Compare commits

...

2773 Commits

Author SHA1 Message Date
Anna Stuchlik
933192a0bb doc: replace Scylla with ScyllaDB in Glossary
This commit replaces "Scylla" with "ScyllaDB" on the Glossary page.

The product has been rebranded as "ScyllaDB".
2024-04-18 13:10:15 +02:00
Aleksandr Bykov
e8833c6f2a test: Kill coordinator during topology operation
If coordinator node was killed, restarted, become not
operatable during topology operation, new coordinator should be elected,
operation should be aborted and cluster should be rolled back

Error injection will be used to kill the coordinator before streaming
starts

Closes scylladb/scylladb#16197
2024-04-17 17:24:20 +02:00
Tomasz Grabiec
c6c8347493 migration_manager: Pull all of group0 state on repair
Current code uses non-raft path to pull the schema, which violates
group0 linearizability because the node will have latest schema but
miss group0 updates of other system tables. In particular,
system.tablets. This manifests as repair errors due to missing
tablet_map for a given table when trying to access it. Tablet map is
always created together with the table in the same group0 command.

When a node is bootstrapping, repair calls sync_schema() to make
sure local schema is up to date. This races with group0 catch up,
and if sync_schema() wins, repair may fail on misssing tablet map.

Fix by making sync_schema() do a group0 read barrier when in raft
mode.

Fixes #18002

Closes scylladb/scylladb#18175
2024-04-17 16:21:05 +02:00
Nadav Har'El
e78fc75323 Merge 'tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands' from Botond Dénes
Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link.

The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable.

Closes scylladb/scylladb#18147

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: fix typo: Fore -> For
  tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands
2024-04-17 15:15:56 +03:00
Asias He
642f9a1966 repair: Improve estimated_partitions to reduce memory usage
Currently, we use the sum of the estimated_partitions from each
participant node as the estimated_partitions for sstable produced by
repair. This way, the estimated_partitions is the biggest possible
number of partitions repair would write.

Since repair will write only the difference between repair participant
nodes, using the biggest possible estimation will overestimate the
partitions written by repair, most of the time.

The problem is that overestimated partitions makes the bloom filter
consume more memory. It is observed that it causes OOM in the field.

This patch changes the estimation to use a fraction of the average
partitions per node instead of sum. It is still not a perfect estimation
but it already improves memory usage significantly.

Fixes #18140

Closes scylladb/scylladb#18141
2024-04-17 14:31:38 +03:00
Kefu Chai
e431e7dc16 test: paritioner_test: print using fmt::print()
instead of using `operator<<`, use `fmt::print()` to
format and print, so we can ditch the `operator<<`-based formatters.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18259
2024-04-17 07:13:20 +03:00
Kefu Chai
0ff28b2a2a test: extract boost_test_print_type() into test_utils.hh
since Boost.Test relies on operator<< or `boost_test_print_type()`
to print the value of variables being compared, instead of defining
the fallback formatter of `boost_test_print_type()` for each
individual test, let's define it in `test/lib/test_utils.hh`, so
that it can be shared across tests.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18260
2024-04-17 07:12:39 +03:00
Kefu Chai
2bb8e7c3c3 utils: include "seastarx.hh" in composite_abort_source.hh
there is chance that `utils/small_vector.hh` does not include
`using namespace seastar`, and even if it does, we should not rely
on it. but if it does not, checkhh would fail. so let's include
"seastarx.hh" in this header, so it is self-contained.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18265
2024-04-17 07:11:01 +03:00
David Garcia
6707bc673c docs: update theme 1.7
Closes scylladb/scylladb#18252
2024-04-16 13:48:11 +02:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Pavel Emelyanov
2c3d6fe72f storage_proxy: Simplify create_hint_sync_point() code
It tries to call container().invoke_on_all() the hard way.
Calling it directly is not possible, because there's no
sharded::invoke_on_all() const overload

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18202
2024-04-16 07:26:06 +03:00
Nadav Har'El
a175e34375 cql-pytest: add instructions on how to get Cassandra
The cql-pytest framework allows running tests also against Cassandra,
but developers need to install Cassandra on their own because modern
distributions such as Fedora no longer carry a Cassandra package.

This patch adds clear and easy to follow (I think) instructions on how
to download a pre-compiled Cassadra, or alternatively how to download
and build Cassandra from source - and how either can be used with the
test/cql-pytest/run-cassandra script.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18138
2024-04-16 07:23:36 +03:00
Botond Dénes
298a7fcbf2 Merge 'Drain view_builder in generic drain' from ScyllaDB
For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself.

refs: #2737
refs: #2795

Closes scylladb/scylladb#16558

* github.com:scylladb/scylladb:
  storage_service: Drain view builder on drain too
  view_builder: Generalize mark_as_built(view_ptr) method
  view_builder: Move mark_existing_views_as_built from storage service
  storage_service: Add view_builder& reference
  main,cql_test_env: Move view_builder start up (and make unconditional)
2024-04-16 07:21:42 +03:00
Pavel Emelyanov
5cf53e670d replica: Remove unused ex variable from table::take_snapshot
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18215
2024-04-16 07:16:38 +03:00
Pavel Emelyanov
f17c594d21 large_data_handler: If-less statistics increment
The partitions_bigger_than_threshold is incremented only if the previous
check detects that the partition exceeds a threshold by its size. It's
done with an extra if, but it can be done without (explicit) condition
as bool type is guaranteed by the standard to convert into integers as
true = 1 and false = 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18217
2024-04-16 07:16:05 +03:00
Pavel Emelyanov
0f70d276d2 tools/scylla-sstable: Use shorter check is unordered_set contains a key
Currentl code counts the number of keys in it just to see if this number
is non-zero. Using .contains() method is better fit here

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18219
2024-04-16 07:14:48 +03:00
Pavel Emelyanov
1df7c2a0e9 topology_coordinator: Mark retake_node() const
Runaway from 4d83a8c12c

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18218
2024-04-16 07:13:07 +03:00
Pavel Emelyanov
05c4042511 api/lsa: Don't use database to perform invoke-on-all
The sharded<database> is used as a invoke_in_all() method provider,
there's no real need in database itself. Simple smp::invoke_on_all()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18221
2024-04-16 07:12:40 +03:00
Pavel Emelyanov
4a6291dce5 test/sstable: Use .handle_exception_type() shortcut
Some tests want to ignore out_of_range exception in continuation and go
the longer route for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18216
2024-04-16 07:11:35 +03:00
Pavel Emelyanov
1612aa01ca cql3: Reserve vector with pk columns
When constructing a vector with partition key data, the size of that
vector is known beforehand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18239
2024-04-16 07:06:07 +03:00
Pavel Emelyanov
f3edde7d2e api: Qualify callback commitlog* argument with const
There's a helper map-reducer that accepts a function to call on
commitlog. All callers accumulate statistics with it, so the commitlog
argument is const pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18238
2024-04-16 07:02:31 +03:00
Botond Dénes
162c9ad6f6 Merge 'gossiper: lock local endpoint when updating heart_beat' from Kamil Braun
In testing, we've observed multiple cases where nodes would fail to
observe updated application states of other nodes in gossiper.

For example:
- in scylladb/scylladb#16902, a node would finish bootstrapping and enter
NORMAL state, propagating this information through gossiper. However,
other nodes would never observe that the node entered NORMAL state,
still thinking that it is in joining state. This would lead to further
bad consequences down the line.
- in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for
schema versions to converge. Convergence would never be achieved and the
test eventually timed out. The node was observing outdated schema state
of some existing node in gossip.

I created a test that would bootstrap 3 nodes, then wait until they all
observe each other as NORMAL, with timeout. Unfortunately, thousands of
runs of this test on different machines failed to reproduce the problem.

After banging my head against the wall failing to reproduce, I decided
to sprinkle randomized sleeps across multiple places in gossiper code
and finally: the test started catching the problem in about 1 in 1000
runs.

With additional logging and additional head-banging, I determined
the root cause.

The following scenario can happen, 2 nodes are sufficient, let's call
them A and B:
- Node B calls `add_local_application_state` to update its gossiper
  state, for example, to propagate its new NORMAL status.
- `add_local_application_state` takes a copy of the endpoint_state, and
  updates the copy:
```
            auto local_state = *ep_state_before;
            for (auto& p : states) {
                auto& state = p.first;
                auto& value = p.second;
                value = versioned_value::clone_with_higher_version(value);
                local_state.add_application_state(state, value);
            }
```
  `clone_with_higher_version` bumps `version` inside
  gms/version_generator.cc.
- `add_local_application_state` calls `gossiper.replicate(...)`
- `replicate` works in 2 phases to achieve exception safety: in first
  phase it copies the updated `local_state` to all shards into a
  separate map. In second phase the values from separate map are used to
  overwrite the endpoint_state map used for gossiping.

  Due to the cross-shard calls of the 1 phase, there is a yield before
  the second phase. *During this yield* the following happens:
- `gossiper::run()` loop on B executes and bumps node B's `heart_beat`.
  This uses the monotonic version_generator, so it uses a higher version
  then the ones we used for states added above. Let's call this new version
  X. Note that X is larger than the versions used by application_states
  added above.
- now node B handles a SYN or ACK message from node A, creating
  an ACK or ACK2 message in response. This message contains:
    - old application states (NOT including the update described above,
      because `replicate` is still sleeping before phase 2),
    - but bumped heart_beat == X from `gossiper::run()` loop,
  and sends the message.
- node A receives the message and remembers that the max
  version across all states (including heart_beat) of node B is X.
  This means that it will no longer request or apply states from node B
  with versions smaller than X.
- `gossiper.replicate(...)` on B wakes up, and overwrites
  endpoint_state with the ones it saved in phase 1. In particular it
  reverts heart_beat back to smaller value, but the larger problem is that it
  saves updated application_states that use versions smaller than X.
- now when node B sends the updated application_states in ACK or ACK2
  message to node A, node A will ignore them, because their versions are
  smaller than X. Or node B will never send them, because whenever node
  A requests states from node B, it only requests states with versions >
  X. Either way, node A will fail to observe new states of node B.

If I understand correctly, this is a regression introduced in
38c2347a3c, which introduced a yield in
`replicate`. Before that, the updated state would be saved atomically on
shard 0, there could be no `heart_beat` bump in-between making a copy of
the local state, updating it, and then saving it.

With the description above, it's easy to make a consistent
reproducer for the problem -- introduce a longer sleep in
`add_local_application_state` before second phase of replicate, to
increase the chance that gossiper loop will execute and bump heart_beat
version during the yield. Further commit adds a test based on that.

The fix is to bump the heart_beat under local endpoint lock, which is
also taken by `replicate`.

The PR also adds a regression test.

Fixes: scylladb/scylladb#15393
Fixes: scylladb/scylladb#15602
Fixes: scylladb/scylladb#16668
Fixes: scylladb/scylladb#16902
Fixes: scylladb/scylladb#17493
Fixes: scylladb/scylladb#18118
Ref: scylladb/scylla-enterprise#3720

Closes scylladb/scylladb#18184

* github.com:scylladb/scylladb:
  test: reproducer for missing gossiper updates
  gossiper: lock local endpoint when updating heart_beat
2024-04-16 06:46:24 +03:00
Tzach Livyatan
289793d964 Update Driver root page
The right term is Amazon DynamoDB not AWS DynamoDB
See https://aws.amazon.com/dynamodb/

Closes scylladb/scylladb#18214
2024-04-16 06:41:28 +03:00
Beni Peled
223275b4d1 test.py: add the pytest junit_suite_name parameter
By default the suitename in the junit files generated by pytest
is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft`
With this change, the junit files will use the real suitename

This change doesn't affect the Test Report in Jenkins, but it
raised part of the other task of publishing the test results to
elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950
where we parse the XMLs and we need the correct suitename

Closes scylladb/scylladb#18172
2024-04-15 21:07:00 +03:00
Tomasz Grabiec
95d93c1668 Merge 'Extend tablet_transition_kind::rebuild to remove replicas' from Pavel Emelyanov
When altering rf for a keyspace, all tablets in this ks may have less replicas. Part of this process is removing replicas from some node(s). This PR extends the tablets rebuild transition to handle this case by making pending_replica optional.

fixes: #18176

Closes scylladb/scylladb#18203

* github.com:scylladb/scylladb:
  test: Tune up tablet-transition test to check del_replica
  api: Add method to delete replica from tablet
  tablet: Make pending replica optional
2024-04-15 21:01:03 +03:00
Pavel Emelyanov
c60639d582 sstables: Coroutinize drop_caches() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18220
2024-04-15 17:22:59 +03:00
Pavel Emelyanov
b06b85c270 test: Tune up tablet-transition test to check del_replica
For that the test case is modified to have 3 nodes and 2 replicas on
start. Existing test cases are changed slightly in the way "from" host
is detected.

Also, the final check for data presense is modified to check that hosts
in "replicas" have data and other hosts don't have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Pavel Emelyanov
8bad828208 api: Add method to delete replica from tablet
Copied from the add_replica counterpart

TODO: Generalize common parts of move_tablet and add_|del_tablet_replica

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Pavel Emelyanov
725b2863d2 tablet: Make pending replica optional
Just like leaving replica could be optional when adding replica to
tablet, the pending replica can be optional too if we're removing a
replica from tablet

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Amnon Heiman
06dc56df01 Update seastar submodule
Fixes scylladb/scylladb#18083

* seastar cd8a9133...f3058414 (18):
  > src/core/metrics.cc: rewrite set_metric_family_configs
  > include/seastar/core/metrics_api.hh: Revert d2929c2ade5bd0125a73d53280c82ae5da86218e
  > sstring: include <fmt/format.h> instead of <fmt/ostream.h>
  > seastar.cc: include used header
  > tls: include used header of <unordered_set>
  > docs: remove unused parameter from handle_connection function of echo-HTTP-server tutorial example
  > stall-analyser: use 0 for the default value of --width
  > http: Move parsed params and urls
  > scripts: use raw string to avoid invalid escape sequences
  > timed_out_error: add fmt::formatter for timed_out_error
  > scripts/stall-analyser: change default branch-threshold to 3%
  > scripts/stall-analyser: resolve string escape sequence warning
  > io_queue: Use static vector for fair groups too
  > io_queue: Use static vector to store fair queues
  > stall-analyser: add space around '=' in param list
  > stall-analyser: add a space between 'var: Type' in type annotation
  > stall-analyser: move variables closer to where they are used
  > memory: drop support for compilers that don't support aligned new

Closes scylladb/scylladb#18235
2024-04-15 15:19:59 +02:00
Tomasz Grabiec
2ceef1d600 scripts: tablet-mon.py: Support for annotating tablets by table id
Closes scylladb/scylladb#18225
2024-04-15 15:19:59 +02:00
Benny Halevy
655d624e01 storage_service: join_token_ring: load ignored nodes state if replacing
When a node bootstraps or replaces a node after full cluster
shutdown and restart, some nodes may be down.

Existing nodes in the cluster load the down nodes TOKENS
(and recently, in this series, also DC and RACK) from system.peers
and then populate locator::topology and token_metadata
accordingly with the down nodes' tokens in storage_service::join_cluster.

However, a bootstrapping/replacing node has no persistent knowledge
of the down nodes, and it learns about their existance only from gossip.
But since the down nodes have unknown status, they never go
through `handle_state_normal` (in gossiper mode) and therefore
they are not accounted as normal token owners.
This is handled by `topology_state_load`, but not with
gossip-based node operations.

This patch updates the ignored nodes (for replace) state in topology
and token_metadata as if they were loaded from system tables,
after calling `prepare_replacement_info` when raft topology changes are
disabled, based on the endpoint_state retrieved in the shadow round
initiated in prepare_replacement_info.

Fixes scylladb/scylladb#15787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:45:55 +03:00
Benny Halevy
e4c3c07510 storage_service: replacement_info: return ignore_nodes state
Instead of `parse_node_list` resolving host ids to inet_address
let `prepare_replacement_info` get host_id_or_endpoint from
parse_node_list and prepare `loaded_endpoint_state` for
the ignored nodes so it can be used later by the callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:43:19 +03:00
Benny Halevy
7c2bd8dc34 locator: host_id_or_endpoint: keep value as variant
Rather than allowing to keep both
host_id and endpoint, keep only one of them
and provide resolve functions that use the
token_metadata to resolve the host_id into
an inet_address or vice verse.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:25:50 +03:00
Benny Halevy
86f1fcdcdd gms: endpoint_state: add getters for host_id, dc_rack, and tokens
Allow getting metadata from the endpoint_state based
on the respective application states instead of going
through the gossiper.

To be used by the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:16:58 +03:00
Benny Halevy
239069eae5 storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
When loading this node endpoint state and it has
tokens in token_metadata, its status can already be set
to normal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
6aaa1b0f48 gossiper: add_saved_endpoint: set dc and rack
When loading endpoint_state from system.peers,
pass the loaded nodes dc/rack info from
storage_service::join_token_ring to gossiper::add_saved_endpoint.

Load the endpoint DC/RACK information to the endpoint_state,
if available so they can propagate to bootstrapping nodes
via gossip, even if those nodes are DOWN after a full cluster-restart.

Note that this change makes the host_id presence
mandatory following https://github.com/scylladb/scylladb/pull/16376.
The reason to do so is that the other states: tokens, dc, and rack
are useless with the host_id.
This change is backward compatible since the HOST_ID application state
was written to system.peers since inception in scylla
and it would be missing only due to potential exception
in older versions that failed to write it.
In this case, manual intervention is needed and
the correct HOST_ID needs to be manually updated in system.peers.

Refs #15787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
468462aa73 gossiper: add_saved_endpoint: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
b9e2aa4065 gossiper: add_saved_endpoint: make host_id mandatory
Require all callers to provide a valid host_id parameter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
1061455442 gossiper: add load_endpoint_state
Pack the topology-related data loaded from system.peers
in `gms::load_endpoint_state`, to be used in a following
patch for `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:06:56 +03:00
Benny Halevy
6b2d94045a gossiper: start_gossiping: log local state
The trace level message hides important information
about the initial node state in gossip.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:06:30 +03:00
Kefu Chai
0be61e51d3 treewide: include <fmt/ostream.h>
this header was previously brought in by seastar's sstring.hh. but
since sstring.hh does not include <fmt/ostream.h> anymore,
`gms/application_state.cc` does not have access to this header.
also, `gms/application_state.cc` should `#include` the used header
by itself.

so, in this change, let's include  <fmt/ostream.h> in `gms/application_state.cc`.
this change addresses the FTBFS with the latest seastar.

the same applies to other places changed in this commit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18193
2024-04-11 11:59:41 +03:00
Pavel Emelyanov
1e0d96cfed storage_service: Drain view builder on drain too
This gets rid of dangling deferred drin on stop and makes nodetool drain
more "consistent" by stopping one more unneeded background activity

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
90593f4e82 view_builder: Generalize mark_as_built(view_ptr) method
Marking is performed in two places and they can be generalized

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
3c3f2cd337 view_builder: Move mark_existing_views_as_built from storage service
Now it's in the correct component

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:11 +03:00
Pavel Emelyanov
895391fb4b storage_service: Add view_builder& reference
Storage service will need to drain v.b. on its drain. Also on cluster
join it marks existing views as built while it's v.b.'s job to do it.
Both will be fixed by next patching and this is prerequisite.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:55:07 +03:00
Pavel Emelyanov
f00f1f117b main,cql_test_env: Move view_builder start up (and make unconditional)
Just starting sharded<view_builder> is lightweight, its constructor does
nothing but initializes on-board variables. Real work takes off on
view_builder::start() which is not moved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:53:33 +03:00
Botond Dénes
c01b19fcb3 Merge 'test/boost: add test for writing large partition notifications' from Ferenc Szili
The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions.

This change adds this check for partitions.

Closes scylladb/scylladb#18189

* github.com:scylladb/scylladb:
  test/boost: added test for large row count warning
  test/boost: add test for writing large partition notifications
2024-04-05 15:35:54 +03:00
Botond Dénes
f6efa17713 Merge 'repair: fix memory counting in repair' from Aleksandra Martyniuk
Repair memory limit includes only the size of frozen mutation
fragments in repair row. The size of other members of repair
row may grow uncontrollably and cause out of memory.

Modify what's counted to repair memory limit.

Fixes: #16710.

Closes scylladb/scylladb#17785

* github.com:scylladb/scylladb:
  test: add test for repair_row::size()
  repair: fix memory accounting in repair_row
2024-04-05 14:53:55 +03:00
Tomasz Grabiec
0c74c2c12f Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov
When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica.

fixes: #18030

Closes scylladb/scylladb#18082

* github.com:scylladb/scylladb:
  test: Check data presense as well
  test: Test how tablets are copied between nodes
  test: Add sanity test for tablet migration
  api: Add method to add replica to a tablet
  tablet: Make leaving replica optional
2024-04-05 12:51:10 +02:00
Ferenc Szili
443192e36d test/boost: added test for large row count warning 2024-04-05 11:50:09 +02:00
Pavel Emelyanov
639cc1f576 compaction: Replace formatted_sstables_list with fmt:: facilities
The formatted_sstables_list is auxiliary class that collects a bunch of
sstables::to_string(shared_sstable)-generated strings. One of bad side
effects of this helper is that it allocates memory for the vector of
strings.

This patch achieves the same goal with the help of fmt::join() equipped
with transformed boost adaptor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18160
2024-04-05 09:17:15 +03:00
Kefu Chai
ff43628b44 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18194
2024-04-05 08:48:17 +03:00
Pavel Emelyanov
2a98e95cd0 api: Coroutinize API get_snapshot_details handler
Now it's possible to understand what it does

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18190
2024-04-04 22:20:28 +03:00
Kamil Braun
72955093eb test: reproducer for missing gossiper updates
Regression test for scylladb/scylladb#17493.
2024-04-04 18:47:01 +02:00
Kamil Braun
a0b331b310 gossiper: lock local endpoint when updating heart_beat
In testing, we've observed multiple cases where nodes would fail to
observe updated application states of other nodes in gossiper.

For example:
- in scylladb/scylladb#16902, a node would finish bootstrapping and enter
NORMAL state, propagating this information through gossiper. However,
other nodes would never observe that the node entered NORMAL state,
still thinking that it is in joining state. This would lead to further
bad consequences down the line.
- in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for
schema versions to converge. Convergence would never be achieved and the
test eventually timed out. The node was observing outdated schema state
of some existing node in gossip.

I created a test that would bootstrap 3 nodes, then wait until they all
observe each other as NORMAL, with timeout. Unfortunately, thousands of
runs of this test on different machines failed to reproduce the problem.

After banging my head against the wall failing to reproduce, I decided
to sprinkle randomized sleeps across multiple places in gossiper code
and finally: the test started catching the problem in about 1 in 1000
runs.

With additional logging and additional head-banging, I determined
the root cause.

The following scenario can happen, 2 nodes are sufficient, let's call
them A and B:
- Node B calls `add_local_application_state` to update its gossiper
  state, for example, to propagate its new NORMAL status.
- `add_local_application_state` takes a copy of the endpoint_state, and
  updates the copy:
```
            auto local_state = *ep_state_before;
            for (auto& p : states) {
                auto& state = p.first;
                auto& value = p.second;
                value = versioned_value::clone_with_higher_version(value);
                local_state.add_application_state(state, value);
            }
```
  `clone_with_higher_version` bumps `version` inside
  gms/version_generator.cc.
- `add_local_application_state` calls `gossiper.replicate(...)`
- `replicate` works in 2 phases to achieve exception safety: in first
  phase it copies the updated `local_state` to all shards into a
  separate map. In second phase the values from separate map are used to
  overwrite the endpoint_state map used for gossiping.

  Due to the cross-shard calls of the 1 phase, there is a yield before
  the second phase. *During this yield* the following happens:
- `gossiper::run()` loop on B executes and bumps node B's `heart_beat`.
  This uses the monotonic version_generator, so it uses a higher version
  then the ones we used for states added above. Let's call this new version
  X. Note that X is larger than the versions used by application_states
  added above.
- now node B handles a SYN or ACK message from node A, creating
  an ACK or ACK2 message in response. This message contains:
    - old application states (NOT including the update described above,
      because `replicate` is still sleeping before phase 2),
    - but bumped heart_beat == X from `gossiper::run()` loop,
  and sends the message.
- node A receives the message and remembers that the max
  version across all states (including heart_beat) of node B is X.
  This means that it will no longer request or apply states from node B
  with versions smaller than X.
- `gossiper.replicate(...)` on B wakes up, and overwrites
  endpoint_state with the ones it saved in phase 1. In particular it
  reverts heart_beat back to smaller value, but the larger problem is that it
  saves updated application_states that use versions smaller than X.
- now when node B sends the updated application_states in ACK or ACK2
  message to node A, node A will ignore them, because their versions are
  smaller than X. Or node B will never send them, because whenever node
  A requests states from node B, it only requests states with versions >
  X. Either way, node A will fail to observe new states of node B.

If I understand correctly, this is a regression introduced in
38c2347a3c, which introduced a yield in
`replicate`. Before that, the updated state would be saved atomically on
shard 0, there could be no `heart_beat` bump in-between making a copy of
the local state, updating it, and then saving it.

With the description above, it's easy to make a consistent
reproducer for the problem -- introduce a longer sleep in
`add_local_application_state` before second phase of replicate, to
increase the chance that gossiper loop will execute and bump heart_beat
version during the yield. Further commit adds a test based on that.

The fix is to bump the heart_beat under local endpoint lock, which is
also taken by `replicate`.

Fixes: scylladb/scylladb#15393
Fixes: scylladb/scylladb#15602
Fixes: scylladb/scylladb#16668
Fixes: scylladb/scylladb#16902
Fixes: scylladb/scylladb#17493
Fixes: scylladb/scylladb#18118
Ref: scylladb/scylla-enterprise#3720
2024-04-04 18:46:56 +02:00
Ferenc Szili
5624abfbeb test/boost: add test for writing large partition notifications
The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions.

This change adds this check for partitions.
2024-04-04 17:33:23 +02:00
Pavel Emelyanov
c7908c319f test: Check data presense as well
Other than making sure that system.tablets is updated with correct
replica set, it's also good to check that the data is present on the
repsective nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 18:01:24 +03:00
Aleksandra Martyniuk
51c09a84cc test: add test for repair_row::size()
Add test which checs whether repair_row::size() considers external
memory.
2024-04-04 16:03:05 +02:00
Aleksandra Martyniuk
a4dc6553ab repair: fix memory accounting in repair_row
In repair, only the size of frozen mutation fragments of repair row
is counted to the memory limit. So, huge keys of repair rows may
lead to OOM.

Include other repair_row's members' memory size in repair memory
limit.
2024-04-04 15:50:53 +02:00
Raphael S. Carvalho
9f93dd9fa3 replica: Use flat_hash_map for tablet storage
The reason that we want to switch to flat_hash_map is that only a small
subset of tablets will be allocated on any given shard, therefore it's
wasteful to use a sparse array, and iterations are slow.
Also, the map gives greater development flexibility as one doesn't have
to worry about empty entries.

perf result:

-- reads

scylla_with_chunked_vector-read-no-tablets.txt
median 73223.28 tps ( 62.3 allocs/op,  13.3 tasks/op,   41932 insns/op,        0 errors)
median 74952.87 tps ( 62.3 allocs/op,  13.3 tasks/op,   41969 insns/op,        0 errors)
median 73016.37 tps ( 62.3 allocs/op,  13.3 tasks/op,   41934 insns/op,        0 errors)
median 74078.14 tps ( 62.3 allocs/op,  13.3 tasks/op,   41938 insns/op,        0 errors)
median 75323.07 tps ( 62.3 allocs/op,  13.3 tasks/op,   41944 insns/op,        0 errors)

scylla_with_hash_map-read-no-tablets.txt
median 74963.30 tps ( 62.3 allocs/op,  13.3 tasks/op,   41926 insns/op,        0 errors)
median 74032.09 tps ( 62.3 allocs/op,  13.3 tasks/op,   41918 insns/op,        0 errors)
median 74850.09 tps ( 62.3 allocs/op,  13.3 tasks/op,   41937 insns/op,        0 errors)
median 74239.37 tps ( 62.3 allocs/op,  13.3 tasks/op,   41921 insns/op,        0 errors)
median 74798.14 tps ( 62.3 allocs/op,  13.3 tasks/op,   41925 insns/op,        0 errors)

scylla_with_chunked_vector-read-tablets-1.txt
median 74234.27 tps ( 62.1 allocs/op,  13.3 tasks/op,   41903 insns/op,        0 errors)
median 75775.98 tps ( 62.1 allocs/op,  13.3 tasks/op,   41910 insns/op,        0 errors)
median 76481.56 tps ( 62.1 allocs/op,  13.2 tasks/op,   41874 insns/op,        0 errors)
median 74056.67 tps ( 62.1 allocs/op,  13.3 tasks/op,   41894 insns/op,        0 errors)
median 75287.68 tps ( 62.1 allocs/op,  13.3 tasks/op,   41894 insns/op,        0 errors)

scylla_with_hash_map-read-tablets-1.txt
median 75613.63 tps ( 62.1 allocs/op,  13.2 tasks/op,   41990 insns/op,        0 errors)
median 74819.51 tps ( 62.1 allocs/op,  13.2 tasks/op,   41973 insns/op,        0 errors)
median 75648.41 tps ( 62.1 allocs/op,  13.3 tasks/op,   42025 insns/op,        0 errors)
median 74170.89 tps ( 62.1 allocs/op,  13.2 tasks/op,   42002 insns/op,        0 errors)
median 75447.72 tps ( 62.1 allocs/op,  13.3 tasks/op,   41952 insns/op,        0 errors)

scylla_with_chunked_vector-read-tablets-128.txt
median 73788.57 tps ( 62.1 allocs/op,  13.2 tasks/op,   41956 insns/op,        0 errors)
median 76563.63 tps ( 62.1 allocs/op,  13.3 tasks/op,   42006 insns/op,        0 errors)
median 75536.12 tps ( 62.1 allocs/op,  13.2 tasks/op,   42005 insns/op,        0 errors)
median 74679.17 tps ( 62.1 allocs/op,  13.3 tasks/op,   41958 insns/op,        0 errors)
median 75380.95 tps ( 62.1 allocs/op,  13.2 tasks/op,   41946 insns/op,        0 errors)

scylla_with_hash_map-read-tablets-128.txt
median 75459.99 tps ( 62.1 allocs/op,  13.3 tasks/op,   42055 insns/op,        0 errors)
median 74280.11 tps ( 62.1 allocs/op,  13.3 tasks/op,   42085 insns/op,        0 errors)
median 74502.61 tps ( 62.1 allocs/op,  13.3 tasks/op,   42063 insns/op,        0 errors)
median 74692.27 tps ( 62.1 allocs/op,  13.3 tasks/op,   41994 insns/op,        0 errors)
median 75402.64 tps ( 62.1 allocs/op,  13.3 tasks/op,   42015 insns/op,        0 errors)

-- writes

scylla_with_chunked_vector-write-no-tablets.txt
median 68635.17 tps ( 58.4 allocs/op,  13.3 tasks/op,   52709 insns/op,        0 errors)
median 68716.36 tps ( 58.4 allocs/op,  13.3 tasks/op,   52691 insns/op,        0 errors)
median 68512.76 tps ( 58.4 allocs/op,  13.3 tasks/op,   52721 insns/op,        0 errors)
median 68606.14 tps ( 58.4 allocs/op,  13.3 tasks/op,   52696 insns/op,        0 errors)
median 68619.25 tps ( 58.4 allocs/op,  13.3 tasks/op,   52697 insns/op,        0 errors)

scylla_with_hash_map-write-no-tablets.txt
median 67678.10 tps ( 58.4 allocs/op,  13.3 tasks/op,   52723 insns/op,        0 errors)
median 67966.06 tps ( 58.4 allocs/op,  13.3 tasks/op,   52736 insns/op,        0 errors)
median 67881.47 tps ( 58.4 allocs/op,  13.3 tasks/op,   52743 insns/op,        0 errors)
median 67856.81 tps ( 58.4 allocs/op,  13.3 tasks/op,   52730 insns/op,        0 errors)
median 67812.58 tps ( 58.4 allocs/op,  13.3 tasks/op,   52740 insns/op,        0 errors)

scylla_with_chunked_vector-write-tablets-1.txt
median 67741.83 tps ( 58.4 allocs/op,  13.3 tasks/op,   53425 insns/op,        0 errors)
median 68014.20 tps ( 58.4 allocs/op,  13.3 tasks/op,   53455 insns/op,        0 errors)
median 68228.48 tps ( 58.4 allocs/op,  13.3 tasks/op,   53447 insns/op,        0 errors)
median 67950.96 tps ( 58.4 allocs/op,  13.3 tasks/op,   53443 insns/op,        0 errors)
median 67832.69 tps ( 58.4 allocs/op,  13.3 tasks/op,   53462 insns/op,        0 errors)

scylla_with_hash_map-write-tablets-1.txt
median 66873.70 tps ( 58.4 allocs/op,  13.3 tasks/op,   53548 insns/op,        0 errors)
median 67568.23 tps ( 58.4 allocs/op,  13.3 tasks/op,   53547 insns/op,        0 errors)
median 67653.70 tps ( 58.4 allocs/op,  13.3 tasks/op,   53525 insns/op,        0 errors)
median 67389.21 tps ( 58.4 allocs/op,  13.3 tasks/op,   53536 insns/op,        0 errors)
median 67437.91 tps ( 58.4 allocs/op,  13.3 tasks/op,   53537 insns/op,        0 errors)

scylla_with_chunked_vector-write-tablets-128.txt
median 67115.41 tps ( 58.3 allocs/op,  13.3 tasks/op,   53341 insns/op,        0 errors)
median 66836.07 tps ( 58.3 allocs/op,  13.3 tasks/op,   53342 insns/op,        0 errors)
median 67214.07 tps ( 58.3 allocs/op,  13.3 tasks/op,   53303 insns/op,        0 errors)
median 67198.25 tps ( 58.3 allocs/op,  13.3 tasks/op,   53347 insns/op,        0 errors)
median 67368.78 tps ( 58.3 allocs/op,  13.3 tasks/op,   53374 insns/op,        0 errors)

scylla_with_hash_map-write-tablets-128.txt
median 66273.50 tps ( 58.3 allocs/op,  13.3 tasks/op,   53400 insns/op,        0 errors)
median 66564.89 tps ( 58.3 allocs/op,  13.3 tasks/op,   53432 insns/op,        0 errors)
median 66568.52 tps ( 58.3 allocs/op,  13.3 tasks/op,   53408 insns/op,        0 errors)
median 66368.00 tps ( 58.3 allocs/op,  13.3 tasks/op,   53441 insns/op,        0 errors)
median 66293.55 tps ( 58.3 allocs/op,  13.3 tasks/op,   53408 insns/op,        0 errors)

Fixes #18010.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18093
2024-04-04 16:25:48 +03:00
Yaniv Kaul
2ce2649ec1 Typo: you -> your
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17806
2024-04-04 14:55:46 +03:00
Nadav Har'El
c24bc3b57a alternator: do not use tablets on new Alternator tables
A few months ago, in merge d3c1be9107,
we decided that if Scylla has the experimental "tablets" feature enabled,
new Alternator tables should use this feature by default - exactly like
this is the default for new CQL tables.

Sadly, it was now decided to reverse this decision: We do not yet trust
enough LWT on tablets, and since Alternator often (if not always) relies
on LWT, we want Alternator tables to continue to use vnodes - not tablets.

The fix is trivial - just changing the default. No test needed to change
because anyway, all Alternator tests work correctly on Scylla with the
tablets experimental feature disabled. I added a new test to enshrine
the fact that Alternator does not use tablets.

An unfortunate result of this patch will be that Alternator tables
created on versions with this patch (e.g., Scylla 6.0) will not use
tablets and will continue to not use tablets even if Scylla is upgraded
(currently, the use of tablets is decided at table creation time, and
there is no way to "upgrade" a vnode-based table to be tablet based).

This patch should be reverted as soon as LWT support matures on tablets.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18157
2024-04-04 12:11:29 +03:00
Pavel Emelyanov
1c1004d1bd sstables_loader: Format list of sstables' filenames in place
Loader wants to print set of sstables' names. For that it collects names
into a dedicated vector, then prints it using fmt/ranges facility.

There's a way to achieve the same goal without allocating extra vector
with names -- use fmt::format() and pass it a range converting sstables
into their names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18159
2024-04-04 12:09:52 +03:00
Ferenc Szili
f1cc6252fd logging: Don't log PK/CK in large partition/row/cell warning
Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly.

This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables.

The logged data will look like this:

Large cells:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db

Large rows:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db

Large partitions:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db

Fixes #18041

Closes scylladb/scylladb#18166
2024-04-04 12:06:31 +03:00
Kefu Chai
3b50c39a83 scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector
in seastar's b28342fa5a301de3facf5e83dc691524a6b20604, we switched
* `io_queue::_streams` from
  `boost::container::small_vector<fair_queue, 2>` to
  `boost::container::static_vector<fair_queue, 2>`
* `io_queue::_fgs` from
  `std::vector<std::unique_ptr<fair_group>>` to
  `boost::container::static_vector<fair_group, 2>`

so we need to update the gdb script accordingly to reflect this
change, and to avoid the nested try-except blocks, we switch to
a `while` statement to simplify the code structure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18165
2024-04-04 11:39:10 +03:00
Anna Stuchlik
994f807bf6 docs: add the latest image info to GCP and Azure pages
This commit adds image information for the latest patch release
to the GCP and Azure deployment page.
The information now replaces the reference to the Download Center
so that the user doesn't have to jump to another website.

Fixes https://github.com/scylladb/scylladb/issues/18144

Closes scylladb/scylladb#18168
2024-04-04 11:24:39 +03:00
Kefu Chai
64b8bb239f api/storage_service: throw if table is not found when move tablets
`database::find_column_family()` throws no_such_column_family
if an unknown ks.cf is fed to it. and we call into this function
without checking for the existence of ks.cf first. since
"/storage_service/tablets/move" is a public interface, we should
translate this error to a better http error.

in this change, we check for the existence of the given ks.cf, and
throw an exception so that it can be caught by seastar::httpd::routers,
and converted to an HTTP error.

Fixes #17198
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17217
2024-04-04 11:23:52 +03:00
Pavel Emelyanov
590f0329ae test: Test how tablets are copied between nodes
This patches the previously introduced test by introducing the 'action'
test paramter and tweaking the final checking assertions around tablet
replicas read from system.tablets

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:57 +03:00
Pavel Emelyanov
28964ba5fe test: Add sanity test for tablet migration
It just checks that after api call to move_tablet the resulting replica
is in expected state. This test will be later expanded to check for
rebuild transition.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:31 +03:00
Pavel Emelyanov
79ad760e95 api: Add method to add replica to a tablet
The new API submits rebuild transition with new replicas set to be old
(current) replicas plus the provided one. It looks and acts like the
move_tablet API call with several changes:

- lacks the "source" replica argument
- submits "rebuild" transition kind
- cross racks checks are not performed

The 'force' argument is inherited from move_tablet, but is unused now
and is left for future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:16 +03:00
Tomasz Grabiec
1a839bcb36 main: Skip tablet metadata loading in maintenance mode
If system.tablets is corrupted, the node would not boot in maintenance
mode, which is needed to fix system.tablets.

Closes scylladb/scylladb#17990
2024-04-04 09:20:09 +03:00
Pavel Emelyanov
b0cba57e29 tablet: Make leaving replica optional
When getting leaving replica from from tablet info and transition info,
the getter code assumes that this replica always exists. It's not going
to be the case soon, so make the return value be optional.

There are four places that mess with leaving replica:

- stream tablet handler: this place checks that the leaving replica is
  _not_ current host. If leaving replica is missing, the check should
  pass

- cleanup tablet handler: this place checks that the leaving replica
  _is_ current host. If leaving replica is missing, the check should
  fail as well

- topology coordinator: it gets leaving replica to call cleanup on. If
  leaving replica is missing, the cleanup call is short-circuited to
  succeed immediately

- load-stats calculator: it checks if the leaving replica is self. This
  check is not patched as it's automatically satisfied by std::optional
  comparison operator overload for wrapped type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:03:36 +03:00
Michał Chojnowski
8147ab69ac row_cache_test: avoid a throw in external_updater
In test_exception_safety_of_update_from_memtable, we have a potential
throw from external_updater.

external_updater is supposed to be infallible.
Scylla currently aborts when an external_updater throws, so a throw from
there just fails the test.

This isn't intended. We aren't testing external_updater in this test.

Fixes #18163

Closes scylladb/scylladb#18171
2024-04-03 23:22:08 +02:00
Piotr Dulikowski
baae811142 Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.

Fixes https://github.com/scylladb/scylladb/issues/17736

Closes scylladb/scylladb#18039

* github.com:scylladb/scylladb:
  auth: keep auth version in scylla_local
  auth: coroutinize service::start
2024-04-03 12:25:56 +02:00
Kefu Chai
e2f3fed373 service: qos: fix a typo
s/accesor/accessor/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18124
2024-04-03 10:33:54 +02:00
Raphael S. Carvalho
12714a4123 locator: Avoid tablet map lookup on every write for getting replicas
We can cache tablet map in erm, to avoid looking it up on every write for
getting write replicas. We do that in tablet_sharder, but not in tablet
erm. Tablet map is immutable in the context of a given erm, so the
address of the map is stable during erm lifetime.

This caught my attention when looking at perf diff output
(comparing tablet and vnode modes).

It also helps when erm is called again on write completion for
checking locality, used for forwarding info to the driver if needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18158
2024-04-03 10:28:04 +02:00
Botond Dénes
d43670046b test/lib: random_schema: disallow boolean_type in keys
They result in poor distribution and poor cardinality, interfering with
tests which want to generate N partitions or rows.

Fixes: #17821

Closes scylladb/scylladb#17856
2024-04-03 09:52:36 +03:00
Botond Dénes
2cb5dcabf7 docs/dev/maintainer.md: document another exceptions to rule no.0
Maintainers are also allowed to commit their own backport PR. They are
allowed to backport their own code, opening a PR to get a CI run for a
backport doesn't change this.

Closes scylladb/scylladb#17727
2024-04-03 09:51:19 +03:00
Botond Dénes
6771c646c4 tools/scylla-nodetool: fix typo: Fore -> For 2024-04-03 02:16:59 -04:00
Botond Dénes
b6db56286a tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands
Just like all the other commands already have it. These commands didn't
have documentation at the point where they were implemented, hence the
missing doc link.

The links don't work yet, but they will work once we release 6.0 and the
current master documentation is promoted to stable.
2024-04-03 02:16:03 -04:00
Piotr Dulikowski
3ba7a4ead2 Merge 'api: upgrade_to_raft topology: add logging' from Benny Halevy
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Closes scylladb/scylladb#18143

* github.com:scylladb/scylladb:
  api: storage_service: upgrade_to_raft_topology: fixup indentation
  api: storage_service: upgrade_to_raft_topology: add logging
2024-04-03 07:00:10 +02:00
Pavel Emelyanov
8550a38a8b cql: Reserve vector of column definitions in advance
The vector in question is populted from the content of another map, so
its size is known in advance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18155
2024-04-02 22:35:10 +03:00
Marcin Maliszkiewicz
562caaf6c6 auth: keep auth version in scylla_local
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.
2024-04-02 19:04:21 +02:00
Benny Halevy
1272d736c0 api: storage_service: upgrade_to_raft_topology: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:51 +03:00
Benny Halevy
31026ae27f api: storage_service: upgrade_to_raft_topology: add logging
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Indentation will be fixed in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:49 +03:00
Kefu Chai
15d59db98b cql3: select_statement: include <ranges>
we should include used header, to avoid compilation failures like:
```
cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views'
        for (const auto& used_function : used_functions | std::ranges::views::filter(not_native)) {
                                                          ~~~~~~~~~~~~~~~~~~~~^
1 error generated.`
```
if some of the included header drops its own `#include <optional>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18145
2024-04-02 18:47:54 +03:00
Botond Dénes
2179bfc40d Merge 'Relax initialization of virtual tables' from Pavel Emelyanov
It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read.

Closes scylladb/scylladb#18133

* github.com:scylladb/scylladb:
  virtual_tables: Open-code install_virtual_readers_and_writers()
  virtual_tables: Move readers setup loop into add_table()
  virtual_tables: Move tables creation loop into add_table()
  virtual_tables: Make add_tablet() a coroutine
  virtual_tables: Open-code register_virtual_tables()
2024-04-02 13:39:26 +03:00
Botond Dénes
469ff4f290 Merge 'repair: Load repair history in background' from Asias He
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993

Closes scylladb/scylladb#17994

* github.com:scylladb/scylladb:
  repair: Load repair history in background
  repair: Abort load_history process in shutdown
2024-04-02 10:53:10 +03:00
Botond Dénes
fd12052c89 Update tools/java/ submodule
* tools/java/ d61296dc...b810e8b0 (1):
  > do not include {dclocal_,}read_repair_chance if not enabled
2024-04-02 10:47:57 +03:00
Yaron Kaikov
fcdb80773e github: sync-labels: run only in scylladb oss repo
We currently support the sync-label only in OSS. Since Scylla-enterprise
get all the commits from OSS repo, the sync-label is running and failing
during checkout (since it's a private repo and should have different
configuration)

For now, let's limit the workflows for oss repo

Closes scylladb/scylladb#18142
2024-04-02 10:45:17 +03:00
Botond Dénes
ffdd47c2b1 Merge 'Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar
Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required.

The feature can be manually verified by doing the following :
1. run a single-node single-shard 1GB cluster
2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter)
3. populate with tiny partitions
4. watch the bloom filter metrics get capped at 100MB

The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature.

Fixes #17747

Closes scylladb/scylladb#17771

* github.com:scylladb/scylladb:
  test_bloom_filter.py: disable reclaiming memory from components
  sstable_datafile_test: add tests to verify auto reclamation of components
  test/lib: allow overriding available memory via test_env_config
  sstables_manager: support reclaiming memory from components
  sstables_manager: store available memory size
  sstables_manager: add variable to track component memory usage
  db/config: add a new variable to limit memory used by table components
  sstable_datafile_test: add testcase to verify reclamation from sstables
  sstables: support reclaiming memory from components
2024-04-02 10:40:52 +03:00
Amnon Heiman
803d414896 get_description.py: Make the Script a library
This patch makes the get_description.py script easier to use by the
documentation automation:
1. The script is now a library.
2. You can choose the output of the script, currently supported pipee
   and yml.

You can still call the from the command line, like before, but you can
also calls it from another python script.

For example the folowing python script would generate the documentation
for the metrics description of the ./alternator/ttl.cc file.
```

import get_description

metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml"))
get_description.write_metrics_to_file("out.yaml", metrics, "yml")
```

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18136
2024-04-02 10:07:11 +03:00
Botond Dénes
ea8478a3e7 scripts/open-coredump.sh: introduce --ci
Coredumps coming from CI are produced by a commit, which is not
available in the scylla.git repository, as CI runs on a merge commit
between the main branch (master or enterprise) and the tested PR branch.
Currently the script will attempt to checkout this commit and will fail
as the commit hash is unrecognized.
To work around this, add a --ci flag, which when used, will force the
main branch to be checked out, instead of the commit hash.

Closes scylladb/scylladb#18023
2024-04-02 09:27:52 +03:00
Kefu Chai
55d0ea48bd test: randomized_nemesis_test: remove fmt::formatter for seastar::timed_out_error
This reverts commit 97b203b1af.

since Seastar provides the formatter, it's not necessary to vendor it in
scylladb anymore.

Refs #13245

Closes scylladb/scylladb#18114
2024-04-02 09:25:51 +03:00
Benny Halevy
d5ac0c06b3 test_sstable_reversing_reader_random_schema: drop workaround for #9352
Issue #9352 was fixed about a year and a half ago
so this workaround should not be needed anymore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18121
2024-04-02 09:25:06 +03:00
Raphael S. Carvalho
29f9f7594f replica: Kill table::storage_group_id_for_token()
storage_group_id_for_token() was only needed from within
tablet_storage_group_manager, so we can kill
table::storage_group_id_for_token().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18134
2024-04-02 09:23:23 +03:00
Asias He
99b7ccfa8b repair: Load repair history in background
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993
2024-04-02 09:24:35 +08:00
Asias He
523895145d repair: Abort load_history process in shutdown
If the node is shutting down, there is no point to continue to load the
repair history.

Refs #17993
2024-04-02 09:24:35 +08:00
Lakshmi Narayanan Sreethar
d86505e399 test_bloom_filter.py: disable reclaiming memory from components
Disabled reclaiming memory from sstable components in the testcase as it
interferes with the false positive calculation.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
d261f0fbea sstable_datafile_test: add tests to verify auto reclamation of components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
169629dd40 test/lib: allow overriding available memory via test_env_config
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
a36965c474 sstables_manager: support reclaiming memory from components
Reclaim memory from the SSTable that has the most reclaimable memory if
the total reclaimable memory has crossed the threshold. Only the bloom
filter memory is considered reclaimable for now.

Fixes #17747

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
2ca4b0a7a2 sstables_manager: store available memory size
The available memory size is required to calculate the reclaim memory
threshold, so store that within the sstables manager.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
f05bb4ba36 sstables_manager: add variable to track component memory usage
sstables_manager::_total_reclaimable_memory variable tracks the total
memory that is reclaimable from all the SSTables managed by it.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e8026197d2 db/config: add a new variable to limit memory used by table components
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e0b6186d16 sstable_datafile_test: add testcase to verify reclamation from sstables
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
4f0aee62d1 sstables: support reclaiming memory from components
Added support to track total memory from components that are reclaimable
and to reclaim memory from them if and when required. Right now only the
bloom filters are considered as reclaimable components but this can be
extended to any component in the future.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Pavel Emelyanov
627c5fdf04 virtual_tables: Open-code install_virtual_readers_and_writers()
It's pretty short already and is naturally a "part" of
initialize_virtual_tables(). Neither it installs writers any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:02:40 +03:00
Pavel Emelyanov
1d79cfc6cf virtual_tables: Move readers setup loop into add_table()
Similarly to previous patch, after virtual tables are registered the
registry is iterated over to install virtual readers onto each entry.
Again, this can happen at the time of registering, no need in dedicated
loop for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:01:50 +03:00
Pavel Emelyanov
891e792717 virtual_tables: Move tables creation loop into add_table()
Once virtual_tables map is populated, it's iterated over to create
replica::table entries for each virtual table. This can be done in the
same place where the virtual table is created, no need in dedicated loop
for it nowadays.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:38 +03:00
Pavel Emelyanov
420ce3634f virtual_tables: Make add_tablet() a coroutine
Next patches will populate it with sleeping calls, this patch prepares
for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:15 +03:00
Pavel Emelyanov
ddc6f9279f virtual_tables: Open-code register_virtual_tables()
It's naturally a "part" of initialize_virtual_tables(). Further patching
gets possible with it being open-coded.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 18:59:18 +03:00
Kefu Chai
c5601a749e github: sync_labels: do not error out if PR's cover letter is empty
if a pull request's cover letter is empty, `pr.body` is None. in that
case we should not try to pass it to `re.findall()` as the "string"
parameter. otherwise, we'd get

```
TypeError: expected string or bytes-like object, got 'NoneType'
```
so, in this change, we just return an empty list if the PR in question
has an empty cover letter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18125
2024-04-01 18:13:22 +03:00
Avi Kivity
88fb686d67 test: generate core dumps on crashes in debug clusters
The cluster manager library doesn't set the asan/ubsan options
to abort on error and create core dumps; this makes debugging much
harder.

Fix by preparing the environment correctly.

Fixes scylladb/scylladb#17510

Closes scylladb/scylladb#17511
2024-04-01 18:11:41 +03:00
Kefu Chai
07c40f5600 github: sync_labels: use ${{}} expression syntax in "if" condition
to ensure that the expression is evaluated properly.
see https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsstepsif

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18127
2024-04-01 17:17:43 +03:00
Kefu Chai
1494499f90 github: sync_labels: checkout a single file not the whole repo
what we need is but a script, so instead of checkout the whole repo,
with all history for all tags and branches, let's just checkout
a single file. faster this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18126
2024-04-01 17:15:50 +03:00
Yaron Kaikov
b8c705bc54 .github: sync-labels: fix pull request permissions
when adding a label to a PR request we keep getting the following error
message:
```
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
    sync_labels(repo, args.number, args.label, args.action, args.is_issue)
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels
    target.add_to_labels(label)
  File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels
    headers, data = self._requester.requestJsonAndCheck(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"}
```

Based on
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token.
The maximum access for pull requests from public forked repositories is
set to `read`

Switching to `pull_request_target` to solve it

Fixes: https://github.com/scylladb/scylladb/issues/18102

Closes scylladb/scylladb#18052
2024-04-01 17:11:35 +03:00
Pavel Emelyanov
46bbfc0c53 expression: Shorten making raw_value from FragmetedView
The read_field is std::optional<View>. The raw_value::make_value()
accepts managed_bytes_opt which is std::optional<manager_bytes>.
Finally, there's std::optional<T>::optional(std::optional<U>&&)
move constructor (and its copy-constructor peer).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18128
2024-04-01 16:52:18 +03:00
Benny Halevy
01fc1a9f66 schema_tables: std::move mutation into the mutation vector
To save a copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18120
2024-04-01 14:16:30 +03:00
Pavel Emelyanov
5427967f45 schema: Introduce build() && overload
The schema_builder::build() method creates a copy of raw schema
internaly in a hope that builder will be updated and be asked to build
the resulting schema again (e.g. alternator uses this).

However, there are places that build schema using temporary object once
in a `return schema_builder().with_...().build()` manner. For those
invocations copying raw schema is just waste of cycles.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18094
2024-04-01 14:00:42 +03:00
Nadav Har'El
b6854cbb21 Merge 'test/cql-pytest: match error message formated using {fmt} ' from Kefu Chai
currently, our homebrew formatter formats `std::map` like
```
{{k1, v1}, {k2, v2}}
```
while {fmt} formats a map like:
```
{k1: v1, k2: v2}
```
and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like
```
{"max_threshold": "1"}
```
before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Closes scylladb/scylladb#18058

* github.com:scylladb/scylladb:
  test/cql-pytest: match error message formated using {fmt}
  test/cql-pytest: extract scylla_error() for not allowed options test
2024-04-01 11:23:24 +03:00
Kefu Chai
fcf7ca5675 utils/logalloc: do not allocate memory in reclaim_timer::report()
before this change, `reclaim_timer::report()` calls

```c++
fmt::format(", at {}", current_backtrace())
```

which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.

in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.

Fixes #18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18100
2024-04-01 11:01:52 +03:00
Botond Dénes
885cb2af07 utils/rjson: include tasklocal backtrace in rapidjson assert error message
Currently, the error message on a failed RAPIDJSON_ASSERT() is this:

    rjson::error (JSON error: condition not met: false)

This is printed e.g. when the code processing a json expects an object
but the JSON has a different type. Or if a JSON object is missing an
expected member. This message however is completely inadequate for
determinig what went wrong. Change this to include a task-local
backtrace, like a real assert failure would. The new error looks like
this:

    rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623)

Closes scylladb/scylladb#18101
2024-03-29 18:41:54 +01:00
Pavel Emelyanov
41a1b1c0d0 move_tablets: Emplace mutations into vector, not push
It's more applicable in this case.

Also, built tablets mutations are casted to canonical_mutations, but
when emplaced compiler can pick-up canonical_mutation(const mutation&)
constructor and the cast is not required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18090
2024-03-29 15:21:49 +02:00
Kamil Braun
f5603ad9ca Merge 'test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero' from Mikołaj Grzebieluch
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it.

In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations.

Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use.

This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```

Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use.

Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```

Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation.

Fixes scylladb/scylladb#17977

Reapply b4144d14c6.

Closes scylladb/scylladb#17998

* github.com:scylladb/scylladb:
  test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
  Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
2024-03-29 12:52:31 +01:00
Tzach Livyatan
4930095d39 Docs: Fix link fro scylla-sstable.rst to /architecture/sstable/
Fix https://github.com/scylladb/scylladb/issues/18096

Closes scylladb/scylladb#18097
2024-03-29 10:48:24 +02:00
Piotr Dulikowski
57719ece4f Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.

Closes scylladb/scylladb#18040

* github.com:scylladb/scylladb:
  main: reload service levels data accessor after join_cluster
  service: qos: create separate function for reloading data accessor
2024-03-29 09:39:11 +01:00
Kefu Chai
1632fbbef9 test/cql-pytest: match error message formated using {fmt}
currently, our homebrew formatter formats `std::map` like

{{k1, v1}, {k2, v2}}

while {fmt} formats a map like:

{k1: v1, k2: v2}

and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like

{"max_threshold": "1"}

before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-29 08:07:59 +08:00
Kefu Chai
8f47fcedf6 test/cql-pytest: extract scylla_error() for not allowed options test
currently, our homebrew formatter formats `std::map` like

{{k1, v1}, {k2, v2}}

while {fmt} formats a map like:

{k1: v1, k2: v2}

and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like

{"max_threshold": "1"}

as we are switching to the formatters provided by {fmt}, would be
better to support its convention directly.

so, in this change, to prepare the change, before migrating to
{fmt}, let's refactor the test to support both formats by
extracting a helper to format the error message, so that we can
change it to emit both formats.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-29 08:03:02 +08:00
Mikołaj Grzebieluch
1e2607563f test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp
by adding `ring_delay_ms` to it.

In this test, nodes are learning about new generations (introduced by upgrade
procedure and then by node bootstrap) concurrently with doing writes that
should go to these generations.

Because of `ring_delay_ms = 0', the generation could have been committed when
it should have already been in use.

This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```

Creating writes during such a generation can result in assigning them a wrong
generation or a failure. Failure may occur if it hits short time window when
`generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet
been executed. With a nonzero ring_delay_ms it's not a problem, because during
this time window, the generation should not be in use.

Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```

Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes.
Wait for the last generation to be in use and sleep one second to make sure
there are writes to the CDC table in this generation.

Fixes #17977
2024-03-28 17:13:43 +01:00
Botond Dénes
4c0dadee7c Merge 'test: changes to prepare for dropping FMT_DEPRECATED_OSTREAM' from Kefu Chai
this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10.

Refs #13245

Closes scylladb/scylladb#18054

* github.com:scylladb/scylladb:
  test: unit: add fmt::formatter for test_data in tests
  test/lib: do not print with fmt::to_string()
  test/boost: print runtime_error using e.what()
2024-03-28 15:33:56 +02:00
Kamil Braun
33751f8f4e Merge 'raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC' from Gleb
* 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev:
  raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
  Use correct limit for raft commands throughout the code.
2024-03-28 14:25:58 +01:00
Nadav Har'El
566223c34a Merge ' tools/scylla-nodetool: repair: abort on first failed repair' from Botond Dénes
When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.

To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones.

Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226

Closes scylladb/scylladb#17678

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: repair: abort on first failed repair
  test/nodetool: nodetool(): add check_return_code param
  test/nodetool: nodetool(): return res object instead of just stdout
  test/nodetool: count unexpected requests
2024-03-28 14:02:29 +02:00
Botond Dénes
81bbfae77a tools/scylla-nodetool: implement the checkAndRepairCdcStreams command
Closes scylladb/scylladb#18076
2024-03-28 13:54:37 +02:00
Pavel Emelyanov
1adf16ce73 Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy
This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table.
The new allocation implementation is independent of vnodes/token ownership.
Rather than using the natural_endpoints_tracker, it implements its own tracking
based on dc/rack load (== number of replicas in rack), with the additional benefit
that tablet allocation will balance the allocation across racks, using a heap structure,
similar to the one we use to balance tablet allocation across shards in each node.

reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map.
In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased,
or it will deallocate tablet replicas from datacenters for which replication factor was decreased.

The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes.

Closes scylladb/scylladb#17846

* github.com:scylladb/scylladb:
  network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
  network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
  network_topology_strategy: reallocate_tablets: support deallocation via rf change
  network_topology_startegy_test: tablets_test: randomize cases
  network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
  network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
  network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
  network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
  network_topology_strategy_test: endpoints_check: strictly check rf for tablets
  network_topology_strategy_test: full_ring_check for tablets: drop unused options param
2024-03-28 11:19:11 +03:00
Kefu Chai
2bfc7324d4 mutation: friend fmt::formatter<atomic_cell> in atomic_cell_view
GCC-14 rightly points out that the constructor of `atomic_cell_view`
is marked private, and cannot be called from its formatter:
```
/usr/bin/g++-14 -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/var/ssd/scylladb -I/var/ssd/scylladb/build/gen -I/var/ssd/scylladb/seastar/include -I/var/ssd/scylladb/build/seastar/gen/include -I/var/ssd/scylladb/build/seastar/gen/src -g -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -ffile-prefix-map=/var/ssd/scylladb=. -march=westmere -Wstack-usage=40960 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -MF mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o.d -o mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -c /var/ssd/scylladb/mutation/atomic_cell.cc
In file included from /var/ssd/scylladb/mutation/atomic_cell.cc:9:
/var/ssd/scylladb/mutation/atomic_cell.hh: In member function ‘auto fmt::v10::formatter<atomic_cell>::format(const atomic_cell&, fmt::v10::format_context&) const’:
/var/ssd/scylladb/mutation/atomic_cell.hh:413:67: error: ‘atomic_cell_view::atomic_cell_view(basic_atomic_cell_view<is_mutable>) [with mutable_view is_mutable = mutable_view::yes]’ is private within this context
  413 |         return fmt::format_to(ctx.out(), "{}", atomic_cell_view(ac));
      |                                                                   ^
/var/ssd/scylladb/mutation/atomic_cell.hh:275:5: note: declared private here
  275 |     atomic_cell_view(basic_atomic_cell_view<is_mutable> view)
      |     ^~~~~~~~~~~~~~~~
```
so, in this change, we make the formatter a friend of
`atomic_cell_view`.
since the operator<< was dropped, there is no need to keep its friend
declaration around, so it is dropped in this change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18081
2024-03-28 09:44:00 +02:00
Kefu Chai
99e743de9d test: nodetool: match with vector printed by {fmt}
our homebrew formatter for std::vector<string> formats like

```
{hello, world}
```

while {fmt}'s formatter for sequence-like container formats like

```
["hello", "world"]
```

since we are moving to {fmt} formatters. and in this context,
quoting the verbatim text makes more sense to user. let's
support the format used by {fmt} as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18057
2024-03-28 09:35:37 +02:00
Kefu Chai
c2ffa0d813 bytes.hh: stop at '}' in fmt::formatter<fmt_hex>
according to {fmt}'s document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,

```
  // the range will contain "f} continued". The formatter should parse
  // specifiers until '}' or the end of the range. In this example the
  // formatter should parse the 'f' specifier and return an iterator
  // pointing to '}'.
```

so we should check for _both_ '}' and end of the range. when building
scylla with {fmt} 10.2.1, it fails to build code like

```c++
fmt::format_to(out, "{}", fmt_hex(frag))
```

as {fmt}'s compile-time checker fails to parse this format string
along with given argument, as at compile time,
```c++
throw format_error("invalid group_size")
```
is executed.

so, in this change, we check both '}' and the end of range.

the change which introduced this formatter was
2f9dfba800

Refs 2f9dfba800
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18080
2024-03-28 08:58:36 +02:00
Marcin Maliszkiewicz
50e0032bca test: auth: remove if not exists from auth cql statement
They were added due to https://github.com/scylladb/python-driver/issues/296
but looks like it no longer reproduces.

Change was tested with ./test.py -vv --repeat=100 test_auth
to minimize chance of introducing flakiness.

Closes scylladb/scylladb#18043
2024-03-28 06:06:45 +01:00
Raphael S. Carvalho
902c71bac8 storage_service: Fix undefined behavior in stream_tablet()
correctness when constructing range_streamer depends on compiler
evaluation order of params.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18079
2024-03-27 23:50:37 +01:00
Gleb Natapov
6e6aefc9ab raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
We have new, more generic, RPC to pull group0 mutations now: RAFT_PULL_SNAPSHOT.
Use it instead of more specific RAFT_PULL_TOPOLOGY_SNAPSHOT one.
2024-03-27 19:18:45 +02:00
Gleb Natapov
c1dcf0fae7 Use correct limit for raft commands throughout the code.
Raft uses schema commitlog, so all its limits should be derived from
this commitlog segment size, but many places used regular commitlog size
to calculate the limits and did not do what they really suppose to be
doing.
2024-03-27 19:16:09 +02:00
Kamil Braun
c3989d8e03 Merge 'storage_service: keep subscription to raft topology feature alive' from Piotr Dulikowski
The storage_service::track_upgrade_progress_to_topology_coordinator
function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature (among other things) before starting the
raft_state_monitor_fiber. The wait is realized by passing a callback to
feature::when_enabled which sets a shared_promise that is waited on by
the tracking fiber. If the feature is already enabled, when_enabled will
call the callback immediately. However, if it's not, then it will return
a non-null listener_registration object - as long as it is alive, the
callback is registered. The listener_registration object was not
assigned to a variable which caused it to be destroyed shortly after the
when_enabled function returns.

Due to that, if upgrade was requested but the current group0 leader
didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled
right after boot, the upgrade would not start until the leader is
changed to a node which has that cluster feature already enabled on
boot. Moreover, the topology coordinator would not start on such a node
until the node were rebooted.

Fix the issue by assigning the subscription to a variable.

Fixes: scylladb/scylladb#18049

Closes scylladb/scylladb#18051

* github.com:scylladb/scylladb:
  gms: feature: mark when_enabled(func) with nodiscard
  storage_service: keep subscription to raft topology feature alive
2024-03-27 14:46:43 +01:00
Avi Kivity
96a3544739 Merge 'alternator: reduce stall for Query and Scan with large pages' from Nadav Har'El
Before this series, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this PR reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use a new result_set::visit_gently() method
that does what visit() does, but with yields when needed.

This PR improves #17995, but does not completely fix is as the stalls in the
are not completely eliminated. But on my laptop it usually reduces the stalls
to around 5ms. It appears that the remaining stalls some from other places
not fixed in this PR, such as perhaps query_page::handle_result(), and will need
to be fixed by additional patches.

Closes scylladb/scylladb#18036

* github.com:scylladb/scylladb:
  alternator: reduce stall for Query and Scan with large pages
  result_set: introduce visit_gently()
  alternator: coroutinize do_query() function
2024-03-27 15:06:32 +02:00
Kamil Braun
404406e6a1 Merge ' test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables' from Botond Dénes
Memtables are fickle, they can be flushed when there is memory pressure,
if there is too much commitlog or if there is too much data in them. The
tests in test_select_from_mutation_fragments.py currently assume data
written is in the memtable. This is tru most of the time but we have
seen some odd test failures that couldn't be understood.  To make the
tests more robust, flush the data to the disk and read it from the
sstables. This means that some range scans need to filter to read from
just a single mutation source, but this does not influence the tests.
Also fix a use-after-return found when modifying the tests.

This PR tentatively fixes the below issues, based on our best guesses on why they failed (each was seen just once):
Fixes: scylladb/scylladb#16795
Fixes: scylladb/scylladb#17031

Closes scylladb/scylladb#17562

* github.com:scylladb/scylladb:
  test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables
  cql3: select_statement: mutation_fragments_select_statement: fix use-after-return
2024-03-27 13:21:19 +01:00
Botond Dénes
fdd5367974 Merge 'compaction: implement unchecked_tombstone_compaction' from Ferenc Szili
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval

Fixes #1487

Closes scylladb/scylladb#17976

* github.com:scylladb/scylladb:
  removed forward declaration of resharding_descriptor
  compaction options and troubleshooting docs
  cql-pytest/test_compaction_strategy_validation.py
  test/boost/sstable_compaction_test.cc
  compaction: implement unchecked_tombstone_compaction
2024-03-27 13:56:02 +02:00
Kefu Chai
6bd0be71ab mutation: add fmt::formatter for invalid_mutation_fragment_stream
before this change, we rely on the default-generated fmt::formatter
created from operator<<. but this depends on the
`FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10.

this change addresses the formatting with fmtlib < 10, and without
`FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up,
it defines formatter for classes derived from `std::exception`, so
our formatter is only added when compiled with {fmt} < 10.

in this change, `fmt::formatter<invalid_mutation_fragment_stream>`
is added for backward compatibility with {fmt} < 10.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18053
2024-03-27 13:37:48 +02:00
Kefu Chai
d1e8d89ae2 doc: topology-over-raft: add transition_state to node state diagram
in order to help the developers to understand the transitions
of `node_state` and the `transition_state` on each of the `node_state`,
in this change, the nested state machine diagram is added to the
node state diagram.

please note, instead of trying to merge similar states like
bootstrapping and replacing into a single state, we keep them as
separate ones, and replicate the nested state machine diagram in them
as well, to be more clear.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18025
2024-03-27 12:16:35 +01:00
Andrei Chekun
0752ef1481 test: remove skip annotation for multi-DC test with 5 DCs with one node in each
As a follow-up of the https://github.com/scylladb/scylladb/pull/17503 remove skip annotation for the multi-DC test with a reduced amount of the DC used in it: from 30 DCs to 5 DCs

Closes scylladb/scylladb#17898
2024-03-27 13:13:13 +02:00
Michał Chojnowski
295b27a07b cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row
Calling `_next_row.get_iterator_in_latest()` is illegal when `_next_row` is not
pointing at a row. In particular, the iterator returned by such call might be
dangling.

We have observed this to cause a use-after-free in the field, when a reverse
read called `maybe_add_to_cache` after `_latest_it` was left dangling after
a dead row removal in `copy_from_cache_to_buffer`.

To fix this, we should ensure that we only call `_next_row.get_iterator_in_latest`
is pointing at a row.

Only the occurrences of this problem in `maybe_add_to_cache` are truly dangerous.
As far as I can see, other occurrences can't break anything as of now.
But we apply fixes to them anyway.

Closes scylladb/scylladb#18046
2024-03-27 11:48:42 +01:00
Kamil Braun
d274f63d89 Merge 'Add support for "initial-token" parameter in raft mode' from Gleb
Fixes scylladb/scylladb#17893

* 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev:
  dht: drop unused parameter from get_random_bootstrap_tokens() function
  test: add test for initial_token parameter
  topology coordinator: use provided initial_token parameter to choose bootstrap tokens
  topology cooordinator: propagate initial_token option to the coordinator
2024-03-27 11:41:06 +01:00
Kefu Chai
71a519dee8 test: unit: add fmt::formatter for test_data in tests
this change is created in same spirit of d1c35f943d.

before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for test_data in
radix_tree_stress_test.cc, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
4f8c1a4729 test/lib: do not print with fmt::to_string()
we should not format a variable unless we want to print it. in this
case, we format `first_row` using `fmt::to_string()` to a string,
and then insert the string to another string, despite that this is
in a cold path, this is still a anti pattern -- both convoluted,
and not performant.

so let's just pass `first_row` to `format()`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
d0ceb35e7e test/boost: print runtime_error using e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Benny Halevy
8a77319cb7 network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
Allocate first from new (unpopulated) racks before
allocating from racks that are already populated
with replicas.

Still, rotate both new and existing racks by tablet id
to ensure fairness.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
c5ff060dee network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
Test that tablet allocation is balanced across
racks, nodes, and shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
4a7d57525e network_topology_strategy: reallocate_tablets: support deallocation via rf change
Add support for deallocating tablet replicas when the
datacenter replication factor is decreased.

We deallocate replicas back-to-front order to maintain
replica pairing between the base table and
its materialized views.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
1e8f8db5b8 network_topology_startegy_test: tablets_test: randomize cases
Instead of deterministically testing a very small set of cases,
randomize the the shard_count per node, the cluster topology
and the NetworkTopologyStrategy options.

The next patch will extend the test to also test
`reallocate_tablets` with randomized options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
898cd1d404 network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
Base initial tablets allocation for new table
on the dc/rack topology, rather then on the token ring,
to remove the dependency on token ownership.

We keep the rack ordinal order in each dc
to facilitate in-rack pairing of base/view
replica pairing, and we apply load-balancing
principles by sorting the nodes in each rack
by their load (number of tablets allocated to
the node), and attempting to fill lease-loaded
nodes first.

This method is more efficient than circling
the token ring and attemting to insert the endpoints
to the natural_endpoint_tracker until the replication
factor per dc is fulfilled, and it allows an easier
way to incrementally allocate more replicas after
rf is increased.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:21 +02:00
Botond Dénes
f70f04c240 tools/scylla-nodetool: repair: abort on first failed repair
When repairing multiple keyspaces, bail out on the first failed keyspace
repair, instead of continuing and reporting all failures at the end.
This is what Origin does as well.
2024-03-27 05:46:18 -04:00
Mikołaj Grzebieluch
fa4193e09f Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit 230f23004b.
2024-03-27 10:39:01 +01:00
Benny Halevy
40a4b349bd network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
Test that we attempting to allocate tablets
throws an error when there are not enough nodes
for the configured replication factor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
f19dbb4ae5 network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
Using e.g. `BOOST_CHECK_EQUAL(endpoints.size(), total_rf)`
rather than `BOOST_CHECK(endpoints.size() == total_rf)`
prints a more detailed error message that includes the
runtime valies, if it fails.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
93b6573a90 network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
c11ffd14cc network_topology_strategy_test: endpoints_check: strictly check rf for tablets
With tablet we want to verify that the number of
replicas allocated per tablet per dc exactly matches
the replication strategy per-dc replication factor options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
ffa5870758 network_topology_strategy_test: full_ring_check for tablets: drop unused options param
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Botond Dénes
764e9a344d test/nodetool: nodetool(): add check_return_code param
When set to false, the returncode is not checked, this is left to the
caller. This in turn allows for checking the expected and unexpected
requests which is not checked when the nodetool process fails.
This is used by utils._do_check_nodetool_fails_with(), so that expected
and unexpected requests are checked even for failed invocations.

Some test need adjustment to the stricter checks.
2024-03-27 04:18:19 -04:00
Botond Dénes
8f3b1db37f test/nodetool: nodetool(): return res object instead of just stdout
So callers have access to stderr, return code and more.
This causes some churn in the test, but the changes are mechanical.
2024-03-27 04:18:19 -04:00
Kefu Chai
2e2c3a5fea locator: fix a typo in comment
s/Substracts/Subtracts/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18048
2024-03-27 10:15:18 +02:00
Piotr Dulikowski
e76817502f gms: feature: mark when_enabled(func) with nodiscard
The feature::when_enabled function takes a callback and returns a
listener_registration object. Unless the feature were enabled right from
the start, the listener_registration will be non-null and will keep the
callback registered until the registration is destroyed. If the
registration is destroyed before the feature is enabled, the callback
will not be called. It's easy to make a mistake and forget to keep the
returned registration alive - especially when, in tests, the feature is
enabled early in boot, because in that case when_enabled calls the
callback immediately and returns a null object instead.

In order to prevent issues with prematurely dropped
listener_registration in the future, mark feature::when_enabled with the
[[nodiscard]] attribute.
2024-03-27 08:55:45 +01:00
Piotr Dulikowski
7ea6e1ec0a storage_service: keep subscription to raft topology feature alive
The storage_service::track_upgrade_progress_to_topology_coordinator
function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature (among other things) before starting the
raft_state_monitor_fiber. The wait is realized by passing a callback to
feature::when_enabled which sets a shared_promise that is waited on by
the tracking fiber. If the feature is already enabled, when_enabled will
call the callback immediately. However, if it's not, then it will return
a non-null listener_registration object - as long as it is alive, the
callback is registered. The listener_registration object was not
assigned to a variable which caused it to be destroyed shortly after the
when_enabled function returns.

Due to that, if upgrade was requested but the current group0 leader
didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled
right after boot, the upgrade would not start until the leader is
changed to a node which has that cluster feature already enabled on
boot. Moreover, the topology coordinator would not start on such a node
until the node were rebooted.

Fix the issue by assigning the subscription to a variable.
2024-03-27 08:55:45 +01:00
Botond Dénes
2d12db81cf Merge 'docs: document nodetool {getsstables, sstableinfo}' from Kefu Chai
these two subcommands are provided by cassandra, and are also implemented natively in scylla. so let's document them.

Closes scylladb/scylladb#17982

* github.com:scylladb/scylladb:
  docs/operating-scylla: document nodetool sstableinfo
  docs/operating-scylla: document nodetool getsstables
2024-03-27 09:04:55 +02:00
Botond Dénes
4d98b7d532 test/nodetool: count unexpected requests
We currently check at the end of each test, that all expected requests
set by the test were consumed. This patch adds a mechanism to count
unexpected requests -- requests which didn't match any of the expected
ones set by the test. This can be used to asser that nodetool didn't
make any request to the server, beyond what the test expected it to do.
Before this patch, requests like this would only be noticed by the test,
if the response of 404/500 caused nodetool to fail, which is not always
the case.
2024-03-27 02:39:28 -04:00
Kefu Chai
8af9c735f2 docs/operating-scylla: document nodetool sstableinfo
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 07:29:24 +08:00
Kefu Chai
da90e368dc docs/operating-scylla: document nodetool getsstables
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 07:29:24 +08:00
Pavel Emelyanov
04370dc8a4 tablets: Introduce substract_sets()
There are several places in code that calculate replica sets associated
with specific tablet transision. Having a helper to substract two sets
improves code readability.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18033
2024-03-26 23:33:06 +02:00
Tomasz Grabiec
042a4b7627 Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El
The CDC feature is not supported on a table that uses tablets
(Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This PR does this.

The warning text which will be produced is the following (obviously, it can
be improved later, as we perhaps find more missing features):

>   "Tables in this keyspace will be replicated using tablets, and will
>    not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from
>    issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop
>    this keyspace and re-create it without tablets, by adding AND TABLETS
>    = {'enabled': false} to the CREATE KEYSPACE statement."

This PR also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets. It also fixes existing tests which didn't like the new warning.

Fixes https://github.com/scylladb/scylladb/issues/16807

Closes scylladb/scylladb#17318

* github.com:scylladb/scylladb:
  tablets: add warning on CREATE KEYSPACE
  test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
2024-03-26 20:04:07 +01:00
Gleb Natapov
9b00847f31 dht: drop unused parameter from get_random_bootstrap_tokens() function 2024-03-26 18:43:31 +02:00
Gleb Natapov
ed534fde8f test: add test for initial_token parameter
Test that configured tokens are used and tokens collision is detected.
2024-03-26 18:43:31 +02:00
Gleb Natapov
06952ec6dd topology coordinator: use provided initial_token parameter to choose bootstrap tokens
Use the same logic as with gossiper to choose bootstrap tokens in case
initial_token parameters is not empty.
2024-03-26 18:43:25 +02:00
Gleb Natapov
6ab78e13c6 topology cooordinator: propagate initial_token option to the coordinator
The patch propagates initial_token option to the topology coordinator
where it is added to join request parameter.
2024-03-26 18:43:16 +02:00
Marcin Maliszkiewicz
e1fea3af6b main: reload service levels data accessor after join_cluster
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.
2024-03-26 17:36:03 +01:00
Nadav Har'El
ba97fd98a3 alternator: reduce stall for Query and Scan with large pages
Before this patch, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this patch reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use the result_set::visit_gently() method
instead of visit() that yields when needed.

This patch does not completely eliminate stalls in the test, but
on my laptop usually reduces them to around 5ms. It appears that
the remaining stalls some from other places not fixed in this PR,
such as perhaps query_page::handle_result(), and will need to be
fixed by additional patches.

The test included in this patch is useful for manually reproducing
the stall, but not useful as a regression test: It is slow (requiring
a couple of seconds to set up the large partition) and doesn't
check anything, and can't even report the stall without modifying the
test runner. So the test is skipped by default (using the "veryslow"
marker) and can be enabled and run manually by developers who want
to continue working on #17995.

Refs #17995.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:45 +02:00
Nadav Har'El
e24629a635 result_set: introduce visit_gently()
Whereas result_set::visit() passes all the rows the the visitor and
returns void, this patch introduces a method visit_gently() that returns
a future, and may yield before visiting each row.

This method will be used in the next patch to allow Alternator, which
used visit() to convert a result_set into JSON format, to potentially
yield between rows and avoid large stalls when converting a large
result set.

Note that I decided to add the yield points in the new visit_gently()
between rows - not between each cell. Many places in our code (including
the memtable) already work on a per-row basis and do not yield in the
middle of a row, so it won't really be helpful to do this either.
But if we'll want, we will still be able to modify visit_gently() later
to be even more gentle, and yield between individual cells. The callers
shouldn't know or care.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:11 +02:00
Marcin Maliszkiewicz
ff17a29b54 service: qos: create separate function for reloading data accessor
Scylla's main is already too long, it's better to contain this logic inside qos service.
2024-03-26 17:26:19 +01:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Nadav Har'El
c146b1224c alternator: coroutinize do_query() function
This patch changes the do_query() function, used to implement Alternator's
Query and Scan operations, from using continuations to be a coroutine.
There are no functional changes in this patch, it's just the necessary
changes to convert the function to a coroutine.

The new code is easier to read and less indented, but more importantly,
will be easier to extend in the next patch to add additional awaits
in the middle of the function.

In additional to the obvious changes, I also had to rename one local
variable (as the same name was used in two scopes), and to convert
pass-by-rvalue-reference to pass-by-value (these parameters are *moved*
by the caller, and moreover the old code had to move them again to a
continuation, so there is no performance penalty in this change).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 15:08:08 +02:00
Pavel Emelyanov
8bf9098663 system_keyspace: Consolidate node-state vs tokens checks
When loading topology state, nodes are checked to have or not to have
"tokens" field set. The check is done based on node state and it's
spread across the loading method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17957
2024-03-26 14:55:46 +02:00
Avi Kivity
22b8065a89 Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes
These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html.

They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565.

The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation.

Closes scylladb/scylladb#17979

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the sstableinfo command
  tools/scylla-nodetool: implement the getsstables command
  tools/scylla-nodetool: move get_ks_cfs() to the top of the file
  test/nodetool: rest_api_mock.py: add expected_requests context manager
2024-03-26 14:38:00 +02:00
Kefu Chai
101fdfc33a test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

also, it's impossible to partial specialize a nested type of a
template class, we cannot specialize the `fmt::formatter` for
`stop_crash<M>::result_type`, as a workaround, a new type is
added.

in this change,

* define a new type named `stop_crash_result`
* add fmt::formatter for `stop_crash_result`
* define stop_crash::result_type as an alias of `stop_crash_result`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18018
2024-03-26 12:18:55 +02:00
Pavel Emelyanov
67c2a06493 api: Rename (un)set_server_load_sstable -> (un)set_server_column_family
The method sets up column family API, not load-sstables one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18022
2024-03-26 12:16:08 +02:00
Botond Dénes
7edbf189e6 Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai
`UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10.

in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped.

Closes scylladb/scylladb#18020

* github.com:scylladb/scylladb:
  utils: UUID: drop UUID::to_sstring()
  treewide: use fmt::to_string() to transform a UUID to std::string
2024-03-26 12:14:56 +02:00
Kefu Chai
f3532cbaa0 db: commitlog: use fmt::streamed() to print segment
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change:

* add `format_as()` for `segment` so we can use it as a fallback
  after upgrading to {fmt} v10
* use fmt::streamed() when formatting `segment`, this will be used
  the intermediate solution before {fmt} v10 after dropping
  `FMT_DEPRECATED_OSTREAM` macro

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18019
2024-03-26 12:13:29 +02:00
Botond Dénes
cd9589ec78 Merge 'test.py: Sanitize test list creation' from Pavel Emelyanov
To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments).

This patch keeps both list-checking code close to each other so that the intent is explicitly clear.

Closes scylladb/scylladb#17981

* github.com:scylladb/scylladb:
  test.py: Give local variable meaningful name
  test.py: Sanitize test list creation
2024-03-26 12:09:49 +02:00
Marcin Maliszkiewicz
5844d66676 auth: coroutinize service::start 2024-03-26 09:45:15 +01:00
Patryk Jędrzejczak
13fecd4e36 raft topology: decommission: allow only in NORMAL mode
We move the mode check so that the raft-based decommission also uses
it. Without this check, it hanged after the drain operation instead
of instantly failing. `test_decommission_after_drain_is_invalid` was
failing because of it with the raft-based topology enabled.

Fixes scylladb/scylladb#16761

Closes scylladb/scylladb#18000
2024-03-26 08:52:26 +01:00
Botond Dénes
f0ff23492f Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov
There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways.

First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped.

Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that.

Closes scylladb/scylladb#17964

* github.com:scylladb/scylladb:
  test: Do not duplicate test name in several skip-lists
  test: Mark tests with skip_mode instead of suite skip-list
2024-03-26 08:24:57 +02:00
Kefu Chai
a047178fe7 utils: UUID: drop UUID::to_sstring()
this function is not used anymore, and it relies on
`FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for
`UUID`, and this feature is deprecated in {fmt} v9, and
dropped in {fmt} v10.

in this change, let's drop this member function.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Kefu Chai
1b859e484f treewide: use fmt::to_string() to transform a UUID to std::string
without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is
implemented using its `fmt::formatter`, which is not available
at the end of this header file where `UUID` is defined. at this moment,
we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can
still use `UUID::to_sstring()`, but in {fmt} v10, we cannot.

so, in this change, we change all callers of `UUID::to_sstring()`
to `fmt::to_string()`, so that we don't depend on
`FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Wojciech Mitros
9789a3dc7c mv: keep semaphore units alive until the end of a remote view update
When a view update has both a local and remote target endpoint,
it extends the lifetime of its memory tracking semaphore units
only until the end of the local update, while the resources are
actually used until the remote update finishes.
This patch changes the semaphore transferring so that in case
of both local and remote endpoints, both view updates share the
units, causing them to be released only after the update that
takes longer finishes.

Fixes #17890

Closes scylladb/scylladb#17891
2024-03-25 19:43:58 +02:00
Tzach Livyatan
6702ba3664 Docs: Add link from migration tools page to nodetool refresh load and stream
Closes scylladb/scylladb#18006
2024-03-25 17:47:05 +02:00
Botond Dénes
1ea7b408db tools/scylla-nodetool: implement the sstableinfo command 2024-03-25 11:29:30 -04:00
Botond Dénes
50da93b9c8 tools/scylla-nodetool: implement the getsstables command 2024-03-25 11:29:30 -04:00
Botond Dénes
f51061b198 tools/scylla-nodetool: move get_ks_cfs() to the top of the file
So it can be used by all commands.
2024-03-25 11:29:30 -04:00
Botond Dénes
4ff88b848c test/nodetool: rest_api_mock.py: add expected_requests context manager
So tests and fixtures can use `with expected_requests():` and have
cleanup be taken care for them. I just discovered that some tests do not
clean up after themselves and when running all tests in a certain order,
this causes unrelated tests to fail.
Fix by using the context everywhere, getting guaranteed cleanup after
each test.
2024-03-25 11:29:30 -04:00
Petr Gusev
7c84fc527b test_invalid_user_type_statements: increase raft timeout
The test creates ut4 with a lot of fields,
this may take a while in debug builds,
to avoid raft operation timeout set the threshold
to some big value.

The error injector is disabled in release builds,
so this settings won't be applied to them.
This shouldn't be a problem since release builds
are fast enough, even on arm.

Fixes scylladb/scylladb#17987

Closes scylladb/scylladb#17997
2024-03-25 14:52:16 +01:00
Ferenc Szili
8bb7a18de2 test/cql-pytest: add --omit-scylla-output to Cassandra test runs
Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.

This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output

Closes scylladb/scylladb#17996
2024-03-25 15:14:45 +02:00
Pavel Emelyanov
16343b3edc test: Do not duplicate test name in several skip-lists
Some tests are only run in dev mode for some reason. For such tests
there's run_in_dev list, no need in putting it in all the non-dev
skip_in_... ones.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
90dfcec86b test: Mark tests with skip_mode instead of suite skip-list
There are many tests that are skipped in release mode becuase they rely
on error-injection machinery which doesn't work in release mode. Most of
those tests are listed in suite's skip_in_release, but it's not very
handy, mainly because it's not clear why the test is there. The
skip_mode decoration is much more convenient.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
2c90aeb5ee test.py: Give local variable meaningful name
Rename t to testname as it's more informative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:53:48 +03:00
Pavel Emelyanov
b2f5b63aaa test.py: Sanitize test list creation
To create the list of tests to run there's a loop that fist collects all
tests from suits, then filters the list in two ways -- excludes
opt-out-ed lists (disabled and matching the skip pattern) or leaves
there only opt-in-ed (those, specified as positional arguments).

This patch keeps both list-checking code close to each other so that the
intent is explicitly clear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:53:20 +03:00
Kamil Braun
69bf962522 Merge 'allow changing snitch with topology over raft' from Gleb
Fixes scylladb/scylladb#17513

* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
  doc: amend snitch changing procedure to work with raft
  test: add test to check that snitch change takes effect.
  raft topology: update rack/dc info in topology state on reboot if changed
2024-03-25 10:41:39 +01:00
Gleb Natapov
3b272c5650 doc: amend snitch changing procedure to work with raft
To change snitch with raft all nodes need to be started simultaneously
since each node will try to update its state in the raft and for that
quorum is required.
2024-03-25 11:31:30 +02:00
Beni Peled
eecfd164ff Remove docs-amplify-enhanced github-workflow
Since we implemented the CI-Docs on pkg, there is no need for this
workflow

Closes scylladb/scylladb#17908
2024-03-25 11:30:06 +02:00
Kefu Chai
e97ae6b0de raft: server: print pointee of server_impl::_fsm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, instead of printing the `unique_ptr` instance, we
print the pointee of it. since `server_impl` uses pimpl paradigm,
`_fsm` is always valid after `server_impl::start()`, we can always
deference it without checking for null.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17953
2024-03-25 11:20:34 +02:00
Botond Dénes
ff421168d0 Update tools/jmx submodule
* tools/jmx 3257897a...53696b13 (1):
  > dist/debian: do not use substvar of ${shlib:Depends}
2024-03-25 11:16:25 +02:00
Gleb Natapov
d7adf26a56 test: add test to check that snitch change takes effect.
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
2024-03-25 10:41:49 +02:00
Kefu Chai
4eabf8b617 topology_coordinator: add fmt::formatter for wait_for_ip_timeout
before this change, we rely on the default-generated fmt::formatter
created from operator<<. but this depends on the
`FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10.

this change addresses the formatting with fmtlib < 10, and without
`FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up,
it defines formatter for classes derived from `std::exception`, so
our formatter is only added when compiled with {fmt} < 10.

in this change, `fmt::formatter<service::wait_for_ip_timeout>` is
added for backward compatibility with {fmt} < 10.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17955
2024-03-25 10:39:38 +02:00
Kefu Chai
5d59dd585f configure.py: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
at the first run of `configure.py`, once these files are around, they
are not updated despite that `SCYLLA_VERSION_GEN` does not generate
them as long as the release string retrieved from git sha1 is identical
the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun
`SCYLLA_VERSION_GEN` at all.

but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.

in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct ninja
to re-stat the output files, see
https://ninja-build.org/manual.html#ref_rule, in order to avoid
unnecessary rebuild. so the downside is that `SCYLLA_VERSION_GEN`
is executed every time we run `ninja` even if all targets are updated.
but the upside is that the release number reported by scylla is
accurate even if we perform incremental build.

also, since we encode the product, version and release stored
in the above files in the generated `build.ninja` file, in this change,
these three files are added as dependencies of `build.ninja`,
so that this file is regenerated if any of them is newer than
`build.ninja`.

Fixes #8255

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17974
2024-03-25 10:29:42 +02:00
Kefu Chai
5bc6d83f3b build: cmake: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
when CMake generate `build.ninja` for the first time, once these files
are around, they are not updated anymore. despite that
`SCYLLA_VERSION_GEN` does not generate them as long as the release
string retrieved from git sha1 is identical the one stored in
`SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at
all.

but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.

in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct CMake
to regenerate `build.ninja` if any of these files is updated.

Fixes #17975
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17983
2024-03-25 10:28:28 +02:00
Kefu Chai
0eb990fbf6 .github: skip "raison" when running codespell workflow
codespell workflow checks for misspellings to identify common
mispellings. it considers "raison" in "raison d'etre" (the accent
mark over "e" is removed , so the commit message can be encoded in
ASCII), to the misspelling of "reason" or "raisin". apparently, the
dictionary it uses does not include les mots francais les plus
utilises.

so, in this change, let's ignore "raison" for this very use case,
before we start the l10n support of the document.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17985
2024-03-25 09:51:12 +02:00
Kefu Chai
0713c324d4 cql3: provide fmt::formatter for cql3_type::raw only for {fmt} < 10
since we already have `format_as()` for `cql3_type::raw`, there is no
need to provide `cql3_type::raw` if the tree is compiled with {fmt} >= 10,
otherwise compiler is not able to figure out which one to match, see the
errror at the end of this commit message. so, in this change, we only
provide the specialized `fmt::formatter` for `cql3_type::raw` when
{fmt} < 10. this should address the FTBFS with {fmt} >= 10.

```
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1040:25: error: ambiguous partial specializations of 'formatter<cql3::cql3_type::raw>'
 1040 |       = __bool_constant<__is_constructible(_Tp, _Args...)>;
      |                         ^
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1046:16: note: in instantiation of template type alias '__is_constructible_impl' requested here
 1046 |       : public __is_constructible_impl<_Tp, _Args...>
      |                ^
/usr/include/fmt/core.h:1420:13: note: in instantiation of template class 'std::is_constructible<fmt::formatter<cql3::cql3_type::raw>>' requested here
 1420 |            !has_formatter<T, Context>::value))>
      |             ^
/usr/include/fmt/core.h:1421:22: note: while substituting prior template arguments into non-type template parameter [with T = cql3::cql3_type::raw]
 1421 |   FMT_CONSTEXPR auto map(const T&) -> unformattable_pointer {
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1422 |     return {};
      |     ~~~~~~~~~~
 1423 |   }
      |   ~
```

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17986
2024-03-25 09:49:40 +02:00
Yaron Kaikov
cb2c69a3f7 github: mergify: Add Ref to original PR
When openning a backport PR, adding a reference to the original PR.
This will be used later for updating the original PR/issue once the
backport is done (with different label)

Closes scylladb/scylladb#17973
2024-03-25 08:12:47 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Kamil Braun
230f23004b Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit b4144d14c6.

The test is flaky and blocks next promotions.
2024-03-22 17:25:04 +01:00
Petr Gusev
2a5f5d1948 test_fencing: fix flakiness
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.

Fix this by disabling the tablet balancer earlier.

fixes scylladb/scylladb#17807

Closes scylladb/scylladb#17940
2024-03-22 12:49:13 +01:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Ferenc Szili
b50a9f9bab removed forward declaration of resharding_descriptor
resharding_descriptor has been removed in e40aa042 in 2020
2024-03-22 11:35:10 +01:00
Ferenc Szili
93395e2ebe compaction options and troubleshooting docs
Added unchecked_tombstone_compaction descrition to compaction docs.
Added section to troubleshooting pointless compaction.
2024-03-22 11:26:17 +01:00
Ferenc Szili
455959b80e cql-pytest/test_compaction_strategy_validation.py
Adds the check for the wording of the validation error on invalid
values of unchecked_tombstone_compaction
2024-03-22 11:22:56 +01:00
Ferenc Szili
5c0de3b097 test/boost/sstable_compaction_test.cc
Checks if the tombstone_threshold value will be ignored if
unchecked_tombstone_compaction is set to true
2024-03-22 11:21:21 +01:00
Kamil Braun
9979adb670 Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak
In this PR, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix. However, this test
needs to block execution of the CDC generation publisher's loop
twice. Currently, error injections with handlers do not allow it
because handlers always share received messages. Apart from the
first created handler, all handlers would be instantly unblocked by
a message from the past that has already unblocked the first
handler. This seems like a general limitation that could cause
problems in the future, so in this PR, we extend injections with
handlers to solve it once and for all. We add the `share_messages`
parameter to the `inject` (with handler) function. Depending on its
value, handlers will share messages (as before) or not.

Fixes scylladb/scylladb#17497

Closes scylladb/scylladb#17934

* github.com:scylladb/scylladb:
  topology_coordinator: clean_obsolete_cdc_generations: fix log
  topology_coordinator: do not clear unpublished CDC generation's data
  topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
  error_injection: allow injection handlers to not share messages
2024-03-22 11:20:26 +01:00
Ferenc Szili
5a65169f46 compaction: implement unchecked_tombstone_compaction
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold,
and decide whether to do a compaction only on the value of tombstone_compaction_interval
2024-03-22 11:19:43 +01:00
Kamil Braun
4359a1b460 Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.

The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.

The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.

Fixes scylladb/scylladb#16604

Closes scylladb/scylladb#17590

* github.com:scylladb/scylladb:
  migration_manager: use raft_timeout{}
  storage_service::join_node_response_handler: use raft_timeout{}
  storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
  storage_service::set_tablet_balancing_enabled: use raft_timeout{}
  storage_service::move_tablet: use raft_timeout{}
  raft_check_and_repair_cdc_streams: use raft_timeout{}
  raft_timeout: test that node operations fail properly
  raft_rebuild: use raft_timeout{}
  do_cluster_cleanup: use raft_timeout{}
  raft_initialize_discovery_leader: use raft_timeout{}
  update_topology_with_local_metadata: use with_timeout{}
  raft_decommission: use raft_timeout{}
  raft_removenode: use raft_timeout{}
  join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
  raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
  raft_group0: make_raft_config_nonvoter: add abort_source parameter
  manager_client: server_add with start=false shouldn't call driver_connect
  scylla_cluster: add seeds parameter to the add_server and servers_add
  raft_server_with_timeouts: report the lost quorum
  join_node_request_handler: add raft_timeout{} for start_operation
  skip_mode: add platform_key
  auth: use raft_timeout{}
  raft_group0_client: add raft_timeout parameter
  raft_group_registry: add group0_with_timeouts
  utils: add composite_abort_source.hh
  error_injection: move api registration to set_server_init
  error_injection: add inject_parameter method
  error_injection: move injection_name string into injection_shared_data
  error_injection: pass injection parameters at startup
2024-03-22 10:45:33 +01:00
Botond Dénes
f02baef871 Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity
Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease
maintenance and reduce recompilations.

Closes scylladb/scylladb#17965

* github.com:scylladb/scylladb:
  test: sstables::test_env: complete pimplification
  test/lib: test_env: move test_env::reusable_sst() to test_services.cc
2024-03-22 11:26:12 +02:00
Botond Dénes
8b2856339a Merge 'github: sync-labels: use more descriptive name for workflow' from Kefu Chai
* rename `sync_labels.yaml` to `sync-labels.yaml`
* use more descrptive name for workflow

Closes scylladb/scylladb#17971

* github.com:scylladb/scylladb:
  github: sync-labels: use more descriptive name for workflow
  github: sync_labels: rename sync_labels to sync-labels
2024-03-22 10:01:56 +02:00
David Garcia
0375faa6aa docs: add experimental tag
Closes scylladb/scylladb#17633
2024-03-22 09:53:30 +02:00
Patryk Wrobel
28ed20d65e scylla-nodetool: adjust effective ownership handling
When a keyspace uses tablets, then effective ownership
can be obtained per table. If the user passes only a
keyspace, then /storage_service/ownership/{keyspace}
returns an error.

This change:
 - adds an additional positional parameter to 'status'
   command that allows a user to query status for table
   in a keyspace
 - makes usage of /storage_service/ownership/{keyspace}
   optional to avoid errors when user tries to obtain
   effective ownership of a keyspace that uses tablets
 - implements new frontend tests in 'test_status.py'
   that verify the new logic

Refs: scylladb#17405
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17827
2024-03-22 09:51:57 +02:00
Yaron Kaikov
407d25e47b [mergify] delete backport branch after merge
Since those branches clutter the branch search UI and we don't need them
after merging

Closes scylladb/scylladb#17961
2024-03-22 09:51:22 +02:00
Calle Wilund
7e09517433 Update seastar submodule
Submodule seastar 6b7b16a8a3..cd8a9133d2:
  > abort_source: add fmt::formatter for abort_requested_exception
  > memory: Ensure thread locals etc are minimally initialized even with non-seastar reactor options for alloc
  > rpc: add fmt::formatter for rpc::error classes and rpc::optional
  > Merge 'Adding Metrics family config' from Amnon Heiman
  > util: add fmt::formatter for bool_class<Tag>
  > util/bool_class: use the default-generated comparison operators
  > membarrier: cooperatively serialize calls to sys_membarrier
  > Merge 'build: relax the version constraint for Protobuf' from Kefu Chai
  > tls: add fmt::formatter for tls::subject_alt_name
  > memory.cc: Fix static init fiasco in system malloc override

diff --git a/seastar b/seastar
index 6b7b16a8a3..cd8a9133d2 160000
--- a/seastar
+++ b/seastar
@@ -1 +1 @@
-Subproject commit 6b7b16a8a329d831b94fdd4b41f6f55b260e9afd
+Subproject commit cd8a9133d2c02f63dbd578d882cf7333a427e194

Closes scylladb/scylladb#17865
2024-03-22 09:49:23 +02:00
Kefu Chai
7ebdfdb705 github: sync-labels: use more descriptive name for workflow
"label-sync" is not very helpful for developers to understand what
this workflow is for.

the "name" field of a job shows in the webpage on github of the
pull request against which the job is performed, so if the author
or reviewer checks the status of the pull request, he/she would
notice these names aside of the workflow's name. for this very
job, what we have now is:

```
Sync labels / label-sync
```

after this change it will be:
```
Sync labels / Synchronize labels between PR and the issue(s) fixed by it
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-22 10:41:20 +08:00
Kefu Chai
af879759b9 github: sync_labels: rename sync_labels to sync-labels
to be more consistent with other github workflows

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-22 10:31:31 +08:00
Michał Jadwiszczak
c0853b461c test: test service levels v2 works in recovery mode 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c551a85cda test: add test for service levels migration 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
5811f696be test: add test for service levels snapshot 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
bf3aed1ecb test:topology: extract trigger_snapshot to utils
The function was defined separately in a few tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
a08918a320 main: create raft dda if sl data was migrated
Create `raft_service_levels_distributed_data_accessor` if service levels
were migrated to v2 table.
This supports raft recovery mode, as service levels will be read from v2
table in the mode.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
dab909b1d1 service:qos: store information about sl data migration
Save information whether service levels data was migrated to v2 table.
The information is stored in `system.scylla_local` table. It's
written with raft command and included in raft snapshot.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
2917ec5d51 service:qos: service levels migration
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.

Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
36c9afda99 main: assign standard service level DDA before starting group0
`topology_state_load()` is responsible for upgrading service level DDA,
so the standard DDA has to be assigned before to be upgraded
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
159a6a2169 service:qos: fix is_v2() method 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
fd32f5162a service:qos: add a method to upgrade data accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d403bdfdd5 test: add unit_test_raft_service_levels_accessor
Raft service level data accessor with logic simillar to
`unit_test_service_levels_accessor` to avoid sleeps in boost tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
8bbeea0169 service:storage_service: add support for service levels raft snapshot
Include mutations from `system.service_levels_v2` in `raft_snapshot`.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d5fa0747d7 service:qos: add abort_source for group0 operations
Add mechanism to abort ongoing group0 operations while draining
service_level_controller or leaving the cluster.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
7e61bbb0d5 service:qos: raft service level distributed data accessor
`raft_service_level_distributed_data_accessor` works this way:
- on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2`
  table with CL = LOCAL_ONE
- on write path it starts group0 operation and it makes the change
  using raft command
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
71c07addb5 service:qos: use group0_guard in data accessor
Adjust service_level_controller and
service_level_controller::service_level_distributed_data_accessor
interfaces to take `group0_guard` while adding/altering/dropping a
service level.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
da82c5f0b0 cql3:statements: run service level statements on shard0 with raft guard
To migrate service levels to be raft managed, obtain `group0_guard` to
be able to pass it to service_level_controller's methods.

Using this mechanism also automatically provides retries in case of
concurrent group0 operation.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
674286b868 test: fix overrides in unit_test_service_levels_accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c0e22fcb9c service:qos: fix indentation 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
1f3c6b2813 service:qos: coroutinize some of the methods
Functions:
- `service_level_controller::set_distributed_service_level()`
- `service_level_controller::drop_distributed_service_level()`
- `service_level_controller::drain()`

Coroutines increase readability of those functions.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
8e242f5acd db:system_keyspace: add SERVICE_LEVELS_V2 table
The table has the same schema as `system_distributed.service_levels`.
However it's created entirely at once (unlike old table which creates
base table first and then it adds other columns) because `system` tables
are local to the node.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
990c5e7dd0 service:qos: extract common service levels' table functions
Getting a service level(s) will be done the same way in raft-based
service levels as it's done in standard service levels, so those
funtions are extracted to reused it.
2024-03-21 23:14:57 +01:00
Avi Kivity
b530dc1e3b test: sstables::test_env: complete pimplification
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.

Complete the pimplification:
 - the `impl` nested struct is moved out-of-line
 - all non-template member functions are moved out-of-line
 - a destructor is declared and defined out-of-line
 - the move constructor is also defined (necessary after the destructor is
   defined)

After this, we can forward-declare more components.
2024-03-21 22:29:01 +02:00
Avi Kivity
d745929b44 test/lib: test_env: move test_env::reusable_sst() to test_services.cc
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.

Move toc_filename with it, as it is its only caller and it is static.
2024-03-21 22:21:02 +02:00
Kefu Chai
900b56b117 raft_group0: print runtime_error by printing e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17954
2024-03-21 19:43:52 +02:00
Avi Kivity
f0ca5e5a08 Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10:

* `utils::bad_exception_container_access`
* `cdc::no_generation_data_exception`
* classes derived from `sstables::malformed_sstable_exception`
* classes derived from `cassandra_exception`

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17944

* github.com:scylladb/scylladb:
  cdc: add fmt::formatter for exception types in data_dictionary.hh
  utils: add fmt::formatter for utils::bad_exception_container_access
  sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception
  exceptions: add fmt::formatter for classes derived from cassandra_exception
  cdc: add fmt::formatter for cdc::no_generation_data_exception
2024-03-21 18:44:37 +02:00
Botond Dénes
f9104fbfa9 tools/toolchain/image: update python driver (implicit)
Fixes: #17662

Closes scylladb/scylladb#17956
2024-03-21 18:27:40 +02:00
Andrei Chekun
7de28729e7 test: change maintenance socket location to /tmp
Fixes #16912

By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.

Closes scylladb/scylladb#17941
2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak
33a0864aaa topology_coordinator: clean_obsolete_cdc_generations: fix log
We use a non-inclusive bound here, so the log was incorrect.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
27465a00e0 topology_coordinator: do not clear unpublished CDC generation's data
In this commit, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
f45aebeee2 topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
In the following commit, we add a test that needs to block the CDC
generation publisher's loop twice. We allow it in this commit by
making handlers of the `cdc_generation_publisher_fiber` injection
share messages. From now on, unblocking every step of the loop will
require sending a new message from the test.

This change breaks the test already using the
`cdc_generation_publisher_fiber` injection, so we adjust the test.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
c5c4cc7d00 error_injection: allow injection handlers to not share messages
For a single injection, all created injection handlers share all
received messages. In particular, it means that one received message
unblocks all handlers waiting for the first message. This behavior
is often desired, for example, if multiple fibers execute the
injected code and we want to unblock them all with a single message.
However, there is a problem if we want to block every execution
of the injected code. Apart from the first created handler, all
handlers will be instantly unblocked by messages from the past that
have already unblocked the first handler.

In one of the following commits, we add a test that needs to block
the CDC generation publisher's loop twice. Since it looks like there
are no good workarounds for this arguably general problem, we extend
injections with handlers in a way that solves it. We introduce the
new `share_messages` parameter. Depending on its value, handlers
will share messages or not. The details are described in the new
comments in `error_injection.hh`.

We also add some basic unit tests for the new funcionality.
2024-03-21 14:35:38 +01:00
Petr Gusev
ae0ec19537 migration_manager: use raft_timeout{}
Checking all the call sites of the migration manager shows
that all of them are initiated by user requests,
not background activities. Therefore, we add a global
raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
294e1ff464 storage_service::join_node_response_handler: use raft_timeout{}
This function is called as part of a node join procedure
initiated by the user, so having timeouts here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
e53189dcdc storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
6e350fb580 storage_service::set_tablet_balancing_enabled: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
22d7c62c3c storage_service::move_tablet: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
dafd5d0160 raft_check_and_repair_cdc_streams: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
ca21362ade raft_timeout: test that node operations fail properly 2024-03-21 16:35:48 +04:00
Petr Gusev
dcc275cb0f raft_rebuild: use raft_timeout{}
This is a user-requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
8deb06647a do_cluster_cleanup: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
d5d2f04cd6 raft_initialize_discovery_leader: use raft_timeout{}
This function is called as part of a node startup
procedure, so a timeout may be useful.

As outlined in the comment, there is no valid way
we can lose quorum here, but some subsystems may
just become unreasonably slow for various reasons,
so we nonetheless use raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
f498cfae79 update_topology_with_local_metadata: use with_timeout{}
This function is called as part of a node startup
procedure, so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
f1f77b4882 raft_decommission: use raft_timeout{}
This is a user requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
aabcc0852a raft_removenode: use raft_timeout{}
This is a user requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
099c756ba1 join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
We also add a specific test_quorum_lost_during_node_join. It
exercises the case when the quorum is lost after start_operation
but before these methods are called.
2024-03-21 16:35:48 +04:00
Petr Gusev
0ad852e323 raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
We'll use this parameter in subsequent commits.
2024-03-21 16:35:48 +04:00
Petr Gusev
ce7fb39750 raft_group0: make_raft_config_nonvoter: add abort_source parameter 2024-03-21 16:35:48 +04:00
Petr Gusev
99ddffac32 manager_client: server_add with start=false shouldn't call driver_connect
If the server is not started there is not point
in starting the driver, it would fail because there
are no nodes to connect to. On the other hand, we
should connect the driver in server_start()
if it's not connected yet.
2024-03-21 16:35:48 +04:00
Petr Gusev
3f6cf38dd5 scylla_cluster: add seeds parameter to the add_server and servers_add
If this parameter is set, we use its value for
the scylla.yaml of the new node, otherwise we
use IPs of all running nodes as before.

We'll need this parameter in subsequent commits to
restrict the communication between nodes.

We remove default values for _create_server_add_data parameters
since they are redundant - in the two call sites we pass all
of them.
2024-03-21 16:35:48 +04:00
Petr Gusev
99419d5964 raft_server_with_timeouts: report the lost quorum
In this commit we extend the timeout error message with
additional context - if we see that there is no quorum of
available nodes, we report this as the most likely
cause of the error.

We adjust the test by adding this new information to the
expected_error. We need raft-group-registry-fd-threshold-in-ms
to make _direct_fd threshold less than
group0-raft-op-timeout-in-ms.
2024-03-21 16:35:48 +04:00
Petr Gusev
1a3fc58438 join_node_request_handler: add raft_timeout{} for start_operation
In the test, we use the group0-raft-op-timeout-in-ms parameter to
reduce the timeout to one second so as not to waste time.

The join_node_request_handler method contains other group0 calls
which should have timeouts (make_nonvoters and add_entry). They
will be handled in a separate commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
854531ae8e skip_mode: add platform_key
In subsequent commits we are going to add test.py
tests for raft_timeout{} feature. The problem is that
aarch/debug configuration is infamously slow. Timeout
settings used in tests work for all platforms but aarch/debug.

In this commit we extend the skip_mode attribute with the
platform_key property. We'll use @skip_mode('debug', platform_key='aarch64')
to skip the tests for this specific configuration.
The tests will still be run for aarch64/release.
2024-03-21 16:35:43 +04:00
Yaron Kaikov
5bd6b4f4c2 github: sync_labels: match issue number with better pattern
Seen in https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535

```
python .github/scripts/sync_labels.py --repo scylladb/scylladb --number 17309 --action labeled --label backport/none
  shell: /usr/bin/bash -e {0}
  env:
    GITHUB_TOKEN: ***

Found issue number: ('', '', '15465')
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 9[3](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:3), in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
    sync_labels(repo, args.number, args.label, args.action, args.is_issue)
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line [7](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:8)1, in sync_labels
    target = repo.get_issue(int(pr_or_issue_number))
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'tuple'
Error: Process completed with exit code 1.
```

Fixing the pattern to catch all GitHub supported close keywords as
describe in https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

Fixed: https://github.com/scylladb/scylladb/issues/17917
Fixed: https://github.com/scylladb/scylladb/issues/17921

Closes scylladb/scylladb#17920
2024-03-21 14:25:24 +02:00
Petr Gusev
e335b17190 auth: use raft_timeout{}
The only place where we don't need raft_timeout{}
is migrate_to_auth_v2 since it's called from
topology_coordinator fiber. All other places are
called from user context, so raft_timeout{} is used.
2024-03-21 16:12:51 +04:00
Petr Gusev
cebf87bf59 raft_group0_client: add raft_timeout parameter
In this commit we add raft_timeout parameter to
start_operation and add_entry method.

We fix compilation in default_authorizer.cc,
bind_front doesn't account for default parameter
values. We should use raft_timeout{} here, but this
is for another commit.
2024-03-21 16:12:51 +04:00
Petr Gusev
3d1b94475f raft_group_registry: add group0_with_timeouts
In this commit we add timeouts support to raft groups
registry. We introduce the raft_server_with_timeouts
class, which wraps the raft::server add exposes its
interface with additional raft_timeout parameter.
If it's set, the wrapper cancels the abort_source
after certain amount of time. The value of the timeout
can be specified in the raft_timeout parameter,
or the default value can be set in the raft_server_with_timeouts
class constructor.

The raft_group_registry interface is extended with
get_server_with_timeouts(group_id) and group0_with_timeouts()
methods. They return an instance of raft_server_with_timeouts for
a specified group id or for group0. The timeout value for it is configured in
create_server_for_group0. It's one minute by default, can be overridden
for tests with group0-raft-op-timeout-in-ms parameter.

The new api allows the client to decide whether to use timeouts or not.
In subsequent commits we are going to review all group0 call sites
and add raft_timeout if that makes sense. The general principle is that
if the code is handling a client request and the client expects
a potential error, we use timeouts. We don't use timeouts for
background fibers (such as topology coordinator), since they won't
add much value. The only thing the background fiber can do
with a timeout is to retry, and this will have the same effect
as not having a timeout at all.
2024-03-21 16:12:51 +04:00
Petr Gusev
532a720c3d utils: add composite_abort_source.hh 2024-03-21 16:12:51 +04:00
Kefu Chai
8dacec589d cql3: add fmt::formatter for cql3_type and cql3_type::raw
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<>` is added for following classes:

* `cql3::cql3_type`
* `cql3::cql3_type::raw`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17945
2024-03-21 14:08:50 +02:00
Nadav Har'El
fdeb14b468 Merge 'scylla-nodetool: make command-line parsing fully compatible with the legacy nodetool' from Botond Dénes
There was two more things missing:
* Allow global options to be positioned before the operation/command option (https://github.com/scylladb/scylladb/issues/16695)
* Ignore JVM args (https://github.com/scylladb/scylladb/issues/16696)

This PR fixes both. With this, hopefully we are fully compatible with nodetool as far as command line parsing is concerned.
After this PR goes in, we will need another fix to tools/java/bin/nodetool-wrapper, to allow user to benefit from this fix. Namely, after this PR, we can just try to invoke scylla-nodetool first with all the command-line args as-is. If it returns with exit-code 100, we fall back to nodetool. We will not need the current trick with `--help $1`. In fact, this trick doesn't work currently, because `$1` is not guaranteed to be the command in the first place.

In addition to the above, this PR also introduces a new option, to help us in the switching process. This is `--rest-api-port`, which can also be provided as `-Dcom.scylladb.apiPort`. When provided, this option takes precedence over `--port|-p`. This is intended as a bridge for `scylla-ccm`, which currently provides the JMX port as `--port`. With this change, it can also provided the REST API port as `-Dcom.scylladb.apiPort`. The legacy nodetool will ignore this, while the native nodetool will use it to connect to the correct REST API address. After the switch we can ditch these options.

Fixes: https://github.com/scylladb/scylladb/issues/16695
Fixes: https://github.com/scylladb/scylladb/issues/16696
Refs: https://github.com/scylladb/scylladb/issues/16679
Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17168

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add --rest-api-port option
  tools/scylla-nodetool: ignore JVM args
  tools/utils: make finding the operation command line option more flexible
  tools/utils: get_selected_operation(): remove alias param
  tools: add constant with current help command-line arguments
2024-03-21 14:06:45 +02:00
Pavel Emelyanov
c8fc43d169 test: Update topology_custom/suite::run_first list
The recently added test_tablets_migration dominates with it run-time (10
minutes). Also update other tests, e.g. test_read_repair is not in top-7
for any mode, test_replace and test_raft_recovery_majority_loss are both
not notably slower than most of other tests (~40 sec both). On the other
hand, the test_raft_recovery_basic and test_group0_schema_versioning are
both 1+ minute

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17927
2024-03-21 12:48:50 +01:00
Gleb Natapov
e26b0f34a0 raft topology: update rack/dc info in topology state on reboot if changed
It is allowed to change a snitch after cluster is already running.
Changing a snitch may cause dc and/or rack names to be changed and
gossiper handles it by gossiping new names on restart. The patch changes
raft mode to update the names on restart as well.
2024-03-21 12:44:12 +02:00
Andrei Chekun
a5455460d8 test: fix flakiness of the multi_dc tests
The initial version used a redundant method, and it did not cover all
cases. So that leads to the flakiness of the test that used this method.
Switching to the cluster_con() method removes flakiness since it's
written more robustly.

Fixes scylladb/scylladb#17914

Closes scylladb/scylladb#17932
2024-03-21 11:17:22 +01:00
Asias He
9587352f13 repair: Invoke group0 read barrier in repair_tablets
This allows the repair master to see all previous metadata changes.

Refs #17658

Closes scylladb/scylladb#17942
2024-03-21 10:54:40 +01:00
Kamil Braun
4dfb7e3051 Merge 'storage_service::merge_topology_snapshot: handle big mutations' from Petr Gusev
The group0 state machine calls `merge_topology_snapshot` from
`transfer_snapshot`. It feeds it with `raft_topology_snapshot` returned
from `raft_pull_topology_snapshot`. This snapshot includes the entire
`system.cdc_generations_v3` table. It can be huge and break the
commitlog `max_record_size` limit.

The `system.cdc_generations_v3` is a single-partition table, so all the
data is contained in one mutation object. To fit the commitlog limit we
split this mutation into many smaller ones and apply them in separate
`database::apply` calls. That means we give up the atomicity guarantee,
but we actually don't need it for `system.cdc_generations_v3` and
`system.topology_requests`.

This PR fixes the dtest
`update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load`

Fixes scylladb/scylladb#17545

Closes scylladb/scylladb#17632

* github.com:scylladb/scylladb:
  test_cdc_generation_data: test snapshot transfer
  storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations
  mutation: add split_mutation function
  storage_service::merge_topology_snapshot: fix indentation
2024-03-21 10:50:03 +01:00
Avi Kivity
628017c810 test: sstables::test_env: mock sstables_registry
sstables::test_env is intended for sstable unit tests, but to satisfy its
dependency of an sstables_registry we instantiate an entire database.

Remove the dependency by having a mock implementation of sstables_registry
and using that instead.

Closes scylladb/scylladb#17895
2024-03-21 10:19:46 +01:00
Tomasz Grabiec
baf12b0b2f test: tablets: Avoid infinite loop in rebalance_tablets()
If there is a bug in the tablet scheduler which makes it never
converge for a given state of topology, rebalance_tablets() will never
complete and will generate a huge amounts of logs. This patch adds a
sanity limit so that we fail earlier.

This was observed in one of the test_load_balancing_with_random_load runs in CI.

Fixes scylladb/scylladb#17894.

Closes scylladb/scylladb#17916
2024-03-21 10:19:46 +01:00
Kamil Braun
bc42a5a092 Merge 'make sure that address map entry is not dropped between join request placement and the request handling' from Gleb
The series marks nodes to be non expiring in the address map earlier, when
they are placed in the topology.

Fixes: scylladb/scylladb#16849

* 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev:
  test: add test to check that address cannot expire between join request placemen and its processing
  topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
  raft_group0: add modifiable_address_map() function
2024-03-21 10:19:46 +01:00
Kamil Braun
676af581d8 Merge 'cdc: should_propose_first_generation: get my_host_id from caller' from Benny Halevy
There is no need to map this node's inet_address to host_id. The
storage_service can easily just pass the local host_id. While at it, get
the other node's host_id directly from their endpoint_state instead of
looking it up yet again in the gossiper, using the nodes' address.

Refs #12283

Closes scylladb/scylladb#17919

* github.com:scylladb/scylladb:
  cdc: should_propose_first_generation: get my_host_id from caller
  storage_service: add my_host_id
2024-03-21 10:19:46 +01:00
Avi Kivity
43bcaeb87f Merge 'test: randomized_nemesis_test: add fmt::formatter for some types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* AppendReg::append
* AppendReg::ret
* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

in which,

* `operator<<` for append_entry is never used. so it is removed.
* `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change.

Refs #13245

Closes scylladb/scylladb#17884

* github.com:scylladb/scylladb:
  test: raft: generator: add fmt::formatter:s
  test: randomized_nemesis_test: add fmt::formatter for some types
  test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
  raft: add fmt::formatter for error classes
2024-03-21 10:19:46 +01:00
Kefu Chai
6d77283941 cdc: add fmt::formatter for exception types in data_dictionary.hh
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<>` is added for following classes for
backward compatibility with {fmt} < 10:

* `data_dictionary::no_such_keyspace`
* `data_dictionary::no_such_column_family`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 13:26:01 +08:00
Kefu Chai
a58be49abf utils: add fmt::formatter for utils::bad_exception_container_access
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<utils::bad_exception_container_access>` is
added for backward compatibility with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
0d6bff0f56 sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<T>` is added for classes derived from
`malformed_sstable_exception`, where `T` is the class type derived from
`malformed_sstable_exception`.

this change is implemented to be backward compatible  with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
0609cd676f exceptions: add fmt::formatter for classes derived from cassandra_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<T>` is added for classes derived from
`cassandra_exception`, where `T` is the class type derived from
`cassandra_exception`.

this change is implemented to be backward compatible  with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
f5e1f0ccc7 cdc: add fmt::formatter for cdc::no_generation_data_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<cdc::no_generation_data_exception>` is
added for backward compatibility with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Petr Gusev
740b240e9d test_cdc_generation_data: test snapshot transfer
The test only looked at the initial cdc_generation
generation. It made the changes bigger to go
past the raft max_command_size limit.
It then made sure this large mutation set is saved
in several raft commands.

In this commit we enhance the test to check that the
mutations are properly handled during snapshot transfer.
The problem is that the entire system.cdc_generations_v3
table is read into the topology_snapshot and it's total
size can exceed the commitlog max_record_size limit.

We need a separate injection since the compaction
could nullify the effects of the previous injection.

The test fails without the fix from the previous commit.
2024-03-20 22:40:03 +04:00
Petr Gusev
276d58114d storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations
The group0 state machine calls merge_topology_snapshot
from transfer_snapshot. It feeds it with raft_topology_snapshot
returned from raft_pull_topology_snapshot. This snapshot
includes the entire system.cdc_generations_v3 table.
It can be huge and break the commitlog max_record_size limit.

The system.cdc_generations_v3 is a single-partition table,
so all the data is contained in one mutation object. To
fit the commitlog limit we split this mutation into several
smaller ones and apply them in separate database::apply calls.
That means we give up the atomicity guarantee, but we
actually don't need it for system.cdc_generations_v3.
The cdc_generations_v3 data is not used in any way until
it's referenced from the topology table. By applying the
cdc_generations_v3 mutations before topology mutations
we ensure that the lack of atomicity isn't a problem here.

The database::apply method takes frozen_mutation parameter by
const reference, so we need to keep them alive until
all the futures are complete.

fixes #17545
2024-03-20 22:40:03 +04:00
Petr Gusev
db1afa0aba mutation: add split_mutation function
The function splits the source mutation into multiple
mutations so that their size does not exceed the
max_size limit. The size of a mutation is calculated
as the sum of the memory_usage() of its constituent
mutation_fragments.

The implementation is taken from view_updating_consumer.
We use mutation_rebuilder_v2 to reconstruct mutations from
a stream of mutation fragments and recreate the output
mutation whenever we reach the limit.

We'll need this function in the next commit.
2024-03-20 22:39:51 +04:00
Petr Gusev
d07e0efdd8 storage_service::merge_topology_snapshot: fix indentation
It was three spaces, should be four.
2024-03-20 22:30:48 +04:00
Kefu Chai
61424b615c test: raft: generator: add fmt::formatter:s
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
72899f573e test: randomized_nemesis_test: add fmt::formatter for some types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* append_entry
* AppendReg::append
* AppendReg::ret

and drop their operator<<:s.

in which,

* `operator<<` for `std::monostate` and `std::variant` are dropped.
  as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we
  cannot define a partial specialization of `fmt::formatter` for
  a nested class for a template class. we will tackle this struct
  in another change.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
97b203b1af test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatter for `seastar::timed_out_error`,
which will be used by the `fmt::formatter` for  `std::variant<...>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
50637964ed raft: add fmt::formatter for error classes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatter for classes derived from
`raft::error`. since {fmt} v10 defines the formatter for all classes
derived from `std::exception`, the definition is provided only when
the tree is compiled with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Pavel Emelyanov
21a5911e60 Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes
The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports.
Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table.
Simple sanity tests are added for this virtual table (it had none).

Fixes: #16850

Closes scylladb/scylladb#17351

* github.com:scylladb/scylladb:
  test/cql-pytest: test_virtual_tables: add test for token_ring table
  db/virtual_tables: token_ring_table: add tablet support
  db/virtual_tables: token_ring_table: add table_name column
  db/virtual_tables: token_ring_table: extract ring emit
  service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
2024-03-20 14:05:49 +03:00
Benny Halevy
fceb1183d3 cdc: should_propose_first_generation: get my_host_id from caller
There is no need to map this node's inet_address to host_id.
The storage_service can easily just pass the local host_id.
While at it, get the other node's host_id directly
from their endpoint_state instead of looking it up
yet again in the gossiper, using the nodes' address.

Refs #12283

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Benny Halevy
37adcd3ecf storage_service: add my_host_id
Shorthand for getting this node's host_id
from token_metadata.topology, similar to the
`get_broadcast_address` helper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Mikołaj Grzebieluch
b4144d14c6 test.py: adjust the test for topology upgrade to write to and read from CDC tables
In topology on raft, management of CDC generations is moved to the topology coordinator.
We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft.

A similar change will be made in the topology recovery test. It will reuse
the `start_writes_to_cdc_table` function.

Ref #17409

Closes scylladb/scylladb#17828
2024-03-20 11:15:02 +01:00
Yaron Kaikov
d859067486 [action sync labels] improve pr search when labeling an issue
This PR contains few fixes and improvment seen during
https://github.com/scylladb/scylladb/issues/15902 label addtion

When we add a label to an issue, we go through all PR.
1) Setting PR base to `master` (release PR are not relevant)
2) Since for each Issue we have only one PR, ending the search after a
   match was found
3) Make sure to skip PR with empty body (mainly debug one)
4) Set backport label prefix to `backport/`

Closes scylladb/scylladb#17912
2024-03-20 12:14:42 +02:00
David Garcia
559dc9bb27 docs: Implement relative link support for configuration properties
Introduces relative link support for individual properties listed on the configuration properties page.  For instance, to link to a property from a different document, use the syntax :ref:`memtable_flush_static_shares <confprop_memtable_flush_static_shares>`.

Additionally, it also adds support for linking groups. For example, :ref:`Ungrouped properties <confgroup_ungrouped_properties>`.

Closes scylladb/scylladb#17753
2024-03-20 11:39:30 +02:00
Gleb Natapov
2b11842cb4 test: add test to check that address cannot expire between join request placemen and its processing 2024-03-20 11:05:31 +02:00
Kefu Chai
2479328e3b Update seastar submodule
> Revert "build: do not provide zlib as an ingredient"
> Fix reference to sstring type in tutorial about concurrency in coroutines
> Merge 'Adding a Metrics tester app' from Amnon Heiman
> cooking.sh: do not quote backtick in here document

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17887
2024-03-20 09:18:35 +02:00
Kefu Chai
432c000dfa ./: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17888
2024-03-20 09:16:46 +02:00
Raphael S. Carvalho
6115c113fe sstables_loader: Don't discard sstable that is not fully exhausted
Affects load-and-stream for tablets only.

The intention is that only this loop is responsible for detecting
exhausted sstables and then discarding them for next iterations:
        while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) {
            sstable_it++;
        }

But the loop which consumes non exhausted sstables, on behalf of
each tablet, was incorrectly advancing the iterator, despite the
sstable wasn't considered exhausted.

Fixes #17733.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17899
2024-03-20 09:11:59 +02:00
Yaron Kaikov
0cbe5f1aa8 [action] add Fixes validation in backport PR
When we open a backport PR we should make sure the patch contains a ref to the issue it suppose to fix in order to make sure we have more accurate backport information

This action will only be triggered when base branch is `branch-*`

If `Fixes` are missing, this action will fail and notify the author.

Ref: https://github.com/scylladb/scylla-pkg/issues/3539

Closes scylladb/scylladb#17897
2024-03-20 08:55:36 +02:00
Nadav Har'El
8df2ea3f95 cql: don't crash when creating a view during a truncate
The test dtest materialized_views_test.py::TestMaterializedViews::
test_mv_populating_from_existing_data_during_truncate reproduces an
assertion failure, and crash, while doing a CREATE MATERIALIZED VIEW
during a TRUNCATE operation.

This patch fixes the crash by removing the assert() call for a view
(replacing it by a warning message) - we'll explain below why this is fine.
Also for base tables change we change the assertion to an on_internal_error
(Refs #7871).
This makes the test stop crashing Scylla, but it still fails due to
issue #17635.

Let's explain the crash, and the fix:

The test starts TRUNCATE on table that doesn't yet have a view.
truncate_table_on_all_shards() begins by disabling compaction on
the table and all its views (of which there are none, at this
point). At this point, the test creates a new view is on this table.
The new view has, by default, compaction enabled. Later, TRUNCATE
calls discard_sstables() on this new view, asserts that it has
compaction disabled - and this assertion fails.

The fix in this patch is to not do the assert() for views. In other words,
we acknowledge that in this use case, the view *will* have compactions
enabled while being truncated. I claim that this is "good enough", if we
remember *why* we disable compaction in the first place: It's important
to disable compaction while truncating because truncating during compaction
can lead us to data resurection when the old sstable is deleted during
truncation but the result of the compaction is written back. True,
this can now happen in a new view (a view created *DURING* the
truncation). But I claim that worse things can happen for this
new view: Notably, we may truncate a view and then the ongoing
view building (which happens in a new view) might copy data from
the base to the view and only then truncate the base - ending up
with an empty base and non-empty view. This problem - issue #17635 -
is more likely, and more serious, than the compaction problem, so
will need to be solved in a separate patch.

Fixes #17543.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17634
2024-03-20 08:54:39 +02:00
Raphael S. Carvalho
d5a5005afa sstables: Fix clone semantics for runs in partitioned_sstable_set
When a sstable set is cloned, we don't want a change in cloned set
propagating to the former one.

It happens today with partitioned_sstable_set::_all_runs, because
sets are sharing ownership of runs, which is wrong.

Let's not violate clone semantics by copying all_runs when cloning.

Doesn't affect data correctness as readers work directly with
sstables, which are properly cloned. Can result in a crash in ICS
when it is estimating pending tasks, but should be very rare in
practice.

Fixes #17878.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17879
2024-03-20 08:41:32 +02:00
Botond Dénes
c2425ca135 tools/scylla-nodetool: add --rest-api-port option
This option is an alternative to --port|-p and takes precedence over it.
This is meant to aid the switch from the legacy nodetool to the native
one. Users of the legacy nodetool pass the port of JMX to --port. We
need a way to provide both the JMX port (via --port) and also the REST
API port, which only the native nodetool will interpret. So we add this
new --rest-api-port, which when provided, overwrites the --port|-p
option. To ensure the legacy nodeotol doesn't try to interpret this,
this option can also be provided as -Dcom.scylladb.apiPort (which is
substituted to --rest-api-port behind the scenes).
2024-03-20 02:11:47 -04:00
Botond Dénes
a85ec6fc60 tools/scylla-nodetool: ignore JVM args
Legacy scripts and tests for nodetool, might pass JVM args like
-Dcom.sun.jndi.rmiURLParsing=legacy. Ignore these, by dropping anything
that starts with -D from the command line args.
2024-03-20 02:11:47 -04:00
Botond Dénes
12516b0861 tools/utils: make finding the operation command line option more flexible
Currently all scylla-tools assume that the operation/command is in
argv[1]. This is not very flexible, because most programs allow global
options (that are not dependent on the current operation/command) to be
passed before the operation name on the command line. Notably C*'s
nodetool is one such program and indeed scripts and tests using nodetool
do utilize this.
This patch makes this more flexible. Instead of looking at argv[1], do
an initial option parsing with boost::program_options to locate the
operation parameter. This initial parser knows about the global options,
and the operation positional argument. It allows for unrecognized
positional and non-positional arguments, but only after the command.
With this, any combination of global options + operation is allowed, in
any order.
2024-03-20 02:11:47 -04:00
Botond Dénes
7ae98c586a tools/utils: get_selected_operation(): remove alias param
This method has a single caller, who always passes "operation". Just
hard-code this into the method, no need to keep a param for it.
2024-03-20 02:11:47 -04:00
Botond Dénes
28e7eecf0b tools: add constant with current help command-line arguments
Unfortunately, we have code in scylla-nodetool.cc which needs to know
what are the current help options available. Soon, there will be more
code like this in tools/utils.cc, so centralize this list in a const
static tool_app_template member.
2024-03-20 02:11:47 -04:00
Petr Gusev
5db6b8b3c2 error_injection: move api registration to set_server_init
The set_server_done function is called only
when a node is fully initialized. To allow error
injection to be used during initialization we
move the handler registration to set_server_init,
which is called as soon as the api http server
is started.
2024-03-19 20:18:29 +04:00
Petr Gusev
e4318e139d error_injection: add inject_parameter method
In this commit we extend the error_injector
with a new method inject_parameter. It allows
to pass parameters from tests to scylla, e.g. to
lower timeouts or limits. A typical use cases is
described in scylladb/scylladb#15571.

It's logically the same as inject_with_handler,
whose lambda reads the parameter named 'value'.
The only difference is that the inject_parameter
doesn't return future, it just read the
parameter from  the injection shared_data.
2024-03-19 20:18:23 +04:00
Petr Gusev
460567c4fd error_injection: move injection_name string into injection_shared_data
In subsequent commit we'll need the injection_name from inside
injection_shared_data, so in this commit we move it there.
Additionally, we fix the todo about switching the injections dictionary
from map to unordered_set, now unordered_map contains
string_views, pointing to injection_name inside
injection_shared_data.
2024-03-19 20:17:02 +04:00
Petr Gusev
49a4220fea error_injection: pass injection parameters at startup
Injection parameters can be used in the lambda passed to
inject_with_handler method to take some values from
the test. However, there was no way to set values to these
parameters on node startup, only through
the error injection REST api. Therefore, we couldn't rely
on this when inject_with_handler is used during
node startup, it could trigger before we call the api
from the test.

In this commit with solve this problem by allowing these
parameters to be assigned through scylla.yaml config.

The defer.hh header was added to error_injection.hh to fix
compilation after adding error_injection.hh to config.hh,
defer function is used in error_injection.hh.
2024-03-19 20:17:02 +04:00
Andrei Chekun
b52f79b1ce Fix leaking file descriptors in test.py
Fixes #17569

Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command:
```
$ ./test.py --mode debug test_native_transport --repeat 1500
```
After fix applied all tests are passed with a next command:
```
$ ./test.py --mode debug test_native_transport --repeat 10000
```

Closes scylladb/scylladb#17798
2024-03-19 14:59:14 +01:00
Piotr Dulikowski
70cb1dc8fe doc: describe upgrade and recovery for raft topology
Document the manual upgrade procedure that is required to enable
consistent cluster management in clusters that were upgraded from an
older version to ScyllaDB Open Source 6.0. This instruction is placed in
previously placeholder "Enable Raft-based Topology" page which is a part
of the upgrade instructions to ScyllaDB Open Source 6.0.

Add references to the new description in the "Raft Consensus Algorithm
in ScyllaDB" document in relevant places.

Extend the "Handling Node Failures" document so that it mentions steps
required during recovery of a ScyllaDB cluster running version 6.0.

Fixes: scylladb/scylladb#17341

Closes scylladb/scylladb#17624
2024-03-19 14:59:14 +01:00
Gleb Natapov
fde3068530 topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
Currently a node's address is set to nonexpiring in the address map when
the node is added to group0, but the node is added to the topology earlier
(during the join request) and the cluster must be able to communicate
with it (potentially) much later when the request will be processed.
The patch marks nodes that are in the topology, but no yet in group0 as
non expiring, so they will not be dropped from address map until their
join request is processed.

Fixes: scylladb/scylladb#16849
2024-03-19 13:35:19 +02:00
Gleb Natapov
9651ae875f raft_group0: add modifiable_address_map() function
Provide access to non const address_map. We will need it later.
2024-03-19 13:34:41 +02:00
Yaron Kaikov
ad76f0325e [action] Sync labels from an Issue to linked PR
After merging https://github.com/scylladb/scylladb/pull/17365, all backport labels should be added to PR (before we used to add backport labels to the issues).

Adding a GitHub action which will be triggered in the following conditions only:

1) The base branch is `master` or `next`
2) Pull request events:
- opened: For every new PR that someone opens, we will sync all labels from the linked issue (if available)
- labeled: This role only applies to labels with the `backport/` prefix. When we add a new label for the backport we will update the relevant issue or PR to get them both to sync
- unlabeled: Same as `labeled` only applies to the `backport/` prefix. When we remove a label for backport we will update the relevant issue or pr

Closes scylladb/scylladb#17715
2024-03-19 09:17:07 +02:00
Avi Kivity
e48eb76f61 sstables_manager: decouple from system_keyspace
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).

One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.

Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.

Closes scylladb/scylladb#17868
2024-03-18 20:38:07 +03:00
Anna Stuchlik
a13694daea doc: fix the image upgrade page
This commit updates the Upgrade ScyllaDB Image page.

- It removes the incorrect information that updating underlying OS packages is mandatory.
- It adds information about the extended procedure for non-official images.

Closes scylladb/scylladb#17867
2024-03-18 18:27:46 +02:00
Gleb Natapov
af218d0063 raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
group0 operations a valid on shard 0 only. Assert that. We already do
that in the version of the function that gets abort source.

Message-ID: <ZeCti70vrd7UFNim@scylladb.com>
2024-03-18 16:20:41 +01:00
Pavel Emelyanov
a8f48e0f6b test/boost/tablets: Use verbose BOOST_REQUIRE checkers
Lot's of BOOST_REQUIRES in this test require some integers to be in some
eq/gt/le relations to each other. And one place that compares rack names
as strings. Using more verbose boost checkers is preferred in such cases

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17866
2024-03-18 17:09:02 +02:00
Botond Dénes
270d01f16a Merge 'build: cmake: put server deb packages under build/dist/$<CONFIG>/debian' from Kefu Chai
this change is a follow up of ca7f7bf8e2, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/*.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, ca7f7bf8e2 made a mistake, as it does not match the expectation of `build_docker.sh`.

in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`.

Closes scylladb/scylladb#17848

* github.com:scylladb/scylladb:
  build: cmake: add dist-* targets to the default build target
  build: cmake: put server deb packages under build/dist/$<CONFIG>/debian
2024-03-18 16:18:35 +02:00
Avi Kivity
72bbe75d5b Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes https://github.com/scylladb/scylladb/issues/17025

Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.

Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.

Closes scylladb/scylladb#17388

* github.com:scylladb/scylladb:
  test: py: Add test for view replica pairing after replace
  raft, api: Add RESTful API to query current leader of a raft group
  test: test_tablets_removenode: Verify replacing when there is no spare node
  doc: topology-on-raft: Document replace behavior with tablets
  tablets, raft topology: Rebuild tablets after replacing node is normal
  tablets: load_balancer: Access node attributes via node struct
  tablets: load_balancer: Extract ensure_node()
  mv: Switch to using host_id-based replica set
  effective_replication_map: Introduce host_id-based get_replicas()
  raft topology: Keep nodes in the left state to topology
  tablets: Introduce read_required_hosts()
2024-03-18 16:16:08 +02:00
Kefu Chai
d1c35f943d test: unit: add fmt::formatter for test_data in tests
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* test_data in two different tests
* row_cache_stress_test::reader_id

and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17861
2024-03-18 15:35:28 +02:00
Kefu Chai
de6803de92 build: cmake: use --ld-path for specifying linker for clang
Clang > 12 starts to complain like
```
warning:  '-fuse-ld=' taking a path is deprecated; use '--ld-path=' instead [-Wfuse-ld-path]'
```
this option is not supported by GCC yet. also instead of using
the generic driver's name, use the specific name. otherwise ld
fails like
```
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17825
2024-03-18 14:49:11 +02:00
Pavel Emelyanov
933b346166 test/tablets: Add test to check how ALTER changes RF (in one DC)
For now test is incomplete in several ways

1. It xfails, until #17116
2. It doesn't rebuild/repair tablets
3. It doesn't check that tablet data actually exists on replicas

refs: #17575

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17808
2024-03-18 14:47:57 +02:00
Yaron Kaikov
6406d3083c [mergify] set draft PR when conflicts
When Mergify open a backport PR and identify conflicts, it adding the
`conflicts` label. Since GitHub can't identify conflicts in PR, setting
a role to move PR to draft, this way we will not trigger CI

Once we resolve the conflicts developer should make the PR `ready for
review` (which is not draft) and then CI will be triggered

`conflict` label can also be removed

Closes scylladb/scylladb#17834
2024-03-18 14:45:08 +02:00
Beni Peled
bddac3279e Skip the backport-label workflow for draft pull requests
It's not necessary (and annoying) when this workflow runs and fails
against PRs in draft mode

Closes scylladb/scylladb#17864
2024-03-18 14:42:55 +02:00
Wojciech Mitros
efcb718e0a mv: adjust memory tracking of single view updates within a batch
Currently, when dividing memory tracked for a batch of updates
we do not take into account the overhead that we have for processing
every update. This patch adds the overhead for single updates
and joins the memory calculation path for batches and their parts
so that both use the same overhead.

Fixes #17854

Closes scylladb/scylladb#17855
2024-03-18 14:31:54 +02:00
Kefu Chai
d57a82c156 build: cmake: add dist-* targets to the default build target
also, add a target of `dist-server`, which mirrors the structure
of the targets created by `configure.py`, and it is consistent
with the ones defined by `build_submodule()`.

so that they are built when our CI runs `ninja -C $build`. CI
expects that all these rpm and deb packages to built when
`ninja -C $build` finishes. so that it can continue with
building the container image. let's make it happen. so that
the CMake-based rules can work better with CI.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-18 20:02:43 +08:00
Raphael S. Carvalho
2c9b13d2d1 compaction: Check for key presence in memtable when calculating max purgeable timestamp
It was observed that some use cases might append old data constantly to
memtable, blocking GC of expired tombstones.

That's because timestamp of memtable is unconditionally used for
calculating max purgeable, even when the memtable doesn't contain the
key of the tombstone we're trying to GC.

The idea is to treat memtable as we treat L0 sstables, i.e. it will
only prevent GC if it contains data that is possibly shadowed by the
expired tombstone (after checking for key presence and timestamp).

Memtable will usually have a small subset of keys in largest tier,
so after this change, a large fraction of keys containing expired
tombstones can be GCed when memtable contains old data.

Fixes #17599.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17835
2024-03-18 13:37:44 +02:00
Benny Halevy
2c0b1d1fa7 compaction: get_max_purgeable_timestamp: optimize sstable filtering by min_timestamp
There is no point in checking `sst->filter_has_key(*hk)`
if the sstable contains no data older than the running
minimum timestamp, since even if it matches, it won't change
the minimum.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17839
2024-03-18 13:26:49 +02:00
Avi Kivity
ed211cd0bf sstables: partition_index_cache: reindent
Fix up after e120ba3514.

Closes scylladb/scylladb#17847
2024-03-18 13:23:21 +02:00
Andrei Chekun
b6edf056ea Add sanity tests for multi dc
Fix writing cassandra-rackdc.properties with correct format data instead of yaml
Add a parameter to overwrite RF for specific DC
Add the possibility to connect cql to the specific node

In this PR 4 tests were added to test multi-DC functionality. One is added from initial commit were multi-DC possibility were introduced, however, this test was not commited. Three of them are migrations from dtest, that later will be deleted. To be able to execute migrated tests additional functionality is added: the ability to connect cql to the specific node in the cluster instead of pooled connection and the possibility to overwrite the replication factor for the specific DC. To be able to use the multi DC in test.py issue with the incorrect format of the properties file fixed in this PR.

Closes scylladb/scylladb#17503
2024-03-18 13:00:36 +02:00
Nadav Har'El
680e37c4af Merge 'schema_tables: unfreeze frozen_mutation:s gently' from Avi Kivity
With large schemas, unfreezing can stall, especially as it requires
a lot of memory. Switch to a gentle version that will not stall.

As a preparation step, we add unfreeze_gently() for a span of mutations.

Fixes #17841

Closes scylladb/scylladb#17842

* github.com:scylladb/scylladb:
  schema_tables: unfreeze frozen_mutation:s gently
  frozen_mutation: add unfreeze_gently(span<frozen_mutation>)
2024-03-18 12:56:44 +02:00
Kefu Chai
fe28aac440 test/perf: add fmt::formatter for perf_result_with_aio_writes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `perf_result_with_aio_writes`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17849
2024-03-18 12:53:39 +02:00
Botond Dénes
a4e8bea679 tools/scylla-nodetool: status: handle missing host_id
Newly joining nodes may not have a host id yet. Handle this and print a
"?" for these nodes, instead of the host-id.
Extend the existing test for joining node case (also rename it and add
comment).

Closes scylladb/scylladb#17853
2024-03-18 12:26:59 +02:00
Kefu Chai
384e9e9c7c build: cmake: put server deb packages under build/dist/$<CONFIG>/debian
this change is a follow up of ca7f7bf8e2, which changed the output path
to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh
expects is `build/dist/$config/debian/*.deb`, where `$config` is the
normalized mode, when the debian packages are built using CMake
generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`.
so, ca7f7bf8e2 made a mistake, as it does not match the expectation of
`build_docker.sh`.

in this change, this issue is addressed. so we use the same path
in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`.

apply the same change to `dist-server-rpm`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-18 14:21:39 +08:00
Avi Kivity
731b5c5120 schema_tables: unfreeze frozen_mutation:s gently
With large schemas, unfreezing can stall, especially as it requires
a lot of memory. Switch to a gentle version that will not stall.
2024-03-17 17:46:02 +02:00
Avi Kivity
a34edb0a93 frozen_mutation: add unfreeze_gently(span<frozen_mutation>)
While we have unfreeze(vector<frozen_mutation>), a gentle version is
preferred.
2024-03-17 17:45:30 +02:00
Kefu Chai
8811900602 build: cmake: do not link randomized_nemesis_test with replication.cc
test/raft/replication.cc defines a symbol named `tlogger`, while
test/raft/randomized_nemesis_test.cc also defines a symbol with
the same name. when linking the test with mold, it identified the ODR
violation.

in this change, we extract test-raft-helper out, so that
randomized_nemesis_test can selectively only link against this library.
this also matches with the behavior of the rules generated by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17836
2024-03-17 17:01:47 +02:00
Kefu Chai
e1ae36ecfd test/boost: add formatter for BOOST_REQUIRE_EQUAL
in gossiping_property_file_snitch_test, we use
`BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality
of two instances of `pair<sstring, sstring`, like:
```c++
BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])
```

since the standard library does not provide the formatter for printing
`std::pair<>`, we rely on the homebrew generic formatter to
print `std::pair<>, which in turn uses operator<< to format the
elements in the `pair`, but we intend to remove this formatter
in future, as the last step of #13245 .

so in order to enable Boost.test to print out lhs and rhs when
`BOOST_REQUIRE_EQUAL` check fails, we are adding
`boost_test_print_type()` for `pair<sstring,sstring>`. the helper
function uses {fmt} to print the `pair<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17831
2024-03-17 16:58:39 +02:00
Kefu Chai
6244a2ae00 service:qos: add fmt::formatter for service_level_options::workload_type
this change prepares for the fmt::formatter based formatter used by
tests, which will use {fmt} to print the elements in a container,
so we need to define the formatter using fmt::formatter for these
element. the operator<< for service_level_options::workload_type is
preserved, as the tests are still using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17837
2024-03-17 16:52:57 +02:00
Kefu Chai
7df3acd39c repair: add fmt::formatter for row_level_diff_detect_algorithm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
row_level_diff_detect_algorithm. please note, we already have
`format_as()` overload for this type, but we cannot use it as a
fallback of the proper `fmt::formatter<>` specialization before
{fmt} v10. so before we update our CI to a distro with {fmt} v10,
`fmt::formatter<row_level_diff_detect_algorithm>` is still
needed.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17824
2024-03-16 19:12:49 +02:00
Botond Dénes
03c47bc30b tools/scylla-nodetool: status: handle nodes without load
Some nodes may not have a load yet. Handle this. Also add a test
covering this case.

Closes scylladb/scylladb#17823
2024-03-16 17:38:53 +02:00
Pavel Emelyanov
42a2dce4b6 test/lib: Eliminate variadic futures from template
The assert_that_failed(future) pair of helpers are templates with
variadic futures, but since they are gone in seastar, so should they in
test/lib

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17830
2024-03-16 17:37:25 +02:00
Kefu Chai
8bab51733f db: add fmt::formatter for db::functions::function
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::functions::function`.
please note, because we use `std::ostream` as the parameter of
the polymorphism implementation of `function::print()`.
without an intrusive change, we have to use `fmt::ostream_formatter`
or at least use similar technique to format the `function` instance
into an instance of `ostream` first. so instead of implementing
a "native" `fmt::formatter`, in this change, we just use
`fmt::ostream_formatter`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17832
2024-03-16 17:36:49 +02:00
Kefu Chai
23e9958ebb data_dictionary: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17826
2024-03-15 21:17:11 +03:00
Botond Dénes
ad9bad4700 tools/scylla-nodetool: {proxy,table}histograms: handle empty histograms
Empty histograms are missing some of the members that non-empty
histograms have. The code handling these histograms assumed all required
members are always present and thus error out when receiving an empty
histogram.
Add tests for empty histograms and fix the code handling them to check
for the potentially missing members, instead of making assumptions.

Closes scylladb/scylladb#17816
2024-03-15 15:59:31 +03:00
Tomasz Grabiec
a233a699cc test: py: Add test for view replica pairing after replace 2024-03-15 13:20:08 +01:00
Tomasz Grabiec
6d50e93f10 raft, api: Add RESTful API to query current leader of a raft group
Example:

  $ curl -X GET "http://127.0.0.1:10000/raft/leader_host"
  "f7f57588-62de-4cac-9e4b-c62bfc458d91"

Accepts optional group_id param, defaults to group0.
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
6d24fdee75 test: test_tablets_removenode: Verify replacing when there is no spare node
The test is changed to be more strict. Verifies the case of replacing
when RF=N in which case tablet replicas have to be rebuilt using the
replacing node.

This would fail if tablets are drained as part of replace operation,
since replacing node is not yet a viable target for tablet migration.
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1d01b4ca20 doc: topology-on-raft: Document replace behavior with tablets 2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1c71f44e63 tablets, raft topology: Rebuild tablets after replacing node is normal
This fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because new tablet replica allocation
will not be able to find a viable destination, as the replacing node
is not considered a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is left and replacing node is normal.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes #17025
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
b2418fab39 tablets: load_balancer: Access node attributes via node struct
Reduces lookups into topology and decouples the algorithm more from
the topology object.
2024-03-15 11:22:34 +01:00
Tomasz Grabiec
9090050244 tablets: load_balancer: Extract ensure_node()
Will be called in another loop to populate the "nodes" map with left node.
2024-03-15 11:22:32 +01:00
Artsiom Mishuta
73ed4c0eb5 test.py: fix aiohttp usage issue in python 3.12
Fix aiohttp usage issue in python 3.12:
"Timeout context manager should be used inside a task"

This occurs due to UnixRESTClient created in one event loop (created
inside pytest) but used in another (created in rewriten event_loop
fixture), now it is fixed by updating UnixRESTClient object for every new
loop.

Closes scylladb/scylladb#17760
2024-03-15 11:17:29 +01:00
Tomasz Grabiec
9b656ec2aa mv: Switch to using host_id-based replica set
This is necessary to not break replica pairing between base and
view. After replacing a node, tablet replica set contains for a while
the replaced node which is in the left state. This node is not
returned by the IP-based get_natural_endpoints() so the replica
indexes would shift, changing the pairing with the view.

The host_id-based replica set always has stable indexes for replicas.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
888dc41d66 effective_replication_map: Introduce host_id-based get_replicas() 2024-03-15 11:05:29 +01:00
Tomasz Grabiec
61b3453552 raft topology: Keep nodes in the left state to topology
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on
 their location. For example materialized views code which pairs base
 replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order
 to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.

We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
f7851696fa tablets: Introduce read_required_hosts()
Will be used by topology loading code to determine which hosts are
needed in topology, even if they're in the left state. We want to load
only left nodes if they are referenced by any tablet, which may happen
temporarily until the replacement replica is rebuilt.
2024-03-15 11:05:29 +01:00
Botond Dénes
598e5aebfb test/cql-pytest: test_virtual_tables: add test for token_ring table
Just a simple sanity test for both vnodes and tablets.
2024-03-15 04:23:20 -04:00
Botond Dénes
279e496133 db/virtual_tables: token_ring_table: add tablet support
For keyspaces which use tablets, we describe each table separately.
2024-03-15 04:23:20 -04:00
Botond Dénes
61b6ac7ffe db/virtual_tables: token_ring_table: add table_name column
As the first clustering column. For vnode keyspaces, this will always be
"ALL", for tablet keyspaces, this will contain the name of the described
table.
2024-03-15 04:23:20 -04:00
Botond Dénes
fdef62c232 db/virtual_tables: token_ring_table: extract ring emit
Into a separate method. For vnodes there is a single ring per keyspace,
but for tablets, there is a separate ring for each table in the
keyspace. To accomodate both, we move the code emitting the ring into a
separate method, so execute() can just call it once per keyspace or once
per table, whichever appropriate.
2024-03-15 04:23:20 -04:00
Botond Dénes
a205752513 service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
Do no use the internal host2ip() method. This relies on `_group0`, which
is only set on shard 0. Consequently, any call to this method, coming
from a shard other than shard 0, would crash ScyllaDB, as it
dereferences a nullptr.
2024-03-15 04:23:20 -04:00
Nadav Har'El
6cdb68f094 test/cql-pytest: remove unused function
Remove an unused function from test/cql-pytest/test_using_timeout.py.
Some linters can complain that this function used re.compile(), but
the "re" package was never imported. Since this function isn't used,
the right fix is to remove it - and not add the missing import.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17801
2024-03-15 09:56:30 +02:00
Kefu Chai
e1a9340cc1 partition_version: add fmt::formatter for partition_entry::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `parition_entry::printer`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17812
2024-03-15 09:52:27 +02:00
Kefu Chai
a0625261ef build: cmake: reword the comment for dev-headers
before this change, the comment was difficult to parse. let's update
it for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17814
2024-03-15 09:51:47 +02:00
Kefu Chai
640d573106 schema_mutations: add fmt::formatter for schema_mutations
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `schema_mutations`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17815
2024-03-15 09:49:56 +02:00
Kefu Chai
3edd530bd1 test/boost: add formatter for BOOST_REQUIRE_EQUAL
before this change, we rely on the homebrew generic formatter to
print unordered_set<>, which in turn uses operator<< to format the
elements in the `unordered_set`, but we intend to remove this formatter
in future, as the last step of #13245 .

so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL`
check fails, we are adding `boost_test_print_type()` for
`unordered_set<fruit>`. the helper function uses {fmt} to print the
`unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the
operator<< for this type is dropped, as it is not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17813
2024-03-15 09:40:22 +02:00
Benny Halevy
530d270828 api: /storage_service/tablets/balancing: fix incorrect operation summary
It was probably copy-pasted from /storage_service/tablets/move

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17811
2024-03-14 22:52:57 +01:00
Tomasz Grabiec
8c5d088928 Merge 'Drop tablets of dropped views and indices' from Benny Halevy
This series adds notification before dropping views and indices so that the
tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets.

Additional unit tests were added for these cases.

Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path.

This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series.

Fixes #17627

Closes scylladb/scylladb#17773

* github.com:scylladb/scylladb:
  migration_manager: notify before_drop_column_family when dropping indices
  schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
  migration_manager: notify before_drop_column_family before dropping views
  cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
  tablet_allocator: on_before_drop_column_family: remove unused result variable
2024-03-14 22:52:29 +01:00
Raphael S. Carvalho
c46c2d436f sstables: Reduce cost for loading sstables with tablets
Loader was changed to quickly determine ownership after consuming
sharding metadata only. If it's not available, it falls back to
reading first and last keys from summary. The fallback is only there
for backward compatibility and it costs a lot more as we don't
skip to the end where keys are located in summary.

With tablets, sharding metadata is only first and last keys so
we can do it without sharder. So loader will be able to use it
instead of looking up keys in summary.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17805
2024-03-14 21:06:35 +01:00
Pavel Emelyanov
8ffb5f27c7 topology_coordinator: Clear tablet transition session after streaming
When jumping from streaming stage into cleanup_target, session must also
be cleared as pending replica may still process some incoming mutations
blocked in the pipeline. Deleting session prior to executing barrier
makes sure those mutations will not be applied.

fixes: #17682

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17800
2024-03-14 20:35:00 +01:00
Pavel Emelyanov
6a77f36519 doc: Add tablets migration state diagram
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17790
2024-03-14 20:29:21 +01:00
Benny Halevy
5bfca73b30 migration_manager: notify before_drop_column_family when dropping indices
Fixes #17627

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:12 +02:00
Benny Halevy
9cf6a2e510 schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
When dropping indices, we don't need to go through
`create_view_for_index` in order to drop the index.
That actually creates a new schema for this view
which is used just for its metadata for generating mutations
dropping it.

Instead, use `find_schema` to lookup the current schema
for the dropped index.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:11 +02:00
Benny Halevy
358e92e645 migration_manager: notify before_drop_column_family before dropping views
Call the before_drop_column_family notifications
before dropping the views to allow the tablet_allocator
to delete the view's tablets.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:14:56 +02:00
Avi Kivity
5e28bf9b5c Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov
Tablet transition would get stuck anyway for such nodes, so it's not worth trying

refs: #16372 (not fixes, because there's also repair transitions with same problem)

Closes scylladb/scylladb#17796

* github.com:scylladb/scylladb:
  topology_coordinator: Skip dead nodes when balancing tablets
  test: Add test for load_balancer skiplist
  tablet_allocator: Add skiplist to load_balancer
2024-03-14 18:47:51 +02:00
Avi Kivity
0f188f2d9f Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17547

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the status command
  test/nodetool: rest_api_mock.py: match requests out-of-order
  test/nodetool: rest_api_mock.py: remove trailing / from request paths
  test/nodetool: rest_api_mock.py: use static routes
  test/nodetool: check only non-exhausted requests
  tools/scylla-nodetool: repair: set the jobThreads request parameter
2024-03-14 18:42:54 +02:00
Kamil Braun
5ef47c42b3 Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.

In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.

From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.

fixes scylladb/scylladb#17445

Closes scylladb/scylladb#17757

* github.com:scylladb/scylladb:
  test_remove_rpc_client_with_pending_requests: add a regression test
  remove_rpc_client_with_ignored_topology: call it earlier
  storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
2024-03-14 17:20:59 +01:00
Yaniv Kaul
a2ac80340f Typo: pint -> print
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17804
2024-03-14 15:50:35 +02:00
Wojciech Mitros
59d5bfa742 mv: fail base writes instead of dropping view updates when overloaded
Since 4c767c379c we can reach a situation
where we know that we have admitted too many expensive view update
operations and the mechanism of dropping the following view updates
can be triggerred in a wider range of scenarios. Ideally, we would
want to fail whole requests on the coordinator level, but for now, we
change the behavior to failing just the base writes. This allows us
to avoid creating inconsistencies between base replicas and views
at the cost of introducing inconsistencies between different base
replicas. This, however, can be fixed by repair, in contrast to
base-view inconsistencies which we don't have a good method of fixing.

Fixes #17795

Closes scylladb/scylladb#17777
2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk
43ef6e6ab9 test: fix regular compaction tasks check
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.

test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.

Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.

Fixes: #17776.

Closes scylladb/scylladb#17784
2024-03-14 14:40:18 +02:00
Piotr Smaron
ad2d039e3d db: move all group 0 tables to schema commitlog
This is to have durability for the group0 tables.
But also because I need it specifially to make
`system.topology` & `system_schema.scylla_keyspaces`
mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723

Fixes: #15596

Closes scylladb/scylladb#17783
2024-03-14 13:33:30 +01:00
Piotr Dulikowski
2d9e78b09a gossiper: failure detector: don't handle directly removed live endpoints
Commit 0665d9c346 changed the gossiper
failure detector in the following way: when live endpoints change
and per-node failure detectors finish their loops, the main failure
detector calls gossiper::convict for those nodes which were alive when
the current iteration of the main FD started but now are not. This was
changed in order to make sure that nodes are marked as down, because
some other code in gossiper could concurrently remove nodes from
the live node lists without marking them properly.

This was committed around 3 years ago and the situation changed:

- After 75d1dd3a76
  the `endpoint_state::_is_alive` field was removed and liveness
  of a node is solely determined by its presence
  in the `gossiper::_live_endpoints` field.
- Currently, all gossiper code which modifies `_live_endpoints`
  takes care to trigger relevant callback. The only function which
  modifies the field but does not trigger notifications
  is `gossiper::evict_from_membership`, but it is either called
  after `gossiper::remove_endpoint` which triggers callbacks
  by itself, or when a node is already dead and there is no need
  to trigger callbacks.

So, it looks like the reasons it was introduced for are not relevant
anymore. What's more important though is that it is involved in a bug
described in scylladb/scylladb#17515. In short, the following sequence
of events may happen:

1. Failure detector for some remote node X decides that it was dead
   long enough and `convict`s it, causing live endpoints to be updated.
2. The gossiper main loop sends a successful echo to X and *decides*
  to mark it as alive.
3. At the same time, failure detector for all nodes other than X finish
  and main failure detector continues; it notices that node X is
  not alive (because it was convicted in point 1.) and *decides*
  to convict it.
4. Actions planned in 2 and 3 run one after another, i.e. node is first
  marked as alive and then immediately as dead.

This causes `on_alive` callbacks to run first and then `on_dead`. The
second one is problematic as it closes RPC connections to node X - in
particular, if X is in the process of replacing another node with the
same IP then it may cause the replace operation to fail.

In order to simplify the code and fix the bug - remove the piece
of logic in question.

Fixes: scylladb/scylladb#17515

Closes scylladb/scylladb#17754
2024-03-14 13:29:17 +01:00
Botond Dénes
d6103dc1b6 tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.

While at it, also make sure that alternator keyspace and tables names
are handled correctly.

Closes scylladb/scylladb#17480
2024-03-14 13:42:23 +02:00
Avi Kivity
dd76e1c834 Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov
The method in question can have a shorter name that matches all other injections in this class, and can be non-template

Closes scylladb/scylladb#17734

* github.com:scylladb/scylladb:
  error_injection: De-template inject() with handler
  error_injection: Overload inject() instead of inject_with_handler()
2024-03-14 13:37:54 +02:00
Petr Gusev
2783985bb2 test_remove_rpc_client_with_pending_requests: add a regression test
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.

The test just bootstraps a new node while bombarding the cluster
with read requests.
2024-03-14 15:17:34 +04:00
Petr Gusev
398e14d6d0 remove_rpc_client_with_ignored_topology: call it earlier
In this commit we move the remove_rpc_client_with_ignored_topology
call to the earliest point possible - when a node first appears
in token_metadata.topology.

From the topology coordinator perspective this happens when a joining
node moves to node_state::bootstrapping and the topology moves to
transition_state::join_group0. In sync_raft_topology_nodes
the node should be contained in transition_nodes. The successful
wait_for_ip before entering transition_state::join_group0 ensures
that update_topology should find a node's IP and put it into the topology.
The barrier in commit_cdc_generation will ensure that all nodes
in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology,
those created by the current node. This means we need to call
remove_rpc_client_with_ignored_topology on each node of the cluster.

fixes scylladb/scylladb#17445
2024-03-14 15:10:09 +04:00
Petr Gusev
1b9f21314f storage_service: decouple remove_rpc_client_with_ignored_topology from
notify_joined

It's too late to call remove_rpc_client_with_ignored_topology on
messaging service when a node becomes normal. Data
plane requests can be routed to the node much earlier,
at least when topology switches to write_both_read_new.
The remove_rpc_client_with_ignored_topology function
shutdowns sockets and causes such requests to timeout.

We intend to call remove_rpc_client_with_ignored_topology
as soon as a node becomes part of token_metadata topology.
In this preparatory commit we refactor
storage_service::notify_joined. We remove the
remove_rpc_client_with_ignored_topology call from it
call it separately from the two call sites of notify_joined.
2024-03-14 15:10:09 +04:00
Kefu Chai
ce17841860 tools/scylla-nodetool: print bpo::options_description with fmt::streamed
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, since boost::program_options::options_description is
defined by boost.program_options library, and it only provides the
operator<< overload. we're inclined to not specializing `fmt::formatter`
for it at this moment, because

* this class is not in defined by scylla project. we would have to
  find a home for this formatter.
* we are not likely to reuse the formatter in multiple places

so, in this change we just print it using `fmt::streamed`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17791
2024-03-14 10:44:32 +02:00
Pavel Emelyanov
33d258528e topology_coordinator: Skip dead nodes when balancing tablets
The coordinator can find out which nodes are marked as DOWN, thus when
calling tablets balancer it can feed it a skiplist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:51:11 +03:00
Pavel Emelyanov
ee55e8442a test: Add test for load_balancer skiplist
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:50:21 +03:00
Pavel Emelyanov
b4dd732dab tablet_allocator: Add skiplist to load_balancer
Currently load balancer skips nodes only based on its "administrative"
state, i.e. whether it's drained/decommissioned/removed/etc. There's no
way to exclude any node from balancing decision based on anything else.
This patch add this ability by adding skiplist argument to
balance_tablets() method. When a node is in it, it will not be
considered, as if it was removenode-d.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:47:31 +03:00
Kefu Chai
926fe29ebd db: commitlog: add fmt::formatter for commitlog types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>

please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17792
2024-03-14 09:28:12 +02:00
Botond Dénes
20d5c536b5 tools/scylla-nodetool: implement the status command
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
  from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
  compares the number of tokens to the number of DCs, not the number of
  nodes.

Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
2d4f4cfad4 test/nodetool: rest_api_mock.py: match requests out-of-order
In the previous patch, we made matching requests to different endpoints
be matched out-of-order. In this patch we go one step further and make
matching requests to the same endpoint match out-of-order too.
With this, tests can register the expected requests in any order, not in
the same order as the nodetool-under-test is expected to send them. This
makes testing more flexible. Also, how requests are ordered is not
interesting from the correctness' POV anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
09a27f49ea test/nodetool: rest_api_mock.py: remove trailing / from request paths
The legacy nodetool likes to append an "/" to the requests paths every
now and then, but not consistently. Unfortunately, request path matching
in the mock rest server and in aiohttp is quite sensitive to this
currently. Reduce friction by removing trailing "/" from paths in the
mock api, allowing paths to match each other even if one has a trailing
"/" but the other doesn't.
Unfortunately there is nothing we can do about the aiohttp part, so some
API endpoints have to be registered with a trailing "/".
2024-03-14 03:27:04 -04:00
Botond Dénes
5659f23b2a test/nodetool: rest_api_mock.py: use static routes
The mock server currently provides its own router to the aiohttp.web
app. The ability to provide custom routers  however is deprecated and
can be removed at any point. So refactor the mock server to use the
built-in router. This requires some changes, because the built-in router
does not allow adding/removing routes once the server starts. However
the mock server only learns of the used routes when the tests run.
This unfortunately means that we have to statically register all
possible routes the tests will use. Fortunately, aiohttp has variable
route support (templated routes) and with this, we can get away with
just 9 statically registered routes, which is not too bad.

A (desired) side-effect of this refactoring is that now requests to
different routes do not have to arrive in order. This constraint of the
previous implementation proved to be not useful, and even made writing
certain tests awkward.
2024-03-14 03:27:04 -04:00
Botond Dénes
061bd89957 test/nodetool: check only non-exhausted requests
Refactor how the tests check for expected requests which were never
invoked. At the end of every test, the nodetool fixture requests all
unconsumed expected requests from the rest_api_mock.py and checks that
there is none. This mechanism has some interaction with requests which
have a "multiple" set: rest_api_mock.py allows registering requests with
different "multiple" requirements -- how many times a request is
expected to be invoked:
* ANY: [0, +inf)
* ONE: 1
* MULTIPLE: [1, +inf)

Requests are stored in a stack. When a request arrives, we pop off
requests from the top until we find a perfect match. We pop off
requests, iff: multiple == ANY || multiple == MULTIPLE and was hit at
least once.
This works as long as we don't have an multiple=ANY request at the
bottom of the stack which is never invoked. Or a multiple=MULTIPLE one.
This will get worse once we refactor requests to be not stored in a
stack.

So in this patch, we filter requests when collecting unexhausted ones,
dropping those which would be qualified to be popped from the stack.
2024-03-14 03:27:04 -04:00
Botond Dénes
be5a18c07d tools/scylla-nodetool: repair: set the jobThreads request parameter
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
2024-03-14 03:26:13 -04:00
Benny Halevy
b4245bf46e cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 09:01:30 +02:00
Asias He
9d41fb9bcd repair: Add hosts and ignore_nodes option support for tablet repair
It is not supported currently.

If a user passes the option, the request will be rejected with:

    The hosts option is not supported for tablet repair
    The ignore_nodes option is not supported for tablet repair

This option is useful to select nodes to repair.

Fixes: #17742

Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes
       repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors
       repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host

Closes scylladb/scylladb#17767
2024-03-14 08:40:30 +02:00
Benny Halevy
b73aaee5e4 tablet_allocator: on_before_drop_column_family: remove unused result variable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 08:34:02 +02:00
Avi Kivity
c1d8a1dda5 Merge 'Fix false-positive errors in scrub validate-mode' from Botond Dénes
The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions.
This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested.
Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced.

Fixes: #16326

Closes scylladb/scylladb#16327

* github.com:scylladb/scylladb:
  test/boost/sstable_compaction_test: add validation test with valid sstable
  sstablex/mx/reader: validate(): print trace message when finishing the PI block
  sstablex/mx/reader: validate(): make index-data PI position check message consistent
  sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
  sstablex/mx/reader: validate(): reset the current PI block on partition-start
  sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
  sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
  test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
  test/boost/sstable_compaction_test: fix indentation
  test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
  test/boost/sstable_compaction_test: introduce scrub_test_framework
  test/lib/random_schema: add uncompatible_timestamp_generator()
2024-03-13 20:51:30 +02:00
Kefu Chai
15bea069a9 docs: use less slangy language
this is a follow-up change of 1519904fb9, to incorporate the comment
from Anna Stuchlik.

Signed-off-by: Anna Stuchlik <anna.stuchlik@scylladb.com>

Closes scylladb/scylladb#17671
2024-03-13 13:33:37 +02:00
Avi Kivity
4db4b2279c Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17651

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the proxyhistograms command
  tools/scylla-nodetool: implement the tableshistograms command
  tools/scylla-nodetool: introduce buffer_samples
  utils/estimated_histogram: estimated_histogram: add constructor taking buckets
  tools/scylla-nodetool: implement the netstats command
  tools/scylla-nodetool: add correct units to file_size_printer
2024-03-13 12:46:11 +02:00
Avi Kivity
e120ba3514 sstables: partition_index_cache: evict entries within a page gently
When the partition_index_cache is evicted, we yield for preemption between
pages, but not within a page.

Commit 3b2890e1db ("sstables: Switch index_list to chunked_vector
to avoid large allocations") recognized that index pages can be large enough
to overflow a 128k alignment block (this was before the index cache and
index entries were not stored in LSA then). However, it did not go as far as
to gently free individual entries; either the problem was not recognized
or wasn't as bad.

As the referenced issue shows, a fairly large stall can happen when freeing
the page. The workload had a large number of tombstones, so index selectivity
was poor.

Fix by evicting individual rows gently.

The fix ignores the case where rows are still references: it is unlikely
that all index pages will be referenced, and in any case skipping over
a referenced page takes an insignificant amount of time, compared to freeing
a page.

Fixes #17605

Closes scylladb/scylladb#17606
2024-03-13 10:44:37 +01:00
Marcin Maliszkiewicz
7b60752e47 test: fix cql connection problem in test_auth_raft_command_split
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').

If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.

Fixes #17713

Closes scylladb/scylladb#17772
2024-03-13 10:36:51 +01:00
Pavel Emelyanov
4d83a8c12c topology_coordinator: Mark constant class methods with const
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17756
2024-03-13 10:23:39 +02:00
Pavel Emelyanov
2e982df898 test/tablets: Generalize repair history loading
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17761
2024-03-13 10:22:57 +02:00
Pavel Emelyanov
88a40b0dfa uuid: UUID_gen::get_UUID src argument is const pointer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17762
2024-03-13 10:21:25 +02:00
Botond Dénes
53e3325845 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* mutation_partition_v2::printer
* frozen_mutation::printer
* mutation

their operator<<:s are dropped.

Refs #13245

Closes scylladb/scylladb#17769

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for mutation
  mutation: add fmt::formatter for frozen_mutation::printer
  mutation: add fmt::formatter for mutation_partition_v2::printer
2024-03-13 10:13:09 +02:00
Pavel Emelyanov
488404e080 gms: Remove unused i_failure_detection_event_listener
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17765
2024-03-13 09:33:56 +02:00
Kefu Chai
fb4f48b4ed schema: add fmt::formatter for schema
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr

their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17768
2024-03-13 09:29:00 +02:00
Kefu Chai
85c4034495 .git: skip redis/lolwut.cc when scanning spelling errors
codespell reports "Nees" should be "Needs" but "Nees" is the last
name of Georg Nees. so it is not a misspelling. can should not be
fixed.

since the purpose of lolwut.cc is to display Redis version and
print a generative computer art. the one included by our version
was created by Georg Nees. since the LOLWUT command does not contain
business logic connected with scylladb, we don't lose a lot if skip
it when scanning for spelling errors. so, in this change, let's
skip it, this should silence one more warning from the github
codespell workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17770
2024-03-13 09:25:58 +02:00
Michał Chojnowski
75864e18a2 open-coredump.sh: respect http redirects
downloads.scylladb.com recently started redirecting from http to https
(via `301 Moved Permanently`).
This broke package downloading in open-coredump.sh.

To fix this, we have to instruct curl to follow redirects.

Closes scylladb/scylladb#17759
2024-03-13 08:57:04 +02:00
Pavel Emelyanov
d90db016bf treewide: Use partition_slice::is_reversed()
Continuation of cc56a971e8, more noisy places detected

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17763
2024-03-13 08:52:46 +02:00
Botond Dénes
a329cc34b7 tools/scylla-nodetool: implement the proxyhistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
a52eddc9c1 tools/scylla-nodetool: implement the tableshistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
151fb5a53b tools/scylla-nodetool: introduce buffer_samples
Based on Origin's org.apache.cassandra.tools.NodeProbe.BufferSamples.
To be used to qunatile time latency histogram samples.
2024-03-13 02:06:30 -04:00
Botond Dénes
47ac7d70e4 utils/estimated_histogram: estimated_histogram: add constructor taking buckets
And bucket offsets. Allows constructing the histogram back from a json
format.
2024-03-13 02:06:30 -04:00
Botond Dénes
006bc84761 tools/scylla-nodetool: implement the netstats command 2024-03-13 02:06:10 -04:00
Botond Dénes
ec7e1a2e92 tools/scylla-nodetool: add correct units to file_size_printer
When printing human-readable file-sizes, the Java nodetool always uses
base-2 steps (1024) to arrive at the human-readable size, but it uses
the base-10 units (MB) and base-2 units (MiB) interchangeably.
Adapt file_size_printer to support both. Add a flag to control which is
used.
2024-03-13 02:05:22 -04:00
Kefu Chai
2d319fa789 mutation: add fmt::formatter for mutation
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for mutation. but its operator<<
is preserved, as we are still using our homebrew generic formatter
for printing `std::vector<mutation>`, and this formatter is using
operator<< for printing the elements in vector.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 11:07:42 +08:00
Kefu Chai
acd14f12f0 mutation: add fmt::formatter for frozen_mutation::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for frozen_mutation::printer,
and drop its operator.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 10:47:22 +08:00
Kefu Chai
94d25e02ad mutation: add fmt::formatter for mutation_partition_v2::printer
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_partition_v2::printer, and
drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 10:47:09 +08:00
Asias He
f74053af40 repair: Add dc option support for tablet repair
This patch adds the dc option support for table repair. The management
tool can use this option to select nodes in specific data centers to run
repair.

Fixes: #17550
Tests: repair_additional_test.py::TestRepairAdditional::test_repair_option_dc

Closes scylladb/scylladb#17571
2024-03-12 22:19:50 +02:00
Ferenc Szili
1da5b3033e scylla-nodetool: check for missing keyspace argument on describering
Calling scylla-nodetool with option describering and ommiting the keyspace
name argument results in a boost exception with the following error message:

error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast)

This change checks for the missing keyspace and outputs a more sensible
error message:

error processing arguments: keyspace must be specified

Closes scylladb/scylladb#17741
2024-03-12 21:19:11 +02:00
Avi Kivity
f410038296 Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov
Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread

Closes scylladb/scylladb#17758

* github.com:scylladb/scylladb:
  test/storage_proxy: Restore indentation after previous patch
  test/storage_proxy: Use do_with_cql_env_thread()
2024-03-12 20:23:40 +02:00
Pavel Emelyanov
34477ad98e test/storage_proxy: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:44 +03:00
Pavel Emelyanov
fd112446c2 test/storage_proxy: Use do_with_cql_env_thread()
One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.

Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:33 +03:00
Botond Dénes
2335f42b2b test/boost/sstable_compaction_test: add validation test with valid sstable
Add a positive test, as it turns out we had some false-positive
validation bugs in the validator and we need a regression test for this.
2024-03-12 11:05:18 -04:00
Botond Dénes
a19a2d76c9 sstablex/mx/reader: validate(): print trace message when finishing the PI block 2024-03-12 11:05:18 -04:00
Botond Dénes
677be168c4 sstablex/mx/reader: validate(): make index-data PI position check message consistent
The message says "index-data" but when printing the position, the data
position is printed first, causing confusion. Fix this and while at it,
also print the position of the partition start.
2024-03-12 11:05:18 -04:00
Botond Dénes
5bff7c40d3 sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
The validate() consumes the content of partitions in a consume-loop.
Every time the consumer asks for a "break", the next PI block is loaded
and set on the validator, so it can validate that further clustering
elements are indeed from this block.
This loop assumed the consumer would only request interruption when the
current clustering block is finished. This is wrong, the consumer can
also request interruption when yielding is needed. When this is the
case, the next PI block doesn't have to be loaded yet, the current one
is not exhausted yet. Check this condition, before loading the next PI
block, to prevent false positive errors, due to mismatched PI block
and clustering elements from the sstable.
2024-03-12 11:05:18 -04:00
Botond Dénes
e073df1dbb sstablex/mx/reader: validate(): reset the current PI block on partition-start
It is possible that the next partition has no PI and thus there won't be
a new PI block to overwrite the old one. This will result in
false-positive messages about rows being outside of the finished PI
block.
2024-03-12 11:05:18 -04:00
Botond Dénes
2737899c21 sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
Promoted index entries can be written on any clustering elements,
icluding range tombstones. So the validating consumer also has the check
whether the current expected clustering block is finished, when
consuming a range tombstone. If it is, consumption has to be
interrupted, so that the outer-loop can load up the next promoted index
block, before moving on to the next clustering element.
2024-03-12 11:05:18 -04:00
Botond Dénes
f46b458f0d sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
For range tombstone end-bounds, the validate_fragment_order() should be
passed a null tombstone, not a disengaged optional. The latter means no
change in the current tombstone. This caused the end bound of range
tombstones to not make it to the validator and the latter complained
later on partition-end that the partition has unclosed range tombstone.
2024-03-12 11:05:18 -04:00
Botond Dénes
8be97884ec test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
It is not used anymore.
2024-03-12 11:05:18 -04:00
Botond Dénes
da0f4d3a9f test/boost/sstable_compaction_test: fix indentation 2024-03-12 11:05:18 -04:00
Botond Dénes
c35092aff6 test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
3f76aad609 test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
5237e8133b test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
The test becomes a lot shorter and it now uses random schema and random
data. The test is also split in two: one test for abort mode and one for
skip mode.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
76785baf43 test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
b6f0c4efa0 test/boost/sstable_compaction_test: introduce scrub_test_framework
Scrub tests require a lot of boilerplate code to work. This has a lot of
disadvantages:
* Tests are long
* The "meat" of the test is lost between all the boiler-plate, it is
  hard to glean what a test actually does
* Tests are hard to write, so we have only a few of them and they test
  multiple things.
* The boiler-plate differs sligthly from test-to-test.

To solve this, this patch introduces a new class, `scrub_test_frawmework`,
which is a central place for all the boiler-plate code needed to write
scrub-related tests. In the next patches, we will migrate scrub related
tests to this class.
2024-03-12 11:05:18 -04:00
Botond Dénes
e412673c44 test/lib/random_schema: add uncompatible_timestamp_generator()
Guarantees that produced mutations will not be compactible.
2024-03-12 11:05:18 -04:00
Pavel Emelyanov
3a734facc7 view_builder: Complete build step early if reader produces nothing
Builder works in "steps". Each step runs for a given base table, when a
new view is created it either initiates a step or appends to currently
running step.

Running a step means reading mutations from local sstables reader and
applying them to all views that has jumped into this step so far. When a
view is added to the step it remembers the current token value the step
is on. When step receives end-of-stream it rewinds to minimal-token.
Rewinding is done by closing current reader and creating a new one. Each
time token is advanced, all the views that meet the new token value for
the second time (i.e. -- scan full round) are marked as built and are
removed from step. When no views are left on step, it finishes.

The above machinery can break when rewinding the end-of-stream reader.
The trick is that a running step silently assumes that if the reader
once produced some token (and there can be a view that remembered this
token as its starting one), then after rewinding the reader would
generate the same token or greater. With tablets, however, that's not
the case. When a node is decommissioned tablets are cleaned and all
sstables are removed. Rewinding a reader after it makes empty reader
that produces no tokens from now on. Respectively, any build steps that
had captured tokens prior to cleanup would get stuck forever.

The fix is to check if the mutation consumer stepped at least one step
forward after rewind, and if no -- complete all the attached views.

fixes: #17293

Similar thing should happen if the base table is truncated with views
being built from it. Testing it steps on compaction assertion elsewhere
and needs more research.

refs: #17543

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17548
2024-03-12 14:58:47 +02:00
Kefu Chai
69f140eea6 test.py: s/summarize_tests/summarize_boost_tests/
summarize_tests() is only used to summarize boost tests, so reflect
this fact using its name. we will need to summarize the tests which
generate JUnit XML as well, so this change also prepares for a
following-up change to implement a new summarize helper.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17746
2024-03-12 14:49:01 +02:00
Pavel Emelyanov
def5fed619 api: Fix stats reported for row cache
Here are three endpoints in the api/cache_service that report "metrics"
for the row cache and the values they return

    - entries:  number of partitions
    - size:     number of partitions
    - capacity: used space

The size and capacity seem very inaccurate.

Comment says, that in C* the size should be weighted, but scylla doesn't
support weight of entries in cache. Also, capacity is configurable via
row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API
call, but Scylla doesn't support both either.

This patch suggestes changing return values for size and capacity endpoints.

Despite row cache doesn't support weights, it's natural to return
used_space in bytes as the value, which is more accurate to what "size"
means rather than number of entries.

The capacity may return back total memory size, because this is what
Scylla really does -- row cache growth is only limited by other memory
consumers, not by configured limits.

fixes: #9418

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17724
2024-03-12 13:44:59 +02:00
Pavel Emelyanov
a755914265 test/cql_query_test: Use string_view by value
The test carries const std::string_view& around, but the type is
lightweight class that can be copied around at the same cost as its
reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17735
2024-03-12 13:44:04 +02:00
Kefu Chai
17fe4a6439 view_info: add fmt::formatter for view_info
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `view_info`, its operator<<
is dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17745
2024-03-12 13:28:27 +02:00
Botond Dénes
f3735dc8e0 Merge 'utils: add fmt::formatter for utils types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* utils::human_readable_value
* std::strong_ordering
* std::weak_ordering
* std::partial_ordering
* utils::exception_container

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17710

* github.com:scylladb/scylladb:
  utils/exception_container: add fmt::formatter for exception_container
  utils/human_readable: add fmt::formatter for human_readable_value
  utils: add fmt::formatter for std::strong_ordering and friends
2024-03-12 13:27:37 +02:00
Botond Dénes
8e90b856b5 Merge 'Extend test.py's ability to select test cases' from Pavel Emelyanov
This PR fixes comments left from #17481 , namely

- adds case selection to boost suite
- describes the case selection in documentation

Closes scylladb/scylladb#17721

* github.com:scylladb/scylladb:
  docs: Add info about the ability to run specific test case
  test.py: Support case selection for boost tests
2024-03-12 13:21:50 +02:00
Kefu Chai
9c1d517bcc data_dictionary: drop unused friend declaration
the corresponding implementation of operator<< was dropped in
a40d3fc25b, so there is no needs to
keep this friend declaration anymore.

also, drop `include <ostream>`, as this header does not reference
any of the ostream types with the change above.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17743
2024-03-12 09:45:15 +02:00
Kefu Chai
af3b69a4d1 Update seastar submodule
* seastar 5d3ee980...a71bd96d (51):
  > util: add formatter for optimized_optional<>
  > build: search protobuf using package config
  > reactor: Move pieces of scollectd to scollectd
  > reactor: Remove write-only task_queue._current
  > Add missing include in tests/unit/rpc_test.cc
  > doc/io_tester.md: include request_type::unlink in the docs
  > doc/io-tester.md: update obsolete information in io_tester docs
  > io_tester/conf.yaml: include an example of request_type::unlink job
  > io_tester: implement request_type::unlink
  > reactor: Print correct errno on io_submit failure
  > src/core/reactor.cc: qualify metric function calls with "sm::"
  > build: add shard_id.hh to seastar library
  > thread: speed up thread creation in debug mode
  > include: add missing modules.hh import to shard_id.hh
  > prometheus: avoid ambiguity when calling MetricFamily.set_name()
  > util/log: add formatter for log_level
  > util/log: use string_view for log_level_names
  > perf: Calculate length of name column in perf tests
  > rpc_test: add a test for inter-compressor communication
  > rpc: in multi_algo_compressor_factory, propagate send_empty_frame
  > rpc: give compressors a way to send something over the connection
  > rpc: allow (and skip) empty compressed frames
  > metrics: change value_vector type to std::deque
  > HACKING.md: remove doc related to test_dist
  > test/unit: do not check if __cplusplus > 201703L
  > json_elements: s/foramted/formatted/
  > iostream: Refactor input_stream::read_exactly_part
  > add unit test to verify str.starts_with(str), str.ends_with(str) return true.
  > str.starts_with(str) and str.ends_with(str) should return true, just like std::string
  > rpc: Remove FrameType::header_and_buffer_type
  > rpc: Defuturize FrameType::return_type
  > rpc: Kill FrameType::get_size()
  > treewide: put std::invocable<> constraints in template param list
  > include: do not include unuser headers
  > rpc: fix a deadlock in connection::send()
  > iostream: Replace recursion by iteration in input_stream::read_exactly_part
  > core/bitops.hh: use std::integral when appropriate
  > treewide: include <concepts> instead of seastar/util/concepts.hh
  > abortable_fifo: fix the indent
  > treewide: expand `SEASTAR_CONCEPT` macro
  > util/concepts: always define SEASTAR_CONCEPT
  > file: Remove unused thread-pool arg from directory lister
  > seastar-json2code: collect required_query_params using a list
  > seastar-json2code: reduce the indent level
  > seastar-json2code: indent the enum and array elements
  > seastar-json2code: generate code for enum type using Template
  > seastar-json2code: extract add_operation() out
  > reactor: Re-ifdef SIGSEGV sigaction installing
  > reactor: Re-ifdef reactor::enable_timer()
  > reactor: Re-ifdef task_histogram_add_task()
  > reactor: Re-ifdef install_signal_handler_stack()

Closes scylladb/scylladb#17714
2024-03-12 09:19:28 +02:00
Botond Dénes
3a7364525f Merge 'test/alternator: improve metrics tests' from Nadav Har'El
This small series improves the Alternator tests for metrics:
1. Improves some comments in the test.
2. Restores a test that was previously hidden by two tests having the same name.
3. Adds tests for latency histogram metrics.

Closes scylladb/scylladb#17623

* github.com:scylladb/scylladb:
  test/alternator: tests for latency metrics
  test/alternator: improve comments and unhide hidden test
2024-03-12 09:13:17 +02:00
Kefu Chai
35fc065458 utils/exception_container: add fmt::formatter for exception_container
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `exception_container<..>`
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Kefu Chai
9300d7b80b utils/human_readable: add fmt::formatter for human_readable_value
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for `utils::human_readable_value`,
and drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Kefu Chai
007d7f1355 utils: add fmt::formatter for std::strong_ordering and friends
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* std::strong_ordering
* std::weak_ordering
* std::partial_ordering

and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they
are only used by Boost.test.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Tomasz Grabiec
47a66d0150 Merge 'Handle tablet migration failure in wrapping-up stages' from Pavel Emelyanov
There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test.

fixes: #16527

Closes scylladb/scylladb#17684

* github.com:scylladb/scylladb:
  test/tablets_migration: Test revert_migration failure handling
  test/tablets_migration: Test end_migration failure handling
  test/tablets_migration: Test cleanup_target failure handling
  test/tablets_migration: Test cleanup failure handling
  test/tablets_migration: Prepare for do_... stages
  test/tablets_migration: Add ability to removenode via any other node
  test/tablets_migration: Wrap migration stages failing code into a helper class
  storage_service: Add failure injection to crash cleanup_tablet
2024-03-12 00:20:56 +01:00
Botond Dénes
c6cff53771 reader_concurrency_semaphore: use variable reference for metrics
Instead of a functor, for those metrics that just return the value of an
existing member variable. This is ever so slightly more efficient than a
functor.

Closes scylladb/scylladb#17726
2024-03-11 20:47:04 +02:00
Mikołaj Grzebieluch
cb17b4ac59 docs: maintenance socket: add section about accessing maintenance socket
Closes scylladb/scylladb#17701
2024-03-11 20:25:00 +02:00
Asias He
ebc0ab94e5 repair: Add ranges option support for tablet repair
The management tool, e.g., scylla manager, needs the ranges option to
select which ranges to repair on a node to schedule repair jobs.

This patch adds ranges option support.

E.g.,

curl -X POST "http://127.0.0.1:10000/storage_service/repair_async/ks1?ranges=-4611686018427387905:-1,4611686018427387903:9223372036854775807"

Fixes: #17416
Tests: test_tablet_repair_ranges_selection

Closes scylladb/scylladb#17436
2024-03-11 20:03:12 +02:00
Nadav Har'El
d207962e40 test/alternator: tests for latency metrics
In test/alternator/test_metrics.py we had tests for the operation-count
metrics for different Alternator API operations, but not for the latency
histograms for these same operations. So this patch adds the missing
tests (and removes a TODO asking to do that).

Note that only a subset of the operations - PutItem, GetItem, DeleteItem,
UpdateItem, and GetRecords - currently have a latency history, and this
test verifies this. We have an issue (Refs #17616) about adding latency
histograms for more operations - at which point we will be able to expand
this test for the additional operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Nadav Har'El
970c2dc7a6 test/alternator: improve comments and unhide hidden test
The original goal of this patch was to improve comments in
test/alternator/test_metrics.py, but while doing that I discovered
that one of the test functions was hidden by a second test with
the same name! So this patch also renames the second test.

The test continues to work after this patch - the hidden test
was successful.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Pavel Emelyanov
0d5c25aef5 error_injection: De-template inject() with handler
The recently renamed inject_with_handler() was a template, but it can be
symmetrical to its peer that accepts void function as a callback, and
use std::function as its argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:32:21 +03:00
Pavel Emelyanov
1f44a374b8 error_injection: Overload inject() instead of inject_with_handler()
The inject_with_handler() method accepts a coroutine that can be called
wiht injection_handler. With such function as an argument, there's no
need in distinctive inject_with_handler() name for a method, it can be
overload of all the existing inject()-s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:30:19 +03:00
Botond Dénes
7d31093d4b Merge 'storage_service/ownership: handle requests when tablets are enabled' from Patryk Wróbel
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Fixes: https://github.com/scylladb/scylladb/issues/17342

Closes scylladb/scylladb#17405

* github.com:scylladb/scylladb:
  storage_service/ownership: discard get_ownership() requests when tablets enabled
  storage_service/ownership/{keyspace}: handle requests when tablets are enabled
  locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
  locator/tablets: add tablet_map::get_sorted_tokens()
  pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
  rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
2024-03-11 14:55:26 +02:00
Kefu Chai
50c6fc1141 scylla-gdb: use current_scheduling_group_ptr instead of task_queue._current
Seastar removed `task_queue::_current` in
258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py
accordingly. despite that `current_scheduling_group_ptr()` is an internal
API, it's been around for a while, and relatively stable. so let's use
it instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17720
2024-03-11 13:13:59 +02:00
Kamil Braun
65b4f754ff Merge 'gossiper: do_status_check: allow evicting dead nodes from membership with no host_id' from Benny Halevy
The short series allows do_status_check to handle down nodes that don't have HOST_ID application state.

Fixes #16936

Closes scylladb/scylladb#17024

* github.com:scylladb/scylladb:
  gossiper: do_status_check: fixup indentation
  gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
  gossiper: print the host_id when endpoint state goes UP/DOWN
  gossiper: get_host_id: differentiate between no endpoint_state and no application_state
  gms: endpoint_state: add get_host_id
  gossiper: do_status_check: continue loop after evicting FatClient
2024-03-11 11:21:49 +01:00
Kefu Chai
e1dbfedcdb service: add fmt::formatter for service/storage_proxy.cc types
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for internal types in service/storage_proxy.cc.
please note, `service::storage_proxy::remote::read_verb` is extracted out of
the outter class, because, the class's implementation formats `read_verb` in this
class. so we have to put the formatter at the place where its callers can see.
that's why it is moved up and out of `service::storage_proxy::remote`.

some of the operator<<:s are preserved, as they are still being used by
the existing formatters, for instance, the one for
`seastar::shared_ptr<>`, which is used to print
`seastar::shared_ptr<service::paxos_response_handler>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17708
2024-03-11 11:52:58 +02:00
Kefu Chai
1ab30fc306 clustering_bounds_comparator: add fmt::formtter for bound_{kind,view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `bound_kind` and `bound_view`,
and drop the latter's operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17706
2024-03-11 11:37:48 +02:00
Botond Dénes
1e7180de57 Update tools/java submodule
* tools/java e4878ae7...d61296dc (1):
  > build.xml: update scylla-driver-core to 3.11.5.2

Closes scylladb/scylladb#17722
2024-03-11 11:36:29 +02:00
Amnon Heiman
8b43609920 alternator: Use summary for shard-level latencies.
Shard-level latencies generate a lot of metrics. This patch reduces the
the number of latencies reported by Alternator while keeping the same
functionality.

On the shard level, summaries will be reported instead of histograms.
On the instance level, an aggregated histogram will be reported.

Summaries, histograms, and counters are marked with skip_when_empty.

Fixes #12230

Closes scylladb/scylladb#17581
2024-03-11 11:12:08 +02:00
Patryk Wrobel
9eb91b5526 storage_service/ownership: discard get_ownership() requests when tablets enabled
This change introduces a logic, that is responsible
for checking if tablets are enabled for any of
keyspaces when get_ownership() is invoked.

Without it, the result would be calculated
based solely on sorted_tokens() which was
invalid.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:25 +01:00
Patryk Wrobel
51da80da7d storage_service/ownership/{keyspace}: handle requests when tablets are enabled
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:23 +01:00
Patryk Wrobel
75aadeb32f locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
Before this patch, the mentioned function was a specific
member of vnode_effective_replication_strategy class.
To allow its usage also when tablets are enabled it was
shifted to the base class - effective_replication_strategy
and made pure virtual to force the derived classes to
implement it.

It is used by 'storage_service::get_ranges_for_endpoint()'
that is used in calculation of effective ownership. Such
calculation needs to be performed also when tablets are
enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
3fff6bd407 locator/tablets: add tablet_map::get_sorted_tokens()
This change introudces a new member function that
returns a vector of sorted tokens where each pair of adjacent
elements depicts a range of tokens that belong to tablet.

It will be used to produce the equivalent of sorted_tokens() of
vnodes when trying to use dht::describe_ownership() for tablets.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
a39a5b671e pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
This change adds a member function that can be used
to access 'storage_service/ownership' API.

It will be used by tests that need to access this API.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
dea76c4763 rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
This change is intended to introduce tests for vnodes for
the following API paths:
 - 'storage_service/ownership'
 - 'storage_service/ownership/{keyspace}'

In next patches the logic that is tested will be adjusted
to work correctly when tablets are enabled. This is a safety
net that ensures that the logic is not broken.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Kefu Chai
38ae52d5cd add fmt::formatter for reader_permit::state and reader_resources
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* reader_permit::state
* reader_resources

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17707
2024-03-11 09:55:51 +02:00
Kefu Chai
ca7b73f34e tools/scylla-nodetool: use constexpr for compile-time format check
instead of using fmt::runtime format string, use compile-time
format string, so that we can have compile-time format check provided
by {fmt}.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17709
2024-03-11 09:45:32 +02:00
Pavel Emelyanov
3453a934ba docs: Add info about the ability to run specific test case
The test.py usage is documented, the ability to run a specific test by
its name is described in doc. Extend it with the new ability to run
specific test case as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 09:10:20 +03:00
Pavel Emelyanov
3afbd21faa test.py: Support case selection for boost tests
Boost tests support case-by-case execution and always turn it on -- when
run, boost test is split into parallel-running sub-tests each with the
specific case name.

This patch tunes this, so that when a test is run like

   test.py boost/testname::casename

No parallel-execution happens, but instead just the needed casename is
run. Example of selection:

   test.py --mode=${mode} boost/bptree_test::test_cookie_find

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 09:09:10 +03:00
Pavel Emelyanov
feae470475 test/tablets_migration: Test revert_migration failure handling
This stage is also the error path that starts from write_both_read_old,
so check this failure in two steps -- first fail the latter stage in one
of the nodes, then fail the former in another.

For that one more node in the cluster is needed.

Also, to avoid name conflicts, the do_revert_migration pseudo stage name
is used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
c3d96b1a86 test/tablets_migration: Test end_migration failure handling
This stage is pure barrier. Barriers already take ignored nodes into
account, so do the fail-injector, so just wire the stage name into the
test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
180446e7b8 test/tablets_migration: Test cleanup_target failure handling
This stage is error path, so in order to fail it we need to fail some
other stage prior to that. This leads to the testing sequence of

1. fail streaming via source node
2. stop and remove source node to let state machine proceed
3. fail cleanup_target on the destination node
4. stop and remove destination node

First thing to note here, is that the test doesn't fail source node for
cleanup_target stage, symmetrically to how it does for cleanup stage.

Next, since we're removing two nodes, the cluster is equipeed with more
nodes nodes to have raft quorum.

Finally, since remove of source node doesn't finish until tablet
migration finishes, it's impossible to remove destination node via the
same node-0, so the 2nd removenode happens via node-3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
724c79ecf6 test/tablets_migration: Test cleanup failure handling
The handling itself is already there -- if the leaving node is excluded
the cleanup stage resolves immediately. So just add a code that
validates that.

Also, skip testing of pending replica failure during cleanup stage, as
it doesn't really participate in it any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
ccefb7f21f test/tablets_migration: Prepare for do_... stages
The tablets migration test is parametrized with stage name to inject
failure in. Internal class node_failer uses this parameter as is when
injecting a failure into scylla barrier handler.

Next patch will need to extend the test with revert_migration value and
add handling of this name to node_failer class. The node_failer class,
in turn, will want to instantiate two other instances of the same class
-- one to fail the write_both_read_old stage, and the other one to fail
the revert_migration barrier. So internally the class will need to tell
revert_migration value as full test parameter from revert_migration as
barrier-only parameter.

This test adds the ability to add do_ prefix to node_failer parameter to
tell full test from barrier-only. When injecting a failure into scylla
the do_ prefix needs to be cut off, since scylla still needs to fail the
barrier named revert_migration, not do_revert_migration.

Also split the long line while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:58 +03:00
Pavel Emelyanov
abbd22cb90 test/tablets_migration: Add ability to removenode via any other node
Currently the test calls removenode via node-0 in the cluster, which is
always alive. Next test case will need to call removenode on some other
node (more details in that patch later).

refs: #17681

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Pavel Emelyanov
5d3291f322 test/tablets_migration: Wrap migration stages failing code into a helper class
One of the next stages will need to use two of them at the same time and
it's going to be easier if the failing code is encapsulated.

No functional changes here, just large portions of code and local
variables are moved into class and its methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Pavel Emelyanov
82270e3ec4 storage_service: Add failure injection to crash cleanup_tablet
Will be needed by test that verifies how failures in tablets migration
stages are handled by state machine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Benny Halevy
9804ce79d8 gossiper: do_status_check: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
1375c4e6a3 gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
Be more permissive about the presence of host_id
application state for dead and expired nodes in release mode,
so do not throw runtime_error in this case, but
rather consider them as non-normal token owners.
Instead, call on_internal_error_noexcept that will
log the internal error and a backtrace, and will abort
if abort-on-internal-error is set.

This was seen when replacing dead nodes,
without https://github.com/scylladb/scylladb/pull/15788

Fixes #16936

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
f32efcb7a6 gossiper: print the host_id when endpoint state goes UP/DOWN
The host_id is now used in token_metadata
and in raft topology changes so print it
when the gossiper marks the node as UP/DOWN.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
fbf85ee199 gossiper: get_host_id: differentiate between no endpoint_state and no application_state
Currently, we throw the same runtime_error:
`Host {} does not have HOST_ID application_state`
in both case: where there is no endpoint_state
or when the endpoint_state has no HOST_ID
application state.

The latter case is unexpected, especially
after 8ba0decda5
(and also from the add_saved_endpoint path
after https://github.com/scylladb/scylladb/pull/15788
is merged), so throw different error in each case
so we can tell them apart in the logs.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:16:49 +02:00
Benny Halevy
a9fb0cf3dc gms: endpoint_state: add get_host_id
A simpler getter to get the HOST_ID application state
from the endpoint_state.

Return a null host_id if the application state is not found.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 15:19:51 +02:00
Benny Halevy
234774295e gossiper: do_status_check: continue loop after evicting FatClient
We're seeing cases like #16936:
```
INFO  2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN
INFO  2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT
INFO  2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip
INFO  2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4
WARN  2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state)
```

Since the FatClient timeout handling already evicts the endpoint
from memberhsip there is no need to check further if the
node is dead and expired, so just co_return.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 15:19:51 +02:00
Nadav Har'El
af90910687 Merge 'repair: add fmt::formatter for repair types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* repair_hash
* read_strategy
* streaming::stream_summary

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17711

* github.com:scylladb/scylladb:
  repair: add fmt::formatter for streaming::stream_summary
  repair: add fmt::formatter for read_strategy
  repair: add fmt::formatter for repair_hash
2024-03-10 12:15:15 +02:00
Kefu Chai
5687c289f4 repair: add fmt::formatter for streaming::stream_summary
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for streaming::stream_summary, and
drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:43:32 +08:00
Kefu Chai
7be93084b3 repair: add fmt::formatter for read_strategy
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for read_strategy, and drop its
operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:42:19 +08:00
Kefu Chai
39ee8593cb repair: add fmt::formatter for repair_hash
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for repair_hash.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:41:58 +08:00
Botond Dénes
9f97d21339 Merge 'Enhance perf-simple-query test' from Pavel Emelyanov
While measuring #17149 with this test some changes were applied, here they are

- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case

Closes scylladb/scylladb#17473

* github.com:scylladb/scylladb:
  perf_simple_query: Add --memtable-partitions option
  perf_simple_query: Disable auto compaction
  perf_simple_query: Keep number of initial tablets in output json
2024-03-08 15:21:04 +02:00
Kefu Chai
079d70145e raft: add fmt::formatter for raft tracker types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft::election_tracker
* raft::votes
* raft::vote_result

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17670
2024-03-08 15:19:37 +02:00
Piotr Smaroń
44bbf2e57b test.py: improve readability of failures resulting in empty XML
Before the change, when a test failed because of some error
in the `cql_test_env.cc`, we were getting:
```
error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
After the change we're getting:
```
error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```

Closes scylladb/scylladb#17679
2024-03-08 15:17:12 +02:00
Kefu Chai
362a8a777c partition_snapshot_row_cursor: add fmt::format to this class
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`partition_snapshot_row_cursor`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17669
2024-03-08 15:15:43 +02:00
Botond Dénes
630be97d2f Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Closes scylladb/scylladb#17553

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
  test/nodetool: calc max_width from all_hosts
  test/nodetool: keep tokens as Host's member
  test/nodetool: remove unused import
2024-03-08 15:15:19 +02:00
Pavel Emelyanov
fc9fb03b90 cql3: Remove unused cf_name::operator<<
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17686
2024-03-08 15:14:52 +02:00
Nadav Har'El
ba585905e5 Update tools/java submodule
* tools/java 5e11ed17...e4878ae7 (2):
  > nodetool: fix a typo in error message
  > bin/cassandra-stress: Add extended version info

Closes scylladb/scylladb#17680
2024-03-08 15:14:21 +02:00
Kefu Chai
f5f5ff1d51 clustering_interval_set: add fmt::formatter for clustering_interval_set
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `clustering_interval_set`

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17593
2024-03-08 15:13:14 +02:00
Kefu Chai
9b5ec53355 tombstone_gc_options: add fmt::formatter for tombstone_gc_mode
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `tombstone_gc_mode`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17673
2024-03-08 15:12:00 +02:00
Kefu Chai
8ca672a02c test/pylib: return better error if self.create_server() raises
in `ScyllaServer::add_server()`, `self.create_server()` is called to
create a server, but if it raises, we would reference a local variable
of `server` which is not bound to any value, as `server` is not assigned
at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we
would not be able to see the real exception apart from the error like

```
cannot access local variable 'server' where it is not associated with a
value
```

which is but the error from Python runtime.

in this change, `server` is always initialized, and we check for None,
before dereference it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17693
2024-03-08 15:10:27 +02:00
Kefu Chai
70ef7e63b5 tools: toolchain: prepare: do not bail out when checking for command
before this change, if `buildah` is not available in $PATH, this script
fails like:
```console
$ tools/toolchain/prepare --help
tools/toolchain/prepare: line 3: buildah: command not found
```

the error message never gets a chance to show up. as `set -e` in the
shebang line just let bash quit.

after this change, we check for the existence of buildah, and bail out
if it is not available. so, on a machine without buildah around, we now
have:
```console
$ tools/toolchain/prepare  --help
install buildah 1.19.3 or later
```

the same applies to "reg".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17697
2024-03-08 15:09:21 +02:00
Botond Dénes
05307d0be9 Merge 'service: add fmt::formatter for service types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* service::fencing_token
* service::topology::transition_state
* service::node_state
* service::topology_request
* service::global_topology_request
* service::raft_topology_cmd::command
* service::paxos::proposal
* service::paxos::promise

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17692

* github.com:scylladb/scylladb:
  service/paxos: add fmt::formatter for paxos::promise
  service/paxos: add fmt::formatter for paxos::proposal
  service: add fmt::formatter for topology_state_machine types
2024-03-08 15:06:07 +02:00
Botond Dénes
505f137cc9 Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov
The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing

refs: #16006
fixes: #16268

Closes scylladb/scylladb#17292

* github.com:scylladb/scylladb:
  test/object_store: Remove unused managed_cluster (and other stuff)
  test/object_store: Use tmpdir fixture in flush-retry case
  test/object_store: Turn flush-retry case to use ManagerClient
  test/object_store: Turn "misconfigured" case to use ManagerClient
  test/object_store: Turn garbage-collect case to use ManagerClient
  test/object_store: Turn basic case to use ManagerClient
  test/object_store: Prepare to work with ManagerClient
2024-03-08 15:04:46 +02:00
Tomasz Grabiec
85ae10f632 Merge 'Make it possible to run individual pytest cases with test.py' from Pavel Emelyanov
Today's test.py allows filtering tests to run with the `test.py --options name` syntax. The "name" argument is then considered to be some prefix, and when iterating tests only those whose name starts with that prefix are collected and executed. This has two troubles.

Minor: since it is prefix filtering, running e.g. topology_custom/test_tablets will run test_tablets _and_ test_tablets_migration from it. There's no way to exclude the latter from this selection. It's not common, but careful file names selection is welcome for better ~~user~~ testing experience.

Major: most of test files in topology and python suites contain many cases, some are extremely long. When the intent is to run a single, potentially fast, test case one needs to either wait or patch the test .py file by hand to somehow exclude unwanted test cases.

This PR adds the ability to run individual test case with test.py. The new syntax is `test.py --options name::case`. If the "::case" part is present two changes apply.

First, the test file selection is done by name match, not by prefix match. So running topology_custom/test_tablets will _not_ select test_tablets_migration from it.

Second, the "::case" part is appended to the pytest execution so that it collects and runs only the specified test case.

Closes scylladb/scylladb#17481

* github.com:scylladb/scylladb:
  test.py: Add test-case splitting in 'name' selection
  test.py: Add casename argument to PythonTest
2024-03-08 12:56:39 +01:00
Kamil Braun
ae954fb2ec test: unflake test_tablets_removenode
These tests are inserting data into RF=3 tables, but used the default
consistency level which is taken from the default execution profile
which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so
we cannot give a guarantee that some of the data won't be missed. Fix
this by inserting the data with CL=ALL. (Do it for all RF cases for
simplicity.)

Fixes scylladb/scylladb#17695

Closes scylladb/scylladb#17700
2024-03-08 12:47:47 +01:00
Benny Halevy
8456967012 tablets: read_tablet_mutations: unfreeze_gently
Use co_await unfreeze_gently in the loop body
unfreezing each partition mutation to prevent
reactor stalls when building group0 snapshot
with lots of tablets.

Fixes #15303

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17688
2024-03-08 10:52:39 +01:00
Yaron Kaikov
ad842e5ad7 [mergify] Fix worng label and base branch for backport pr
This PR contains 2 fixes for mergify config file:
1) When openning a backport PR base branch should be `branch-x.y`

2) Once a commit is promoted, we should add the label
   `promoted-to-master`, in 5.4 configuraion we were using the wrong
label. fixing it

Closes scylladb/scylladb#17698
2024-03-08 10:08:09 +01:00
Kamil Braun
76fb902858 test: unflake test_topology_remove_garbage_group0
The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
   peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```

Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.

Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.

Fixes scylladb/scylladb#16373

Closes scylladb/scylladb#17676
2024-03-08 10:08:09 +01:00
Mikołaj Grzebieluch
a0915115c3 maintenance_socket: change log message to differentiate from regular CQL ports
Scylla-ccm uses function `wait_for_binary_interface` that waits for
scylla logs to print "Starting listening for CQL clients". If this log
is printed far before the regular cql_controller is initialized,
scylla-ccm assumes too early that node is initialized.
It can result in timeouts that throw errors, for example in the function
`watch_rest_for_alive`.

Closes scylladb/scylladb#17496
2024-03-08 10:08:09 +01:00
Nadav Har'El
ea53db379f Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible

Both are fixed in this mini-series.

Closes scylladb/scylladb#17541

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
  tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
2024-03-08 10:08:09 +01:00
Kefu Chai
185b503b73 service/paxos: add fmt::formatter for paxos::promise
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::promise`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Kefu Chai
cb6c7bb9bf service/paxos: add fmt::formatter for paxos::proposal
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::proposal`,
but its operator<< is preserved, as it is still used by our generic
formatter for std::tuple<> which uses operator<< for printing the
elements in it, so operator<< of this class is indirectly used.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Kefu Chai
14cb48eb0a service: add fmt::formatter for topology_state_machine types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* service::fencing_token
* service::topology::transition_state
* service::node_state
* service::topology_request
* service::global_topology_request
* service::raft_topology_cmd::command

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:05:45 +08:00
Kefu Chai
de276901f2 tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:29:31 +08:00
Kefu Chai
d927ee8d8f test/nodetool: calc max_width from all_hosts
for better readability. as `token_to_endpoint` is but a derived
variable from `all_hosts`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
4a748c7fb0 test/nodetool: keep tokens as Host's member
to be more consistent with the test_status.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
aefc385786 test/nodetool: remove unused import
and add two empty lines in between global functions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Botond Dénes
b69ee6bc27 Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.

Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.

Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.

Fixes #17315.

Closes scylladb/scylladb#17449

* github.com:scylladb/scylladb:
  test: test_tablets: Add load-and-stream test
  sstables_loader: Stream to pending tablet replica if needed
  sstables_loader: Implement tablet based load-and-stream
  sstables_loader: Virtualize sstable_streamer for tablet
  sstables_loader: Avoid reallocations in vector
  sstable_loader: Decouple sstable streaming from selection
  sstables_loader: Introduce sstable_streamer
  Fix online SSTable loading with concurrent tablet migration
2024-03-07 14:18:30 +02:00
Nadav Har'El
19bcea6216 materialized views: fix rare failure caused by empty update
This one-line patch fixes a failure in the dtest

        lwt_schema_modification_test.py::TestLWTSchemaModification
        ::test_table_alter_delete

Where an update sometimes failed due to an internal server error, and the
log had the mysterious warning message:

        "std::logic_error (Empty materialized view updated)"

We've also seen this log-message in the past in another user's log, and
never understood what it meant.

It turns out that the error message was generated (and warning printed)
while building view updates for a base-table mutation, and noticing that
the base mutation contains an *empty* row - a row with no cells or
tombstone or anything whatsoever. This case was deemed (8 years ago,
in d5a61a8c48) unexpected and nonsensical,
and we threw an exception. But this case actually *can* happen - here is
how it happened in test_table_alter_delete - which is a test involving
a strange combination of materialized views, LWT and schema changes:

 1. A table has a materialized view, and also a regular column "int_col".
 2. A background thread repeatedly drops and re-creates this column
    int_col.
 3. Another thread deletes rows with LWT ("IF EXISTS").
 4. These LWT operations each reads the existing row, and because of
    repeated drop-and-recreate of the "int_col" column, sometimes this
    read notices that one node has a value for int_col and the other
    doesn't, and creates a read-repair mutation setting int_col (the
    difference between the two reads includes just in this column).
 5. The node missing "int_col" receives this mutation which sets only
    int_col. It upgrade()s this mutation to its most recent schema,
    which doesn't have int_col, so it removes this column from the
    mutation row - and is left with a completely empty mutation row.
    This completely empty row is not useful, but upgrade() doesn't
    remove it.
 6. The view-update generation code sees this empty base-mutation row
    and fails it with this std::logic_error.
 7. The node which sent the read-repair mutation sees that the read
    repair failed, so it fails the read and therefore fails the LWT
    delete operation.
    It is this LWT operation which failed in the test, and caused
    the whole test to fail.

The fix is trivial: an empty base-table row mutation should simply be
*ignored* when generating view updates - it shouldn't cause any error.

Before this patch, test_table_alter_delete used to fail in roughly
20% of the runs on my laptop. After this patch, I ran it 100 times
without a single failure.

Fixes #15228
Fixes #17549

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17607
2024-03-07 12:00:43 +02:00
Botond Dénes
09068d20ea tools/scylla-nodetool: scrub: make keyspace parameter optional
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.

Closes scylladb/scylladb#17495
2024-03-07 11:15:46 +02:00
Tomasz Grabiec
ec6ed18b5c Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.

refs: #16527

Closes scylladb/scylladb#17450

* github.com:scylladb/scylladb:
  topology.tablets_migration: Handle failed use_new
  topology.tablets_migration: Handle failed write_both_read_new
  topology.tablets_migration: Handle failed write_both_read_old
  topology.tablets_migration: Handle failed allow_write_both_read_old
  test/tablets_migration: Add conditional break-point into barrier handler
  replica: Add helper to read tablet transition stage
  topology_coordinator: Add action_failed() helper
2024-03-07 09:56:13 +01:00
Botond Dénes
5dfaa69bde tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
2024-03-07 03:54:54 -05:00
Botond Dénes
80483ba732 tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
Print a message and exit, don't continue to output the snapshot table.
This is what the legacy nodetool does too.
2024-03-07 03:54:54 -05:00
Botond Dénes
ac15e4c109 tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads
These two parameters are not used by the native nodetool, because
ScyllaDB itself doesn't support them. These should be just ignored and
indeed there was a unit test checking that this is the case. However,
due to a mistake in the unit test, this was not actually tested and
nodetool complained when seeing these params.
This patch fixes both the test and the native nodetool.

Closes scylladb/scylladb#17477
2024-03-07 11:53:50 +03:00
Nadav Har'El
a36c8b28dd Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai
this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general.

Closes scylladb/scylladb#17668

* github.com:scylladb/scylladb:
  scylla-gdb: s/if not foo is None/if foo is not None/
  scylla-gdb.py: add space after keyword
  scylla-gdb.py: remove extraneous spaces
  scylla-gdb.py: use 2 empty lines between top-level funcs/classes
  scylla-gdb.py: replace <tab> with 4 spaces
  scylla-gdb: fix the indent
2024-03-07 10:41:15 +02:00
Botond Dénes
28639e6a59 Merge 'docs: trigger the docs-pages workflow on release branches' from Beni Peled
Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change,

In this change, I added the `branch-**` pattern, so changes to release branches will trigger the workflow

Closes scylladb/scylladb#17281

* github.com:scylladb/scylladb:
  docs: always build from the default branch
  docs: trigger the docs-pages workflow on release branches
2024-03-07 10:01:50 +02:00
Botond Dénes
75fe2f5c3a Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk
Fix test_compaction_task.py, test_repair_task.py and
test_storage_service.py to work with tablets.

Fixes: #17338.

Closes scylladb/scylladb#17474

* github.com:scylladb/scylladb:
  test: rest_api: enable tablets by default
  test: fix indentation and delete unused this_dc param
  test: rest_api: fix test_storage_service.py
  test: rest_api: fix test_repair_task.py
  test: rest_api: fix test_compaction_task.py
  test: rest_api: use skip_without_tablets fixture
  test: rest_api: add some tablet related fixtures
2024-03-07 10:00:09 +02:00
Asias He
83a28342ea service: Drop unused table param from session_topology_guard
The table param is not used. Dropping it so it can be used in places
where the table object is not available.

Closes scylladb/scylladb#17628
2024-03-07 09:34:40 +02:00
Israel Fruchter
6eb0509ff9 Update tools/cqlsh submodule
* tools/cqlsh b8d86b76...e5f5eafd (2):
  > dist/debian: fix the trailer line format
  > `COPY TO STDOUT` shouldn't put None where a function is expected

Fixes: scylladb/scylladb#17451

Closes scylladb/scylladb#17447
2024-03-07 09:33:36 +02:00
Michał Chojnowski
f9e97fa632 sstables: fix a use-after-free in key_view::explode()
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.

This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.

Fixes #17625

Closes scylladb/scylladb#17626
2024-03-07 09:07:07 +02:00
Kefu Chai
7631605892 query-request: use default-generated operator==
instead of using the hand-crafted operator==, use the default-generated
one, which is equivalent to the former.

regarding the difference between global operator== and member operator==,
the default-generated operator in C++20 is now symmetric. so we don't
need to worry about the problem of `max_result_size` being lhs or rhs.
but neither do we need to worry about the implicit conversion, because
all constructors of `max_result_size` are marked explicit. so we don't
gain any advantage by making the operator== global instead of a member
operator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17536
2024-03-07 09:02:42 +03:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Kefu Chai
6ef507e842 build: cmake: add table_check.cc to repair/CMakeLists.txt
in 5202bb9d, we introduced repair/table_check.cc, but we didn't
update repair/CMakeLists.txt accordingly. but the symbols defined
by this compilation unit is referenced by other source files when
building scylla.

so, in this change, we add this table_check.cc to the "repair"
target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17517
2024-03-07 08:59:02 +03:00
Pavel Emelyanov
52a1b2c413 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer

Refs #13245

Closes scylladb/scylladb#17521

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for position_range
  mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
  mutation: add fmt::formatter for mutation_fragment_v2::printer
2024-03-07 08:56:21 +03:00
Pavel Emelyanov
df6048adec topology.tablets_migration: Handle failed use_new
This stage doesn't need any special treatment, because we cannot revert
to old replicas and should proceed normally. The barrier itself won't
get stuck, because it already handles excluded/ignored nodes.

Just make the test validate it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
fb7428c560 topology.tablets_migration: Handle failed write_both_read_new
Two options here -- go revert to old replicas by jumping into
cleanup_target stage or proceed noramlly. The choice depends on which
replica set has less number of dead nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
324eaaf873 topology.tablets_migration: Handle failed write_both_read_old
At this stage it can happen that target replica got some writes, so its
tablet needs to be cleaned up, so jump to cleanup_target stage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f81e0b2e88 topology.tablets_migration: Handle failed allow_write_both_read_old
This is early stage, just proceed to existing revert_migration

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
5bb1597a30 test/tablets_migration: Add conditional break-point into barrier handler
There are several transition stages that are executed by the topology
coordinator with the help of barrier-and-drain raft commands. For the
test to stop and remove a node while handling this stage it must inject
a break-point into barrier handler, wait for it to happen and then stop
the node without resuming the break-point. Then removenode from the
cluster.

The break-point suspends barrier handling when a specific tablet is in
specific transition stage. Tablet ID and desired stage are configured
via injector parameters.

With today's error-injection facilities the way to suspend code
execution is with injecting a lambda that waits for a message from the
injection engine.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f5264dc501 replica: Add helper to read tablet transition stage
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:25 +03:00
Kefu Chai
4f8b618be7 scylla-gdb: s/if not foo is None/if foo is not None/
more readable this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
643a6d5bda scylla-gdb.py: add space after keyword
it'd be more pythonic to just put an expression after `assert`,
instead of quoting it with a pair of parenthesis. and there is no need
to add `;` after `break`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8c65f92f1f scylla-gdb.py: remove extraneous spaces
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
12c06c39c3 scylla-gdb.py: use 2 empty lines between top-level funcs/classes
and 1 empty line for nested functions/classes, to be more PEP8
compliant. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8e3b22c76a scylla-gdb.py: replace <tab> with 4 spaces
do not mix tab and spaces for indent

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
c4b679fe3b scylla-gdb: fix the indent
indent should be multiple of 4 spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Pavel Emelyanov
79b5a75ded topology_coordinator: Add action_failed() helper
It checks if the action holder holds a failed action.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:46:29 +03:00
Botond Dénes
8dd6fe75e7 Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17498

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement info
  test/nodetool: move format_size into utils.py
2024-03-07 07:14:51 +02:00
Avi Kivity
c5f01349b1 Merge 'Add specialized tablet_sstable_set' from Benny Halevy
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

This sstable_set provides much more efficient access
to the table's sstable sets as it takes advantage of the disjointness
of sstable sets between tablets/storage_groups, and making it is cheaper
that rebuilding a complete partitioned_sstable_set from all sstables in the table.

Fixes #16876

Cassandra-stress setup:
```
$ sudo cpupower frequency-set -g userspace
$ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets
cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048};
$ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
```

Baseline (0a7854ea4d) vs. fix (0c2c00f01b)

Throughput (op/s):
workload | baseline | fix
---------|----------|----------
write | 76,806 | 100,787
read | 34,330 | 106,099
mixed | 32,195 | 79,246

Closes scylladb/scylladb#17149

* github.com:scylladb/scylladb:
  table: tablet_storage_group_manager: make tablet_sstable_set
  storage_group_manager: add make_sstable_set
  tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
  table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
  table: move compaction_group_list and storage_group_vector to storage_group_manager
  compaction_group::table_state: get_group_id: become self-sufficient
  compaction_group, table: make_compound_sstable_set: declare as const
  tablet_storage_group_manager: precalculate my_host_id and _tablet_map
  table: coroutinize update_effective_replication_map
2024-03-06 23:59:39 +02:00
Botond Dénes
557d851191 tools/toolchain/README.md: mention the need of credentials for publishing images
Without this, the push will fail, complaining about bad permissions.

Closes scylladb/scylladb#17652
2024-03-06 15:58:24 +02:00
Kefu Chai
3e91b1382b tools/scylla-nodetool: always use compile-time format string
instead of passing fmt string as a plain `const char*`, pass it as
a consteval type, so that `fmt::format()` can perform compile-time
format check against it and the formatted params.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17656
2024-03-06 14:55:10 +02:00
Avi Kivity
3ab2088119 Merge 'build: cmake: use scylla build mode for rust profile name ' from Kefu Chai
before this change, we used the lower-case CMake build configuration
name for the rust profile names. but this was wrong, because the
profiles are named with the scylla build mode.

in this change, we translate the `$<CONFIG>` to scylla build mode,
and use it for the profile name and for the output directory of
the built library.

Closes scylladb/scylladb#17648

* github.com:scylladb/scylladb:
  build: cmake: use scylla build mode for rust profile name
  build: cmake: define per-config build mode
2024-03-06 13:46:20 +02:00
Botond Dénes
65b9e10543 repair: resolve start-up deadlock
Repairs have to obtain a permit to the reader concurrency semaphore on
each shard they have a presence on. This is prone to deadlocks:

node1                              node2
repair1_master (takes permit)      repair1_follower (waits on permit)
repair2_master (waits for permit)  repair2_follower (takes permit)

In lieu of strong central coordination, we solved this by making permits
evictable: if repair2 can evict repair1's permit so it can obtain one
and make progress. This is not efficient as evicting a permit usually
means discarding already done work, but it prevents the deadlocks.
We recently discovered that there is a window when deadlocks can still
happen. The permit is made evictable when the disk reader is created.
This reader is an evictable one, which effectively makes the permit
evictable. But the permit is obtained when the repair constrol
structrure -- repair meta -- is create. Between creating the repair meta
and reading the first row from disk, the deadlock is still possible. And
we know that what is possible, will happen (and did happen). Fix by
making the permit evictable as soon as the repair meta is created. This
is very clunky and we should have a better API for this (refs #17644),
but for now we go with this simple patch, to make it easy to backport.

Refs: #17644
Fixes: #17591

Closes scylladb/scylladb#17646
2024-03-06 11:38:07 +02:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Botond Dénes
58265a7dc1 tools/utils: fix use-after-free when printing error message for unknown operation
When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.

Fixes: #17584

Closes scylladb/scylladb#17586
2024-03-06 10:42:02 +02:00
Pavel Emelyanov
ca8bfed8e6 topology_coordinator: Demote log level for advance_in_background() errors
The helper in question is supposed to spawn a background fiber with
tablet migration stage action and repeat it in case action fails (until
operator intervention, but that's another story). In case action fails
a message with ERROR level is logger about the failure.

This error confuses some tests that scan scylla log messages for
ERROR-s at the end, treat most of them (if not all) as ciritical and
fail. But this particular message is not in fact an error -- topology
coordinator would re-execute this action anyway, so let's demote the
message to be WARN instead.

refs: #17027

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17568
2024-03-06 10:39:00 +02:00
Botond Dénes
88a76245ba Merge 'Get metrics description' from Amnon Heiman
This series adds a Python script that searches the code for metrics definition and their description.
Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values.

The script supports the code that uses string format and string concatenation with variables.

The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases.

Replaces #16328

Closes scylladb/scylladb#17479

* github.com:scylladb/scylladb:
  Adding scripts/metrics-config.yml
  Adding scripts/get_description.py to fetch metrics description
2024-03-06 10:37:35 +02:00
Kefu Chai
e248ab48db tools/scylla-nodetool: correct tablestats filtering
before this change, we failed to apply the filtering of tablestats
command in the right way:

1. `table_filter` failed to check if delimiter is npos before
   extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
   included by the filter.
3. the total number of tables in the stats report should contain
   all tables no matter they are filtered or not.

in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17468
2024-03-06 10:36:20 +02:00
Benny Halevy
0c2c00f01b table: tablet_storage_group_manager: make tablet_sstable_set
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

Refs #16876

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
0745865914 storage_group_manager: add make_sstable_set
Move the responsibility for preparing the table_set
covering all sstables in the table to the storage_group_manager
so it can specialize the sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
3cee24c148 tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
Mini-cleanup of `new_tablet_count`, similar
to pre-calculating `old_tablet_count` once.

While at it, add some missing coding-style related spaces.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
c65768dc24 table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
No validation is really required in release build.
Add `#ifndef SCYLLA_BUILD_MODE_RELEASE` before
adding another term to the logic in the next patch
that adds support for sparse allocation in a cloned
tablet_storage_group_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
7f203f0551 table: move compaction_group_list and storage_group_vector to storage_group_manager
So the storage_group_manager can be used later by table_sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:33 +02:00
Tzach Livyatan
a245c0bb98 Docs: Remove 3rd party Rust Driver from the driver list
The 3rd party Rust https://github.com/AlexPikalov/cdrs is not maintained, and we have a better internal alternative.

Closes scylladb/scylladb#15815
2024-03-06 10:34:43 +02:00
Aleksandra Martyniuk
923ef3c8c8 repair: reuse table name from repair_range argument
Currently in shard_repair_task_impl::repair_range table name is
retrieved with database::find_column_family and in case of exception,
we return from the function.

But the table name is already kept in table_info passed to repair_range
as an argument. Let's reuse it. If a table is dropped, we will find it
out almost immediately after calling repair_cf_range_row_level and
handle it more adequately.

Closes scylladb/scylladb#17245
2024-03-06 10:34:21 +02:00
Botond Dénes
41424231f1 Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.

Fixes https://github.com/scylladb/scylladb/issues/16966

Closes scylladb/scylladb#17395

* github.com:scylladb/scylladb:
  test/topology_custom: add testcase to verify reshape with tablets
  test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
  replica/distributed_loader: enable reshape for sstables
  compaction: reshape sstables within compaction groups
  replica/table : add method to get compaction group id for an sstable
  compaction: reshape: update total reshaped size only on success
  compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
2024-03-06 10:33:56 +02:00
Botond Dénes
f164ed8bae Merge 'docs: fix the formattings in operating-scylla/nodetool-commands/info.rst' from Kefu Chai
couple minor formatting fixes.

Closes scylladb/scylladb#17518

* github.com:scylladb/scylladb:
  docs: remove leading space in table element
  docs: remove space in words
2024-03-06 10:33:21 +02:00
Tzach Livyatan
dafc83205b Docs: rename the select-from-mutation-fragments page name
Closes scylladb/scylladb#17456
2024-03-06 10:32:56 +02:00
David Garcia
d27d89fd34 docs: add collapsible for images
Introduces collapsible dropdowns for images reference docs. With this update, only the latest version's details will be displayed open by default. Information about previous versions will be hidden under dropdowns, which users can expand as needed. This enhancement aims to make pages shorter and easier to navigate.

Closes scylladb/scylladb#17492
2024-03-06 10:32:35 +02:00
Botond Dénes
dce42b2517 Merge 'tools/scylla-nodetool: fixes to address the test failure with dtest' from Kefu Chai
- use API endpoint of /storage_service/toppartition/
- only print out the specified samplings.
- print "\n" separator between samplings

Closes scylladb/scylladb#17574

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print separator between samplings
  tools/scylla-nodetool: only print the specified sampling
  tools/scylla-nodetool: use /storage_service/toppartition/
2024-03-06 10:27:25 +02:00
David Garcia
847882b981 docs: add dynamic substitutions
This pull request adds dynamic substitutions for the following variables:

* `.. |CURRENT_VERSION| replace:: {current_version}`
* `.. |UBUNTU_SCYLLADB_LIST| replace:: scylla-{current_version}.list`
* `.. |CENTOS_SCYLLADB_REPO| replace:: scylla-{current_version}.repo`

As a result, it is no longer needed to update the "Installation on Linux" page manually after every new release.

Closes scylladb/scylladb#17544
2024-03-06 10:25:57 +02:00
comsky
48ad1b3d20 Update stats-output.rst
I read this doc to learn how to use nodetool commands, and I eventually found some typos in the docs. 😄

Closes scylladb/scylladb#15771
2024-03-06 10:25:06 +02:00
Kefu Chai
7bb33a1f8d node_ops: add fmt::formatter for node_ops_cmd and node_ops_cmd_request
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* node_ops_cmd
* node_ops_cmd_request

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17505
2024-03-06 10:24:31 +02:00
Benny Halevy
dc10d02890 compaction_group::table_state: get_group_id: become self-sufficient
Printing the compaction_group group_id as "i/size"
where size is the total number of compaction_groups in
the table is convenient but it comes with a price
of a circular dependency on the table, as noted by
Aleksandra Martyniuk in c25827feb3 (r1511341251),
which can be triggered when hitting an error when adding the
compaction_group::table_state to the table's compaction_manager
within the table's constructor.

This patch just prints the _group_id member
resolving the dependency on the table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:21:48 +02:00
Avi Kivity
6383aa1e3c docs: maintainer.md: add exceptions to the don't-commit-your-own-code rules
Submodule and toolchain updates aren't original code and so are exempt
from the don't-commit-own-code rule.

Closes scylladb/scylladb#17534
2024-03-06 10:19:46 +02:00
Tzach Livyatan
04b483e286 Docs: fix RF type in the consistency-calculator
Closes scylladb/scylladb#17557
2024-03-06 10:18:29 +02:00
Kefu Chai
d93b018bcf create-relocatable-package.py: add --debian-dir option
before this change, we assume that debian packaging directory is
always located under `build/debian/debian`. which is hardwired by
`configure.py`. but this could might hold anymore, if we want to
have a self-contained build, in the sense that different builds do
not share the same build directory. this could be a waste for the
non-mult-config build, but `configure.py` uses mult-config generator
when building with CMake. so in that case, all builds still share the
same $build_dir/debian/ directory.

in order to work with the out-of-source build, where the build
directory is not necessarily "build", a new option is added to
`create-relocatable-package.py`, this allows us to specify the directory
where "debian" artifacts are located.

Refs #15241

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17558
2024-03-06 10:18:00 +02:00
Kefu Chai
19e02de1aa transport/controller: remove unused struct definition
the removed struct definition is not used, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17537
2024-03-06 10:17:08 +02:00
Tzach Livyatan
1edce9f4b6 Improve the frozen vs. non-frozen doc section, removing falses claimes
Closes scylladb/scylladb#17556
2024-03-06 10:16:33 +02:00
Kefu Chai
4d4c0ddf31 build: cmake: exclude Seastar's tests from "all"
in 02de9f1833, we enable building Seastar testing for using the
testing facilities in scylla's own tests. but this brings in
Seastar's tests.

since scylladb's CI builds the "all" targets, and we are not
interested in running Seastar's tests when building scylladb,
let's exclude Seastar's tests from the "all" target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17554
2024-03-06 10:15:45 +02:00
Benny Halevy
bfe13daed4 compaction_group, table: make_compound_sstable_set: declare as const
It does not modify the compaction_group/table respectively.
This is required by the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Benny Halevy
d7b1851449 tablet_storage_group_manager: precalculate my_host_id and _tablet_map
The node host_id never changes, so get it once,
when the object is constructed.

A pointer to the tablet_map is taken when constructed
using the effective_replication_map and it is
updated whenever the e_r_m changes, using a newly added
`update_effective_replication_map` method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Benny Halevy
f2ff701489 table: coroutinize update_effective_replication_map
It's better to wait on deregistering the
old main compaction_groups:s in handle_tablet_split_completion
rather than leaving work in the background.
Especially since their respective storage_groups
are being destroyed by handle_tablet_split_completion.

handle_tablet_split_completion keeps a continuation chain
for all non-ready compaction_group stop fibers.
and returns it so that update_effective_replication_map
can await it, leaving no cleanup work in the background.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Konstantin Osipov
39d882ddca main: print pid (process id) at start
Print process id to the log at start.
It aids debugging/administering the instance if you have multiple
instances running on the same machine.

Closes scylladb/scylladb#17582
2024-03-06 10:14:22 +02:00
Kefu Chai
80d2981473 dist/docker: collect deb packages from different dir for CMake builds
CMake generate debian packages under build/$<CONFIG>/debian instead of
build/$mode/debian. so let's translate $mode to $<CONFIG> if
build.ninja is found under build/ directory, as configure.py puts
build.ninja under $top_srcdir, while CMake puts it under build/ .

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17592
2024-03-06 10:13:47 +02:00
Botond Dénes
d37ac1545b Merge 'build: cmake: fixes for debian packaging' from Kefu Chai
- changes to use build/$<CONFIG> for build directory
- add ${CMAKE_BINARY_DIR}/debian as a dep
- generate deb packages under build/$<CONFIG>/debian

Closes scylladb/scylladb#17560

* github.com:scylladb/scylladb:
  build: cmake: generate deb packages under build/$<CONFIG>/debian
  build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep
  build: cmake: use build/$<CONFIG>/ instead of build
  build: cmake: always pass absolute path for add_stripped()
2024-03-06 10:12:18 +02:00
Anna Stuchlik
a024c2d692 doc: remove Membership changes vs LWT page
This commit removes the redundant
"Cluster membership changes and LWT consistency" page.

The page is no longer useful because the Raft algorithm
serializes topology operations, which results in
consistent topology updates.

Closes scylladb/scylladb#17523
2024-03-06 10:10:01 +02:00
Kefu Chai
e8473d6d03 row_cache: add fmt::formatter for cache_entry
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for cache_entry, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17594
2024-03-06 10:08:11 +02:00
Botond Dénes
6f374aa7d6 Merge 'doc: update procedures following the introduction of Raft-based topology' from Anna Stuchlik
This PR updates the procedures that changed as a result of introducing Raft-based topology.

Refs https://github.com/scylladb/scylladb/issues/15934
Applied the updates from https://docs.google.com/document/d/1BgZaYtKHs2GZKAxudBZv4G7uwaXcRt2jM6TK9dctRQg/edit

In addition, it adds a placeholder for the 5.4-to-6.0 upgrade guide, as a file included in that guide, Enable Raft topology, is referenced from other places in the docs.

Closes scylladb/scylladb#17500

* github.com:scylladb/scylladb:
  doc: replace "Raft Topology" with "Consistent Topology"
  doc: (Raft topology) update Removenode
  doc: (Raft topology) update Upscale a Cluster
  doc:(Raft topology)update Membership Change Failures
  doc: doc: (Raft topology) update Replace Dead Node
  doc: (Raft topology) update Remove a Node
  doc: (Raft topology) update Add a New DC
  doc: (Raft topology) update Add a New Node
  doc: (Raft topology) update Create Cluster (EC2)
  doc: (Raft topology) update Create Cluster (n-DC)
  doc: (Raft topology) update Create Cluster (1DC)
  doc: include the quorum requirement file
  doc: add the quorum requirement file
  doc: add placeholder for Enable Raft topology page
2024-03-06 10:05:47 +02:00
Botond Dénes
c843f98769 Merge 'cql3: add fmt::formatter for cql3 types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* std::vector<data_type>
* column_identifier
* column_identifier_raw
* untyped_constant::type_class

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17538

* github.com:scylladb/scylladb:
  cql3: add fmt::formatter for expression::printer
  cql3: add fmt::formatter for raw_value{,_view}
  cql3: add fmt::formatter for std::vector<data_type>
  cql3: add fmt::formatter for untyped_constant::type_class
  cql3: add fmt::formatter for column_identifier{,_row}
2024-03-06 10:03:50 +02:00
Kefu Chai
1519904fb9 docs: quote CQL keywords
this "misspelling" was identified by codespell. actually, it's not
quite a misspelling, as "UPDATE" and "INSERT" are keywords in CQL.
so we intended to emaphasis them, so to make codespell more useful,
and to preserve the intention, let's quote the keywords with backticks.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17391
2024-03-06 09:57:07 +02:00
Kefu Chai
51a789afc1 build: cmake: use scylla build mode for rust profile name
before this change, we used the lower-case CMake build configuration
name for the rust profile names. but this was wrong, because the
profiles are named with the scylla build mode.

in this change, we translate the $<CONFIG> to scylla build mode,
and use it for the profile name and for the output directory of
the built library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-06 15:53:11 +08:00
Kefu Chai
0c1864eebd build: cmake: define per-config build mode
so that scylla_build_mode_$<CONFIG> can be referenced when necessary.
we using it for referencing build mode in the building system instead
of the CMake configuration name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-06 15:53:11 +08:00
Kefu Chai
7e9b0d3d9e network_topology_strategy: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17642
2024-03-06 09:52:20 +02:00
Botond Dénes
c370f42d8b Merge 'Automation of ScyllaDB backports - Phase #1: Master → OSS backports' from Yaron Kaikov
This PR includes 3 commits:

- **[actions] Add a check for backport labels**: As part of the Automation of ScyllaDB backports project, each PR should get either a `backport/none` or `backport/X.Y` label. Based on this label we will automatically open a backport PR for the relevant OSS release.

In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of `master` or `next`. For releases, we don't need this check

- **Add Mergify (https://mergify.com/) configuration file**: In this PR we introduce the `.mergify.yml` configuration file, which
include a set of rules that we will use for automating our backport
process.

For each supported OSS release (currently 5.2 and 5.4) we have an almost
identical configuration section which includes the four conditions before
we open a backport pr:
* PR should be closed
* PR should have the proper label. for example: backport/5.4 (we can
  have multiple labels)
* Base branch should be `master`
* PR should be set with a `promoted` label - this condition will be set
  automatically once the commits are promoted to the `master` branch (passed
gating)

Once all conditions are applied, the verify bot will open a backport PR and
will assign it to the author of the original PR, then CI will start
running, and only after it pass. we merge

- **[action] Add promoted label when commits are in master**: In Scylla, we don't merge our PR but use ./script/pull_github_pr.sh` to close the pull request, adding `closes scylladb/scylladb <PR number>` remark and push changes to `next` branch.

One of the conditions for opening a backport PR is that all relevant commits are in `master` (passed gating), in this GitHub action, we will go through the list of commits once a push was made to `master` and will identify the relevant PR, and add `promoted` label to it. This will allow Mergify to start the process of backporting

Closes scylladb/scylladb#17365

* github.com:scylladb/scylladb:
  [action] Add promoted label when commits are in master
  Add mergify (https://mergify.com/) configuration file
  [actions] Add a check for backport labels
2024-03-06 09:50:30 +02:00
Dawid Medrek
b36becc1f3 db/hints: Fix too_many_in_flight_hints_for
The semantics of the function was accidentally
modified in 6e79d64. The consequence of the change
was that we didn't limit memory consumption:
the function always returned false for any node
different from the local node. The returned value
is used by storage_proxy to decide whether it
is able to store a hint or not.

This commit fixes the problem by taking other
nodes into consideration again.

Fixes #17636

Closes scylladb/scylladb#17639
2024-03-06 09:48:30 +02:00
Benny Halevy
08b0426318 scripts/open-coredump.sh: calculate MAIN_BRANCH before cloning repo
We need MAIN_BRANCH calculated earlier so we can use it
to checkout the right branch when cloning the src repo
(either `master` or `enterprise`, based on the detected `PRODUCT`)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17647
2024-03-06 09:46:30 +02:00
Avi Kivity
c32a4c8d5c build: docker: clean up after docker build
The `buildah commit` command doesn't remove the working container. These
accumulate in ~/.local/container/storage until something bad happens.

Fix by adding the `--rm` flag to remove the container and volume.

Closes scylladb/scylladb#17546
2024-03-06 09:41:36 +02:00
Kefu Chai
3d8ac06ee8 cql3: add fmt::formatter for expression::printer
before this change, we already have a `fmt::formatter` specialized for
`expression::printer`. but the formatter was implemented by

1. formatting the `printer` instance to an `ostringstream`, and
2. extracting a `std::string` from this `ostringstream`
3. formatting the `std::string` instance to the fmt context

this is convoluted and is not an optimal implementation. so,
in this change, it is reimplemented by formatting directly to
the context. its operator<< is also dropped in this change.
please note, to avoid adding the large chunk of code into the
.hh file, the implementation is put in the .cc file. but in order
to preserve the usage of `transformed(fmt::to_string<expression::printer>)`,
the `format()` function is defined as a template, and instantiated
explicitly for two use cases:

1. to format to `fmt::context`
2. to format using `fmt::to_string()`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Kefu Chai
fc774361e8 cql3: add fmt::formatter for raw_value{,_view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raw_value
* raw_value_view

`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.

`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Kamil Braun
0a7854ea4d Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.

Fixes scylladb/scylladb#15924
Fixes scylladb/scylladb#15962

Closes scylladb/scylladb#17585

* github.com:scylladb/scylladb:
  test: test_topology_ops: reenable background writes without tablets
  test: test_topology_ops: run with and without tablets
  test: topology: decrease the server's request timeouts
2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak
f1d9248df9 test: wait for CDC generations publishing before checking CDC-topology consistency
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.

The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.

For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.

To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.

Fixes scylladb/scylladb#17587
Fixes scylladb/scylladb#17600
Fixes scylladb/scylladb#17621

Closes scylladb/scylladb#17622
2024-03-04 19:28:51 +02:00
Kamil Braun
ec1f574b3a test/pylib: util: silence exception from refresh_nodes
Driver's `refresh_nodes` function may throw an exception if we call it
in the middle of driver reconnecting. Silence it.

Fixes scylladb/scylladb#17616

Closes scylladb/scylladb#17620
2024-03-04 17:50:16 +02:00
Avi Kivity
e3de30f943 tools: toolchain: update frozen toolchain for python driver 3.26.7
Fixes scylladb/scylladb#16709
Fixes scylladb/scylladb#17353

Closes scylladb/scylladb#17604
2024-03-03 16:36:14 +02:00
Kefu Chai
4cc5fcde72 cql3: add fmt::formatter for std::vector<data_type>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for std::vector<data_type>,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Kefu Chai
ed6dc6e3b4 cql3: add fmt::formatter for untyped_constant::type_class
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for untyped_constant::type_class,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Kefu Chai
213d13a31c cql3: add fmt::formatter for column_identifier{,_row}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_identifier
* column_identifier_raw

and their operator<<:s are dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Marcin Maliszkiewicz
eb56ae3bb9 test: extend auth-v2 migration test to catch stale static 2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz
6c30dc6351 test: add auth-v2 migration test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
53996e2557 test: add auth-v2 snapshot transfer test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
4f65e173cf test: auth: add tests for lost quorum and command splitting
With auth-v2 we can login even if quorum is lost. So test
which checks if error occurs in such situation is deleted
and the opposite test which checks if logging in works was
added.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
a5f81f0836 test: pylib: disconnect driver before re-connection 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
1badd09d45 test: adjust tests for auth-v2 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
ebb0ffeb6c auth: implement auth-v2 migration
During raft topology upgrade procedure data from
system_auth keyspace will be migrated to system_auth_v2.

Migration works mostly on top of CQL layer to minimize
amount of new code introduced, it mostly executes SELECTs
on old tables and then INSERTs on new tables. Writes are
not executed as usual but rather announced via raft.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
a8175ce5c6 auth: remove static from queries on auth-v2 path
Because keyspace is part of the query when we
migrate from v1 to v2 query should change otherwise
code would operate on old keyspace if those statics
were initialized.

Likewise keyspace name can no longer be class
field initialized in constructor as it can change
during class lifetime.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
ca488c5777 auth: coroutinize functions in password_authenticator
Affected functions are: create, create_default_if_missing,
authenticate, alter, drop
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
9f172f1843 auth: coroutinize functions in standard_role_manager
Affected functions are: find_record, create_default_role_if_missing,
create_or_replace, drop, modify_membership, query_all, get_attribute,
set_attribute, remove_attribute
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
896b474db0 auth: coroutinize functions in default_authorizer
Affected functions: authorize, list_all, revoke_all
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
5a6d4dbc37 storage_service: add support for auth-v2 raft snapshots
This patch adds new RPC for pulling snapshot of auth tables.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
c27a84d8e7 storage_service: extract getting mutations in raft snapshot to a common function 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
17572a0e44 auth: service: capture string_view by value
This doesn't seem to fix anything but typically
we capture string_view by value, so do it consistently
the same way.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
9cb1f111d5 alternator: add support for auth-v2
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.

Docs will be updated later, when auth-v2
is enabled as default.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
913a773b1a auth: add auth-v2 write paths
All auth modifications will go now via group0.
This is achieved by acquiring group0 guard,
creating mutations without executing and
then announcing them.

Actually first guard is taken by query processor,
it serves as read barrier for query validations
(such as standard_role_manager::exists), otherwise
we could read older data. In principle this single
guard should be used for entire query but it's impossible
to achive with current code without major refactor.

For read before write cases it's good to do write with
the guard acquired before the read so that there
wouldn't be any modify operation allowed in between.
Alought not doing it doesn't make the implementation
worse than it currently is so the most complex cases
were left with FIXME.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
7f204a6e80 auth: add raft_group0_client as dependency
Most auth classes need this to be able to announce
raft commands.

Usage added in subsequent commit.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
bd444ed6f1 cql3: auth: add a way to create mutations without executing
To make table modifications go via raft we need to publish
mutations. Currently many system tables (especially auth) use
CQL to generate table modifications. Added function is a missing
link which will allow to do a seamless transition of certain
system tables to raft.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
b482679857 cql3: run auth DML writes on shard 0 and with raft guard
Because we'll be doing group0 operations we need to run on shard 0. Additional benefit
is that with needs_guard set query_processor will also do automatic retries in case of
concurrent group0 operations.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
5607aa590e service: don't loose service_level_controller when bouncing client_state
When bounce_to_shard happens we need to fill client_state with
sl_controller appropriate for destination shard.

Before the patch sl_controller was set to null after the bounce.
It was fine becauase looks like it was never used in such scenario.
With auth-v2 we need to bounce attach/detach service level statements
because they modify things via auth subsystem which needs to be called
on shard 0.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
e26e786340 auth: put system_auth and users consts in legacy namespace
This is done to clearly mark legacy (no longer used, once auth-v2
feature becomes default) code paths.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
661eec6e07 cql3: parametrize keyspace name in auth related statements 2024-03-01 16:25:11 +01:00
Marcin Maliszkiewicz
6728965869 auth: parametrize keyspace name in roles metadata helpers 2024-03-01 16:25:03 +01:00
Marcin Maliszkiewicz
f9b985b68c auth: parametrize keyspace name in password_authenticator
It's the same approach as done for standard_role_manager in
earlier commit.
2024-03-01 16:24:54 +01:00
Marcin Maliszkiewicz
1901b1c808 auth: parametrize keyspace name in standard_role_manager
It's the same approach as done for default_authorizer in
earlier commit.

Note that only non-legacy paths were changed, in particular
legacy migrations and table creations won't be ever executed
in new keyspace as they will be managed by system_auth_keyspace
implementation.

For now we add keyspace name as class member because it's static
value anyway. But statics will be removed in future commits because
migration can occur and auth need to switch keyspace name in runtime.
2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz
12d7b40b34 auth: remove redundant consts auth::meta::*::qualified_name
Just follow the same pattern as in default_authorizer so
it's easy to track where system_auth keyspace is actually
used. It will also allow for easier parametrization.
2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz
ae2d8975b9 auth: parametrize keyspace name in default_authorizer
When adding group0 replication for auth we will change only
write path and plan to reuse read path. To not copy the code
or make more complicated class hierarchy default_authorizer's
read code will remain unchanged except this parametrization,
it is needed as group0 implementation uses separate keyspace
(replication is defined on a keyspace level).

In subsequent commits legacy write path code will be separated
and new implementation placed in default_authorizer.

For now we add keyspace name as class member because it's static
value anyway. But statics will be removed in future commits because
migration can occur and auth need to switch keyspace name in runtime.
2024-03-01 16:22:17 +01:00
Gleb Natapov
94cd235888 topology_coordinator: drop group0 guard while changing raft configuration
Changing config under the guard can cause a deadlock.

The guard holds _read_apply_mutex. The same lock is held by the group0
apply() function. It means that no entry can be applied while the guard
is held and raft apply fiber may be even sleeping waiting for this lock
to be release. Configuration change OTOH waits for the config change
command to be committed before returning, but the way raft is implement
is that commit notifications are triggered from apply fiber which may
be stuck. Deadlock.

Drop and re-take guard around configuration changes.

Fixes scylladb/scylladb#17186
2024-03-01 11:20:15 +01:00
Marcin Maliszkiewicz
d3679de1d2 db: make all system_auth_v2 tables use schema commitlog 2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz
a706424825 db: add system_auth_v2 tables
Their schema is equivalent to legacy tables
in system_auth.
2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz
9144d8203b db: add system_auth_v2 keyspace
New keyspace is added similarly as system_schema keyspace,
it's being registred via system_keyspace::make which calls
all_tables to build its schema.

Dummy table 'roles' is added as keyspaces are being currently
registered by walking through their tables. Full table schemas
will be added in subsequent commits.

Change can be observed via cqlsh:

cassandra@cqlsh> describe keyspaces;

system_auth_v2  system_schema       system         system_distributed_everywhere
system_auth     system_distributed  system_traces

cassandra@cqlsh> describe keyspace system_auth_v2;

CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;

CREATE TABLE system_auth_v2.roles (
    role text PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = 'comment'
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 604800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
2024-03-01 10:40:29 +01:00
Kefu Chai
ca7f7bf8e2 build: cmake: generate deb packages under build/$<CONFIG>/debian
this follows the convention of configure.py, which puts
debian packages under build/$mode/debian.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-01 09:50:30 +08:00
Patryk Jędrzejczak
e7d4e080e9 test: test_topology_ops: reenable background writes without tablets
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.

We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
>       await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E       cassandra.protocol.ConfigurationException: <Error from server: code=2300
        [Query invalid because of configuration issue] message="Datacenter
        datacenter1 doesn't have enough nodes for replication_factor=3">
```

The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.

Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.

Fixes scylladb/scylladb#15962
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
90317c5ceb test: test_topology_ops: run with and without tablets
`test_topology_ops` is a valuable test that has uncovered many bugs.
It's worth running it with and without tablets.
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
9dfb26428b test: topology: decrease the server's request timeouts
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.

We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.

Fixes scylladb/scylladb#15924
2024-02-29 18:37:38 +01:00
Gleb Natapov
4ef57096bc topology coordinator: fix use after free after streaming failure
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes #17577

Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>
2024-02-29 18:27:12 +02:00
Kamil Braun
57b14580f0 Merge 'move migration_request handling to shard0' from Gleb
The RPC is used by group0 now which is available only on shard0

Fixes scylladb/scylladb#17565

* 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev:
  raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
  migration_manager: fix indentation after the previous patch.
  messaging_service: process migration_request rpc on shard 0
2024-02-29 15:13:16 +01:00
Anna Stuchlik
85cfc6059b doc: replace "Raft Topology" with "Consistent Topology"
This commit replaces "Raft-based Topology" with
"Consistent Topology Updates"
in the 5.4-to-6.0 upgrade guide and all the links to it.
2024-02-29 14:42:30 +01:00
Anna Stuchlik
9250e0d8e0 doc: (Raft topology) update Removenode
This commit updates the Nodetool Removenode page
with reference to the Raft-related topology.
Specifically, it removes outdated warnings, and
adds the information about banning removed and ignored
nodes from the cluster.
2024-02-29 14:40:19 +01:00
Anna Stuchlik
d59f38a6ad doc: (Raft topology) update Upscale a Cluster
This commit updates the Upscale a Cluster page
with reference to the Raft-related topology.
Specifically, it adds a note with the quorum requirement.
2024-02-29 14:40:11 +01:00
Anna Stuchlik
5bece99d4d doc:(Raft topology)update Membership Change Failures
This commit updates the Handling Cluster Membership Change Failures page
with reference to the Raft-related topology.
Specifically, it adds a note that the page only applies when
Raft-based topology is not enabled.
In addition, it removes the Raft-enabled option.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
48dd7021a7 doc: doc: (Raft topology) update Replace Dead Node
This commit updates the Replace a Dead Node page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to replace the nodes one by one and the requirement to ensure
that the the replaced node will never come back to the cluster
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
a390ce9e6b doc: (Raft topology) update Remove a Node
This commit updates the Remove a Node page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to remove the nodes one by one and the requirement to ensure
that the the removed node will never come back to the cluster
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
59f890c0ef doc: (Raft topology) update Add a New DC
This commit updates the Add a New DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:36 +01:00
Anna Stuchlik
5a3a720b82 doc: (Raft topology) update Add a New Node
This commit updates the Add a New Node (Out Scale) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:35:03 +01:00
Anna Stuchlik
631fcebe12 doc: (Raft topology) update Create Cluster (EC2)
This commit updates the Create Cluster (EC2) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
b6b610c16e doc: (Raft topology) update Create Cluster (n-DC)
This commit updates the Create Cluster (Multi DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
cbf054f2b9 doc: (Raft topology) update Create Cluster (1DC)
This commit updates the Create Cluster (Single DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
57e0f15c7c doc: include the quorum requirement file
Include the file to avoid repetition.
2024-02-29 14:29:39 +01:00
Gleb Natapov
9847e272f9 raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
group0 operations a valid on shard 0 only. Assert that.
2024-02-29 12:39:48 +02:00
Gleb Natapov
77907b97f1 migration_manager: fix indentation after the previous patch. 2024-02-29 12:39:48 +02:00
Gleb Natapov
4a3c79625f messaging_service: process migration_request rpc on shard 0
Commit 0c376043eb added access to group0
semaphore which can be done on shard0 only. Unlike all other group0 rpcs
(that already always forwarded to shard0) migration_request does not
since it is an rpc that what reused from non raft days. The patch adds
the missing jump to shard0 before executing the rpc.
2024-02-29 12:39:48 +02:00
Petr Gusev
6afa80a443 sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.

The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.

In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.

Fixes scylladb/scylladb#17444

Closes scylladb/scylladb#17561
2024-02-29 10:18:20 +01:00
Kefu Chai
ce45f93caf tools/scylla-nodetool: print separator between samplings
instead of printing it out after samplings, we should print it
in between them. as toppartitions_test.py in dtest splits the
samplings using "\n\n". without this change, dtest would consider
the empty line as another sampling and then fail the test. as
the empty sampling does not match with the expected regular expressions.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Kefu Chai
a53457f740 tools/scylla-nodetool: only print the specified sampling
before this change, we print all samplings returned by the API,
but this is not what cassandra nodetool's behavior, which only
prints out the specified one. and the toppartitions_test.py
in dtest actually expects that the number of sampling should
match with the one specified with command line.

so, in this change, we only print out the specified samplings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Kefu Chai
604c7440d2 tools/scylla-nodetool: use /storage_service/toppartition/
instead of using the endpoint of /storage_service/toppartition,
use /storage_service/toppartition/. otherwise API server refuses
to return the expected result. as it does match with any API endpoint.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Anna Stuchlik
b02f8a0759 doc: add the quorum requirement file 2024-02-28 13:21:11 +01:00
Botond Dénes
60e04e2c59 test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables
Memtables are fickle, they can be flushed when there is memory pressure,
if there is too much commitlog or if there is too much data in them.
The tests in test_select_from_mutation_fragments.py currently assume
data written is in the memtable. This is tru most of the time but we
have seen some odd test failures that couldn't be understood.
To make the tests more robust, flush the data to the disk and read it
from the sstables. This means that some range scans need to filter to
read from just a single mutation source, but this does not influence
the tests.
2024-02-28 07:00:25 -05:00
Botond Dénes
c228e4d518 cql3: select_statement: mutation_fragments_select_statement: fix use-after-return
Don't capture stack variables by reference... it can (and will) explode
in your face.
2024-02-28 06:48:09 -05:00
Kefu Chai
9dbc30a385 build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep
create-relocatable-package.py packages debian packaging as well,
so we have to add it as a dependency for the targets which
uses `create-relocatable-package.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Kefu Chai
a1cd019e50 build: cmake: use build/$<CONFIG>/ instead of build
with multi-config generator, the generated artifacts are located
under ${CMAKE_BINARY_DIR}/$<CONFIG>/ instead of ${CMAKE_BINARY_DIR}.
so update the paths referencing the built executables. and update
the `--build-dir` option of `create-relocatable-package.py` accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Kefu Chai
bf9a895c09 build: cmake: always pass absolute path for add_stripped()
before this change, we assumed that the $<TARGET_FILE:${name}
is the path to the parameter passed to this function, but this was
wrong. it actually refers the `TARGET` argument of the keyword
of this function. also, the path to the generated files should
be located under path like "build/Debug" instead of "build" if
multi-config generator is used. as multi-config builds share
the same `${CMAKE_BINARY_DIR}`.

in this change, instead of acccepting a CMake target, we always
accept an absolute path. and use ""${CMAKE_BINARY_DIR}/$<CONFIG>"
for the directory of the executable, this should work for
multi-config generator which it is used by `configure.py`, when
CMake is used to build the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Raphael S. Carvalho
305c63c629 test: test_tablets: Add load-and-stream test
stresses concurrent migration and stream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 15:18:21 -03:00
Raphael S. Carvalho
771cbf9b79 sstables_loader: Stream to pending tablet replica if needed
Even though taking erm blocks migration, it cannot prevent the
load-and-stream to start while a migration is going on, erm
only prevents migration from advancing.

With tablets, new data will be streamed to pending replica too if
the write replica selector, in transition metadata, is set to both.
If migration is at a later stage where only new replica is written
to, then data is streamed only to new replica as selector is set
to next (== new replica set).

primary_replica_only flag is handled by only streaming to pending
if the primary replica is the one leaving through migration.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 15:17:05 -03:00
Avi Kivity
616eec2214 Merge ' test/topology_custom: test_read_repair.py: reduce run-time ' from Botond Dénes
This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,  reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).

Fixes: https://github.com/scylladb/scylladb/issues/15425
Fixes: https://github.com/scylladb/scylladb/issues/16899

Closes scylladb/scylladb#17529

* github.com:scylladb/scylladb:
  test/topology_custom: test_read_repair.py: reduce run-time
  replica/database: get_query_max_result_size(): use query_page_size_in_bytes
  replica/database: use include page-size in max-result-size
  query-request: max_result_size: add without_page_limit()
  db/config: introduce query_page_size_in_bytes
2024-02-27 18:54:38 +02:00
Aleksandra Martyniuk
9dcb5c76d6 test: rest_api: enable tablets by default
Enable tablets by default. Add --vnodes flag to test/rest_api/run
to run tests without tablets.
2024-02-27 17:46:30 +01:00
Aleksandra Martyniuk
92d87eb1f7 test: fix indentation and delete unused this_dc param 2024-02-27 17:37:31 +01:00
Aleksandra Martyniuk
9cca241ec6 test: rest_api: fix test_storage_service.py
Fix test_storage_service.py to work with tablets.

- test_describe_ring was failing because in storage_service/describe_ring
  table must be specified for keyspaces with tablets.
  Do not check the status if tablets are enabled. Add checks for
  specified table;
- test_storage_service_keyspace_cleanup_with_no_owned_ranges
  was failing because cleanup is disabled on keyspaces with tablets.
  Use test_keyspace_vnodes fixture to use keyspace with tablet disabled;
- test_storage_service_get_natural_endpoints required
  some minor type-related fixes.
2024-02-27 17:34:40 +01:00
Aleksandra Martyniuk
aee0257051 test: rest_api: fix test_repair_task.py
Injection set in test_repair_task_progress didn't consider the case
when repair::shard_repair_task_impl::ranges_size() == 1 which is
true for tablets.

Move the injection so that it is triggered before number of complete
ranges is increased.
2024-02-27 17:33:59 +01:00
Aleksandra Martyniuk
6210c210ff test: rest_api: fix test_compaction_task.py
Fix test_compaction_task.py to work with tablets.

Currently test fail because cleanup on keyspace with tablets is
disabled, and reshape and reshard of keyspace with tablets uses
load_and_stream which isn't covered by tasks.

Use test_keyspace_vnodes for these tests to have a keyspace with
tablets disabled.
2024-02-27 17:32:24 +01:00
Aleksandra Martyniuk
a996ed8be9 test: rest_api: use skip_without_tablets fixture
Use skip_without_tablets in tests that can be run only with tablets
enabled. Delete xfails for these tests.
2024-02-27 17:12:04 +01:00
Aleksandra Martyniuk
1fbe76814e test: rest_api: add some tablet related fixtures
Add fixtures for checking if tablets are enabled or skipping a test
if they are/aren't enabled.
2024-02-27 17:11:57 +01:00
Raphael S. Carvalho
ab498489fe sstables_loader: Implement tablet based load-and-stream
Similar treatment to repair is given to load-and-stream.

Jumps into a new streaming session for every tablet, so we guarantee
data will be segregated into tablets co-habiting the same shard.

Fixes #17315.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 13:04:20 -03:00
Nadav Har'El
fc861742d7 cql: avoid undefined behavior in totimestamp() of extreme dates
This patch fixes a UBSAN-reported integer overflow during one of our
existing tests,

   test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp

when attempting to convert an extreme "date" value, millions of years
in the past, into a "timestamp" value. When UBSAN crashing is enabled,
this test crashes before this patch, and succeeds after this patch.

The "date" CQL type is 32-bit count of *days* from the epoch, which can
span 2^31 days (5 million years) before or after the epoch. Meanwhile,
the "timestamp" type measures the number of milliseconds from the same
epoch, in 64 bits. Luckily (or intentionally), every "date", however
extreme, can be converted into a "timestamp": This is because 2^31 days
is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds
(9.2e18).

But it turns out that our conversion function, date_to_time_point(),
used some boost::gregorian library code, which carried out these
calculations in **microsecond** resolution. The extra conversion to
microseconds wasn't just wasteful, it also caused an integer overflow
in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT
fit in a 64-bit integer. UBSAN notices this overflow, and complains
(plus, the conversion is incorrect).

The fix is to do the trivial conversion on our own (a day is, by
convention, exactly 86400 seconds - no fancy library is needed),
without the grace of Boost. The result is simpler, faster, correct
for the Pliocene-age dates, and fixes the UBSAN crash in the test.

Fixes #17516

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17527
2024-02-27 17:04:18 +02:00
Raphael S. Carvalho
b9158e36ef sstables_loader: Virtualize sstable_streamer for tablet
virtualization allows for tablet version of streaming.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:30:14 -03:00
Raphael S. Carvalho
3523cc8063 sstables_loader: Avoid reallocations in vector
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
d1db17d490 sstable_loader: Decouple sstable streaming from selection
That will make it easy to introduce tablet-based load-and-stream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
0a41f2a11f sstables_loader: Introduce sstable_streamer
Will make it easier to implement tablet oriented variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
21533aff0f Fix online SSTable loading with concurrent tablet migration
load-and-stream is currently the only method -- for tablets -- that
can load SSTables while the node is online.
Today, sstable_directory relies on replication map (erm) not being
invalidated during loading, and the assumption is broken with
concurrent tablet migration.
It causes load-and-stream to segfault.

The sstable loader needs the sharder from erm in order to compute
the owning shard.

To fix, let's use auto_refreshing_sharder, which refreshes sharder
every time table has replication map updated. So we guarantee any
user of sharder will find it alive throughout the lifetime of
sstable_directory.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:27:07 -03:00
Gleb Natapov
0c376043eb migration_manager: take group0 lock during raft snapshot taking
Group0 state machine access atomicity is guaranteed by a mutex in group0
client. A code that reads or writes the state needs to hold the log. To
transfer schema part of the snapshot we used existing "migration
request" verb which did not follow the rule. Fix the code to take group0
lock before accessing schema in case the verb is called as part of
group0 snapshot transfer.

Fixes scylladb/scylladb#16821
2024-02-27 11:15:17 +01:00
Botond Dénes
5dc145a93f test/topology_custom: test_read_repair.py: reduce run-time
This test needed a lot of data to ensure multiple pages when doing the
read repair. This change two key configuration items, allowing
for a drastic reduction of the data size and consequently a large
reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,
  reducing this to a too small value would start killing internal
  queries. Now, after said commit, this is no longer a concern, as this
  limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5
seconds (dev mode, on my build machine).
2024-02-27 02:27:55 -05:00
Botond Dénes
7f3ca3a3d8 replica/database: get_query_max_result_size(): use query_page_size_in_bytes
As the page size for user queries, instead of the hard-coded constant
used before. For system queries, we keep using the previous constant.
2024-02-27 02:27:55 -05:00
Botond Dénes
8213e66815 replica/database: use include page-size in max-result-size
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
  which now has a much simpler common return path and has to drop the
  page size on one rare return path.

This is a purely mechanical change, no behaviour is changed.
2024-02-27 02:27:55 -05:00
Botond Dénes
97615e0d9a query-request: max_result_size: add without_page_limit()
Returns an instance with the page_limit reset to 0. This converts a
max_results_size which is usable only with the
"page_size_and_safety_limit" feature, to one which can be used before
this feature.
To be used in the next patch.
2024-02-27 02:14:46 -05:00
Botond Dénes
5e37c1465f db/config: introduce query_page_size_in_bytes
Regulates the page size in bytes via config, instead of the currently
used hard-coded constant. Allows tests to configure lower limits so they
can work with smaller data-sets when testing paging related
functionality.
Not wired yet.
2024-02-27 02:14:45 -05:00
Kefu Chai
0fd85a98a9 mutation: add fmt::formatter for position_range
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `position_range`, and the
helpers for printing related types are dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 20:15:57 +08:00
Kefu Chai
2f532b9ebc mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* mutation_fragment
* range_tombstone_stream

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 20:15:57 +08:00
Beni Peled
c06282b312 docs: always build from the default branch
In order to publish the docs-pages from release branches (see the other
commit), we need to make sure that docs is always built from the default
branch which contains the updated conf.py

Ref https://github.com/scylladb/scylladb/pull/17281
2024-02-26 11:48:38 +02:00
Beni Peled
f59f70fc58 docs: trigger the docs-pages workflow on release branches
Currently, the github docs-pages workflow is triggered only when changes
are merged to the master/enterprise branches, which means that in the
case of changes to a release branch, for example, a fix to branch-5.4,
or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and
therefore the documentation is not updated with the new change,

In this change, I added the `branch-**` pattern, so changes to release
branches will trigger the workflow.
2024-02-26 11:48:13 +02:00
Kefu Chai
1fe7a467e7 mutation: add fmt::formatter for mutation_fragment_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_fragment_v2::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 17:47:05 +08:00
Kefu Chai
3d6948c13e tools/scylla-nodetool: implement info
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Kefu Chai
4d8f74f301 test/nodetool: move format_size into utils.py
so that this helper can be shared across more tests. `test_info.py`
will be using it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Kefu Chai
cd228f4d6c docs: remove leading space in table element
otherwise sphinx would consider "Within which Data Center the"
as the "term" part of an entry in a definition list, and
"node is located" as the definition part of this entry.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 13:03:26 +08:00
Kefu Chai
d12655ff46 docs: remove space in words
* remove space in "Exceptions", otherwise it renders like "Except"
  "tions", which does not look right.
* remove space in "applicable".
* remove space in "Transport".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 13:03:26 +08:00
Kamil Braun
fd32e2ee10 Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski
The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0.

The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0.

There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A:

1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted,
2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state,
3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state,
4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`,
5. Shard 1 resumes and stores version `v` in the global counter,
6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`.
7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4.

Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by:

1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual,
2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value,
3. running a test which performs schema changes in a loop.

The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway.

Refs: #17493

Closes scylladb/scylladb#17499

* github.com:scylladb/scylladb:
  version_generator: check that get_next_version is called on shard 0
  misc_services: fix data race from bad usage of get_next_version
2024-02-25 19:35:34 +01:00
Gleb Natapov
59df47920b topology coordinator: fix use after free in rollback_to_normal state
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes scylladb/scylladb#17402
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6993/

Message-ID: <ZdtJNJM056r4EZzz@scylladb.com>
2024-02-25 16:34:19 +02:00
Raphael S. Carvalho
f07c233ad5 Fix potential data resurrection when another compaction type does cleanup work
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17502
2024-02-25 13:08:04 +02:00
Yaron Kaikov
493327afd8 [action] Add promoted label when commits are in master
In Scylla, we don't merge our PR but use ./script/pull_github_pr.shto close the pull request, addingcloses scylladb/scylladb remark and push changes tonext` branch.
One of the conditions for opening a backport PR is that all relevant commits are in master (passed gating), in this GitHub action, we will go through the list of commits once a push was made to master and will identify the relevant PR, and add promoted label to it. This will allow Mergify to start the process of backporting
2024-02-25 11:56:50 +02:00
Nadav Har'El
b4cef638ef Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* canonical_mutation
* atomic_cell_view
* atomic_cell
* atomic_cell_or_collection::printer

Refs #13245

Closes scylladb/scylladb#17506

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for canonical_mutation
  mutation: add fmt::formatter for atomic_cell_view and atomic_cell
  mutation: add fmt::formatter for atomic_cell_or_collection::printer
2024-02-25 09:48:56 +02:00
Kefu Chai
84ba624415 mutation: add fmt::formatter for canonical_mutation
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for canonical_mutation

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:48:13 +08:00
Kefu Chai
3625796222 mutation: add fmt::formatter for atomic_cell_view and atomic_cell
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* atomic_cell_view
* atomic_cell

and drop their operator<<:s.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:19:11 +08:00
Kefu Chai
b4fa32ec17 mutation: add fmt::formatter for atomic_cell_or_collection::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`atomic_cell_or_collection::printer`, and drop its operator<<.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:18:41 +08:00
Lakshmi Narayanan Sreethar
c7eab9329f test/topology_custom: add testcase to verify reshape with tablets
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
ed2d8529f3 test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
7196d2fff4 replica/distributed_loader: enable reshape for sstables
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
83fecc2f1f compaction: reshape sstables within compaction groups
For tables using tablet based replication strategies, the sstables
should be reshaped only within the compaction groups they belong to.
Updated shard_reshaping_compaction_task_impl to group the sstables based
on their compaction groups before reshaping them within the groups.

Fixes #16966

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Piotr Dulikowski
54546e1530 version_generator: check that get_next_version is called on shard 0
The get_next_version function can only be safely called from shard 0,
but this constraint is not enforced in any way. As evidenced in the
previous commit, it is easy to accidentally call it from a non-zero
shard.

Introduce a runtime check to get_next_version which calls
on_fatal_internal_error if it detects that the function was called form
the wrong shard. This will let us detect cross-shard use issues in
runtime.
2024-02-23 13:49:49 +01:00
Piotr Dulikowski
21d5d4e15c misc_services: fix data race from bad usage of get_next_version
The function `gms::version_generator::get_next_version()` can only be
called from shard 0 as it uses a global, unsynchronized counter to
issue versions. Notably, the function is used as a default argument for
the constructor of `gms::versioned_value` which is used from shorthand
constructors such as `versioned_value::cache_hitrates`,
`versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which
updates the `CACHE_HITRATES` application state in the local gossiper
state. Each time the job is scheduled, it runs on the next shard (it
goes through shards in a round-robin fashion). The job uses the
`versioned_value::cache_hitrates` shorthand to create a
`versioned_value`, therefore risking a data race if it is not currently
executing on shard 0.

Fix the race by constructing the versioned value on shard 0.
2024-02-23 12:54:32 +01:00
Kefu Chai
496cf9a1d8 interval: add fmt::formatters for managed_bytes and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* wrapping_interval
* interval

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17488
2024-02-23 10:26:30 +02:00
Nadav Har'El
0aaa6b1a08 fmt: add formatter for mutation_fragment_v2::kind
Unfortunately, fmt v10 dropped support for operator<< formatters,
forcing us to replace the huge number of operator<< implementations
in our code by uglier and templated fmt::formatter implementations
to get Scylla to compile on modern distros (such as Fedora 39) :-(

Kefu has already started doing this migration, here is my small
contribution - the formatter for mutation_fragment_v2::kind.
This patch is need to compile, for example,
build/dev/mutation/mutation_fragment_stream_validator.o.

I can't remove the old operator<< because it's still used by
the implementation of other operator<< functions. We can remove
all of them when we're done with this coversion. In the meantime,
I replaced the original implementation of operator<< by a trivial
implementation just passing the work to the new fmt::print support.

Refs #13245

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17432
2024-02-23 10:25:39 +02:00
Botond Dénes
c1267900c6 Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* bound_kind_m
* sstable_state
* indexable_element
* deletion_time

drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17490

* github.com:scylladb/scylladb:
  sstables: add fmt::formatter for deletion_time
  sstable: add fmt::formatter for indexable_element
  sstables: add fmt::foramtter for sstable_state
  sstables: add fmt::formatter for sstables::bound_kind_m
2024-02-23 10:09:26 +02:00
Botond Dénes
89efa89dd7 Merge 'test: add fmt::formatters' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing.

Refs #13245

Closes scylladb/scylladb#17485

* github.com:scylladb/scylladb:
  test/unit: add fmt::formatter for tree_test_key_base
  test: add printer for type for BOOST_REQUIRE_EQUAL
  test: add fmt::formatters
  test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
2024-02-23 09:32:39 +02:00
Botond Dénes
1f363a876e Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt
* occupancy_stats

and drop their operator<<:s

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17462

* github.com:scylladb/scylladb:
  utils/managed_bytes: add fmt::formatters for managed_bytes and friends
  utils/logalloc: add fmt::formatter for occupancy_stats
2024-02-23 09:31:22 +02:00
Botond Dénes
d314ad2725 Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.

Fixes: #17232.

Closes scylladb/scylladb#17355

* github.com:scylladb/scylladb:
  test: add test to check if reader is closed
  sstables: close index_reader in has_partition_key
2024-02-23 09:27:55 +02:00
Kefu Chai
010fb5f323 tools/scylla-nodetool: make keyspace argument optional for "ring"
the "keyspace" argument of the "ring" command is optional. but before
this change, we considered it a mandatory option. it was wrong.

so, in this change, we make it optional, and print out the warning
message if the keyspace is not specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17472
2024-02-23 09:25:29 +02:00
Kefu Chai
6800810dba interval, multishard_mutation_query: fix typos in comments
these misspellings were identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17491
2024-02-23 09:06:24 +02:00
Botond Dénes
a08d9ba2a4 Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai
* tighten the param check for toppartitions
* add an extra empty line inbetween reports

Closes scylladb/scylladb#17486

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add an extra empty line inbetween reports
  tools/scylla-nodetool: tighten the param check for toppartitions
2024-02-23 09:05:30 +02:00
Botond Dénes
959d33ba39 Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk
RPC calls lose information about the type of returned exception.
Thus, if a table is dropped on receiver node, but it still exists
on a sender node and sender node streams the table's data, then
the whole operation fails.

To prevent that, add a method which synchronizes schema and then
checks, if the exception was caused by table drop. If so,
the exception is swallowed.

Use the method in streaming and repair to continue them when
the table is dropped in the meantime.

Fixes: #17028.
Fixes: #15370.
Fixes: #15598.

Closes scylladb/scylladb#17231

* github.com:scylladb/scylladb:
  repair: handle no_such_column_family from remote node gracefully
  test: test drop table on receiver side during streaming
  streaming: fix indentation
  streaming: handle no_such_column_family from remote node gracefully
  repair: add methods to skip dropped table
2024-02-23 08:25:45 +02:00
Kefu Chai
3574c22d73 test/nodetool/utils: print out unmatched output on test failure
would be more helpful if the matched could print out the unmatched
output on test failure. so, in this change, both stdout and stderr
are printed if they fail to match with the expected error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17489
2024-02-23 08:20:30 +02:00
Botond Dénes
234aa99aaa Merge 'tools/scylla-nodetool: extract and use {yaml,json}_writers' from Kefu Chai
simpler this way.

Closes scylladb/scylladb#17437

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
  tools/scylla-nodetool: add {json,yaml}_writer
2024-02-23 08:13:07 +02:00
Kefu Chai
3a3f0d392f gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt
less repeatings this way. this is also a follow-up change of
cb781c0ff7.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17390
2024-02-23 08:11:03 +02:00
Kefu Chai
62abf89312 sstables: add fmt::formatter for deletion_time
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::deletion_time`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:32 +08:00
Kefu Chai
a5a757387a sstable: add fmt::formatter for indexable_element
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::indexable_element`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:28 +08:00
Kefu Chai
5754b9eb08 sstables: add fmt::foramtter for sstable_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::sstable_state`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:49 +08:00
Kefu Chai
9a32029a8f sstables: add fmt::formatter for sstables::bound_kind_m
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::bound_kind_m`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:22 +08:00
Kefu Chai
67c69be3c6 tools/scylla-nodetool: add an extra empty line inbetween reports
before this change, `toppartitions` does not print an empty line
after an empty sampling warning message. but
dtest/toppartitions_test.py actually split sampling reports with
two newlines, so let's appease it. the output also looks better
this way, as the samplings for READS and WRITES are always visually
separated with an empty line.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 12:57:51 +08:00
Kefu Chai
381c389b56 tools/scylla-nodetool: tighten the param check for toppartitions
the test cases of `test_any_of_required_parameters_is_missing`
considers that we should either pass all positional argument or
pass none of them, otherwise nodetool should fail. but `scylla nodetool`
supported partial positional argument.

to be more consistent with the expected behavior, in this change,
we enforce the sanity check so that we only accept either all
positional args or none of them. the corresponding test is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 12:57:51 +08:00
Kefu Chai
3835ebfcdc utils/managed_bytes: add fmt::formatters for managed_bytes and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Kefu Chai
3d9054991b utils/logalloc: add fmt::formatter for occupancy_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `occupancy_stats`, and
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Avi Kivity
bf107dae84 test/unit: add fmt::formatter for tree_test_key_base
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for the classes derived from `tree_test_key_base`

(this change was extracted from a larger change at #15599)

Refs #13245
2024-02-23 10:52:12 +08:00
Kefu Chai
a70318e722 test: add printer for type for BOOST_REQUIRE_EQUAL
after dropping the operator<< for vector, we would not able to
use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this,
less defined the printer for Boost.test

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
63396f780d test: add fmt::formatters
the operator<< for `cql3::expr::test_utils::mutation_column_value` is
preserved, as it used by test/lib/expr_test_utils.cc, which prints
std::map<sstring, cql3::expr::test_utils::mutation_column_value> using
the homebrew generic formatter for std::map<>. and the formatter uses
operator<< for printing the elements in map.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
2ccd9e695d test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* scheduling_latency_measurer
* perf_result

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:17:50 +08:00
Lakshmi Narayanan Sreethar
c76871aa65 replica/table : add method to get compaction group id for an sstable
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar
9fffd8905f compaction: reshape: update total reshaped size only on success
The total reshaped size should only be updated on reshape success and
not after reshape has been failed due to some exception.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar
4fb099659a compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
Catch and handle the exceptions directly instead of rethrowing and
catching again.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Pavel Emelyanov
5682e51a97 test.py: Add test-case splitting in 'name' selection
When filtering a test by 'name' consider that name can be in a
'test::case' format. If so, get the left part to be the filter and the
right part to be the case name to be passed down to test itself.

Later, when the pytest starts it then appends the case name (if not
None) to the pytest execution, thus making it run only the specified
test-case, not the whole test file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 19:24:10 +03:00
Pavel Emelyanov
b64710b0c6 test.py: Add casename argument to PythonTest
And propagate it from add_test() helper. For now keep it None, next
patch will bring more sense to this place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 19:23:06 +03:00
Amnon Heiman
8859b4d991 Adding scripts/metrics-config.yml
The scripts/metrics-config.yml is a configuration file used by
get_description.py. It covers the places in the code that uses
non-standard way of defining metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-02-22 17:15:30 +02:00
Amnon Heiman
4e67a98a21 Adding scripts/get_description.py to fetch metrics description
The get_description script parse a c++ file and search of metrics
decleration and their description.

It create a pipe delimited file with the metric name, metric family
name,description and location in file.

To find all description in all files:
find . -name "*.cc" -exec grep -l '::description' {} \; | xargs -i ./get_description.py {}

While many of the metrics define in the form of
_metrics.add_group("hints_manager", {
        sm::make_gauge("size_of_hints_in_progress", _stats.size_of_hints_in_progress,
                        sm::description("Size of hinted mutations that are scheduled to be written.")),

Some metrics decleration uses variable and string format.
The script uses a configuration file to translate parameters and
concatenations to the actual names.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-02-22 17:06:26 +02:00
Anna Stuchlik
14a4fa16a8 doc: add placeholder for Enable Raft topology page
This commit adds a placeholder for the Enable Raft-based Topology page
in the 5.4-to-6.0 upgrade guide.
This page needs to be referenced from other pages in the docs.
2024-02-22 16:02:06 +01:00
Pavel Emelyanov
5afaa03241 test/object_store: Remove unused managed_cluster (and other stuff)
Now all test cases use pylib manager client to manipulate cluster
While at it -- drop more unused bits from suite .py files

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:40:25 +03:00
Kefu Chai
57c408ab5d alternator: add fmt::formatter for alternator::parsed::path
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::parsed::path`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17458
2024-02-22 16:40:01 +02:00
Pavel Emelyanov
95ed46e26a test/object_store: Use tmpdir fixture in flush-retry case
Now when the test case in question is not using ManagerCluster, there's
no point in using test_tempdir either and the temporary object-store
config can be generated in generic temporary directory

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:39:30 +03:00
Pavel Emelyanov
252688fe0c test/object_store: Turn flush-retry case to use ManagerClient
In the middle this test case needs to force scylla server reload its
configs. Currently manager API requires that some existing config option
is provided as an argument, but in this test case scylla.yaml remains
intact. So it satisfies the API with non-chaning option.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
e742906f1f test/object_store: Turn "misconfigured" case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
857b48f950 test/object_store: Turn garbage-collect case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
d27b91cfb4 test/object_store: Turn basic case to use ManagerClient
This case is a bit tricky, as it needs to know where scylla's workdir
is, so it replaces the use of test_tempdir with the call to manager to
get server's workdir.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Avi Kivity
67f8dc5a7c Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17461

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for clustering_row and friends
  mutation: add fmt::formatter for row_tombstone and friends
2024-02-22 16:16:26 +02:00
Pavel Emelyanov
89d0704d9b test/object_store: Prepare to work with ManagerClient
This includes

- marking the suite as Topology
- import needed fixtures and options from topology conftest
- configuring the zero initial cluster size and anonymous auth
- marking all test cases as skipped, as they no longer work after above

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:02:05 +03:00
Aleksandra Martyniuk
4530be9e5b test: add test to check if reader is closed
Add test to check if reader is closed in sstable::has_partition_key.
2024-02-22 14:53:14 +01:00
Aleksandra Martyniuk
5227336a32 sstables: close index_reader in has_partition_key
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.
2024-02-22 14:53:07 +01:00
Yaron Kaikov
6d07f7a0ea Add mergify (https://mergify.com/) configuration file
In this PR we introduce the .mergify.yml configuration file, which
include a set of rules that we will use for automating our backport
process.
For each supported OSS release (currently 5.2 and 5.4) we have an almost
identical configuration section which includes the four conditions before
we open a backport pr:

* PR should be closed
* PR should have the proper label. for example: backport/5.4 (we can
have multiple labels)
* Base branch should be master
* PR should be set with a promoted label - this condition will be set
automatically once the commits are promoted to the master branch (passed
gating)

Once all conditions are applied, the verify bot will open a backport PR and
will assign it to the author of the original PR, then CI will start
running, and only after it pass. we merge
2024-02-22 14:28:08 +02:00
Nadav Har'El
b0233c0833 Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity
Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token.

We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.

Closes scylladb/scylladb#17455

* github.com:scylladb/scylladb:
  interval: rename nonwrapping_interval to interval
  interval: rename interval_test to wrapping_interval_test
2024-02-22 14:03:43 +02:00
Kefu Chai
8afdc503b8 cdc: s/string_view/std::string_view/
in af2553e8, we added formatters for cdc::image_mode and
cdc::delta_mode. but in that change, we failed to qualify `string_view`
with `std::` prefix. even it compiles, it depends on a `using
std::string_view` or a more error-prone `using namespace std`.
neither of which shold be relied on. so, in this change, we
add the `std::` prefix to `string_view`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17459
2024-02-22 13:49:19 +02:00
Avi Kivity
35b700a884 Merge 'compaction: add fmt::formatter for types' from Kefu Chai
* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`
* `formatted_sstables_list`

Refs #13245

Closes scylladb/scylladb#17439

* github.com:scylladb/scylladb:
  compaction: add formatter for formatted_sstables_list
  compaction: add fmt::formatter for compaction_type and friends
2024-02-22 13:48:30 +02:00
Pavel Emelyanov
027282ee07 perf_simple_query: Add --memtable-partitions option
There's the --partitions one that specifies how many partitions the test
would generate before measuring. When --bypass-cache option is in use,
thus making the test alway engage sstables readers, it makes sense to
add some control over sstables granularity. The new option suggests that
during population phase, memtable gets flushed every $this-number
partitions, not just once at the end (and unknown amount of times in the
middle because of dirty memory limit).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:44:17 +03:00
Pavel Emelyanov
fd4c2e607e perf_simple_query: Disable auto compaction
Usually a perf test doesn't expect that some activity runs in the
background without controls. Compaction is one of a kind, so it makes
sense to keep it off while running the measurement.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:43:23 +03:00
Pavel Emelyanov
74899f71de perf_simple_query: Keep number of initial tablets in output json
When producing the output json file, keep how many initial tablets were
requested (if at all) next to other workload parameters

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:42:39 +03:00
Kefu Chai
643c01fd80 locator: fix typo in comment -- s/slecting/selecting/
fix a typo

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17470
2024-02-22 13:28:18 +02:00
Avi Kivity
89f86962f5 Merge 'streaming: add fmt::formatter for stream_session_state and stream_request' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `streaming::stream_request`,
* `stream_session_state`

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17464

* github.com:scylladb/scylladb:
  streaming: add fmt::formatter for streaming::stream_request
  streaming: add fmt::formatter for stream_session_state
2024-02-22 13:04:02 +02:00
Kefu Chai
5c0952ab59 compaction: add fmt::formatter for compaction_type and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode``

and drop their operator<<:s.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17441
2024-02-22 13:02:37 +02:00
Kamil Braun
3d15fecf12 Merge 'amend cluster_status_table virtual table to work with raft' from Gleb
cluster_status_table virtual table have a status field for each node. In
gossiper mode the status is taken from the gossiper, but with raft the
states are different and are stored in the topology state machine. The
series fixes the code to check current mode and take the status from
correct place.

Refs scylladb/scylladb#16984

* 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev:
  gossiper: remove unused REMOVAL_COORDINATOR state
  virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled
  virtual_tables: create result for  cluster_status_table read on shard 0
2024-02-22 11:47:57 +01:00
Kamil Braun
3ee56e1936 Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.

In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. This PR adjusts the Raft-based topology
to ensure all required generations are loaded into memory and their
data isn't cleared too early.

To load all required generations into memory, we replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
To ensure this set doesn't grow endlessly, we remove an entry from
this set together with the data in CDC_GENERATIONS_V3.

Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.
The new solution introduced in this PR is to clear data of the
generations that finished operating more than 24 hours ago.

Apart from the changes mentioned above, this PR hardens
`test_cdc_generation_clearing.py`.

Fixes scylladb/scylladb#16916
Fixes scylladb/scylladb#17184
Fixes scylladb/scylladb#17288

Closes scylladb/scylladb#17374

* github.com:scylladb/scylladb:
  test: harden test_cdc_generation_clearing
  test: test clean-up of committed_cdc_generations
  raft topology: clean committed_cdc_generations
  raft topology: clean only obsolete CDC generations' data
  storage_service: topology_state_load: load all committed CDC generations
  system_keyspace: load_topology_state: fix indentation
  raft topology: store committed CDC generations' IDs in the topology
2024-02-22 11:41:25 +01:00
Gleb Natapov
fe5853aacc storage_service: disable removenode --force in raft mode and deprecate it for gossiper mode
removenode --force is an unsafe operation and does not even make sense with
topology over raft. This patch disables it if raft is enabled and prints
a deprecation note otherwise. We already have a PR to remove it
(https://github.com/scylladb/scylladb/pull/15834), but it was decided
there that a deprecation period is needed for legacy use case.

Fixes: scylladb/scylladb#16293
2024-02-22 11:08:57 +01:00
Kefu Chai
37c6073fd5 mutation: add fmt::formatter for clustering_row and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 17:53:34 +08:00
Kefu Chai
9ee728dab9 scylla-gdb: use raw string when '\' is not used in an escape sequence
when '\' does not start an escape sequence, Python complains at seeing
it. but it continues anyway by considering '\' as a separate char.
but the warning message is still annoying:

```
scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-'
  branches = (r" |-- ", " \-- ")
```

when sourcing this script.

so, let's mark these strings as raw strings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17466
2024-02-22 09:03:26 +02:00
Kefu Chai
4ee2aee279 tools/scylla-nodetool: define operator<< for vector<sstring>
we already have generic operator<< based formatter for sequence-alike
ranges defined in `utils/to_string.hh`, but as a part of efforts to
address #13245, we will eventually drop the formatter.

to prepare for this change, we should create/find the alternatives
where the operator<< for printing the ranges is still used.
Boost::program_options is one of them. it prints the options' default
values using operator<< in its error message or usage. so in order
to keep it working, we define operator<< for `vector<sstring>` here.
if there are more types are required, we will need the generalize
this formatter. if there are more needs from different compiling
units, we might need to extract this helper into, for instance,
`utils/to_string.hh`. but we should do this after removing it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17413
2024-02-22 09:01:04 +02:00
Kefu Chai
da7ffd4e73 tools/scylla-types: print using managed_bytes
instead of materializing the `managed_bytes_view` to a string, and
print it, print it directly to stdout. this change helps to deprecate
`to_hex()` helpers, we should materialize string only when necessary.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17463
2024-02-22 09:00:38 +02:00
Kefu Chai
f644ba9cdc streaming: add fmt::formatter for streaming::stream_request
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `streaming::stream_request`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 14:03:59 +08:00
Kefu Chai
618091f6f7 streaming: add fmt::formatter for stream_session_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`streaming::stream_session_state`, and drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 14:03:59 +08:00
Kefu Chai
b61b5a8b5d mutation: add fmt::formatter for row_tombstone and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 12:44:33 +08:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00
Avi Kivity
e338f0e009 interval: rename interval_test to wrapping_interval_test
As preparation for reclaiming the name `interval` for nonwrapping_interval,
rename interval_test to wrapping_interval_test.
2024-02-21 19:38:53 +02:00
Avi Kivity
1df5697bd7 Merge 'Refine some api/column_family endpoints' from Pavel Emelyanov
Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code

Closes scylladb/scylladb#17433

* github.com:scylladb/scylladb:
  api: Reserve vectors in advance
  api: Use range-loop to iterate keyspaces
2024-02-21 19:19:28 +02:00
Tomasz Grabiec
ef9e5e64a3 locator: token_metadata: Introduce topology barrier stall detector
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.

Example log:

  token_metadata - topology version 30 held for 299.159 [s] past expiry, released at:  0x2397ae1 0x23a36b6 ...

Closes scylladb/scylladb#17427
2024-02-21 15:05:34 +02:00
Nadav Har'El
e02cfd0035 Merge 'query*.h: add fmt::formatter for types' from Kefu Chai
* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer
* query::result_set
* query::result_set_row
* query::result::printer

Refs #13245

Closes scylladb/scylladb#17440

* github.com:scylladb/scylladb:
  query-result.hh: add formatter for query::result::printer
  query-result-set: add formatter for query-result-set.hh types
  query-request: add formatter for query-request.hh types
2024-02-21 14:46:36 +02:00
Avi Kivity
4be70bfc2b Merge 'multishard_mutation_query: add tablets support' from Botond Dénes
When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.

Fixes: #16484

Closes scylladb/scylladb#16802

* github.com:scylladb/scylladb:
  test/cql-pytest: remove skip_with_tablets fixture
  test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
  test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
  multishard_mutation_query: add tablets support
  multishard_mutation_query: remove compaction-state from result-builder factory
  multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
  mutation_query: reconcilable_result: add merge_disjoint()
  locator: introduce tablet_range_spliter
  dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
  interval: add before() overload which takes another interval
2024-02-21 13:40:55 +02:00
Botond Dénes
94dac43b2f tools/utils: configure tools to use the epoll reactor backend
The default AIO backend requires AIO blocks. On production systems, all
available AIO blocks could have been already taken by ScyllaDB. Even
though the tools only require a single unit, we have seen cases where
not even that is available, ScyllDB having siphoned all of the available
blocks.
We could try to ensure all deployments have some spare blocks, but it is
just less friction to not have to deal with this problem at all, by just
using the epoll backend. We don't care about performance in the case of
the tools anyway, so long as they are not unreasonably slow. And since
these tools are replacing legacy tools written in Java, the bar is low.

Closes scylladb/scylladb#17438
2024-02-21 11:58:09 +02:00
Kefu Chai
1263494dd1 query-result.hh: add formatter for query::result::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::result::printer

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:57:18 +08:00
Kefu Chai
e5a930e7c6 query-result-set: add formatter for query-result-set.hh types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::result_set
* query::result_set_row

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:54:48 +08:00
Kefu Chai
4383ca431c query-request: add formatter for query-request.hh types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:54:41 +08:00
Kefu Chai
6408834e33 compaction: add formatter for formatted_sstables_list
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `formatted_sstables_list`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:45 +08:00
Kefu Chai
9969d88d82 compaction: add fmt::formatter for compaction_type and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:40 +08:00
Kefu Chai
61308d51ef tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 16:49:30 +08:00
Kefu Chai
e9e558534a tools/scylla-nodetool: add {json,yaml}_writer
so that we have less repeatings for dumping the metrics. the repeatings
are error-prone and not maintainable. also move them out into a separate
header, to keep fit of this source file -- it's now 3000 LOC. also,
by moving them out, we can reuse them in other subcommands without
moving them to the top of this source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 16:49:30 +08:00
Botond Dénes
ca585903b7 test/cql-pytest: remove skip_with_tablets fixture
All tests that used it are fixed, and we should not add any new tests
failing with tablets from now on, so remove.
2024-02-21 02:08:49 -05:00
Botond Dénes
8df82d4781 test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.

Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
2024-02-21 02:08:49 -05:00
Botond Dénes
b09b949159 test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
The underlying functionality was fixed, the tests should now pass with
tablets.
2024-02-21 02:08:49 -05:00
Botond Dénes
ce472b33b8 multishard_mutation_query: add tablets support
When reading a list of ranges with tablets, we don't need a multishard
reader. Instead, we intersect the range list with the local nodes tablet
ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with
database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets,
multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the
coordinator. The only code using multishard_mutation_query.cc are
forced, replica-local scans, like those used by SELECT * FROM
MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests,
so we optimize for simplicity, not performance.
2024-02-21 02:08:48 -05:00
Botond Dénes
d160a179ee multishard_mutation_query: remove compaction-state from result-builder factory
This param was used by the query-result builder, to set the
last-position on end-of-stream. Instead, do this via a new ResultBuilder
method, maybe_set_last_position(), which is called from read_page(),
which has access to the compaction-state.
With this, the ResultBuilder can be created without a compaction-state
at hand. This will be important in the next patch.
2024-02-21 02:08:48 -05:00
Botond Dénes
95bc0cb1c0 multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
Makes future patching easier.
2024-02-21 02:08:48 -05:00
Botond Dénes
35e6cbf42e mutation_query: reconcilable_result: add merge_disjoint()
Merging two disjoint reconcilable_result instances.
2024-02-21 02:08:48 -05:00
Botond Dénes
7bdd0c2cae locator: introduce tablet_range_spliter
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
2024-02-21 02:08:48 -05:00
Botond Dénes
4993d0e30a dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
Consider the inclusiveness of the token-range's start and end bounds and
copy the flag to the output bounds, instead of assuming they are always
inclusive.
2024-02-21 02:08:48 -05:00
Botond Dénes
239484f259 interval: add before() overload which takes another interval
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
2024-02-21 02:08:48 -05:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Wojciech Mitros
4c767c379c mv: adjust the overhead estimation for view updates
In order to avoid running out of memory, we can't
underestimate the memory used when processing a view
update. Particularly, we need to handle the remote
view updates well, because we may create many of them
at the same time in contrast to local updates which
are processed synchronously.

After investigating a coredump generated in a crash
caused by running out of memory due to these remote
view updates, we found that the current estimation
is much lower than what we observed in practice; we
identified overhead of up to 2288 bytes for each
remote view update. The overhead consists of:
- 512 bytes - a write_response_handler
- less than 512 bytes - excessive memory allocation
for the mutation in bytes_ostream
- 448 bytes - the apply_to_remote_endpoints coroutine
started in mutate_MV()
- 192 bytes - a continuation to the coroutine above
- 320 bytes - the coroutine in result_parallel_for_each
started in mutate_begin()
- 112 bytes - a continuation to the coroutine above
- 192 bytes - 5 unspecified allocations of 32, 32, 32,
48 and 48 bytes

This patch changes the previous overhead estimate
of 256 bytes to 2288 bytes, which should take into
account all allocations in the current version of the
code. It's worth noting that changes in the related
pieces of code may result in a different overhead.

The allocations seem to be mostly captures for the
background tasks. Coroutines seem to allocate extra,
however testing shows that replacing a coroutine with
continuations may result in generating a few smaller
futures/continuations with a larger total size.
Besides that, considering that we're waiting for
a response for each remote view update, we need the
relatively large write_response_handler, which also
includes the mutation in case we needed to reuse it.

The change should not majorly affect workloads with many
local updates because we don't keep many of them at
the same time anyway, and an added benefit of correct
memory utilization estimation is avoiding evictions
of other memory that would be otherwise necessary
to handle the excessive memory used by view updates.

Fixes #17364

Closes scylladb/scylladb#17420
2024-02-21 00:05:49 +02:00
Tomasz Grabiec
e63d8ae272 Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.

This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.

To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.

refs: #16527

Closes scylladb/scylladb#17360

* github.com:scylladb/scylladb:
  test/topology: Add checking error paths for failed migration
  topology.tablets_migration: Handle failed streaming
  topology.tablets_migration: Add cleanup_target transition stage
  topology.tablets_migration: Add revert_migration transition stage
  storage_service: Rewrap cleanup stage checking in cleanup_tablet()
  test/topology: Move helpers to get tablet replicas to pylib
2024-02-20 18:50:55 +01:00
Anna Stuchlik
37237407f6 doc: remove info about outdated versions
This PR removes information about outdated versions, including disclaimers and information when a given feature was added.
Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read).

Fixes https://github.com/scylladb/scylladb/issues/12110

Closes scylladb/scylladb#17430
2024-02-20 19:32:13 +02:00
Pavel Emelyanov
ceac65be1e api: Reserve vectors in advance
Some endpoints in api/column_family fill vectors with data obtained from
database and return them back. Since the amount of data is known in
advance, it's good to reserve the vector.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 19:13:05 +03:00
Pavel Emelyanov
f3e58cb806 api: Use range-loop to iterate keyspaces
The code uses standard for (;;) loop, but range version is nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 19:12:12 +03:00
Avi Kivity
93af3dd69b Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch
Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect.
Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket.

Fixes https://github.com/scylladb/scylladb/issues/16487.

Closes scylladb/scylladb#17113

* github.com:scylladb/scylladb:
  maintenance_socket: add option to set owning group
  transport/controller: get rid of magic number for socket path's maximal length
  transport/controller: set unix_domain_socket_permissions for maintenance_socket
  transport/controller: pass unix_domain_socket_permissions to generic_server::listen
  transport/controller: split configuring sockets into separate functions
2024-02-20 15:09:54 +02:00
Botond Dénes
73a3a3faf3 Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17387

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement tablestats
  utils/rjson: add templated streaming_writer::Write()
2024-02-20 14:46:07 +02:00
Botond Dénes
8c228bffc8 Merge 'repair: accelerate repair load_history time' from Xu Chang
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, to reduced bootstrap time.
Using uuid_xor_to_uint32 on repair load_history dispatch to shard.

Ref: https://github.com/scylladb/scylladb/issues/16774

Closes scylladb/scylladb#16927

* github.com:scylladb/scylladb:
  repair: resolve load_history shard load skew
  repair: accelerate repair load_history time
2024-02-20 13:45:26 +02:00
Kefu Chai
b0bb3ab5b0 topology: print node* with node_printer
in da53854b66, we added formatter for printing a `node*`, and switched
to this formatter when printing `node*`. but we failed to update some
caller sites when migrating to the new formatter, where a
`unique_ptr<node>` is printed instead. this is not the behavior before
the change, and is not expected.

so, in this change, we explicitly instantiate `node_printer` instances
with the pointer held by `unique_ptr<node>`, to restore the behavior
before da53854b66.

this issue was identified when compiling the tree using {fmt} v10 and
compile-time format-string check enabled, which is yet upstreamed to
Seastar.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17418
2024-02-20 14:35:56 +03:00
Patryk Jędrzejczak
419354bc9f test: harden test_cdc_generation_clearing
In one of the previous patches, we fixed scylladb/scylladb#16916 as
a side effect. We removed
`system_keyspace::get_cdc_generations_cleanup_candidate`, which
contained the bug causing the issue.

Even though we didn't have to fix this issue directly, it showed us
that `test_cdc_generation_clearing` was too weak. If something went
wrong during/after the only clearing, the test still could pass
because the clearing was the last action in the test. In
scylladb/scylladb#16916, the CDC generation publisher was stuck
after the clearing because of a recurring error. The test wouldn't
detect it. Therefore, we harden the test by expecting two clearings
instead of one. If something goes wrong during the first clearing,
there is a high chance that the second clearing will fail. The new
test version wouldn't pass with the old bug in the code.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
2b724735d1 test: test clean-up of committed_cdc_generations
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.

In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
7301d1317b raft topology: clean committed_cdc_generations
We clean `TOPOLOGY.committed_cdc_generations` from obsolete
generations to ensure this set doesn't grow endlessly. After this
patch, the following invariant will be true: if a generation is in
`committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
b8aa74f539 raft topology: clean only obsolete CDC generations' data
Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.

The new solution is to clear data of the generations that
finished operating more than 24 hours ago. The rationale behind
it is in the new comment in
`topology_coordinator:clean_obsolete_cdc_generations`.

The previous solution used the clean-up candidate. After
introducing `committed_cdc_generations`, it became unneeded.
The last obsolete generation can be computed in
`topology_coordinator:clean_obsolete_cdc_generations`. Therefore,
we remove all the code that handles the clean-up candidate.

After changing how we clear CDC generations' data,
`test_current_cdc_generation_is_not_removed` became obsolete.
The tested feature is not present in the code anymore.

`test_dependency_on_timestamps` became the only test case covering
the CDC generation's data clearing. We adjust it after the changes.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
8b214d02fb storage_service: topology_state_load: load all committed CDC generations
We load all committed CDC generations into `cdc::metadata`. Since
we have allowed sending writes to the previous generations in
scylladb/scylladb#17134, the committed generations may be necessary
to handle a correct request.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
18cff1aa6a system_keyspace: load_topology_state: fix indentation
Broken in the previous patch.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
e145e758eb raft topology: store committed CDC generations' IDs in the topology
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.

In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. We need to adjust the Raft-based topology to ensure
all required generations are loaded into memory and their data
isn't cleared too early.

This patch is the first step of the adjustment. We replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
This set is sorted by timestamps, just like
`unpublished_cdc_generations`.

This patch is mostly refactoring. The last generation in
`committed_cdc_generations` is the equivalent of the previous
`current_cdc_generation_{uuid, timestamp}`. The other generations
are irrelevant for now. They will be used in the following patches.

After introducing `committed_cdc_generations`, a newly committed
generation is also unpublished (it was current and unpublished
before the patch). We introduce `add_new_committed_cdc_generation`,
which updates both sets of generations so that we don't have to
call `add_committed_cdc_generation` and
`add_unpublished_cdc_generation` together. It's easy to forget
that both of them are necessary. Before this patch, there was
no call to `add_unpublished_cdc_generation` in
`topology_coordinator::build_coordinator_state`. It was a bug
reported in scylladb/scylladb#17288. This patch fixes it.

This patch also removes "the current generation" notion from the
Raft-based topology. For the Raft-based topology, the current
generation was the last committed generation. However, for the
`cdc::metadata`, it was the generation operating now. These two
generations could be different, which was confusing. For the
`cdc::metadata`, the current generation is relevant as it is
handled differently, but for the Raft-based topology, it isn't.
Therefore, we change only the Raft-based topology. The generation
called "current" is called "the last committed" from now.
2024-02-20 12:35:16 +01:00
Kefu Chai
c627d9134e tools/scylla-nodetool: implement tablestats
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 18:12:35 +08:00
Kefu Chai
a7a2cf64cc utils/rjson: add templated streaming_writer::Write()
so we can use it in a templated context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 18:12:35 +08:00
Botond Dénes
050c6dcad7 api: storage_service/keyspaces: add replication filter
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".

Fixes: #16509

Closes scylladb/scylladb#17319
2024-02-20 09:04:41 +01:00
Kefu Chai
57ede58a64 raft: add fmt::formatter for raft::fsm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `raft::fsm`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17414
2024-02-20 09:02:02 +02:00
Kefu Chai
acefde0735 mutation: add fmt::formatter for mutation_partition::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `mutation_partition::printer`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17419
2024-02-20 09:01:22 +02:00
Kefu Chai
0b13de52de sstable/mx: add fmt::formatter for cached_promoted_index::promoted_index_block
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`cached_promoted_index::promoted_index_block`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17415
2024-02-20 09:00:32 +02:00
Botond Dénes
2a494b6c47 Merge 'test/nodetool: parameterize test_ring' from Kefu Chai
so we exercise the cases where state and status are not "normal" and "up".

turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`.

* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
  should help with debugging.

Fixes #17401

Closes scylladb/scylladb#17417

* github.com:scylladb/scylladb:
  test/nodetool: parameterize test_ring
  test/nodetool: fail a test only with leftover expected requests
2024-02-20 08:48:11 +02:00
Anna Stuchlik
69ead0142d doc: remove outdated/invalid entries from FAQ
This commit removes outdated or invalid
FAQ entries specified in https://github.com/scylladb/scylladb/issues/16631

In addition, the questions about Cassandra compatibility
are removed as they are already answered on the forum:
https://forum.scylladb.com/t/which-cassandra-version-is-scylladb-it-compatible-with/84

Also, the incorrect entry about the cache has been removed
and the correct answer is added to the forum.
Fixes https://github.com/scylladb/scylladb/issues/17003

The question about troubleshooting performance issues
has also been removed, as it's already covered on the Forum.

Also, it removes the Apache copyright entry,
which should not be added to the FAQ page.

Closes scylladb/scylladb#17200
2024-02-20 08:43:58 +02:00
Anna Stuchlik
4f8f183736 doc: remove SSTable2json from the docs
This commit removes the SSTable2json documentation,
as well as the links to the removed page.

In addition, it adds a redirection for that page
to prevent 404.

Fixes https://github.com/scylladb/scylladb/issues/17204

Closes scylladb/scylladb#17340
2024-02-20 08:43:27 +02:00
Kefu Chai
64f9d90f7b tools/scylla-nodetool: implement toppartitions
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17357
2024-02-20 08:16:43 +02:00
Pavel Emelyanov
1440eddc58 test/topology: Add checking error paths for failed migration
For now only fail streaming stage and check that migration doesn't get
stuck and doesn't make tablet appear on dead node.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
cb02297642 topology.tablets_migration: Handle failed streaming
In case pending or leaving replica is marked as ignored by operator,
streaming cannot be retried and should jump to "cleanup_target" stage
after a barrier.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
72f3b1d5fe topology.tablets_migration: Add cleanup_target transition stage
The new stage will be used to revert migration that fails at some
stages. The goal is to cleanup the pending replica, which may already
received some writes by doing the cleanup RPC to the pending replica,
then jumping to "revert_migration" stage introduced earlier.

If pending node is dead, the call to cleanup RPC is skipped.

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
ced5bf56eb topology.tablets_migration: Add revert_migration transition stage
It's like end_migration, but old replicas intact just removing the
transition (including new replicas).

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Pavel Emelyanov
a0a33e8be1 storage_service: Rewrap cleanup stage checking in cleanup_tablet()
Next patch will need to teach this code to handle new cleanup_target
stage, this change prepares this place for smoother patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Pavel Emelyanov
c06cbc391f test/topology: Move helpers to get tablet replicas to pylib
These are very useful and will be used across different test files soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Kefu Chai
3a94a7c1ff test/nodetool: parameterize test_ring
so we exercise the cases where state and status are not "normal" and "up".

turns out the MBean is able to cache some objects. so the requets
retrieving datacenter and rack are now marked `ANY`.

Fixes #17401
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 12:59:59 +08:00
Kefu Chai
3d8a6956fc test/nodetool: fail a test only with leftover expected requests
if there are unconsumed requests whose `multiple` is -1, we should
not consider it a required, the test can consume it or not. but if
it does not, we should not consider the test a failure just because
these requests are sitting at the end of queue.

so, in this change, we

* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
  should help with debugging.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 12:59:59 +08:00
Patryk Wrobel
82104b6f50 test_tablets: tablet count metric - remove assumption about tablets existence
The mentioned test failed on CI. It sets up two nodes and performs
operations related to creation and dropping of tables as well as
moving tablets. Locally, the issue was not visible - also, the test
was passing on CI in majority of cases.

One of steps in the test case is intended to select the shard that
has some tablets on host_0 and then move them to (host_1, shard_3).
It contains also a precondition that requires the tablets count to
be greater than zero - to ensure, that move_tablets operation really
moves tablets.

The error message in the failed CI run comes from the precondition
related to tablets count on (host0, src_shard) - it was zero.
This indicated that there were no tablets on entire host_0.

The following commit removes the assumption about the existence of
tablets on host_0. In case when there are no tablets there, the
procedure is rerun for host_1.

Now the logic is as follows:
 - find shard that has some tablets on host_0
 - if such shard does not exist, then find such shard on host_1
 - depending on the result of search set src/dest nodes
 - verify that reported tablet count metric is changed when
   move_tablet operation finishes

Refs: scylladb#17386

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17398
2024-02-19 21:26:08 +01:00
Kefu Chai
3c84f08b93 alternator: add formatter for attribute_path_map_node<update_expression::action>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`attribute_path_map_node<update_expression::action>`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17270
2024-02-19 20:09:11 +02:00
Gleb Natapov
f00ea36f63 gossiper: remove unused REMOVAL_COORDINATOR state
This is leftover from 66ff072540
2024-02-19 15:01:33 +02:00
Gleb Natapov
461bba08cb virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled
If topology over raft is enabled the most up-to-date node status is in
the topology state machine. Get it from there.
2024-02-19 15:01:33 +02:00
Gleb Natapov
eb6fa81714 virtual_tables: create result for cluster_status_table read on shard 0
Next patch will access data that is available only on shard 0 during
result creation.
2024-02-19 15:01:33 +02:00
Petr Gusev
f83df24108 test_decommission: fix log messages
Closes scylladb/scylladb#17396
2024-02-19 12:09:43 +02:00
Mikołaj Grzebieluch
182cfebe40 maintenance_socket: add option to set owning group
Option `maintenance-socket-group` sets the owning group of the maintenance socket.
If not set, the group will be the same as the user running the scylla node.
2024-02-19 10:21:00 +01:00
Kefu Chai
34cc245da5 gms: add formatter for read_context::dismantle_buffer_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`read_context::dismantle_buffer_stats`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17389
2024-02-19 09:43:53 +02:00
Kefu Chai
fe8e37c5bd configure.py: remove -Wno-unused-command-line-argument
`-Wno-unused-command-line-argument` is used to disable the warning of
`-Wunused-command-line-argument`, which is in turn used to split
warnings if any of the command line arguments passed to the compiler
driver is not used. see
https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument
but it seems we are not passing unused command line arguments to
the compiler anymore. so let's drop this option.

this change helps to

* reduce the discrepencies between the compiling options used by
  CMake-generated rules and those generated directly using
  `configure.py`
* reenable the warning so we are aware if any of the options
  is not used by compiler. this could a sign that the option fails
  to serve its purpose.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17195
2024-02-19 09:42:31 +02:00
Botond Dénes
42a76ca568 Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov
There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not.

This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation.

Closes scylladb/scylladb#17235

* github.com:scylladb/scylladb:
  topology: Restore indentation after previous patch
  topology: Drop if_enabled checks for logging
  topology: Add lazy_backtrace() helper
  topology: Add printer wrapper for node* and formatter for it
  topology: Expand formatter<locator::node>
2024-02-19 09:32:53 +02:00
Kefu Chai
47ec74ad1a tools/scylla-nodetool: implement ring
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17375
2024-02-19 09:30:01 +02:00
Anna Stuchlik
ef1468d5ec doc: remove Enterprise OS support from Open Source
With this commit:
- The information about ScyllaDB Enterprise OS support
  is removed from the Open Source documentation.
- The information about ScyllaDB Open Source OS support
  is moved to the os-support-info file in the _common folder.
- The os-support-info file is included in the os-support page
  using the scylladb_include_flag directive.

This update employs the solution we added with
https://github.com/scylladb/scylladb/pull/16753.
It allows to dynamically add content to a page
depending on the opensource/enterprise flag.

Refs https://github.com/scylladb/scylladb/issues/15484

Closes scylladb/scylladb#17310
2024-02-18 22:09:06 +02:00
Petr Gusev
1d6caa42b9 join_cluster: move was_decommissioned check earlier
Before the patch if a decommissioned node tries
to restart, it calls _group0->discover_group0 first
in join_cluster, which hangs since decommissioned
nodes are banned and other nodes don't respond
to their discovering requests.

We fix the problem by checking was_decommissioned()
flag before calling discover_group0.

fixes scylladb/scylladb#17282

Closes scylladb/scylladb#17358
2024-02-18 22:07:28 +02:00
Kefu Chai
9d666f7d29 cmake: add -Wextra to compiling options
this matches what we have in configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17376
2024-02-18 19:21:54 +02:00
Kefu Chai
cb781c0ff7 gms: add add formatter for gms::versioned_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::versioned_value`. its
operator<< is preserved, as it's still being used by the homebrew
generic formatter for std::unordered_map<gms::application_state,
gms::versioned_value>, which is in turn used in gms/gossiper.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17366
2024-02-18 19:21:54 +02:00
Avi Kivity
43f1c3df2e Merge 'repair: Update repair history for tablet repair' from Asias He
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history

Closes scylladb/scylladb#17047

* github.com:scylladb/scylladb:
  repair: Update repair history for tablet repair
  repair: Extract flush hints code
2024-02-18 19:21:54 +02:00
Kefu Chai
8fc4243cf6 configure.py: do not pass include cxx_ldflags in cxxflags
ldflags are passed to ld (the linker), while cxxflags are passed to the
C++ compiler. the compiler does not understand the ldflags. if we
pass ldflags to it, it complains if `-Wunused-command-line-argument` is
enabled.

in this change, we do not include the ldflags in cxxflags, this helps
us to enable the warning option of `-Wunused-command-line-argument`,
so we don't need to disabled it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17328
2024-02-18 19:21:54 +02:00
Avi Kivity
d257cc5003 Merge 'scylla-nodetool: implement the repair command' from Botond Dénes
As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation.

Refs: #15588

Closes scylladb/scylladb#17368

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the repair command
  test/nodetool: utils: add check_nodetool_fails_with_error_contains()
  test/nodetool: util: replace flags with custom matcher
2024-02-18 19:21:54 +02:00
Petr Gusev
4ef5d92f50 gossiping_property_file_snitch_test: modernize + fix potential race
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.

We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).

We also fix the potential race - the test may write
to res from multiple cores with no locks.

Closes scylladb/scylladb#17371
2024-02-18 19:21:53 +02:00
Kefu Chai
4812a57f71 gms: add add formatter for gms::gossip_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

- gms::gossip_digest
- gms::gossip_digest_ack
- gms::gossip_digest_syn

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17379
2024-02-18 19:21:53 +02:00
Patryk Wrobel
3842bf18a7 storage_service/range_to_endpoint_map: allow API to properly handle tablets
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.

This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.

The new logic is as follows:
 - when tablets are disabled then users may query endpoints
   for a keyspace or for a given table in a keyspace
 - when tablets are enabled then users have to provide
   table name, because effective replication map is per-table

When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.

Fixes: scylladb#17343

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17372
2024-02-18 19:21:53 +02:00
Kefu Chai
808f4d72fb storage_service: fix typos in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17377
2024-02-18 19:21:53 +02:00
Botond Dénes
b11213e547 tools/scylla-nodetool: implement the upgradesstables command
Refs: #15588

Closes scylladb/scylladb#17370
2024-02-18 19:21:53 +02:00
Kefu Chai
af2553e8bc cdc: add formatter for cdc::image_mode and cdc::delta_mode
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cdc::image_mode and cdc::delta_mode, and drop their operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17381
2024-02-18 19:21:53 +02:00
Avi Kivity
9bb4482ad0 Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak
Before this PR, writes to the previous CDC generations would
always be rejected. After this PR, they will be accepted if the
write's timestamp is greater than `now - generation_leeway`.

This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.

The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry
with a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.

Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.

Apart from this change, this PR adds tests for it and updates
the documentation.

This PR is sufficient to enable writes to the previous generations
only in the gossiper-based topology. The Raft-based topology
needs some adjustments in loading and cleaning CDC generations.
These changes won't interfere with the changes introduced in this
PR, so they are left for a follow-up.

Fixes scylladb/scylladb#7251
Fixes scylladb/scylladb#15260

Closes scylladb/scylladb#17134

* github.com:scylladb/scylladb:
  docs: using-scylla: cdc: remove info about failing writes to old generations
  docs: dev: cdc: document writing to previous CDC generations
  test: add test_writes_to_previous_cdc_generations
  cdc: generation: allow increasing generation_leeway through error injection
  cdc: metadata: allow sending writes to the previous generations
2024-02-18 19:21:53 +02:00
Asias He
796044be1c repair: Update repair history for tablet repair
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history
2024-02-18 10:21:58 +08:00
Asias He
e43bc775d0 repair: Extract flush hints code
So it can be used by tablet repair as well.
2024-02-18 09:42:02 +08:00
Kefu Chai
50964c423e hints: host_filter: add formatter for hints::host_filter
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `hints::host_filter`. its
operator<< is preserved as it's still used by the homebrew generic
formatter for vector<>, which is in turn used by db/config.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17347
2024-02-16 19:03:11 +03:00
Anna Stuchlik
e132ffdb60 doc: add missing redirections
This commit adds the missing redirections
to the pages whose source files were
previously stored in the install-scylla folder
and were moved to another location.

Closes scylladb/scylladb#17367
2024-02-16 14:09:26 +02:00
Kefu Chai
47fec0428a tools/scylla-nodetool: return 1 when viewbuild not succeeds
this change introduces a new exception which carries the status code
so that an operation can return a non-zero exit code without printing
any errors. this mimics the behavior of "viewbuildstatus" command of
C* nodetool.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17359
2024-02-16 13:53:33 +02:00
Botond Dénes
8d8ea12862 tools/scylla-nodetool: implement the repair command 2024-02-16 04:42:08 -05:00
Botond Dénes
48e8435466 test/nodetool: utils: add check_nodetool_fails_with_error_contains()
Checks that at least one error snippet is contained in the error output.
2024-02-16 04:40:31 -05:00
Botond Dénes
190c9a7239 test/nodetool: util: replace flags with custom matcher
_do_check_nodetool_fails_with() currently has a `match_all` flag to
control how the match is checked. Now we need yet another way to control
how matching is done. Instead of adding yet another flag (and who knows
how many more), jut replace the flag and the errors input with a matcher
functor, which gets the stdout and stderr and is delegated to do any
checks it wants. This method will scale much better going forward.
2024-02-16 04:40:31 -05:00
Yaron Kaikov
44edb89f79 [actions] Add a check for backport labels
As part of the Automation of ScyllaDB backports project, each PR should get either a backport/none or backport/X.Y label.
Based on this label we will automatically open a backport PR for the relevant OSS release.
In this commit, I am adding a GitHub action to verify if such a label was added.
This only applies to PR with a based branch of master or next. For releases, we don't need this check
2024-02-15 22:40:09 +02:00
Avi Kivity
eedb997568 Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar
Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace.

Fixes #16848

Closes scylladb/scylladb#17335

* github.com:scylladb/scylladb:
  compaction: upgrade: handle keyspaces that use tablets
  replica/database: add an optional variant to get_keyspace_local_ranges
2024-02-15 21:31:54 +02:00
Kefu Chai
f0b3068bcf build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy
-Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy
warning options are enabled by -Wextra. the tree fails to build with
these options enabled, before we address them if the warning are genuine
problems, let's disable them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17352
2024-02-15 21:27:44 +02:00
Kamil Braun
50ebce8acc Merge 'Purge old ip on change' from Petr Gusev
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.

We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.

The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.

To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.

The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.

Fixes #16886
Fixes #16691
Fixes #17199

Closes scylladb/scylladb#17162

* github.com:scylladb/scylladb:
  test_change_ip: improve the test
  raft_ip_address_updater: remove stale IPs from gossiper
  raft_address_map: add my ip with the new generation
  system_keyspace::update_peer_info: check ep and host_id are not empty
  system_keyspace::update_peer_info: make host_id an explicit parameter
  system_keyspace::update_peer_info: remove any_set flag optimisation
  system_keyspace: remove duplicate ips for host_id
  system_keyspace: peers table: use coroutines
  storage_service::raft_ip_address_updater: log gossiper event name
  raft topology: ip change: purge old IP
  on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
2024-02-15 17:40:29 +01:00
Nadav Har'El
6873a4772f tablets: add warning on CREATE KEYSPACE
The CDC feature is not supported on a table that uses tablets
(Refs #16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs #5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This patch does this. It was surprisingly hard and ugly to find a place
in the code that can check the tablet-ness of a keyspace while it is
still being created, but I think I found a reasonable solution.

The warning text in this patch is the following (obviously, it can
be improved later, as we perhaps find more missing features):

   "Tables in this keyspace will be replicated using tablets, and will
    not support the CDC feature (issue #16317) and LWT may suffer from
    issue #5251 more often. If you want to use CDC or LWT, please drop
    this keyspace and re-create it without tablets, by adding AND TABLETS
    = {'enabled': false} to the CREATE KEYSPACE statement."

This patch also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets.

Obviously, this entire patch - the warning and its test - can be reverted
as soon as we support CDC (and all other features) on tablets.

Fixes #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-02-15 15:51:47 +02:00
Nadav Har'El
29b42e47e5 test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
The guardrail tests check that certain guardrails enable and disable
certain warnings.

These tests currently check for the *number* of warnings returned by a
request, assuming that without the guardrail there would be no warning.
But in the following patch we plan to add an additional warning on
keyspace creation (that warns about tablets missing some features).
So the tests should check for whether or not a *specific* warning is
returned - not the count.

I only modified tests which the change in the next patch will break.
Tests which use SimpleStrategy and will not get the extra warning,
are unmodified and continue to use the old approach of counting
warnings.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-02-15 15:08:08 +02:00
Lakshmi Narayanan Sreethar
7a98877798 compaction: upgrade: handle keyspaces that use tablets
Tables in keyspaces governed by replication strategy that uses tablets, have
separate effective_replication_maps. Update the upgrade compaction task to
handle this when getting owned key ranges for a keyspace.

Fixes #16848

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:47:39 +05:30
Lakshmi Narayanan Sreethar
8925a2c3cb replica/database: add an optional variant to get_keyspace_local_ranges
Add a new method database::maybe_get_keyspace_local_ranges that
optionally returns the owned ranges for the given keyspace if it has a
effective_replication_map for the entire keyspace.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:44:47 +05:30
Botond Dénes
22a5112bf1 tools/scylla-sstable-scripts: add keys.lua and largest-key.lua
I wrote these scripts to identify sstables with too large keys for a
recent investigation. I think they could be useful in the future,
certainly as further examples on how to write lua scripts for
scylla-sstable script.

Closes scylladb/scylladb#17000
2024-02-15 13:39:41 +02:00
Avi Kivity
5df5714331 Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes
This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter.
This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR.

Fixes: #17313

Closes scylladb/scylladb#17316

* github.com:scylladb/scylladb:
  service/storage_service: get_natural_endpoints(): add tablets support
  replica/database: keyspace: add uses_tablets()
  service/storage_service: remove token overload of get_natural_endpoints()
2024-02-15 13:36:56 +02:00
Kefu Chai
caa20c491f storage_service: pass non-empty keyspace when performing cleanup_all
this change addresses the regression introduced by 5e0b3671, which
fall backs to local cleanup in cleanup_all. but 5e0b3671 failed to
pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl`
is its constructor parameter, that's why the test fails like
```
error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request:
Can't find a keyspace

```

where the string after "Can't find a keyspace" is empty.

in this change, the keyspace name of the keyspace to be cleaned is passed to
`shard_cleanup_keyspace_compaction_task_impl`.

we always enable the topology coordinator when performing testing,
that's why this issue does not pop up until the longevity test.

Fixes #17302
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17320
2024-02-15 13:17:45 +02:00
Aleksandra Martyniuk
cf36015591 repair: handle no_such_column_family from remote node gracefully
If no_such_column_family is thrown on remote node, then repair
operation fails as the type of exception cannot be determined.

Use repair::with_table_drop_silenced in repair to continue operation
if a table was dropped.
2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
2ea5d9b623 test: test drop table on receiver side during streaming 2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
b08f539427 streaming: fix indentation 2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
219e1eda09 streaming: handle no_such_column_family from remote node gracefully
If no_such_column_family is thrown on remote node, then streaming
operation fails as the type of exception cannot be determined.

Use repair::with_table_drop_silenced in streaming to continue
operation if a table was dropped.
2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
5202bb9d3c repair: add methods to skip dropped table
Schema propagation is async so one node can see the table while on
the other node it is already dropped. So, if the nodes stream
the table data, the latter node throws no_such_column_family.
The exception is propagated to the other node, but its type is lost,
so the operation fails on the other node.

Add method which waits until all raft changes are applied and then
checks whether given table exists.

Add the function which uses the above to determine, whether the function
failed because of dropped table (eg. on the remote node so the exact
exception type is unknown). If so, the exception isn't rethrown.
2024-02-15 12:06:42 +01:00
Botond Dénes
811e931b09 Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17344

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement viewbuildstatus
  tools/scylla-nodetool: implement compactionstats
2024-02-15 12:44:05 +02:00
Petr Gusev
c4140678ba test_change_ip: improve the test
In this commit we refactor test_change_ip to improve
it in several ways:
  * We inject failure before old IP is removed and verify
    that after restart the node sees the proper peers - the
    new IP for node2 and old IP for node3, which is not restarted
    yet.
  * We introduce the lambda wait_proper_ips, which checks not only the
    system.peers table, but also gossiper and token_metadata.
  * We call this lambda for all nodes, not only the first node;
    this allows to validate that the node that has changed its
    IP has the proper IP of itself in the data structures above.

Note that we need to inject an additional delay ip-change-raft-sync-delay
before old IP is removed. Otherwise the problem stop reproducing - other
nodes remove the old IP before it's send back to the just restarted node.
2024-02-15 13:26:02 +04:00
Petr Gusev
a068dba8c9 raft_ip_address_updater: remove stale IPs from gossiper
In the scenario described in the previous commit the
on_endpoint_change could be called with our previous IP.
We can easily detect this case - after add_or_update_entry
the IP for a given id in address_map hasn't changed. We
remove such IP from gossiper since it's not needed, and
makes the test in the next commit more natural - all old
IPs are removed from all subsystems.
2024-02-15 13:25:56 +04:00
Petr Gusev
4b33ba2894 raft_address_map: add my ip with the new generation
The following scenario is possible: a node A changes its IP
from ip1 to ip2 with restart, other nodes are not yet aware of ip2
so they keep gossiping ip1, after restart A receives
ip1 in a gossip message and calls handle_major_state_change
since it considers it as a new node. Then on_join event is
called on the gossiper notification handles, we receive
such event in raft_ip_address_updater and reverts the IP
of the node A back to ip1.

The essence of the problem is that we don't pass the proper
generation when we add ip2 as a local IP during initialization
when node A restarts, so the zero generation is used
in raft_address_map::add_or_update_entry and the gossiper
message owerwrites ip2 to ip1.

In this commit we fix this problem by passing the new generation.
To do that we move the increment_and_get_generation call
from join_token_ring to scylla_main, so that we have a new generation
value before init_address_map is called.

Also we remove the load_initial_raft_address_map function from
raft_group0 since it's redundant. The comment above its call site
says that it's needed to not miss gossiper updates, but
the function storage_service::init_address_map where raft_address_map
is now initialized is called before gossiper is started. This
function does both - it load the previously persisted host_id<->IP
mappings from system.local and subscribes to gossiper notifications,
so there is no room for races.

Note that this problem reproduces less likely with the
'raft topology: ip change: purge old IP' commit - other
nodes remove the old IP before it's send back to the
just restarted node. This is also the reason why this
problem doesn't occur in gossiper mode.

fixes scylladb/scylladb#17199
2024-02-15 13:21:04 +04:00
Petr Gusev
2bf75c1a4e system_keyspace::update_peer_info: check ep and host_id are not empty 2024-02-15 13:21:04 +04:00
Petr Gusev
86410d71d1 system_keyspace::update_peer_info: make host_id an explicit parameter
The host_id field should always be set, so it's more
appropriate to pass it as a separate parameter.

The function storage_service::get_peer_info_for_update
is  updated. It shouldn't look for host_id app
state is the passed map, instead the callers should
get the host_id on their own.
2024-02-15 13:21:04 +04:00
Petr Gusev
e0072f7cb3 system_keyspace::update_peer_info: remove any_set flag optimisation
This optimization never worked -- there were four usages of
the update_peer_info function and in all of them some of
the peer_info fields were set or should be set:
* sync_raft_topology_nodes/process_normal_node: e.g. tokens is set
* sync_raft_topology_nodes/process_transition_node: host_id is set
* handle_state_normal: tokens is set
* storage_service::on_change: get_peer_info_for_update could potentially
return a peer_info with all fields set to empty, but this shouldn't
be possible, host_id should always be set.

Moreover, there is a bug here: we extract host_id from the
states_ parameter, which represent the gossiper application
states that have been changed. This parameter contains host_id
only if a node changes its IP address, in all other cases host_id
is unset. This means we could end up with a record with empty
host_id, if it wasn't previously set by some other means.

We are going to fix this bug in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
4a14988735 system_keyspace: remove duplicate ips for host_id
When a node changes IP we call sync_raft_topology_nodes
from raft_ip_address_updater::on_endpoint_change with
the old IP value in prev_ip parameter.
It's possible that the nodes crashes right after
we insert a new IP for the host_id, but before we
remove the old IP. In this commit we fix the
possible inconsistency by removing the system.peers
record with old timestamp. This is what the new
peers_table_read_fixup function is responsible for.

We call this function in all system_keyspace methods
that read the system.peers table. The function
loads the table in memory, decides if some rows
are stale by comparing their timestamps and
removes them.

The new function also removes the records with no
host_id, so we no longer need the get_host_id function.

We'll add a test for the problem this commit fixes
in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
fa8718085a system_keyspace: peers table: use coroutines
This is a refactoring commit with no observable
changes in behaviour.

We switch the functions to coroutines, it'll
be easy to work with them in this way in the
next commit. Also, we add more const-s
along the way.
2024-02-15 13:21:04 +04:00
Petr Gusev
00547d3f48 storage_service::raft_ip_address_updater: log gossiper event name
It's useful for debugging.
2024-02-15 13:20:54 +04:00
Petr Gusev
6955cfa419 raft topology: ip change: purge old IP
When a node changes IP address we need to
remove its old IP from system.peers and
gossiper.

We do this in sync_raft_topology_nodes when
the new IP is saved into system.peers to avoid
losing the mapping if the node crashes
between deleting and saving the new IP. In the
next commit we handle the possible duplicates
in this case by dropping them on the read path.

In subsequent commits, test_change_ip will be
adjusted to ensure that old IPs are removed.

fixes scylladb/scylladb#16886
fixes scylladb/scylladb#16691
2024-02-15 13:19:13 +04:00
Petr Gusev
a2c0384cd1 on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
We introduce the helper 'ensure_alive' which takes a
coroutine lambda and returns a wrapper which
ensures the proper lifetime for it.
It works by moving the input lambda onto the heap and
keeping the ptr alive until the resulting future
is resolved.

We also move the holder acquired from _async_gate
to the 'then' lambda closure, since now these closures
will be kept alive during the lambda coroutine execution.

We'll be adding more code to this lambda in the subsequent
commits, it's easier to work with coroutines.
2024-02-15 13:13:44 +04:00
Kefu Chai
f9d19a61ff tools/scylla-nodetool: implement viewbuildstatus
Refs 15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-15 16:54:16 +08:00
Nadav Har'El
28db187756 alternator, tablets: return error if enabling TTL with tablets
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.

So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.

This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.

Refs #16567
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17306
2024-02-15 10:47:06 +02:00
Kefu Chai
4da9a62472 utils: managed_bytes: fix typo in comment
s/assigments/assignments/

this misspelling was identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17333
2024-02-15 10:37:25 +02:00
Kefu Chai
8e8b73fa82 dht: add formatter for paritition_range_view and i_partition
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`partition_range_view` and `i_partition`, and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17331
2024-02-15 09:46:03 +02:00
Lakshmi Narayanan Sreethar
3b7b315f6a replica/database: quiesce compaction before closing system tables during shutdown
During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes #15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17218
2024-02-15 09:44:16 +02:00
Nadav Har'El
b97ded5c4a test/topology: tests for setting tombstone_gc on materialized view
A user asked on the ScyllaDB forum several questions on whether
tombstone_gc works on materialized views. This patch includes two
tests that confirm the following:

1. The tombstone_gc may be set on a view - either during its creation
   with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW.

2. The tombstone_gc setting is correctly shown - for both base tables
   and views - by the "DESC" statement.

3. The tombstone_gc setting is NOT inherited from a base table to a new
   view - if you want this option on a view, you need to set it
   separately.

Unfortunately, this test could not be a single-node cql-pytest because
we forbid tombstone_gc=repair when RF=1, and since recently, we forbid
setting RF>1 on a single-node setup. So the new tests are written in
the test/topology framework - which may run multiple tests against
a single three-node cluster run multiple tests against it.

To write tests over a shared cluster, we need functions which create
temporary keyspaces, tables and views, which are deleted automatically
as soon as a test ends. The test/topology framework was lacking such
functions, so this tests includes them - currently inside the test
file, but if other people find them useful they can be moved to a more
central location.

The new functions, net_test_keyspace(), new_test_table() and
new_materialized_view() are inspired by the identically-named
functions in test/cql-pytest/util.py, but the implementation is
different: Importantly, the new functions here are *async*
context managers, used via "async with", to fit with the rest
of the asynchronous code used in the topology test framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17345
2024-02-15 09:43:30 +02:00
Kefu Chai
bcb144ada3 configure.py: disable stack-use-after-scope check only when ASan is enabled
`-fno-sanitize-address-use-after-scope` is used to disable the check for
stack-use-after-scope bugs, but this check is only performed when ASan
is enabled. if we pass this option when ASan is not enabled, we'd have
following warning, so let's apply it only when ASan is enabled.

```
clang-16: error: argument unused during compilation:
'-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17329
2024-02-15 09:28:29 +02:00
Botond Dénes
ca13ff10ea service/storage_service: get_natural_endpoints(): add tablets support
Also add a unit test for this API endpoint, testing it with both tablets
and vnodes.
2024-02-15 02:07:18 -05:00
Botond Dénes
7f17d3bb0e replica/database: keyspace: add uses_tablets()
Mirroring table::uses_tablets(), provides a convenient and -- more
importabtly -- easily discoverable way to determine whether the keyspace
uses tablets or not.
This information is of course already available via the abstract
replication strategy, but as seen in a few examples, this is not easily
discoverable and sometimes people resorted to enumerating the keyspace's
tables to be able to invoke table::uses_tablets().
2024-02-15 01:51:26 -05:00
Botond Dénes
0b2acf90ff service/storage_service: remove token overload of get_natural_endpoints()
This overload does not work with tablets because it only has a keyspace
and token parameters. The only caller is the other overload, which also
has a table parameters, so it can be made to works with tablets. Inline
this overload into the other and remove it, in preparation to fixing
this method for tablets.
2024-02-15 01:51:25 -05:00
Kefu Chai
68795eb8fa tools/scylla-nodetool: implement gossipinfo
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17317
2024-02-15 08:41:39 +02:00
Kefu Chai
a7abaa457b tools/scylla-nodetool: implement compactionstats
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-15 12:29:10 +08:00
Anna Stuchlik
710d182654 doc: update Handling Node Failures to add topology
This commit updates the Handling Node Failures page
to specify that the quorum requirement refers to both
schema and topology updates.

Closes scylladb/scylladb#17321
2024-02-14 17:15:13 +01:00
Kamil Braun
7e9e10186f Merge 'change the way ignored nodes are handled by the topology coordinator' from Gleb
This series makes several changes to how ignored nodes list is treated
by the topology coordinator. First the series makes it global and not
part of a single topology operation, second it extends the list at the
time of removenode/replace invocation and third it bans all nodes in
the list from contacting the cluster ever again.

The main motivation is to have a way to unblock tablet migration in case
of a node failure. Tablet migration knows how to avoid nodes in ignored
nodes list and this patch series provides a way to extend it without
performing any topology operation (which is not possible while tables
migration runs).

Fixes scylladb/scylladb#16108

* 'gleb/ignore-nodes-handling-v2' of github.com:scylladb/scylla-dev:
  test: add test for the new ignore nodes behaviour
  topology coordinator: cleanup node_state::decommissioning state handling code
  topology coordinator: ban ignored nodes just like we ban nodes that are left
  storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace
  topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time
  topology coordinator: make ignored_nodes list global and permanent
  topology_coordinator: do not cancel rebuild just because some other nodes are dead
  topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout
  raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously
2024-02-14 16:36:01 +01:00
Marcin Maliszkiewicz
0b8b9381f4 auth: drop const from methods on write path
In a follow-up patch abort_source will be used
inside those methods. Current pattern is that abort_source
is passed everywhere as non const so it needs to be
executed in non const context.

Closes scylladb/scylladb#17312
2024-02-14 13:24:53 +01:00
Tzach Livyatan
902733cd7e Docs: rename doc page from REST tp Admin REST API
Closes scylladb/scylladb#17334
2024-02-14 13:49:54 +02:00
Kefu Chai
d43c418f72 tools/scylla-nodetool: implement getendpoints
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17332
2024-02-14 11:20:52 +02:00
Gleb Natapov
7802c206c7 test: add test for the new ignore nodes behaviour
The test checks that once a node is specified in ignored node list by
one topology operation the information is carried over to the next
operation as well.
2024-02-14 10:35:11 +02:00
Gleb Natapov
7ec9316774 topology coordinator: cleanup node_state::decommissioning state handling code
The code is shared between decommission and removenode and it has
scattered 'ifs' for different behaviours between those. Change it to
have only one 'if'.
2024-02-14 10:35:11 +02:00
Gleb Natapov
363af9e664 topology coordinator: ban ignored nodes just like we ban nodes that are left
Since now a node that is at one point was marked as dead, either via
--ignore-dead-nodes parameter or by been a target for removenode or
replace, can no longer be made "undead" we need to make sure that they
cannot rejoin the cluster any longer. Do that by banning them on a
messaging layer just like we do for nodes that are left.

Not that the removenode failure test had to be altered since it restarted
a node after removenode failure (which now will not work). Also, since
the check for liveness was removed from the topology coordinator (because
the node is already banned by then), the test case that triggers the
removed code is removed as well.
2024-02-14 10:35:06 +02:00
Kefu Chai
ab07fb25f5 scylla_raid_setup: reference xfsprog on the minimal 1024 block size
the quote of "The minimum block size for crc enabled filesystems is
1024" comes from the output of mkfs.xfs, let's quote the source for
better maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17094
2024-02-14 08:44:14 +02:00
Michał Chojnowski
3d81138852 configure.py: don't modify modes in write_build_file()
The true motivation for this patch is a certain problem with configure.py
in scylla-enterprise, which can only be solved by moving the `extra_cxxflags`
lines before configure_seastar(). This patch does that by hoisting
get_extra_cxxflags() up to create_build_system().

But this patch makes sense even if we disregard the real motivation.
It's weird that a function called `write_build_file()` adds additional
build flags on its own.

Closes scylladb/scylladb#17189
2024-02-13 21:28:32 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Nadav Har'El
5d4c60aee3 test/cql-pytest: avoid spurious guardrail warnings
All cql-pytest tests use one node, and unsuprisingly most use RF=1.
By default, as part of the "guardrails" feature, we print a warning
when creating a keyspace with RF=1. This warning gets printed on
every cql-pytest run, which creates a "boy who cried wolf" effect
whereby developers get used to seeing these warnings, and won't care
if new warnings start appearing.

The fix is easy - in run.py start Scylla with minimum-replication-factor-
warn-threshold set to -1 instead of the default 3.

Note that we do have cql-pytest tests for this guardrail, but those don't
rely on the default setting of this variable (they can't, cql-pytest
tests can also be run on a Scylla instance run manually by a developer).
Those tests temporarily set the threshold during the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17274
2024-02-13 17:44:20 +02:00
Kefu Chai
b309e42195 collection_mutation: add formatter for collection_mutation_view::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`collection_mutation_view::printer`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17300
2024-02-13 17:42:25 +02:00
Botond Dénes
120442231f Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski
Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it.

Closes scylladb/scylladb#17208

* github.com:scylladb/scylladb:
  row_cache_test: test cache consistency during memtable-to-cache merge
  row_cache: use preemption_source in update()
  utils: preempt: add preemption_source
2024-02-13 17:37:06 +02:00
Kefu Chai
54ed65bb50 mutation: s/statics/static content/
codespell reports that "statics" could be the misspelling of
"statistics". but "static" here means the static column(s). so
replace "static" with more specific wording.

Refs #589
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17216
2024-02-13 17:33:21 +02:00
Kefu Chai
9b6a66826c api/storage_service: add more constness to http_context parameter
when we just want to perform read access to `http_context`, there
is no need to use a non-const reference. so let's add `const` specifier
to make this explicit. this shoudl help with the readability and
maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17219
2024-02-13 17:32:45 +02:00
Lakshmi Narayanan Sreethar
f8f8d64982 test.py: support skipping multiple test patterns
Support skipping multiple patterns by allowing them to be passed via
multiple '--skip' arguments to test.py.

Example : `test.py --skip=topology --skip=sstables`

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17220
2024-02-13 17:32:03 +02:00
Kefu Chai
57d138b80f row_cache: s/fro/reader/
"fro" is the short of "from" but the value is an
`optimized_optional<flat_mutation_reader_v2>`. codespell considers
it a misspelling of "for" or "from". neither of them makes sense,
so let's change it to "reader" for better readability, also for
silencing the warning. so that the geniune warning can stands out,
this would help to make the codespell more useful.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17221
2024-02-13 17:28:14 +02:00
Kefu Chai
c555af3cd8 raft: add formatter for raft::log
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `raft::log`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17301
2024-02-13 17:17:57 +02:00
Anna Stuchlik
02cd84adbf doc: remove OSS-vs-Ent Matrix from OSS docs
This commit removes the Open Source vs. Enterprise matrix
from the Open Source documentation.

In addition, a redirection is added to prevent 404 in the OSS docs,
and to the removed page is replaced with a link to the same page
in the Enterprise docs.

This commit must be reverted enterprise.git, because
we want to keep the Matrix in the Enterprise docs.

Fixes https://github.com/scylladb/scylladb/issues/17289

Closes scylladb/scylladb#17295
2024-02-13 17:17:22 +02:00
Yaniv Kaul
d2ef100b60 Typos: more/less then -> more/less than
Fix repated typos in comments: more then -> more than, less then -> less than

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17303
2024-02-13 17:16:15 +02:00
Nadav Har'El
dce47a81b0 alternator, tablets: return error if enabling Streams with tablets
Alternator Streams doesn't yet work on tables using tablets (this is
issue #16317). Before this patch, an attempt to enable it results in
an unsightly InternalServerError, which isn't terrible - but we can
do better.

So in this patch, we make the attempt to enable Streams and tablets
together into a clear error. The error message points to the open issue,
and also suggests how to create a table that uses vnodes, not tablets.

Unfortunately, there are slightly two different code paths and error
messages for two cases: One case is the creation of a new table (where
the validation happens before the keyspace is actually created), and
the other case is an attempt to enable streams on an existing table
with an existing keyspace (which already might or might not be using
tablets).

This patch also adds a test that verifies that trying to enable Streams
with tablets is an error - in both cases (table creation and update).
Obviously, this test - and the validation code - should be removed once
the issue is solved and Alternator Streams begins working with tablets.

Fixes #16497
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17311
2024-02-13 16:42:35 +02:00
Raphael S. Carvalho
54226dddf5 replica: Kill vnode-oriented cleanup handling for multiple compaction groups
With tablets, we don't use vnode-oriented sstable cleanup.
So let's just remove unused code and bail out silently if sharding is
tablet based. The reason for silence is that we don't want to break
tests that might be reused for tablets, and it's not a problem for
sstable cleanup to be ignored with tablets.
This approach is actually already used in the higher level code,
implementing the cleanup API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17296
2024-02-13 16:35:15 +02:00
Gleb Natapov
8f7d2fd44b storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace
Fail commands if provided nodes are not in the "normal" state.
2024-02-13 16:15:35 +02:00
Gleb Natapov
d062a04df0 topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time
To unblock tablet migration in case of a node failure we need a way to
dynamically extend a list of ignored_nodes while the migration is
happening. This patch does it by piggybacking on existing topology
operations that assume their target node is already dead. It adds the
target node to now global ignored_nodes list when request is issued and,
for better HA, makes the nodes in ignored_nodes non voters.
2024-02-13 16:15:35 +02:00
Gleb Natapov
9b52dc4560 topology coordinator: make ignored_nodes list global and permanent
Currently ignored_nodes list is part of a request (removenode or
replace) and exists only while a request is handled. This patch
changes it to be global and exist outside of any request. Node stays
in the list until they eventually removed and moved to the "left" state.
If a node is specified in the ignore-dead-nodes option for any command
it will be ignored for all other operations that support ignored_nodes
(like tablet migration).
2024-02-13 16:15:35 +02:00
Gleb Natapov
cbef807e69 topology_coordinator: do not cancel rebuild just because some other nodes are dead
Rebuild may not contact all the nodes, so it may succeed even while some
nodes are dead.
2024-02-13 16:15:35 +02:00
Gleb Natapov
0fe00e34ef topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout
It will be easier to distinguish the failure reason.
2024-02-13 16:15:35 +02:00
Gleb Natapov
f21a3b4ca5 raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously 2024-02-13 16:15:35 +02:00
Petr Gusev
3722ca0a41 sync_raft_topology_nodes: parallelize system_keyspace update functions
In sync_raft_topology_nodes we execute a system keyspace
update query for each node of the cluster. The system keyspace
tables use schema commitlog which by default enables use_o_dsync.
This means that each write to the commitlog is accompanied by fsync.
For large clusters this can incur hundreds of writes with fsyncs, which
is very expensive. For example, in #17039 for a  moderate size cluster
of 50 nodes sync_raft_topology_nodes took almost 5 seconds.

In this commit we solve this problem by running all such update
queries in parallel. The commitlog should batch them and issue
only one write syscall to the OS.

Closes scylladb/scylladb#17243
2024-02-13 14:44:48 +01:00
Piotr Dulikowski
314fd9a11f test: test_topology_recovery_basic: add missing driver reconnect
Unfortunately, scylladb/python-driver#230 is not fixed yet, so it is
necessary for the sake of our CI's stability to re-create the driver
session after all nodes in the cluster are restarted.

There is one place in test_topology_recovery_basic where all nodes are
restarted but the driver session is not re-created. Even though nodes
are not restarted at once but rather sequentially, we observed a failure
with similar symptoms in a CI run for scylla-enterprise.

Add the missing driver reconnect as a workaround for the issue.

Fixes: scylladb/scylladb#17277

Closes scylladb/scylladb#17278
2024-02-13 12:28:30 +01:00
David Garcia
f45d9d33f1 docs: remove liveness asterisks
Instead of adding an asterisk next to "liveness" linking to the glossary, we will temporarily replace them with a hyperlink pending the implementation of tooltip functionality.

Closes scylladb/scylladb#17244
2024-02-12 20:37:52 +02:00
Avi Kivity
b22db74e6a Regenerate frozen toolchain
For gnutls 3.8.3 and clang clang-16.0.6-4.

Fixes #17285.

Closes scylladb/scylladb#17287
2024-02-12 18:36:11 +02:00
Botond Dénes
3f2d7e8b25 tree: remove unnecessary yields around for_each_tablet()
Commit 904bafd069 consolidated the two
existing for_each_tablet() overloads, to the one which has a future<>
returning callback. It also added yields to the bodies of said
callbacks. This is unnecessary, the loop in for_each_tablet() already
has a yield per tablet, which should be enough to prevent stalls.

This patch is a follow-up to #17118

Closes scylladb/scylladb#17284
2024-02-12 17:10:25 +01:00
Kamil Braun
2e81f045cc Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy
RPC is not ready yet at this point, so we should not set this application state yet.

Also, simplify add_local_application_state as it contains dead code
that will never generate an internal error after 1d07a596bf.

Fixes #16932

Closes scylladb/scylladb#17263

* github.com:scylladb/scylladb:
  gossiper: add_local_application_state: drop internae error
  transport: controller: do_start_server: do not set_cql_read for maintenance port
2024-02-12 13:26:45 +01:00
Pavel Emelyanov
2b1612aa04 main: Stop lifecycle notifier for real
It wasn't because of storage service, not the latter is stopped (since
e6b34527c1), so the former can be stopped to

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17251
2024-02-12 13:59:50 +02:00
Kefu Chai
7baee379de sstable/storage: pass fs::path to storage::create_links()
this change is a follow-up of 637dd730. the goal is to use
std::filesystem::path for manipulating paths, and to avoid the
converting between sstring and fs::path back and forth.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17257
2024-02-12 13:26:11 +02:00
Kefu Chai
7a5cb69e33 storage_service: s/format()/fmt::format/
in the same spirit of e84a0991, let's switch the callers who expect
std::string to fmt::format(). to minimize the impact and to reduce
the risk, the switch will be performed piecemeal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17253
2024-02-12 13:24:21 +02:00
Pavel Emelyanov
b9721bd397 test/tablets: Decommissioning node below RF is not allowed
When a node is decommissioned, all tablet replicas need to be moved away
from it. In some cases it may not be possible. If the number of node in
the cluster equals the keysapce RF, one cannot decommission any node
because it's not possible to find nodes for every replica.

The new test case validates this constraint is satisfied.

refs: #16195

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17248
2024-02-12 13:21:47 +02:00
Nadav Har'El
21e7deafeb alternator, mv: fix case of two new key columns in GSI
A materialized view in CQL allows AT MOST ONE view key column that
wasn't a key column in the base table. This is because if there were
two or more of those, the "liveness" (timestamp, ttl) of these different
columns can change at every update, and it's not possible to pick what
liveness to use for the view row we create.

We made an exception for this rule for Alternator: DynamoDB's API allows
creating a GSI whose partition key and range key are both regular columns
in the base table, and we must support this. We claim that the fact that
Alternator allows neither TTL (Alternator's "TTL" is a different feature)
nor user-defined timestamps, does allow picking the liveness for the view
row we create. But we did it wrong!

We claimed in a comment - and implemented in the code before this patch -
that in Alternator we can assume that both GSI key columns will have the
*same* liveness, and in particular timestamp. But this is only true if
one modifies both columns together! In fact, in general it is not true:
We can have two non-key attributes 'a' and 'b' which are the GSI's key
columns, and we can modify *only* b, without modifying a, in which case
the timestamp of the view modification should be b's newer timestamp,
not a's older one. The existing code took a's timestamp, assuming it
will be the same as b's, which is incorrect. The result was that if
we repeatedly modify only b, all view updates will receive the same
timestamp (a's old timestamp), and a deletion will always win over
all the modifications. This patch includes a reproducing test written by
a user (@Zak-Kent) that demonstrates how after a view row is deleted
it doesn't get recreated - because all the modifications use the same
timestamp.

The fix is, as suggested above, to use the *higher* of the two
timestamps of both base-regular-column GSI key columns as the timestamp
for the new view rows or view row deletions. The reproducer that
failed before this patch passes with it. As usual, the reproducer
passes on AWS DynamoDB as well, proving that the test is correct and
should really work.

Fixes #17119

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17172
2024-02-12 13:17:29 +02:00
Nadav Har'El
341af86167 test/cql-pytest: reproducer for GROUP BY regression
This patch adds a simple reproducer for a regression in Scylla 5.4 caused
by commit 432cb02, breaking LIMIT support in GROUP BY.

Refs #17237

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17275
2024-02-12 13:09:52 +02:00
Kefu Chai
57df20eef8 configure.py: use un-deprecated module
PEP 632 deprecates distutils module, and it is remove from Python 3.12.
we are actually using the one vendored by setuptools, if we are using
3.12. so let's use shutil for finding ninja executable.
see https://peps.python.org/pep-0632/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17271
2024-02-12 13:05:35 +02:00
Kamil Braun
7d73c40125 Merge 'test.py: tablets: Fix flakiness of test_tablet_missing_data_repair' from Tomasz Grabiec
Reimplements stop/start sequence using rolling_restart() which is safe
with regards to UP status propagation and not prone to sudden
connection drop which may cause later CQL queries to time out. It also
ensures that CQL is up on all the remaining nodes when the with_down
callback is executed.

The test was observed to fail in CI like this:

```
  cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.157.135.26:9042 datacenter1>: ConnectionException('Pool for 127.157.135.26:9042 is shutdown')})
  ...
      @pytest.mark.repair
      @pytest.mark.asyncio
      async def test_tablet_missing_data_repair(manager: ManagerClient):
  ...
          for idx in range(0,3):
              s = servers[idx].server_id
              await manager.server_stop_gracefully(s, timeout=120)
  >           await check()
```

Hopefully: Fixes #17107

Closes scylladb/scylladb#17252

* github.com:scylladb/scylladb:
  test: py: tablets: Fix flakiness of test_tablet_missing_data_repair
  test: pylib: manager_client: Wait for driver to catch up in rolling_restart()
  test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down
2024-02-12 11:52:09 +01:00
Botond Dénes
f068d1a6fa query: do not kill unpaged queries when they reach the tombstone-limit
The reason we introduced the tombstone-limit
(query_tombstone_page_limit), was to allow paged queries to return
incomplete/empty pages in the face of large tombstone spans. This works
by cutting the page after the tombstone-limit amount of tombstones were
processed. If the read is unpaged, it is killed instead. This was a
mistake. First, it doesn't really make sense, the reason we introduced
the tombstone limit, was to allow paged queries to process large
tombstone-spans without timing out. It does not help unpaged queries.
Furthermore, the tombstone-limit can kill internal queries done on
behalf of user queries, because all our internal queries are unpaged.
This can cause denial of service.

So in this patch we disable the tombstone-limit for unpaged queries
altogether, they are allowed to continue even after having processed the
configured limit of tombstones.

Fixes: #17241

Closes scylladb/scylladb#17242
2024-02-12 12:34:04 +02:00
Kefu Chai
9b85d1aebf configure.py, cmake: do not pass -Wignored-qualifiers explicitly
we recently added -Wextra to configure.py, and this option enables
a bunch of warning options, including `-Wignored-qualifiers`. so
there is no need to enable this specific warning anymore. this change
remove ths option from both `configure.py` and the CMake building system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17272
2024-02-12 12:32:00 +02:00
Avi Kivity
c14571af16 Update seastar submodule
Because Seastar now defaults to C++23, we downgrade it explicitly to
C++20.

* seastar 289ad5e593...5d3ee98073 (10):
  > Update supported C++ standards to C++23 and C++20 (dropping C++17)
  > docker: install clang-tools-18
  > http: add handler_base::verify_mandatory_params()
  > coroutine/exception: document return_exception_ptr()
  > http: use structured-binding when appropriate
  > test/http: Read full server response before sending next
  > doc/lambda-coroutine-fiasco: fix a syntax error
  > util/source_location-compat: use __cpp_consteval
  > Fix incorrect class name in documentation.
  > Add support for missing HTTP PATCH method.

Closes scylladb/scylladb#17268
2024-02-12 12:21:47 +02:00
Patryk Wrobel
9fccd968d3 test_tablets.py: implement test_tablet_count_metric_per_shard
This change introduces a new test that verifies the
functionality related to tablet_count metric.

It checks if tablet_count metric is correctly reported
and updated when new tables are created, when tables
are dropped and when `move_tablet` is executed.

Refs: scylladb#16131
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17165
2024-02-12 11:49:38 +02:00
Kefu Chai
54995fcac0 test/manual: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17255
2024-02-12 11:49:38 +02:00
Patryk Jędrzejczak
38e1ddb8bc docs: using-scylla: cdc: remove info about failing writes to old generations
In one of the previous patches, we have allowed writing to the
previous CDC generations for `generation_leeway`. This change has
made the information about failing writes to the previous
generation and the "rejecting writes to an old generation" example
obsolete so we remove them.

After the change, a write can only fail if its timestamp is distant
from the node's timestamp. We add the information about it.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
9b923f8b81 docs: dev: cdc: document writing to previous CDC generations
We update the dev documentation after allowing writes to the
previous CDC generations in one of the previous patches.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
e64162e8f6 test: add test_writes_to_previous_cdc_generations
In one of the previous patches, we allowed writing to the previous
CDC generations for `generation_leeway`. Now, we add tests for this
change.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
0470b721c2 cdc: generation: allow increasing generation_leeway through error injection
The increased `generation_leeway` is used in the next patch to
write a test. Since it's no longer a constant, we create a new
getter for it.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
330a37b5c9 cdc: metadata: allow sending writes to the previous generations
Before this patch, writes to the previous CDC generations would
always be rejected. After this patch, they will be accepted if
the write's timestamp is greater than `now - generation_leeway`.

This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.

The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry with
a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.

Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.

Note that allowing writes only to the previous generation might
not be enough. With the Raft-based topology, it is possible to
add multiple nodes concurrently. Moreover, tablets make streaming
instant, which allows the topology coordinator to add multiple nodes
very quickly. So, creating generations with almost identical
timestamps is possible. Then, we could encounter the same bug but,
for example, for a generation before the previous generation.
2024-02-12 10:14:00 +01:00
Asias He
a0e46a6b47 repair: Fix rpc::source and rpc::optional parameter order in rpc message
In a mixed cluster (5.4.1-20231231.3d22f42cf9c3 and
5.5.0~dev-20240119.b1ba904c4977), in the rolling upgrade test, we saw
repair never finishing.

The following was observed:

rpc - client 127.0.0.2:65273 msg_id 5524:  caught exception while
processing a message: std::out_of_range (deserialization buffer
underflow)

It turns out the repair rpc message was not compatible between the two
versions. Even with a rpc stream verb, the new rpc parameters must come
after the rpc::source<> parameter. The rpc::source<> parameter is not
special in the sense that it must be the last parameter.

For example, it should be:

void register_repair_get_row_diff_with_rpc_stream(
std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> (
const rpc::client_info& cinfo, uint32_t repair_meta_id,
rpc::source<repair_hash_with_cmd> source, rpc::optional<shard_id> dst_cpu_id_opt)>&& func);

not:

void register_repair_get_row_diff_with_rpc_stream(
std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> (
const rpc::client_info& cinfo, uint32_t repair_meta_id,
rpc::optional<shard_id> dst_cpu_id_opt, rpc::source<repair_hash_with_cmd> source)>&& func);

Fixes #16941

Closes scylladb/scylladb#17156
2024-02-12 09:50:30 +02:00
Nadav Har'El
13e16475fa cql-pytest: fix skipping of tests on Cassandra or old Scylla
Recently we added a trick to allow running cql-pytests either with or
without tablets. A single fixture test_keyspace uses two separate
fixtures test_keyspace_tablets or test_keyspace_vnodes, as requested.

The problem is that even if test_keyspace doesn't use its
test_keyspace_tablets fixture (it doesn't, if the test isn't
parameterized to ask for tablets explicitly), it's still a fixture,
and it causes the test to be skipped. This causes every test to be
skipped when running on Cassandra or old Scylla which doesn't support
tablets.

The fix is simple - the internal fixture test_keyspace_tablets should
yield None instead of skipping. It is the caller, test_keyspace, which
now skips the test if tablets are requested but test_keyspace_tablets
is None.

Fixes #17266

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17267
2024-02-11 21:03:25 +02:00
Kefu Chai
f990ea9678 tools/scylla-nodetool: implement describecluster
Refs #15588
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17240
2024-02-11 20:21:07 +02:00
Avi Kivity
14bf09f447 Merge 'utils: managed_bytes: optimize memory usage for small buffers' from Michał Chojnowski
managed_bytes is implemented as chain of blob_storage objects.
Each blob_storage contains 24 bytes of metadata. But in the most
common case -- when there is only a single element in the chain --
16 bytes of this metadata is trivial/unused.

This is regrettable waste because managed_bytes is used for every
database cell in the memtables and cache. It means that every value
of size >= 7 bytes (smaller ones fit in the inline storage of
managed_bytes) receives 16 bytes of useless overhead.

To correct that, this series adds to managed_bytes an alternative storage
layout -- used for buffers small enough to fit in one fragment -- which only
stores the necessary minimum of metadata. (That is: a pointer to the parent,
to facilitate moving the storage during memory defragmentation).

This saves 16 bytes on every cell greater than 15 bytes. Which includes e.g.
every live cell with value bigger than 6 bytes, which likely applies to most cells.

Before:
```
$ build/release/scylla perf-simple-query --duration 10
median 218692.88 tps ( 61.1 allocs/op,  13.1 tasks/op,   41762 insns/op,        0 errors)
$ build/release/scylla perf-simple-query --duration 10 --write
median 173511.46 tps ( 58.3 allocs/op,  13.2 tasks/op,   53258 insns/op,        0 errors)
$ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16
 - in cache:     2580222
 - in memtable:  2549852
```

After:
```
$ build/release/scylla perf-simple-query --duration 10
median 218780.89 tps ( 61.1 allocs/op,  13.1 tasks/op,   41763 insns/op,        0 errors)
$ build/release/scylla perf-simple-query --duration 10 --write
median 173105.78 tps ( 58.3 allocs/op,  13.2 tasks/op,   52913 insns/op,        0 errors)
$ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16
 - in cache:     2068238
 - in memtable:  2037696
```

Closes scylladb/scylladb#14263

* github.com:scylladb/scylladb:
  utils: managed_bytes: optimize memory usage for small buffers
  utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view
2024-02-11 16:43:40 +02:00
Kefu Chai
cfb2c2c758 db: add formatter for gc_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gc_clock::time_point`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17254
2024-02-11 16:39:25 +02:00
Kefu Chai
33224cc10b sstables/storage: avoid unnecessary type cast
the type of `_dir` was changed to fs::path back in 637dd730, there
is no need to cast `_dir` to fs::path anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17256
2024-02-11 16:37:05 +02:00
Benny Halevy
2ed29e31db gms: inet_address: make constructors explicit
In particular, `inet_address(const sstring& addr)` is
dangerous, since a function like
`topology::get_datacenter(inet_address ep)`
might accidentally convert a `sstring` argument
into an `inet_address` (which would most likely
throw an obscure std::invalid_argument if the datacenter
name does not look like an inet_address).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17260
2024-02-11 15:44:13 +02:00
Benny Halevy
136df58cbc data_value: delete data_value(T*) constructor
Currently, since the data_value(bool) ctor
is implicit, pointers of any kind are implicitly
convertible to data_value via intermediate conversion
to `bool`.

This is error prone, since it allows unsafe comparison
between e.g. an `sstring` with `some*` by implicit
conversion of both sides to `data_value`.

For example:
```
    sstring name = "dc1";
    struct X {
        sstring s;
    };
    X x(name);
    auto p = &x;
    if (name == p) {}
```

Refs #17261

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17262
2024-02-11 15:42:55 +02:00
Benny Halevy
f86a5072d6 gossiper: add_local_application_state: drop internae error
After 1d07a596bf that
dropped before_change notifications there is no sense
in getting the local endpoint_state_ptr twice: before
and after the notifications and call on_internal_error
if the state isn't found after the notifications.

Just throw the runtime_error if the endpoint state is not
found, otherwise, use it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-02-11 13:33:26 +02:00
Benny Halevy
ac83df4875 transport: controller: do_start_server: do not set_cql_read for maintenance port
RPC is not ready yet at this point, so we should not
set this application state yet.

This is indicated by the following warning from
`gossiper::add_local_application_state`:
```
WARN  2024-01-22 23:40:53,978 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.227.191.13, application_states = {{RPC_READY -> Value(1,1)}})
```

That should really be an internal error, but
it can't because of this bug.

Fixes #16932

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-02-11 11:49:52 +02:00
Kefu Chai
d7a404e1ec alternator: add formatter for alternator::calculate_value_caller
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::calculate_value_caller`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17259
2024-02-11 11:49:46 +02:00
Michał Chojnowski
5a3e4a1cc0 utils: managed_bytes: optimize memory usage for small buffers
managed_bytes is implemented as chain of blob_storage objects.
Each blob_storage contains 24 bytes of metadata. But in the most
common case -- when there is only a single element in the chain --
16 bytes of this metadata is trivial/unused.

This is regrettable waste because managed_bytes is used for every
database cell in the memtables and cache. It means that every value
of size >= 7 bytes (smaller ones fit in the inline storage of
managed_bytes) receives 16 bytes of useless overhead.

To correct that, this patch adds to managed_bytes an alternative storage
layout -- used for buffers small enough to fit in one contiguous
fragment -- which only stores the necessary minimum of metadata.
(That is: a pointer to the parent, to facilitate moving the storage during
memory defragmentation).
2024-02-09 20:56:20 +01:00
Tomasz Grabiec
1eedc85990 test: py: tablets: Fix flakiness of test_tablet_missing_data_repair
Reimplement stop/start sequence using rolling_restart() which is safe
with regards to UP status propagation and not prone to sudden
connection drop which may cause later CQL queries to time out. It also
ensures that CQL is up on all the remaining nodes when the with_down
callback is executed.

Hopefully: Fixes #17107
2024-02-09 20:37:06 +01:00
Tomasz Grabiec
27ed2d94fc test: pylib: manager_client: Wait for driver to catch up in rolling_restart()
For sanity of the developers who want to execute CQL queries after
rolling restarts.
2024-02-09 20:35:41 +01:00
Tomasz Grabiec
3ce4ec796a test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down 2024-02-09 20:35:41 +01:00
Pavel Emelyanov
7a710425f0 streaming: Open-code on-stack lambda
It just wraps one if, no benefit in keeping it this way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17250
2024-02-09 20:31:09 +01:00
Petr Gusev
4554653ad9 storage_proxy: add a test for stop_remote
This patch adds a reproducer test for an issue #16382.
See scylladb/seastar#2044 for details of the problem.

The test is enabled only in dev mode since it requires
error injection mechanism. The patch adds a new injection
into storage_proxy::handle_read to simulate the problem
scenario - the node is shutting down and there are some
unfinished pending replica requests.

Closes scylladb/scylladb#16776
2024-02-09 17:23:13 +01:00
Michał Chojnowski
277a31f0ae utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view
Some methods of managed_bytes contain the logic needed to read/write the
contents of managed_bytes, even though this logic is already present in
managed_bytes_{,mutable}_view.

Reimplementing those methods by using the views as intermediates allows us to
remove some code and makes the responsibilities cleaner -- after the change,
managed_bytes contains the logic of allocating and freeing the storage,
while views provide read/write access to the storage.

This change will simplify the next patch which changes the internals of
managed_bytes.
2024-02-09 17:00:33 +01:00
Botond Dénes
ba89b86913 Update tools/java submodule
* tools/java c75ce2c1...5e11ed17 (1):
  > bin/nodetool-wrapper: pass all args to nodetool for testings its ability
2024-02-09 16:34:47 +01:00
Raphael S. Carvalho
daa82f406c test_tablets: Enable table debug log in split test
If the test fails, it's helpful to see how split completion was
handled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17236
2024-02-09 14:38:24 +02:00
Mikołaj Grzebieluch
38191144ac transport/controller: get rid of magic number for socket path's maximal length
Calculate `max_socket_length` from the size of the structure
representing the Unix domain socket address.
2024-02-09 12:32:37 +01:00
Mikołaj Grzebieluch
fffb732704 transport/controller: set unix_domain_socket_permissions for maintenance_socket
Set filesystem permissions for the maintenance socket to 660.

Fixes #16487
2024-02-09 12:32:26 +01:00
Botond Dénes
c7d9708092 Merge 'repair: delete table reference from repair related classes' from Aleksandra Martyniuk
row_level_repair and repair_meta keep a reference to a table.
If the table is dropped during repair, its object is destructed, leaving
a dangling reference.

Delete {row_level_repair,repair_meta}::_cf and replace their usages.

Fixes: #17233.

Closes scylladb/scylladb#17234

* github.com:scylladb/scylladb:
  repair: delete _cf from repair_meta
  repair: delete _cf from row_level_repair
2024-02-09 13:16:43 +02:00
Kamil Braun
e9e24f47ec Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski
This PR implements a procedure that upgrades existing clusters to use
raft-based topology operations. The procedure does not start
automatically, it must be triggered manually by the administrator after
making sure that no topology operations are currently running.

Upgrade is triggered by sending `POST
/storage_service/raft_topology/upgrade` request. This causes the
topology coordinator to start who drives the rest of the process: it
builds the `system.topology` state based on information observed in
gossip and tells all nodes to switch to raft mode. Then, topology
coordinator runs normally.

Upgrade progress is tracked in a new static column `upgrade_state` in
`system.topology`.

The procedure also serves as an extension to the current recovery
procedure on raft. The current recovery procedure requires restarting
nodes in a special mode which disables raft, perform `nodetool
removenode` on the dead nodes, clean up some state on the nodes and
restart them so that they automatically rebuild the group 0. Raft
topology fits into existing procedure by falling back to legacy topology
operations after disabling raft. After rebuilding the group 0, upgrade
needs to be triggered again.

Because upgrade is manual and it might not be convenient for
administrators to run it right after upgrading the cluster, we allow the
cluster to operate in legacy topology operations mode until upgrade,
which includes allowing new nodes to join. In order to allow it, nodes
now ask the cluster about the mode they should use to join before
proceeding by using a new `JOIN_NODE_QUERY` RPC.

The procedure is explained in more detail in `topology-over-raft.md`.

Fixes: https://github.com/scylladb/scylladb/issues/15008

Closes scylladb/scylladb#17077

* github.com:scylladb/scylladb:
  test/topology_custom: upgrade/recovery tests for topology on raft
  cdc/generation_service: in legacy mode, fall back to raft tables
  system_keyspace: add read_cdc_generation_opt
  cdc/generation_service: turn off gossip notifications in raft topo mode
  cql_test_env: move raft_topology_change_enabled var earlier
  group0_state_machine: pull snapshot after raft topology feature enabled
  storage_service: disable persistent feature enabler on upgrade
  storage_service: replicate raft features to system.peers
  storage_service: gossip tokens and cdc generation in raft topology mode
  API: add api for triggering and monitoring topology-on-raft upgrade
  storage_service: infer which topology operations to use on startup
  storage_service: set the topology kind value based on group 0 state
  raft_group0: expose link to the upgrade doc in the header
  feature_service: fall back to checking legacy features on startup
  storage_service: add fiber for tracking the topology upgrade progress
  gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
  topology_coordinator: implement core upgrade logic
  topology_coordinator: extract top-level error handling logic
  storage_service: initialize discovery leader's state earlier
  topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
  topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
  topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
  topology_state_machine: introduce upgrade_state
  storage_service: disallow topology ops when upgrade is in progress
  raft_group0_client: add in_recovery method
  storage_service: introduce join_node_query verb
  raft_group0: make discover_group0 public
  raft_group0: filter current node's IP in discover_group0
  raft_group0: remove my_id arg from discover_group0
  storage_service: make _raft_topology_change_enabled more advanced
  docs: document raft topology upgrade and recovery
2024-02-09 11:54:53 +01:00
Kefu Chai
c1c96bbc16 api/storage_service: drop /storage_service/describe_ring/ API
per its description, "`/storage_service/describe_ring/`" returns the
token ranges of an arbitrary keyspace. actually, it returns the
first keyspace which is of non-local-vnode-based-strategy. this API
is not used by nodetool, neither is it exercised in dtest.
scylla-manager has a wrapper for this API though, but that wrapper
is not used anywhere.

in this change, this API is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17197
2024-02-09 12:49:21 +02:00
Pavel Emelyanov
309d34a147 topology: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
f7a13b9bb0 topology: Drop if_enabled checks for logging
Now all the logged arguments are lazily evaluated (node* format string
and backtrace) so the preliminary log-level checks are not needed.

indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
c1ea6c8acf topology: Add lazy_backtrace() helper
This helper returns lazy_eval-ed current_backtrace(), so it will be
generated and printed only if logger is really going to do it with its
current log-level.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
da53854b66 topology: Add printer wrapper for node* and formatter for it
Currently to print node information there's a debug_format(node*) helper
function that returns back an sstring object. Here's the formatter
that's more flexible and convenient, and a node_printer wrapper, since
formatters cannot format non-void pointers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
aa0293f411 topology: Expand formatter<locator::node>
Equip it with :v specifier that turns verbose mode on and prints much
more data about the node. Main user will appear in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Kefu Chai
c07de1fad1 topology_coordinator: s/sate/state/
fix a typo in the logging message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17201
2024-02-09 10:27:33 +01:00
Kefu Chai
876478b84f storage_service: allow concurrent tablet migration in tablets/move API
Currently it waits for topology state machine to be idle, so it allows
one tablet to be moved at a time. We should allow it to start migration
if the current transition state is

- topology::transition_state::tablet_migration or
- topology::transition_state::tablet_draining

to allow starting parallel tablet movement. That will be useful when
scripting a custom rebalancing algorithm.

in this change, we wait until the topology state machine is idle or
it is at either of the above two states.

Fixes #16437
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17203
2024-02-08 21:47:15 +01:00
Piotr Dulikowski
4d4976feb0 test/topology_custom: upgrade/recovery tests for topology on raft
Adds three tests for the new upgrade procedure:

- test_topology_upgrade - upgrades a cluster operating in legacy mode to
  use raft topology operations,
- test_topology_recovery_basic - performs recovery on a three-node
  cluster, no node removal is done,
- test_topology_majority_loss - simulates a majority loss scenario, i.e.
  removed two nodes out of three, performs recovery to rebuild the
  raft topology state and re-add two nodes back.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
d04b3338ce cdc/generation_service: in legacy mode, fall back to raft tables
When a node enters recovery after being in raft topology mode, topology
operations switch back to legacy mode. We want CDC to keep working when
that happens, so we need for the legacy code to be able to access
generations created back in raft mode - so that the node can still
properly serve writes to CDC log tables.

In order to make this possible, modify the legacy logic to also look for
a cdc generation in raft tables, if it is not found in legacy tables.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
fb02453686 system_keyspace: add read_cdc_generation_opt
The `system_keyspace::read_cdc_generation` loads a cdc generation from
the system tables. One of its preconditions is that the generation
exists - this precondition is quite easy to satisfy in raft mode, and
the function was designed to be used solely in that mode.

In legacy mode however, in case when we revert from raft mode through
recovery, it might be necessary to use generations created in raft mode
for some time. In order to make the function useful as a fallback in
case lookup of a generation in legacy mode fails, introduce a relaxed
variant of `read_cdc_generation` which returns std::nullopt if the
generation does not exist.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
77a8f5e3d6 cdc/generation_service: turn off gossip notifications in raft topo mode
In raft topology mode CDC information is propagated through group 0.
Prevent the generation service from reacting to gossiper notifications
after we made the switch to raft mode.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
29e286ee03 cql_test_env: move raft_topology_change_enabled var earlier
We will need to pass it to cdc::generation_service::config in the next
commit, so move it a bit earlier.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
07aba3abc4 group0_state_machine: pull snapshot after raft topology feature enabled
Pulling a snapshot of the raft topology is done via new rpc verb
(RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of
scylla and does not understand the verb, sending it will result in an
error. We usually use cluster features to avoid such situations, but in
the case when a node joins the cluster, it doesn't have access to
features yet. Therefore, we need to enable pulling snapshots in two
situations:

- when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled,
- in case when starting group 0 server when joining a cluster that uses
  raft-based topology.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
53932420f8 storage_service: disable persistent feature enabler on upgrade
When starting in legacy mode, a gossip event listener called persistent
feature enabler is registered. This listener marks a feature as enabled
when it notices, in gossip, that all nodes declare support for the
feature.

With raft-based topology, features are managed in group 0 instead and do
not rely on the persistent feature enabler at all. Make the listener
look at the raft_topology_change_enabled() method and prevent it from
enabling more features after that method starts returning true.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
4fdd3e014a storage_service: replicate raft features to system.peers
This is necessary for cluster features to work after we switch from raft
topology mode to legacy topology mode during recovery, because
information in system.peers is used during legacy cluster feature check
and when enabling features.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
08865a0bd7 storage_service: gossip tokens and cdc generation in raft topology mode
A mixed raft/legacy cluster can happen when entering recovery mode, i.e.
when the group 0 upgrade state is set to 0 and a rolling restart is
performed. Legacy nodes expect at least information about tokens,
otherwise an internal error occurs in the handle_state_normal function.
Therefore, make nodes that use raft topology behave well with respect to
other nodes.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
a672383c2a API: add api for triggering and monitoring topology-on-raft upgrade
Implements the /storage_service/raft_topology/upgrade route. The route
supports two methods: POST, which triggers the cluster-wide upgrade to
topology-on-raft, and GET which reports the status of the upgrade.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
0bfcf7d4c6 storage_service: infer which topology operations to use on startup
Adds a piece of logic to storage_service::join_cluster which chooses the
mode in which it will boot.

If the experimental raft topology flag is disabled, it will fall back to
legacy node operations.

When the node starts for the first time, it will perform group 0
discovery. If the node creates a cluster, it will start it in raft
topology mode. If it joins an existing one, it will ask the node chosen
by the discovery algorithm about which joining method to use.

If the node is already a part of the cluster, it will base its decision
on the group0 state.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
1e0aae8576 storage_service: set the topology kind value based on group 0 state
When booting for the first time, the node determines whether to use raft
mode or not by asking the cluster, or by going straight to raft mode
when it creates a new cluster by itself. This happens before joining
group 0. However, right after joining group 0, the `upgrade_state`
column from `system.topology` is supposed to control which operations
the node is supposed to be using.

In order to have a single source of control over the flag (either
storage_service code or group 0 code), the
`_manage_topology_change_kind_from_group0` flag is added which controls
whether the `_topology_change_kind_enabled` flag is controlled from
group 0 or not.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
5392bac85b raft_group0: expose link to the upgrade doc in the header
So that it can be referenced from other files.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
3513a07d8a feature_service: fall back to checking legacy features on startup
When checking features on startup (i.e. whether support for any feature
was revoked in an unsafe way), it might happen that upgrade to raft
topology didn't finish yet. In that case, instead of loading an empty
set of features - which supposedly represents the set of features that
were enabled until last boot - we should fall back to loading the set
from the legacy `enabled_features` key in `system.scylla_local`.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
d5a2837658 storage_service: add fiber for tracking the topology upgrade progress
The topology coordinator fiber is not started if a node starts in legacy
topology mode. We need to start the raft state monitor fiber after all
preconditions for starting upgrade to raft topology are met.

Add a fiber which is spawned only in legacy mode that will wait until:

- The schema-on-raft upgrade finishes,
- The SUPPORTS_CONSISTENT_CLUSTER_MANAGEMENT feature is enabled,
- The upgrade is triggered by the user.

and, after that, will spawn the raft state monitor fiber.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
2ecb8641b1 gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
All nodes being capable of support for raft topology is a prerequisite
for starting upgrade to raft topology. The newly introduced feature will
track this prerequisite.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
a55797fd41 topology_coordinator: implement core upgrade logic
Implement topology coordinator's logic responsible for building the
group 0 state related to topology.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
b3369611bc topology_coordinator: extract top-level error handling logic
...to a separate method. It will be reused in another method that will
be introduced in the next commit.
2024-02-08 19:09:35 +01:00
Kefu Chai
082ad51b71 .git: skip *.svg when scanning spelling errors
codespell reports following warnings:
```
Error: ./docs/kb/flamegraph.svg:1: writen ==> written
Error: ./docs/kb/flamegraph.svg:1: writen ==> written
Error: ./docs/kb/flamegraph.svg:1: storag ==> storage
Error: ./docs/kb/flamegraph.svg:1: storag ==> storage
```

these misspellings come from the flamgraph, which can be viewed
at https://opensource.docs.scylladb.com/master/kb/flamegraph.html
they are very likely to be truncated function names displayed
in the frames. and the spelling of these names are not responsible
of the author of the article, neither can we change them in a
meaningful way. so add it to the skip list.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17215
2024-02-08 19:46:54 +02:00
Kefu Chai
e84a09911a data_dictionary: use fmt::format() when appropriate
we have three format()s in our arsenal:

* seastar::format()
* fmt::format()
* std::format()

the first one is used most frequently. but it has two limitations:

1. it returns seastar::sstring instead of std::string. under some
   circumstances, the caller of the format() function actually
   expects std::string, in that case a deep copy is performed to
   construct an instance of std::string. this incurs unnecessary
   performance overhead. but this limitation is a by-design behavior.
2. it does not do compile-time format check. this can be improved
   at the Seastar's end.

to address these two problems, we switch the callers who expect
std::string to fmt::format(). to minimize the impact and to reduce
the risk, the switch will be performed piecemeal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17212
2024-02-08 19:44:56 +02:00
Kefu Chai
64c829da70 docs: reformat the state machine diagram using mermaid
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16620
2024-02-08 19:43:53 +02:00
Kefu Chai
3dfb0f86f1 db: add formatter for error_injection_at_startup
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `error_injection_at_startup`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17211
2024-02-08 19:40:48 +02:00
Piotr Dulikowski
09a6862f96 storage_service: initialize discovery leader's state earlier
Move it before the topology coordinator is started. This way, the
topology coordinator will see non-empty state when it is started and it
will allow for us to assert that topology coordinator is never started
for an empty system.topology table.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
61e2b2fd9f topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
Extend the prepare_and_broadcast_cdc_generation_data function like we
did in the case of prepare_new_cdc_generation_data - the topology
coordinator state building process not only has to create a new
generation, but also broadcast it.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
0d9b88fd78 topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
During topology coordinator state build phase a new cdc generation will
be generated. We can reuse prepare_new_cdc_generation_data for that.
Currently, it always takes sharding information (shard count + ignore
msb) from the topology state machine - which won't be available yet at
the point of building the topology, so extend the function so that it
can accept a custom source of sharding information.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
573bb8dd98 topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
The FIXME mentions that token metadata should return host ID for given
token (instead of, presumably, an IP) - but that is already the case, so
let's remove the fixme.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
32a2e24a0f topology_state_machine: introduce upgrade_state
`upgrade_state` is a static column which will be used to track the
progress of building the topology state machine.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
b8e4e04096 storage_service: disallow topology ops when upgrade is in progress
Forbid starting new topology changes while upgrade to topology on raft
is in progress. While this does not take into account any ongoing
topology operations, it makes sure that at the end of the upgrade no
node will try to perform any legacy topology operations.
2024-02-08 18:05:02 +01:00
Avi Kivity
f1e11a7060 Merge 'scylla-nodetool: implement the describering command' from Botond Dénes
On top of the capabilities of the java-nodetool command, tablet support is also implemented: in addition to the existing keyspace parameter, an optional table parameter is also accepted and forwarded to the REST API. For tablet keyspaces this is required to get a ring description.

The command comes with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588
Refs: https://github.com/scylladb/scylladb/issues/16846

Closes scylladb/scylladb#17163

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement describering
  tools/scylla-nodetool.cc: handle API request failures gracefully
  test/nodetool: util.py: add check_nodetool_fails_with_all()
2024-02-08 18:52:34 +02:00
Tomasz Grabiec
c06173b3a3 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.

Topology transition is protected by the means of topology_guard.

Closes scylladb/scylladb#17230
2024-02-08 18:26:00 +02:00
Aleksandra Martyniuk
5f7263afb5 repair: delete _cf from repair_meta
repair_meta keeps a reference to a table. If the table is dropped
during repair, its object is destructed, leaving a dangling reference.

Delete repair_meta::_cf and replace its usages with appropriate
methods.
2024-02-08 17:01:41 +01:00
Aleksandra Martyniuk
36882e1c4a repair: delete _cf from row_level_repair
row_level_repair keeps a reference to a table. If the table is dropped
during repair, its object is destructed, leaving a dangling reference.

Delete row_level_repair::_cf and replace its usages with appropriate
methods.
2024-02-08 16:47:02 +01:00
Botond Dénes
8fcb4ed707 tools/scylla-nodetool: implement describering
Also implementing tablet support, which basically just means that a new
table parameter is also accepted and forwarded to the API, in addition
to the existing keyspace one.
2024-02-08 09:20:25 -05:00
Botond Dénes
2df2733ed1 tools/scylla-nodetool.cc: handle API request failures gracefully
Currently, error handling is done via catching
http::unexpected_status_error and re-throwing an std::runtime_error.
Turns out this no longer works, because this error will only be thrown
by the http client, if the request had an expected reply code set.
The scylla_rest_client doesn't set an expected reply code, so this
exception was never thrown for some time now.
Furthermore, even when the above worked, it was not too user-friendly as
the error message only included the reply-code, but not the reply
itself.

So in this patch this is fixed:
* The handling of http::unexpected_status_error is removed, we don't
  want to use this mechanism, because it yields very terse error
  messages.
* Instead, the status code of the request is checked explicitely and all
  cases where it is not 200 are handled.
* A new api_request_failed exception is added, which is throw for all
  non-200 statuses with the extracted error message from the server (if
  any).
* This exception is caught by main, the error message is printed and
  scylla-nodetool returns with a new distinct error-code: 4.

With this, all cases where the request fails on ScyllaDB are handled and
we shouldn't hit cases where a nodetool command fails with some
obscure JSON parsing error, because the error reply has different JSON
schema than the expected happy-path reply.
2024-02-08 09:20:25 -05:00
Botond Dénes
d4f7f23b98 test/nodetool: util.py: add check_nodetool_fails_with_all()
Similar to the existing check_nodetool_fails_with() but checks that all
error messages from expected_errors are contained in stderr.

While at it, use list as the typing hint, instead of typing.List.
2024-02-08 09:20:25 -05:00
Kefu Chai
e02958ad35 sstable: let make_entry_descriptor() accept a single fs::path
both of its callers are passing parent_path() and filename() to
it. so let the callee to do this. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17225
2024-02-08 16:44:16 +03:00
Kefu Chai
770baa806e streaming: ignore failures when streaming dropped tables
before this change, when performing `stream_transfer_task`, if an
exception is raised, we check if the table being streamed is still
around, if it is missing, we just skip the table as it should be
dropped during streaming, otherwise we consider it a failure, and
report it back to the peer. this behavior was introduced by
953af382.

but we perform the streaming on all shards in parallel, and if any
of the shards fail because of the dropped table, the exception is
thrown. and the current shard is not necessarily the one which
throws the exception. actually, current shard might be still
waiting for a write lock for removing the table from the database's
table metadata. in that case, we consider the streaming RPC call a
failure even if the table is already removed on some shard(s). and
the peer would fail to bootstreap because of streaming failure.

in this change, before catching all exceptions, we handle
`no_such_column_family`, and do not fail the streaming in that case.
please note, we don't touch other tables, so we can just assume that
`no_such_column_family` is thrown only if the table to be transferred
is missing. that's why `assert()` is added.

Fixes #15370
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17160
2024-02-08 14:07:22 +02:00
Amnon Heiman
f4e82174b2 replica/table.cc: Align the tablet's behavior with other metrics.
Due to the potentially large number of per-table metrics, ScyllaDB uses
configuration to determine what metrics will be reported.  The user can
decide if they want per-table-per-shard metrics, per-table-per-instance
metrics, or none.

This patch uses the same logic for tablet metrics registration.
It adds a new metrics group tablets with one metric inside it - count.
So, scylla_tablets_count will report the number of tablets per shard.

The existing per-table metrics will be reported aggregated or not like
the other per-table metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#17182
2024-02-08 12:48:25 +01:00
xuchang
9b675d1fe4 repair: resolve load_history shard load skew
Using uuid_xor_to_uint32 instance of table_uuid's most_significant_bits,
optimize the hash conflict to shard.
2024-02-08 18:18:01 +08:00
xuchang
ae422fdf69 repair: accelerate repair load_history time
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, and parallel num 16 for each shard,
to reduced bootstrap time.
2024-02-08 18:18:01 +08:00
Kefu Chai
6eae678eb3 db: add formatter for gms::gossip_digest_ack2
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::gossip_digest_ack2`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17153
2024-02-08 11:49:37 +02:00
Kefu Chai
07da9fd197 sstable: change sstable_touch_directory_io_check() to accept fs::path
this change is a follow-up of 637dd730. the goal is to use
std::filesystem::path for manipulating paths, and to avoid the
converting between sstring and fs::path back and forth.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17214
2024-02-08 10:01:47 +03:00
Kefu Chai
2c859bc310 sstables: let state_to_dir(sstable_state) return string_view
state_to_dir(sstable_state) translate the enum to the corresponding
directory component. and it returns a `seastar::sstring`. not all
the callers of this function expect a full-blown sstring instance,
on the contrary, quite a few of them just want a string-alike object
which represents the directory component, so they can use it, for
instance to compose a path, or just format the given `state` enum.

so to avoid the overhead of creating/destroying the `seastar::sstring`
instance, let's switch to `std::string_view`. with this change, we
will be able to implement the fmt::formatter for `sstable_state`
without the help of the formatter of sstring.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17213
2024-02-08 10:00:08 +03:00
Kurashkin Nikita
7ce9a3e9e5 cql: add limits for integer values when creating date type
Added a simple check that prevents entering int values that lead to
overflow when creating a date type.

Fixes #17066

Closes scylladb/scylladb#17102
2024-02-08 00:08:01 +02:00
Michał Chojnowski
f5e3a728e4 row_cache_test: test cache consistency during memtable-to-cache merge
A rather minimal reproducer for #16759. Not extensive.
2024-02-07 18:31:36 +01:00
Michał Chojnowski
bed20a2e37 row_cache: use preemption_source in update()
To facilitate testing the state of cache after the update is preempted
at various points, pass a preemption_source& to update() instead of
calling the reactor directly.

In release builds, the calls to preemption_source methods should compile
to the same direct reactor calls as today. Only in dev mode they should
add an extra branch. (However, the `preemption_source&` argument has
to be shoveled in any mode).
2024-02-07 18:31:36 +01:00
Michał Chojnowski
fabab2f46f utils: preempt: add preemption_source
While `preemption_check` can be passed to functions to control
their preemption points, there is no way to inspect the
state of the system after the preemption results in a yield.

`preemption_source` is a superset of `preemption_check`,
which also allows for customizing the yield, not just the preemption
check. An implementation passed by a test can hook the yield to
put the tested function to sleep, run some code, and then wake the
function up.

We use the preprocessor to minimize the impact on release builds.
Only dev-mode preemption_source is hookable. When it's used in other
modes, it should compile to direct reactor calls, as if it wasn't used.
2024-02-07 18:31:28 +01:00
Piotr Dulikowski
f6b303d589 raft_group0_client: add in_recovery method
It tells whether the current node currently operates in recovery mode or
not. It will be vital for storage_service in determining which topology
operations to use at startup.
2024-02-07 10:02:01 +01:00
Piotr Dulikowski
7601f40bf8 storage_service: introduce join_node_query verb
When a node joins an existing cluster, it will ask a node that already
belongs to the cluster about which topology operations to use when
joining.
2024-02-07 10:02:00 +01:00
Piotr Dulikowski
bab5d3bbe5 raft_group0: make discover_group0 public
The `discover_group0` function returns only after it either finds a node
that belongs to some group 0, or learns that the current node is
supposed to create a new one. It will be very helpful to storage_service
in determining which topology mode to use.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
367df7322e raft_group0: filter current node's IP in discover_group0
This was previously done by `setup_group0`, which always was an
(indirect) caller of `discover_group0`. As we want to make
`discover_group0` public, it's more convenient for the callers if the
called method takes care of sanitizing the argument.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
86e4a59d5b raft_group0: remove my_id arg from discover_group0
The goal is to make `discover_group0` public. The `my_id` argument was
always set to `this->load_my_id()`, so we can get rid of it and it will
make it more convenient to call `discover_group0` from the outside.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
4174a32d3f storage_service: make _raft_topology_change_enabled more advanced
Currently, nodes either operate in the topology-on-raft mode or legacy
mode, depending on whether the experimental topology on raft flag is
enabled. This also affects the way nodes join the cluster, as both modes
have different procedures.

We want to allow joining nodes in legacy mode until the cluster is
upgraded. Nodes should automatically choose the best method. Therefore,
the existing boolean _raft_topology_change_enabled flag is extended into
an enum with the following variants:

- unknown - the node still didn't decide in which mode it will operate
- legacy - the node uses legacy topology operations
- upgrading_to_raft - the node is upgrading to use raft topology
  operations
- raft - the node uses raft topology operations

Currently, only the `legacy` and `raft` variants are utilized, but this
will change in the commits that follow.

Additionally, the `_raft_experimental_topology` bool flag is introduced
which retains the meaning of the old `_raft_topology_change_enabled` but
has a more fitting name. It is explicitly needed in
`topology_state_load`.
2024-02-07 10:00:15 +01:00
Piotr Dulikowski
1104f8b00f docs: document raft topology upgrade and recovery 2024-02-07 09:54:54 +01:00
Botond Dénes
35da9551fb Merge 'storage_service: Add describe_ring support for tablet table' from Asias He
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509

Closes scylladb/scylladb#17118

* github.com:scylladb/scylladb:
  tablets: Convert to use the new version of for_each_tablet
  storage_service: Add describe_ring support for tablet table
  storage_service: Mark host2ip as const
  tablets: Add for_each_tablet_gently
2024-02-07 10:41:36 +02:00
Kefu Chai
b1e4513c2d dht: add formatter for dht::ring_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_posittion`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17194
2024-02-07 09:30:45 +02:00
Kefu Chai
75be212ab2 lang: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17193
2024-02-07 09:27:39 +02:00
Pavel Emelyanov
ca261f8916 utils: Mark chunked_vector::max_chunk_capacity with constexpr
It uses only compile-time constants to produce the value, so deserves
this marking

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17181
2024-02-07 09:22:23 +02:00
Raphael S. Carvalho
41a5c9eaec test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer
Reduces footprint from hundreds of MB to a very few MB.

Issue could be reproduced with:
./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131

Fixes #17076.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17187
2024-02-07 09:21:24 +02:00
Tomasz Grabiec
032c1a3d04 Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov
When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

closes: #16529

Closes scylladb/scylladb#17079

* github.com:scylladb/scylladb:
  tablets: Make sure topology has enough endpoints for RF
  cql-pytest: Disable tablets when RF > nodes-in-DC
  test: Remove test that configures RF larger than the number of nodes
  keyspace_metadata: Include tablets property in DESCRIBE
2024-02-06 22:38:11 +01:00
Kefu Chai
f3845a7f3d sstable: replace "welp" with more descriptive words
despite that "welp" is more emotional expressive, it is considered
a misspelling of "whelp" by codespell. that's why this comment stands
out. but from a non-native speaker's point of view, probably we can
use more descriptive words to explain what "welp" is for in plain
English.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17183
2024-02-06 16:31:18 +02:00
David Garcia
f14edf3543 docs: correct image sorting order for reference docs
This commit displays images in reference docs in the correct order. Prior to this fix, the images were listed as 4.0.0, 4.0.1, and 4.0.2, but they should be sorted in reverse order (4.0.2, 4.0.1, 4.0.0).

The changes made in this PR resolve the issue introduced in https://github.com/scylladb/scylladb/pull/16942 when common functions for Azure and GCP were extracted into a separate file without reversing the list as defined in the original extension: https://github.com/scylladb/scylladb/pull/16942/files#diff-b8f6253ea8fdcca681deb556ca61cd1f3feb3b7aeb7e856b145ef9b685aad460L185

Closes scylladb/scylladb#17185
2024-02-06 16:24:22 +02:00
Kamil Braun
c0c291b985 Merge 'raft topology: harden IP related tests' from Petr Gusev
In this PR we add the tests for two scenarios, related to the use of IPs in raft topology.

* When the replaced node transitions to the `LEFT` state we used to
  remove the IP of such node from gossiper. If we replace with same IP,
  this caused the IP of the new node to be removed from gossiper. This
  problem was fixed by #16820, this PR adds a regression test for it.
* When a node is restarted after decommissioning some other node, the
  restarting node tries to apply the raft log, this log contains a
  record about the decommissioned node, and we got stuck trying to resolve
  its IP. This was fixed by #16639 - we excluded IPs from the RAFt log
  application code and moved it entirely to host_id-s. This PR adds a
  regression test for this case.

Closes scylladb/scylladb#15967
Closes scylladb/scylladb#14803

Closes scylladb/scylladb#17180

* github.com:scylladb/scylladb:
  test_topology_ops: check node restart after decommission
  test_replace_reuse_ip: check other servers see the IP
2024-02-06 14:28:06 +01:00
Nadav Har'El
14315fcbc3 mv: fix missing view deletions in some cases of range tombstones
For efficiency, if a base-table update generates many view updates that
go the same partition, they are collected as one mutation. If this
mutation grows too big it can lead to memory exhaustion, so since
commit 7d214800d0 we split the output
mutation to mutations no longer than 100 rows (max_rows_for_view_updates)
each.

This patch fixes a bug where this split was done incorrectly when
the update involved range tombstones, a bug which was discovered by
a user in a real use case (#17117).

Range tombstones are read in two parts, a beginning and an end, and the
code could split the processing between these two parts and the result
that some of the range tombstones in update could be missed - and the
view could miss some deletions that happened in the base table.

This patch fixes the code in two places to avoid breaking up the
processing between range tombstones:

1. The counter "_op_count" that decides where to break the output mutation
   should only be incremented when adding rows to this output mutation.
   The existing code strangely incrmented it on every read (!?) which
   resulted in the counter being incremented on every *input* fragment,
   and in particular could reach the limit 100 between two range
   tombstone pieces.

2. Moreover, the length of output was checked in the wrong place...
   The existing code could get to 100 rows, not check at that point,
   read the next input - half a range tombstone - and only *then*
   check that we reached 100 rows and stop. The fix is to calculate
   the number of rows in the right place - exactly when it's needed,
   not before the step.

The first change needs more justification: The old code, that incremented
_op_count on every input fragment and not just output fragments did not
fit the stated goal of its introduction - to avoid large allocations.
In one test it resulted in breaking up the output mutation to chunks of
25 rows instead of the intended 100 rows. But, maybe there was another
goal, to stop the iteration after 100 *input* rows and avoid the possibility
of stalls if there are no output rows? It turns out the answer is no -
we don't need this _op_count increment to avoid stalls: The function
build_some() uses `co_await on_results()` to run one step of processing
one input fragment - and `co_await` always checks for preemption.
I verfied that indeed no stalls happen by using the existing test
test_long_skipped_view_update_delete_with_timestamp. It generates a
very long base update where all the view updates go to the same partition,
but all but the last few updates don't generate any view updates.
I confirmed that the fixed code loops over all these input rows without
increasing _op_count and without generating any view update yet, but it
does NOT stall.

This patch also includes two tests reproducing this bug and confirming
its fixed, and also two additional tests for breaking up long deletions
that I wanted to make sure doesn't fail after this patch (it doesn't).

By the way, this fix would have also fixed issue #12297 - which we
fixed a year ago in a different way. That issue happend when the code
went through 100 input rows without generating *any* output rows,
and incorrectly concluding that there's no view update to send.
With this fix, the code no longer stops generating the view
update just because it saw 100 input rows - it would have waited
until it generated 100 output rows in the view update (or the
input is really done).

Fixes #17117

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17164
2024-02-06 14:57:33 +02:00
Asias He
e7e1f4b01a streaming: Fix rpc::source and rpc::optional parameter order
The new rpc::optional parameter must come after any existing parameters,
including the rpc::source parameters, otherwise it will break
compatibility.

The regression was introduced in:

```
commit fd3c089ccc
Author: Tomasz Grabiec <tgrabiec@scylladb.com>
Date:   Thu Oct 26 00:35:19 2023 +0200

    service: range_streamer: Propagate topology_guard to receivers
```

We need to backport this patch ASAP before we release anything that
contains commit fd3c089ccc.

Refs: #16941
Fixes: #17175

Closes scylladb/scylladb#17176
2024-02-06 13:15:28 +01:00
Botond Dénes
a3d4131918 Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov
RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code.

Closes scylladb/scylladb#17132

* github.com:scylladb/scylladb:
  network_topology_strategy: Do not walk list of datacenters twice
  replication_strategy: Do not convert string RF into int twise
  abstract_replication_strategy: Make validate_replication_factor return value
2024-02-06 13:26:31 +02:00
Kefu Chai
a40d3fc25b db: add formatter for data_dictionary::user_types_metadata
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `data_dictionary::user_types_metadata`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17140
2024-02-06 13:24:07 +02:00
Kefu Chai
97587a2ea4 test/boost: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17139
2024-02-06 13:22:16 +02:00
Kefu Chai
16e1700246 exceptions: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17152
2024-02-06 13:16:03 +02:00
Kefu Chai
3bca11668a db: add formatter for exceptions::exception_code
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `exceptions::exception_code`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17151
2024-02-06 13:15:08 +02:00
Pavel Emelyanov
93918eef62 ks_prop_defs: Remove preprocessor-guarded java code
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17166
2024-02-06 13:14:15 +02:00
Botond Dénes
53a11cba62 Merge 'types/types.cc: move stringstream content instead of copying it' from Patryk Wróbel
C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value.

The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer.

It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`.

Closes scylladb/scylladb#16991

* github.com:scylladb/scylladb:
  use fmt::to_string() for seastar::net::inet_address
  types/types.cc: move stringstream content instead of copying it
2024-02-06 13:11:41 +02:00
Botond Dénes
619c3fdf32 Merge 'types: use {fmt} to format time and boolean' from Kefu Chai
so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. also, there is no need to reinvent the wheels if there are ready-to-use ones.

Closes scylladb/scylladb#17177

* github.com:scylladb/scylladb:
  types: use {fmt} to format boolean
  types: use {fmt} to format time
2024-02-06 13:10:39 +02:00
Kefu Chai
3dfe7c44f6 dht: add formatter for dht::sharder
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::sharder`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17178
2024-02-06 13:06:46 +02:00
Kefu Chai
c38325db26 Update seastar submodule
* seastar 85359b28...289ad5e5 (19):
  > net/dpdk: use user-defined literal when appropriate
  > io_tester: Allow running on non-XFS fs
  > io: Apply rate-factor early
  > circular_buffer: make iterator default constructible
  > net/posix: add a way to change file permissions of unix domain socket
  > resource: move includes to the top of the source file
  > treewide: replace calls to future::get0() by calls to future::get()
  > core/future: add as_ready_future utility
  > build: do not expose -Wno-error=#warnings
  > coroutine: remove remnants of variadic futures
  > build: prevent gcc -Wstringop-overflow workaround from affecting clang
  > util/spinlock: use #warning instead of #warn
  > io_tester: encapsulate code into allocate_and_fill_buffer()
  > io_tester: make maybe_remove_file a function
  > future: remove tuples from get0_return_type
  > circular_buffer_fixed_capacity: use std::uninitialized_move() instead of open-coding
  > rpc/rpc_types: do not use integer literal in preprocessor macro
  > future: use "(T(...))" instead of "{T(...)}" in uninitialized_set()
  > net/posix: include used header

Closes scylladb/scylladb#17179
2024-02-06 13:05:33 +02:00
David Garcia
ad1c9ae452 docs: fix logging in images extensions
Adds a missing logging import in the file scylladb_common_images extension, which prevents the enterprise build from building.

Additionally, it standardizes logging handling across the extensions and removes "ami" references in Azure and GCP extensions.

Closes scylladb/scylladb#17137
2024-02-06 13:00:37 +02:00
Botond Dénes
ce3233112e Merge 'configure.py: add -Wextra to cflags' from Kefu Chai
also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them.

this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.

Closes scylladb/scylladb#17131

* github.com:scylladb/scylladb:
  configure.py: add -Wextra to cflags
  test/tablets: do not compare signed and unsigned
2024-02-06 12:57:32 +02:00
Petr Gusev
646ca9515e test_topology_ops: check node restart after decommission
There used to be a problem with restarting a node after
decommissioning some other node - the restarting node
tries to apply the raft log, this log contains a record
about the decommissioned node, and we got stuck trying
to resolve its IP.

This was fixed in #16639 - we excluded IPs from
the RAFt log application code and moved it entirely
to host_id-s.

In this commit we add a regression test
for this case. We move the decommission_node
call before server_stop/server_start. We need
to add one more server to retain majority when
the node is decommissioned, otherwise the topology
coordinator won't migrate from the stopped node
before replacing it, and we'll get an error.

closes #14803
2024-02-06 13:29:42 +04:00
Petr Gusev
aeed5c5fe3 test_replace_reuse_ip: check other servers see the IP
The replaced node transitions to LEFT state, and
we used to remove the IPs of such nodes from gossiper.
If we replace with same IP, this caused the IP of the
new node to be removed from gossiper.

This problem was fixed by #16820, this commit
adds a regression test for it.

closes #15967
2024-02-06 13:28:04 +04:00
Botond Dénes
115ee4e1f5 Merge 'doc: remove the OSS and Enterprise Features pages' from Anna Stuchlik
This PR removes the following pages:
- ScyllaDB Open Source Features
- ScyllaDB Enterprise Features

They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes.

With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features.

In addition, a reference to the Enterprise-only Features section is added.

Note: No redirections are added because no file paths or URLs are changed with this PR.

Fixes https://github.com/scylladb/scylladb/issues/13485

Refs https://github.com/scylladb/scylladb/issues/16496

(nobackport)

Closes scylladb/scylladb#17150

* github.com:scylladb/scylladb:
  Update docs/using-scylla/features.rst
  doc: remove the OSS and Enterprise Features pages
2024-02-06 08:17:18 +02:00
Botond Dénes
edb983d165 Merge 'doc: add the 5.4-to-2024.1 upgrade guide' from Anna Stuchlik
This PR:
- Adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. Note: The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959.
- Removes the 5.1-to-2022.2 upgrade guide (unsupported versions).

Fixes https://github.com/scylladb/scylladb/issues/16445

Closes scylladb/scylladb#16887

* github.com:scylladb/scylladb:
  doc: fix the OSS version number
  doc: metric updates between 2024.1. and 5.4
  doc: remove the 5.1-to-2022.2 upgrade guide
  doc: add the 5.4-to-2024.1 upgrade guide
2024-02-06 08:16:05 +02:00
Kefu Chai
6f07d9edaa types: use {fmt} to format boolean
{fmt} format boolean as "true" / "false" since v2.0.1, no need to
reinvent the wheel.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-06 10:40:02 +08:00
Kefu Chai
be29556955 types: use {fmt} to format time
so we can tighten our dependencies a little bit. there are only
three places where we are using the `date` library. the outputs
of these two ways are identical:
see https://wandbox.org/permlink/Lo9NUrQNUEqyiMEa and https://godbolt.org/z/YEha9ah7v to compare their outputs.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-06 10:39:30 +08:00
Kefu Chai
02376250b5 storage_service: do no filter tablets tables manually
instead of filtering the keyspaces manually, let's reuse
`database::get_non_local_strategy_keyspaces_erms()`. less
repeatings and more future-proof this way.

Fixes #16974
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17121
2024-02-05 21:28:35 +01:00
Anna Stuchlik
d6723134ab doc: fix the OSS version number
Replace "5.2" with "5.4", as this is
the 5.4-to-2024.1 upgrade guide.
2024-02-05 21:10:50 +01:00
Tomasz Grabiec
448e117e7d Merge 'service: validate replication strategy constraints in tablet-moving API' from Aleksandra Martyniuk
Validate replication strategy constraints in /storage_service/tablets/move API:
- replicas are not on the same node
- replicas don't move across DC (violates RF in each DC)
- availability is not reduced due to rack overloading

Add flag to force tablet move even though dc/rack constraints aren't fulfilled.

Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911.

Fixes: #16379.

Closes scylladb/scylladb#16648

* github.com:scylladb/scylladb:
  api: service: add force param to move_tablet api
  service: validate replication strategy constraints
2024-02-05 20:07:21 +01:00
Avi Kivity
9dd76c1035 Merge 'db: add formatter for dht::ring_position_{ext,view}' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_ext` and
`dht::ring_position_view`, and drop their operator<<.

Refs #13245

Closes scylladb/scylladb#17128

* github.com:scylladb/scylladb:
  db: add formatter for dht::ring_position_ext
  db: add formatter for dht::ring_position_view
2024-02-05 20:27:54 +02:00
Patryk Wrobel
cc186c1798 use fmt::to_string() for seastar::net::inet_address
This change removes inet_addr_type_impl::to_sstring()
and replaces its usages with fmt::to_string().
The removed helper performed an uneeded copying via
std::ostringstream::str().

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-02-05 16:56:40 +01:00
Patryk Wrobel
8c0d30cd88 types/types.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-02-05 16:35:27 +01:00
Kamil Braun
968d1e3e78 Merge 'raft topology: make rollback_to_normal a transition state' from Patryk Jędrzejczak
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.

This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.

Fixes scylladb/scylladb#17032

Closes scylladb/scylladb#17136

* github.com:scylladb/scylladb:
  docs: dev: topology-over-raft: align indentation
  docs: dev: topology-over-raft: document the rollback_to_normal state
  topology_coordinator: improve logs in rollback_to_normal handler
  raft topology: make rollback_to_normal a transition state
2024-02-05 16:30:20 +01:00
Anna Stuchlik
6d6c400b77 doc: metric updates between 2024.1. and 5.4
This commit adds the information about
metrics updates between these two versions.

Fixes https://github.com/scylladb/scylladb/issues/16446
2024-02-05 16:24:16 +01:00
Anna Stuchlik
1e9c7ab6d1 Update docs/using-scylla/features.rst
Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>
2024-02-05 14:44:31 +01:00
Mikołaj Grzebieluch
4cecda7ead transport/controller: pass unix_domain_socket_permissions to generic_server::listen 2024-02-05 14:22:03 +01:00
Mikołaj Grzebieluch
6b178f9a4a transport/controller: split configuring sockets into separate functions
TCP sockets and unix domain sockets don't share common listen options
excluding `socket_address`. For unix domain sockets, available options will be
expanded to cover also filesystem permissions and owner for the socket.
Storing listen options for both types of sockets in one structure would become messy.
For now, both use `listen_cfg`.

In a singular cql controller, only sockets of one type are created, thus it
can be easily split into two cases.
Isolate maintenance socket from `listen_cfg`.
2024-02-05 14:20:17 +01:00
Nadav Har'El
7888b23e9e Merge 'test/cql-pytest: re-enable disabled tests' from Botond Dénes
In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently *do* care which replication method is used.

Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost.

Closes scylladb/scylladb#16945

* github.com:scylladb/scylladb:
  test/cql-pytest: conftest.py: remove xfail_tablets fixture
  test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests
  test/cql-pytest: test_describe.py: re-enable disabled tests
  test/cql-pytest: test_cdc.py: re-enable disabled tests
  test/cql-pytest: add parameter support to test_keyspace
2024-02-05 14:12:57 +02:00
Asias He
904bafd069 tablets: Convert to use the new version of for_each_tablet
It is more gently than the old one.
2024-02-05 18:45:40 +08:00
Asias He
04773bd1df storage_service: Add describe_ring support for tablet table
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509
2024-02-05 18:11:07 +08:00
Pavel Emelyanov
45dbe38658 tablets: Make sure topology has enough endpoints for RF
When creating a keyspace, scylla allows setting RF value smaller than
there are nodes in the DC. With vnodes, when new nodes are bootstrapped,
new tokens are inserted thus catching up with RF. With tablets, it's not
the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and
require as many nodes to be up and running as the requested RF is. This
patch implementes this in a lazy manned -- when creating a keyspace RF
can be any, but when a new table is created the topology should meet RF
requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:04 +03:00
Pavel Emelyanov
8471d88576 cql-pytest: Disable tablets when RF > nodes-in-DC
All the cql-pytest-s run agains single scylla node, but
new_random_keyspace() helper may request RF in the rage of 1 through 6,
so tablets need to be explicitly disabled when the RF is too large

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:04 +03:00
Pavel Emelyanov
3b9ca29411 test: Remove test that configures RF larger than the number of nodes
This is going to be disabled soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:03 +03:00
Pavel Emelyanov
8910d37994 keyspace_metadata: Include tablets property in DESCRIBE
When tablets are enabled and a keyspace being described has them
explicitly disabled or non-automatic initial value of zero, include this
into the returned describe statement too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:49:20 +03:00
Benny Halevy
bd3ed168ab api/compaction_manager: stop_keyspace_compaction: prevent stack use-after-free
Since `t.parallel_foreach_table_state` may yield,
we should access `type` by reference when calling
`stop_compaction` since it is captured by the calling
lambda and gets lost when it returns if
`parallel_foreach_table_state` returns an unavailable
future.

Instead change all captures to `[&]` so we can access
the `type` variable held by the coroutine frame.

Fixes #16975

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17143
2024-02-05 09:32:08 +02:00
Asias He
ab560c1580 storage_service: Mark host2ip as const
So it can be used by another const function.
2024-02-05 13:42:08 +08:00
Asias He
fab0d33d08 tablets: Add for_each_tablet_gently
In this version, the callback returns a future<>, so it can yield itself
to avoid stalls in func itself.
2024-02-05 13:42:08 +08:00
Anna Stuchlik
f7afa6773f doc: remove the OSS and Enterprise Features pages
This commit removes the following pages:
- ScyllaDB Open Source Features
- ScyllaDB Enterprise Features

They were outdated, incomplete, and misleading.
They were also redundant, as the per-release
updates are added as Release Notes.

With this update, the features listed on the removed
pages are added under the common page: ScyllaDB Features.

Note: No redirections are added, because no file paths
or URLs are changed with this commit.

Fixes https://github.com/scylladb/scylladb/issues/13485

Refs https://github.com/scylladb/scylladb/issues/16496
2024-02-04 20:55:40 +01:00
Avi Kivity
784c2f8ad2 Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.

Closes scylladb/scylladb#17130

* github.com:scylladb/scylladb:
  treewide: replace seastar::future::get0() with seastar::future::get()
  sstable: capture return value of get0() using auto
  utils: result_loop: define result_type with decayed type

[avi: add another one that snuck in while this was cooking]
2024-02-04 15:23:33 +02:00
Michał Chojnowski
ed98102c45 row_cache: update _prev_snapshot_pos even if apply_to_incomplete() is preempted
Commit e81fc1f095 accidentally broke the control
flow of row_cache::do_update().

Before that commit, the body of the loop was wrapped in a lambda.
Thus, to break out of the loop, `return` was used.

The bad commit removed the lambda, but didn't update the `return` accordingly.
Thus, since the commit, the statement doesn't just break out of the loop as
intended, but also skips the code after the loop, which updates `_prev_snapshot_pos`
to reflect the work done by the loop.

As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted,
`do_update()` fails to update `_prev_snapshot_pos`. It remains in a
stale state, until `do_update()` runs again and either finishes or
is preempted outside of `updater`.

If we read a partition processed by `do_update()` but not covered by
`_prev_snapshot_pos`, we will read stale data (from the previous snapshot),
which will be remembered in the cache as the current data.

This results in outdated data being returned by the replica.
(And perhaps in something worse if range tombstones are involved.
I didn't investigate this possibility in depth).

Note: for queries with CL>1, occurences of this bug are likely to be hidden
by reconciliation, because the reconciled query will only see stale data if
the queried partition is affected by the bug on on *all* queried replicas
at the time of the query.

Fixes #16759

Closes scylladb/scylladb#17138
2024-02-04 11:17:41 +02:00
Aleksandra Martyniuk
89c683f51a api: service: add force param to move_tablet api
Force flag is added to /storage_service/tablets/move. If force is set
to true, replication strategy constraints regarding racks and dcs can
be broken.
2024-02-02 19:08:01 +01:00
Aleksandra Martyniuk
3b0fa7335a service: validate replication strategy constraints
Check whether tablet move meets replication strategy constraints, i.e.
replicas aren't on the same node, replicas don't move across DCs
or HA isn't reduced due to rack overloading. Throw if constraints
are broken.
2024-02-02 19:06:45 +01:00
Botond Dénes
017a574b16 tools: lua_sstable_consumer.cc: load os and math libs
The amount of standard Lua libraries loaded for the sstable-script was
limited, due to fears that some libraries (like the io library) could
expose methods, which if used from the script could interfere with
seastar's asynchronous arhitecture. So initially only the table and
string libraries were loaded.
This patch adds two more libraries to be loaded: match and os. The
former is self-explanatory and the latter contains methods to work with
date/time, obtain the values of environment variables as well as launch
external processes. None of these should interfere with seastar, on the
other hand the facilities they provide can come very handy for sstable
scripts.

Closes scylladb/scylladb#17126
2024-02-02 19:00:57 +03:00
Patryk Jędrzejczak
2687204c7f docs: dev: topology-over-raft: align indentation 2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
fdd3c3a280 docs: dev: topology-over-raft: document the rollback_to_normal state
In one of the previous patches, we changed the `rollback_to_normal`
state from a node state to a transition state. We document it
in this patch. The node state wasn't documented, so there is
nothing to remove.
2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
8d6a9730db topology_coordinator: improve logs in rollback_to_normal handler
After making `rollback_to_normal` a transition state, we can
distinguish a failed decommission from a failed bootstrap in the
`rollback_to_normal` handler. We use it to make logs more
descriptive.
2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
25b90f5554 raft topology: make rollback_to_normal a transition state
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.

This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.

The change also simplifies the code in
`topology_coordinator:rollback_current_topology_op`.

Moving the `rollback_to_normal` handler from
`handle_node_transition` to `handle_topology_transition` created a
large diff. There is only one change - adding
`auto node = get_node_to_work_on(std::move(guard));`.
2024-02-02 16:55:20 +01:00
Pavel Emelyanov
52e6398ad6 messaging: Add formatter for netw::msg_addr
As a part of ongoing "support fmt v10" effort

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17053
2024-02-02 15:20:40 +01:00
Kefu Chai
cd3c7a50ed scylla_raid_setup: drop unused import
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17095
2024-02-02 15:20:40 +01:00
Kefu Chai
e62b29bab7 tasks: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17125
2024-02-02 15:20:40 +01:00
Pavel Emelyanov
75bc702ae8 utils: Remove unused operator<< for file_lock object
The lock itself is only used by utils/directories code

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17051
2024-02-02 15:20:40 +01:00
Kefu Chai
792fa4441e docs: s/ontop/on top/
this misspelling is identified by codespell. ontop cannot be found
on merriam-webster, but "on top" can, see
https://www.merriam-webster.com/dictionary/on%20top, so let's
replace ontop with "on top".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17127
2024-02-02 15:20:40 +01:00
Botond Dénes
c9ab39af88 install-dependencies.sh: remove duplicate python3-pyudev package
It appeared in the list twice.

Closes scylladb/scylladb#17060
2024-02-02 15:20:40 +01:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
deef78c796 sstable: capture return value of get0() using auto
instead of capturing the return value of `get0()` with a reference
type, use a plain type. as `get0()` returns a plain `T` while `get0()`
returns a `T&&`, to avoid the value referenced by `T&&` gets destroyed
after the expression, let's use a plain `auto` instead of `auto&&`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 22:12:18 +08:00
Kefu Chai
9fcca8f585 utils: result_loop: define result_type with decayed type
this change prepares for replacing `seastar::future::get0()` with
`seastar::future::get()`. the former's return type is a plain `T`,
while the latter is `T&&`. in this case `T` is
`boost::outcome::result<..>`. in order to extract its `error_type`,
we need to get its decayed type. since `std::remove_reference_t<T>`
also returns `T`, let's use it so it works with both `get0()` and `get()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 22:12:18 +08:00
Kefu Chai
19025127c3 configure.py: add -Wextra to cflags
also disable some more warnings which are failing the build after
`-Wextra` is enabled. we can fix them on a case-by-case basis, if
they are geniune issues. but before that, we just disable them.

this goal of this change is to reduce the discrepancies between
the compile options used by CMake and those used by configure.py.
the side effect is that we enable some more warning enabeld by
`-Wextra`, for instance, `-Wsign-compare` is enable now. for
the full list of the enabled warnings when building with Clang,
please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 20:49:21 +08:00
Kefu Chai
aea6cd0b2d test/tablets: do not compare signed and unsigned
this change should silence following warning:

```
 test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
19:47:04          for (int i = 0; i < smp::count * 20; i++) {
19:47:04                          ~ ^ ~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 20:49:21 +08:00
Pavel Emelyanov
afda0f6ddf network_topology_strategy: Do not walk list of datacenters twice
Construct of that class walks the provided options to get per-DC
replication factors. It does it twice -- first to populate the dc:rf
map, second to calculate the sum of provided RF values. The latter loop
can be optimized away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:39:24 +03:00
Pavel Emelyanov
06f9e7367c replication_strategy: Do not convert string RF into int twise
There are two replication strategy classes that validate string RF and
then convert it into integer. Since validation helper returns the parsed
value, it can be just used avoiding the 2nd conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:38:17 +03:00
Pavel Emelyanov
a8cd3bc636 abstract_replication_strategy: Make validate_replication_factor return value
The helper in question checks if string RF is indeed an integer. Make
this helper return the "checked" integer value, because it does this
conversion. And rename it to parse_... to reflect what it now does. Next
patches will make use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:36:47 +03:00
Kefu Chai
e56e74df0a db: add formatter for dht::ring_position_ext
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_ext`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 18:37:56 +08:00
Kefu Chai
bb3ba81b15 db: add formatter for dht::ring_position_view
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_view`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 18:36:17 +08:00
Pavel Emelyanov
9450a03cdf data_dictionary: Add formatter for keyspace-metadata
Other than being fmt v10 compatible, it's also shorter and easier to
read, thanks to fmt::join() helper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17115
2024-02-02 11:26:39 +02:00
Kefu Chai
c7a01b9eb4 transport: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17092
2024-02-02 11:20:24 +02:00
Lakshmi Narayanan Sreethar
e86965c272 compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group
Use maintenance group to run all the compaction tasks that use the
rewrite_sstables_compaction_task_executor.

Fixes #16699

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17112
2024-02-02 11:18:49 +02:00
Pavel Emelyanov
b557dcbf5a cql3: Sanitize ALTER KEYSPACE check for non-local storages
This kills three birds with one stone

1. fixes broken indentation
2. re-uses new_options local variable
3. stops using string literal to check storage type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17111
2024-02-02 11:13:29 +02:00
Botond Dénes
63d44712af Merge 'storage_service: Fix indentation for stream_ranges' from Asias He
This is a follow up of "storage_service: Run stream_ranges cmd in streaming group" to fix indentation and drop a unnecessary co_return.

Refs: #17090

Closes scylladb/scylladb#17114

* github.com:scylladb/scylladb:
  storage_service: Drop unnecessary co_return in raft_topology_cmd_handler
  storage_service: Fix indentation for stream_ranges
2024-02-02 11:12:52 +02:00
Kefu Chai
b45af994c2 locator/utils: remove stale comment
this comment has already served its purpose when rewriting
C* in C++. since we've re-implemented it, there is no need to keep it
around.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17120
2024-02-02 11:07:35 +02:00
Asias He
23a8b0552c storage_service: Drop unnecessary co_return in raft_topology_cmd_handler
It is introduced in "storage_service: Run stream_ranges cmd in streaming
group".

Refs: #17090
2024-02-02 08:20:06 +08:00
Asias He
732a9b5253 storage_service: Fix indentation for stream_ranges
Fixes the indentation introduced in "storage_service: Run
stream_ranges cmd in streaming group".

Refs: #17090
2024-02-02 08:20:03 +08:00
Pavel Emelyanov
66b859a29f gms: Remove unused operator<< for feature object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17109
2024-02-01 19:00:46 +02:00
Kefu Chai
aad8035bed replica/database: use structured-bind when appropriate
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17104
2024-02-01 16:31:29 +02:00
Botond Dénes
dc8e13baed Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov
The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance

Closes scylladb/scylladb#17101

* github.com:scylladb/scylladb:
  test/topology_custom: Remove test_tablets.py
  test/topology: Move test_tablet_change_initial_tablets
  test/topology: Move test_tablet_explicit_disabling
  test/topology: Move test_tablet_default_initialization
  test/topology: Move test_tablet_change_replication_strategy
  test/topology: Move test_tablet_change_replication_vnode_to_tablets
  cql-pytest: Add skip_without_tablets fixture
2024-02-01 16:28:43 +02:00
Kamil Braun
c911bf1a33 test_raft_snapshot_request: fix flakiness (again)
At the end of the test, we wait until a restarted node receives a
snapshot from the leader, and then verify that the log has been
truncated.

To check the snapshot, the test used the `system.raft_snapshots` table,
while the log is stored in `system.raft`.

Unfortunately, the two tables are not updated atomically when Raft
persists a snapshot (scylladb/scylladb#9603). We first update
`system.raft_snapshots`, then `system.raft` (see
`raft_sys_table_storage::store_snapshot_descriptor`). So after the wait
finishes, there's no guarantee the log has been truncated yet -- there's
a race between the test's last check and Scylla doing that last delete.

But we can check the snapshot using `system.raft` instead of
`system.raft_snapshots`, as `system.raft` has the latest ID. And since
1640f83fdc, storing that ID and truncating
the log in `system.raft` happens atomically.

Closes scylladb/scylladb#17106
2024-02-01 16:06:12 +02:00
Kefu Chai
946d281d39 exceptions: s/#warn/#warning/
`#warning` is a preprocessor macro in C/C++, while `#warn` is not. the
reason we haven't run into the build failure caused by this is likely
that we are only building on amd64/aarch64 with libstdc++ at the time
of writing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17074
2024-02-01 14:50:17 +02:00
Botond Dénes
1a0300dba6 Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction
where we

1. forcing new active segment,
2. flushing tables being cleaned up
3. perform cleanup using compaction

Fixes #4734

Closes scylladb/scylladb#16757

* github.com:scylladb/scylladb:
  storage_service: fall back to local cleanup in cleanup_all
  compaction: format flush_mode without the helper
  compaction_manager: flush all tables before cleanup
  replica: table: pass do_flush to table::perform_cleanup_compaction()
  api, compaction: promote flush_mode
2024-02-01 13:47:45 +02:00
libo-sober
a341b870bc Remove unnecessary calculations in integrity_checked_file_impl::write_dma.
Use calculated `rbuf_end` in `std::mismatch` to reduce unnecessary calculations.

Closes scylladb/scylladb#16979
2024-02-01 13:42:59 +02:00
Botond Dénes
8debb6b98f Merge 'storage_service: Run stream_ranges cmd in streaming group' from Asias He
Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group.

Fixes #17090

Closes scylladb/scylladb#17097

* github.com:scylladb/scylladb:
  streaming: Verify stream consumer runs inside streaming group
  storage_service: Run stream_ranges cmd in streaming group
2024-02-01 13:18:26 +02:00
Patryk Wrobel
25324bbe50 cql_test_env.cc: remove dead code
This change removes empty anonymous namespace
that is a dead code.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17099
2024-02-01 13:17:48 +02:00
Pavel Emelyanov
64cb3a6496 test/topology_custom: Remove test_tablets.py
It's now empty, all test cases had been moved to cql-pytest

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
3fbe93e45d test/topology: Move test_tablet_change_initial_tablets
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
480227fcad test/topology: Move test_tablet_explicit_disabling
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
45b0490100 test/topology: Move test_tablet_default_initialization
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
3258c56ca3 test/topology: Move test_tablet_change_replication_strategy
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
6f50cc2783 test/topology: Move test_tablet_change_replication_vnode_to_tablets
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Botond Dénes
b9af2efcb1 Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar
During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache.

Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache.

Fixes https://github.com/scylladb/scylladb/issues/14506

Closes scylladb/scylladb#16952

* github.com:scylladb/scylladb:
  directories: prevent inode cache fragmentation by orderly verifying data directory contents
  directories: skip verifying data directory contents during startup
  directories: co-routinize create_and_verify
2024-02-01 12:30:07 +02:00
Kefu Chai
4ec104e086 api: storage_service: correct a typo
s/a any keyspace/a given keyspace/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17098
2024-02-01 10:55:58 +02:00
Botond Dénes
2a4b991772 Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El
This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs.

The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function.

Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw.

Fixes #17035

Closes scylladb/scylladb#17073

* github.com:scylladb/scylladb:
  utils: replace assert() by on_internal_error()
  utils: add on_internal_error with common logger
  utils: add a timeuuid minimum, like we had maximum
  test/cql-pytest: tests for "date" type
2024-02-01 10:48:48 +02:00
Patryk Wrobel
6e5a85c387 replica/table: add tablet count metric
This change introduces a new metric called tablet_count
that is recalculated during construction of table object
and on each call to table::update_effective_replication_map().

To get the count of tablet per current shard, tablet map
is traversed and for each tablet_id tablet_map::get_shard()
is called. Its return value is compared with this_shard_id().

The new metric is maintained and exposed only for tables
that uses tablets.

Refs: scylladb#16131
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17056
2024-02-01 10:46:53 +02:00
Asias He
2888c3086c utils: Add uuid_xor_to_uint32 helper
Convert the uuid to a uint32_t using xor.
It is useful to get a uint32_t number from the uuid.

Refs: #16927

Closes scylladb/scylladb#17049
2024-02-01 10:27:55 +02:00
Botond Dénes
f5917b215f Merge 'replica, tablet_allocator: do not compare unsigned with signed' from Kefu Chai
this series addresses couple `-Wsign-compare` warnings surfaced in the tree.

Closes scylladb/scylladb#17091

* github.com:scylladb/scylladb:
  tablet_allocator: do not compare signed and unsigned
  replica: table: do not compare signed with unsigned
2024-02-01 10:26:04 +02:00
Kefu Chai
7a8e8c2ced db: add formatter for db::write_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::write_type`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17093
2024-02-01 10:22:45 +02:00
Kefu Chai
005d231f96 db: add formatter for gms::application_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::application_state`,
but its operator<< is preserved, as it is still used by the generic
homebrew formatter for `std::unordered_map<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17096
2024-02-01 10:02:25 +02:00
Pavel Emelyanov
ab7ce3d1fa cql-pytest: Add skip_without_tablets fixture
It's opposite to skip_with_tablets one and thus also depends on
scylla_only one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 10:58:13 +03:00
Lakshmi Narayanan Sreethar
dbe758d309 directories: prevent inode cache fragmentation by orderly verifying data directory contents
During startup, the contents of the data directory are verified to ensure
that they have the right owner and permissions. Verifying all the
contents, which includes files that will be read and closed immediately,
and files that will be held open for longer durations, together, can
lead to memory fragementation in the dentry/inode cache.

Prevent this by updating the verification in a such way that these two
set of files will be verified separately ensuring their separation in
the dentry/inode cache.

Fixes #14506

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar
74a4085426 directories: skip verifying data directory contents during startup
This is in preparation for a subsequent patch that will verify the
contents of the data directory in a specific order.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar
2e3d2498f4 directories: co-routinize create_and_verify
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 11:41:10 +05:30
Kefu Chai
5e0b3671d3 storage_service: fall back to local cleanup in cleanup_all
before this change, if no keyspaces are specified,
scylla-nodetool just enumerate all non-local keyspaces, and
call "/storage_service/keyspace_cleanup" on them one after another.
this is not quite efficient, as each this RESTful API call
force a new active commitlog segment, and flushes all tables.
so, if the target node of this command has N non-local keyspaces,
it would repeat the steps above for N times. this is not necessary.
and after a topology change, we would like to run a global
"nodetool cleanup" without specifying the keyspace, so this
is a typical use case which we do care about.

to address this performance issue, in this change, we improve
an existing RESTful API call "/storage_service/cleanup_all", so
if the topology coordinator is not enabled, we fall back to
a local cleanup to cleanup all non-local keyspaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
4f90a875f6 compaction: format flush_mode without the helper
since flush_mode is moved out of major_compaction_task_impl, let's
drop the helper hosted in that class as well, and implement the
formatter witout it.

please note, the `__builtin_unreachable()` is dropped. it should
not change the behavior of the formatter. we don't put it in the
`default` branch in hope that `-Wswitch` can warn us in the case
when another enum of `flush_mode` is added, but we fail to handle
it somehow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
b39cc01bb3 compaction_manager: flush all tables before cleanup
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction:

1. force new active segment,
2. flush all tables
3. perform cleanup using compaction, which rewrites the sstables
   of specified tables

because we already `flush()` all tables in
`cleanup_keyspace_compaction_task_impl::run()`, there is no need to
call `flush()` again, in `table::perform_cleanup_compaction()`, so
the `flush()` call is dropped in this function, and the tests using
this function are updated to call `flush()` manually to preserve
the existing behavior.

there are two callers of `cleanup_keyspace_compaction_task_impl`,

* one is `storage_service::sstable_cleanup_fiber()`, which listens
  for the events fired by topology_state_machine, which is in turn
  driven by, for instance, "/storage_service/cleanup_all" API.
  which cleanup all keyspaces in one after another.
* another is "/storage_service/keyspace_cleanup", which cleans up
  the specified keyspace.

in the first use case, we can force a new active segment for a single
time, so another parameter to the ctor of
`cleanup_keyspace_compaction_task_impl` is introduced to specify if
the `db.flush_all_tables()` call should be skiped.

please note, there are two possible optimizations,

1. force new active segment only if the mutations in it touches the
   tables being cleaned up
2. after forcing new active segment, only flush the (mem)tables
   mutated by the non-active segments

but let's leave them for following-up changes. this change is a
minimal fix for data resurrection issue.

Fixes #16757
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
34d80690fa replica: table: pass do_flush to table::perform_cleanup_compaction()
this parameter defaults to do_flush::yes, so the existing behavior is
preserved. and this change prepares for a change which flushes all
tables before performing cleanup on the tables per-demand.

please note, we cannot pass compaction::flush_mode to this function,
as it is used by compaction/task_manager_module.hh, if we want to
share it by both database.hh and compaction/task_manager_module.hh,
we would have to find it a new home. so `table::do_flush` boolean
tag is reused instead.

Refs #16757

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
9afec2e3e7 api, compaction: promote flush_mode
so that this enum type can be shared by other task(s) as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
110d2e52be tablet_allocator: do not compare signed and unsigned
`available_shards` could be negative when `resize_plan` is empty, and
the loop to build `resize_plan` stops at the next iteration after
`available_shards` is assigned with a negative number. so, instead of
making it an `unsigned`, let's just compare it using `std::cmp_less()`.

this change should silence following warning:

```
/home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -g -O0 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wignored-qualifiers -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -MF service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o.d -o service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -c /home/kefu/dev/scylladb/service/tablet_allocator.cc
/home/kefu/dev/scylladb/service/tablet_allocator.cc:529:60: error: comparison of integers of different signs: 'long' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare]
  529 |             if (resize_plan.size() > 0 && available_shards < size_desc.shard_count) {
      |                                           ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:01:19 +08:00
Kefu Chai
493a608417 replica: table: do not compare signed with unsigned
this change helps to silence follow warning:
```
/home/kefu/dev/scylladb/replica/table.cc:1952:26: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
 1952 |     for (auto id = 0; id < _storage_groups.size(); id++) {
      |                       ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:01:19 +08:00
Asias He
e1fc91bea9 streaming: Verify stream consumer runs inside streaming group
This will catch schedule group leaks by accident.

Refs: 17090
2024-02-01 10:37:24 +08:00
Asias He
f103f75ed8 storage_service: Run stream_ranges cmd in streaming group
Otherwise it will inherit the rpc verb's scheduling group which is
gossip. As a result, it causes the streaming runs in the wrong scheduling
group.

Fixes #17090
2024-02-01 10:20:02 +08:00
Kamil Braun
b2c02d8268 Merge 'schema: column_mapping::{static,regular}_column_at(): use on_internal_error()' from Botond Dénes
Instead of std::out_of_range(). Accessing a non-existing column is a
serious bug and the backtrace coming with `on_internal_error()` can be
very valuable when debugging it. As can be the coredump that is possible
to trigger with `--abort-on-internal-error`.

This change follows another similar change to `schema::column_at()`.

This should help us get to the bottom of the mysterious repair failures
caused by invalid column access, seen in
https://github.com/scylladb/scylladb/issues/16821.

Refs: https://github.com/scylladb/scylladb/issues/16821

Closes scylladb/scylladb#17080

* github.com:scylladb/scylladb:
  schema: column_mapping::{static,regular}_column_at(): use on_internal_error()
  schema: column_mapping: move column accessors out-of-line
2024-01-31 16:29:15 +01:00
Nadav Har'El
458fd0c2f7 utils: replace assert() by on_internal_error()
In issue #17035 we had a situation where a certain input timestamp
could result in the create_time() utility function getting called on
a timestamp that cannot be represented as timeuuid, and this resulted
in an *assertion failure*, and a crash.

I guess we used an assertion because we believed that callers try to
avoid calling this function on excessively large timestamps, but
evidentally, they didn't tried hard enough and we got a crash.
The code in UUID_gen.hh changed a lot over the years and has become
very convoluted and it is almost impossible to understand all the
code paths that could lead to this assertion failures. So it's better
to replace this assertion by a on_internal_error, which by default
is just an exception - and also logs the backtrace of the failure.
Issue #17035 would have been much less serious if we had an exception
instead of an assert.

Refs #17035
Refs #7871, Refs #13970 (removes an assert)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 16:45:28 +02:00
Nadav Har'El
259811b6ec utils: add on_internal_error with common logger
Seastar's on_internal_error() is a useful replacement for assert()
but it's inconvenient that it requires each caller to supply a logger -
which is often inconvenient, especially when the caller is a header file.

So in this patch we introduce a utils::on_internal_error() function
which is the same as seastar::on_internal_error() (the former calls
the latter), except it uses a single logger instead of asking the caller
to pass a logger.

Refs #7871

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 16:45:09 +02:00
Patryk Wrobel
c6de20a608 replica/mutation_dump.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::stringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Moreover, it introduces usage of std::stringstream::view() when
checking if the stream contains some characters. It skips another
copy of the underlying string, because std::string_view is returned.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17084
2024-01-31 14:58:20 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Avi Kivity
c8397f0287 Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho
The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer.

If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint.

Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on.

A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split.

Tablet metadata gains 2 new fields for managing this:
resize_type: resize decision type, can be either of "merge", "split", or "none".
resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator).

A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ).
When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready.

When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split").
The split monitor will start splitting of compaction groups (using mechanism introduced here: 081f30d149) for the table. And once splitting work is completed, the table updates its local state as having completed split.

When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step).

Fixes #16536.

Closes scylladb/scylladb#16580

* github.com:scylladb/scylladb:
  test/topology_experimental_raft: Add tablet split test
  replica: Bypass reshape on boot with tablets temporarily
  replica: Fix table::compaction_group_for_sstable() for tablet streaming
  test/topology_experimental_raft: Disable load balancer in test fencing
  replica: Remap compaction groups when tablet split is finalized
  service: Split tablet map when split request is finalized
  replica: Update table split status if completed split compaction work
  storage_service: Implement split monitor
  topology_cordinator: Generate updates for resize decisions made by balancer
  load_balancer: Introduce metrics for resize decisions
  db: Make target tablet size a live-updateable config option
  load_balancer: Implement resize decisions
  service: Wire table_resize_plan into migration_plan
  service: Introduce table_resize_plan
  tablet_mutation_builder: Add set_resize_decision()
  topology_coordinator: Wire load stats into load balancer
  storage_service: Allow tablet split and migration to happen concurrently
  topology_coordinator: Periodically retrieve table_load_stats
  locator: Introduce topology::get_datacenter_nodes()
  storage_service: Implement table_load_stats RPC
  replica: Expose table_load_stats in table
  replica: Introduce storage_group::live_disk_space_used()
  locator: Introduce table_load_stats
  tablets: Add resize decision metadata to tablet metadata
  locator: Introduce resize_decision
2024-01-31 13:59:56 +02:00
Kefu Chai
bd71e0b794 tracing: add formatter for tracing::span_id
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `tracing::span_id`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17058
2024-01-31 13:43:46 +02:00
Kefu Chai
f5e3a2d98e test.py: add boost_tests() to suite
this change is a cleanup.

so it only returns tests, to be more symmetric with `junit_tests()`.
this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`.
as only the BoostTest will be asked for `get_test_case()` after this
change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16961
2024-01-31 13:43:21 +02:00
Botond Dénes
181f68f248 Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun
The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Also add a test.

Fixes scylladb/scylladb#16683

Closes scylladb/scylladb#17072

* github.com:scylladb/scylladb:
  test: add test for fixing a broken group 0 snapshot
  raft_group0: trigger snapshot if existing snapshot index is 0
2024-01-31 13:04:59 +02:00
Kefu Chai
843d74428d configure.py: s/-DBOOST_TEST_DYN_LINK/-DBOOST_ALL_DYN_LINK/
we add `-DBOOST_TEST_DYN_LINK` to the cflags when `--static-boost` is
not passed to `configure.py`. but we don't never pass this option to
`configure.py` in our CI/CD. also, we don't install `boost-static` in
`install-dependencies.sh`, so the linker always use the boost shared
libraries when building scylla and other executables in this project.
this fact has been verified with the latest master HEAD, after building
scylla from `build.ninja` which was in turn created using `configure.py`.

Seastar::seastar_testing exposes `Boost::dynamic_linking` in its public
interface, and `Boost::dynamic_linking` exposes `-DBOOST_ALL_DYN_LINK`
as one of its cflags.

so, when building testings using CMake, the tests are compiled with
`-DBOOST_ALL_DYN_LINK`, while when building tests using `configure.py`,
they are compiled with `-DBOOST_TEST_DYN_LINK`. the former is exposed
by `Boost::dynamic_linking`, the latter is hardwired using
`configure.py`. but the net results are identical. it would be better
to use identical cflags on these two building systems. so, let's use
`-DBOOST_ALL_DYN_LINK` in `configure.py` also. furthermore, this is what
non-static-boost implies.

please note, we don't consume the cflags exposed by
`seastar-testing.pc`, so they don't override the ones we set using
`configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17070
2024-01-31 12:21:31 +02:00
Botond Dénes
ecf654ea11 schema: column_mapping::{static,regular}_column_at(): use on_internal_error()
Instead of std::out_of_range(). Accessing a non-existing column is a
serious bug and the backtrace coming with on_internal_error() can be
very valuable when debugging it. As can be the coredump that is possible
to trigger with --abort-on-internal-error.

This change follows another similar change to schema::column_at().
2024-01-31 05:12:33 -05:00
Botond Dénes
03ed9f77ff schema: column_mapping: move column accessors out-of-line
To faciliate further patching.
2024-01-31 05:06:34 -05:00
Lakshmi Narayanan Sreethar
b5e1097858 build: cmake: include raft.cc in api library
When building with cmake, include the raft source files introduced by
commit 617e0913 as sources for api library target.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17075
2024-01-31 11:39:41 +02:00
Nadav Har'El
827c20467c utils: add a timeuuid minimum, like we had maximum
Our time-handling code in UUID_gen.hh is very fragile for very large
timestamps, because the different types - such as Cassandra "timestamp"
and Timeuuid use very different resolution and ranges.

In issue #17035 we discovered a situation where a certain CQL
"timestamp"-type value could cause an assertion-failure and a crash
in the create_time() function that creates a timeuuid - because that
timestamp didn't fit the place we have in timeuuid.

We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which
we refuse timestamps, to avoid these assertions failure. However, we
missed the possibility of *negative* timestamps (which are allowed in
CQL), and indeed a negative timestamp (or a timestamp which was "wrapped"
to a negative value) is what caused issue #17035.

So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the
most negative timestamp that we support to well below the area which
causes problems, and adds tests that reproduce #17035 and that we
didn't break anything else (e.g., negative timestamps are still
allowed - just not extremely negative timestamps).

Fixes #17035.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 11:32:26 +02:00
Kamil Braun
bb22e06a9e Merge 'abort failed rebuild instead of retrying it forever' from Gleb
Add error handling to rebuild instead of retrying it until succeeds.

* 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev:
  test: add test for rebuild failure
  test: add expected_error to rebuild_node operation
  topology_coordinator: Propagate rebuild failure to the initiator
2024-01-31 10:07:28 +01:00
Nadav Har'El
47955642d9 test/cql-pytest: tests for "date" type
This patch adds a few simple tests for the values of the "date" column
type, and how it can be initialized from string or integers, and what do
those values mean.

Two of the tests reproduce issue #17066, where validation is missing
for values that don't fit in a 32-bit unsigned integer.

Refs #17066

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 10:58:02 +02:00
Patryk Wrobel
1b6ab65c51 reader_concurrency_semaphore.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::stringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17064
2024-01-31 09:31:50 +02:00
Botond Dénes
f8d3070559 Merge 'Fix flakiness in test_raft_snapshot_request' from Kamil Braun
Add workaround for scylladb/python-driver#295.

Also an assert made at the end of the test was false, it is fixed with
appropriate comment added.

Closes scylladb/scylladb#17071

* github.com:scylladb/scylladb:
  test_raft_snapshot_request: fix flakiness
  test: topology/util: update comment for `reconnect_driver`
2024-01-31 09:30:27 +02:00
Pavel Emelyanov
84ddc37130 utils: Coroutinize disk_sanity()
It's pretty hairy in its future-promises form, with coroutines it's
much easier to read

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17052
2024-01-31 09:20:21 +02:00
Kefu Chai
8a9f13c187 redis: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17057
2024-01-31 09:17:18 +02:00
Kefu Chai
b931d93668 treewide: fix misspellings in code comments
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17004
2024-01-31 09:16:10 +02:00
Kamil Braun
57d5aa5a68 test: add test for fixing a broken group 0 snapshot
In a cluster with group 0 with snapshot at index 0 (such group 0 might
be established in a 5.2 cluster, then preserved once it upgrades to 5.4
or later), no snapshot transfer will be triggered when a node is
bootstrapped. This way to new node might not obtain full schema, or
obtain incorrect schema, like in scylladb/scylladb#16683.

Simulate this scenario in a test case using the RECOVERY mode and error
injections. Check that the newly added logic for creating a new snapshot
if such situation is detected helps in this case.
2024-01-30 16:44:01 +01:00
Kamil Braun
98d75c65af raft_group0: trigger snapshot if existing snapshot index is 0
The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Fixes scylladb/scylladb#16683
2024-01-30 16:35:54 +01:00
Kamil Braun
74bf60a8ca test_raft_snapshot_request: fix flakiness
Add workaround for scylladb/python-driver#295.

Also an assert made at the end of the test was false, it is fixed with
appropriate comment added.
2024-01-30 16:21:24 +01:00
Kamil Braun
39339b9f70 test: topology/util: update comment for reconnect_driver
The issues mentioned in the comment before are already fixed.
Unfortunately, there is another, opposite issue which this function can
be used for. The previous issue was about the existing driver session
not reconnecting. The current issue is about the existing driver session
reconnecting too much... (and in the middle of queries.)
2024-01-30 15:36:48 +01:00
Piotr Smaroń
35ba037724 config: fix a typo in --role-manager's description
Closes scylladb/scylladb#17063
2024-01-30 16:13:33 +02:00
Kamil Braun
cf3f26dc94 test_maintenance_mode: fix flakiness
Wait until CQL is available and nodes see each other before trying to
perform a query.

Closes scylladb/scylladb#17059
2024-01-30 14:11:14 +02:00
Gleb Natapov
8b50613465 test: add test for rebuild failure 2024-01-30 11:04:19 +02:00
Gleb Natapov
d62204e758 test: add expected_error to rebuild_node operation 2024-01-30 11:04:19 +02:00
Gleb Natapov
51c40034f5 topology_coordinator: Propagate rebuild failure to the initiator
Do not retry rebuild endlessly, but report the error instead.
2024-01-30 11:04:19 +02:00
Kefu Chai
90c0e83f9a thrift: remove unused namespace definition
thrift_transport is never used, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17050
2024-01-30 09:16:47 +02:00
Michał Chojnowski
904bb25987 test: test_tablet_cleanup: wait for servers to see each other before multi-node queries
Waiting for CQL connections is not enough. For the queries to succeed,
nodes must see each other. We have to wait for this, otherwise the test
will be flaky.

Fixes #17029

Closes scylladb/scylladb#17040
2024-01-30 08:56:01 +02:00
Tomasz Grabiec
36f218c83b Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes
We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes.
Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard.

Startup will fail as:

    ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)

Refs: #16739
Fixes: #16843

Closes scylladb/scylladb#17008

* github.com:scylladb/scylladb:
  test/topolgy_experimental_raft: test_tablets.py: add test for resharding
  test/pylib: manager[_client]: add update_cmdline()
  main: refuse startup when tablet resharding is required
  locator: tablets: add check_tablet_replica_shards()
2024-01-29 23:39:41 +01:00
Pavel Emelyanov
370fbd346c Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel
`db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact.

This PR:
 - extends the public interface of utils::directories class to provide required directory paths to the users
 - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object
 - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs

Fixes: scylladb#5626

Closes scylladb/scylladb#16787

* github.com:scylladb/scylladb:
  utils/directories: make utils::directories::set an internal type
  db::config: keep dir paths unchanged
  cql_transport/controler: use utils::directories to get paths of dirs
  service/storage_proxy: use utils::directories to get paths of dirs
  api/storage_service.cc: use utils::directories to get paths of dirs
  tools/scylla-sstable.cc: use utils::directories to get paths
  db/commitlog: do not use db::config to get dirs
  Use utils::directories to get dirs paths in replica::database
  Allow utils::directories to provide paths to dirs
  Clean-up of utils::directories
2024-01-29 18:01:15 +03:00
Kamil Braun
0912d2a2c6 Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak
When a node is in the `left_token_ring` state, we don't know how
it has ended up in this state. We cannot distinguish a node that
has finished decommissioning from a node that has failed bootstrap.

The main problem it causes is that we incorrectly send the
`barrier_and_drain` command to a node that has failed
bootstrapping or replacing. We must do it for a node that has
finished decommissioning because it could still coordinate
requests. However, since we cannot distinguish nodes in the
`left_token_ring` state, we must send the command to all of them.
This issue appeared in scylladb/scylladb#16797 and this PR is
a follow-up that fixes it.

The solution is changing `left_token_ring` from a node state
to a transition state.

Fixes scylladb/scylladb#16944

Closes scylladb/scylladb#17009

* github.com:scylladb/scylladb:
  docs: dev: topology-over-raft: document the left_token_ring state
  topology_coordinator: adjust reason string in left_token_ring handler
  raft topology: make left_token_ring a transition state
  topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes
2024-01-29 15:29:01 +01:00
Kefu Chai
819fc95a67 reader: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17036
2024-01-29 16:21:42 +02:00
Kefu Chai
43094d2023 db: add formatter for db::read_repair_decision
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::read_repair_decision`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17033
2024-01-29 15:43:51 +02:00
Botond Dénes
d202d32f81 Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).

The PR adds the API to `raft::server` and a HTTP endpoint that uses it.

In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)

Closes scylladb/scylladb#16816

* github.com:scylladb/scylladb:
  raft: remove `empty()` from `fsm_output`
  test: add test for manual triggering of Raft snapshots
  api: add HTTP endpoint to trigger Raft snapshots
  raft: server: add `trigger_snapshot` API
  raft: server: track last persisted snapshot descriptor index
  raft: server: framework for handling server requests
  raft: server: inline `poll_fsm_output`
  raft: server: fix indentation
  raft: server: move `io_fiber`'s processing of `batch` to a separate function
  raft: move `poll_output()` from `fsm` to `server`
  raft: move `_sm_events` from `fsm` to `server`
  raft: fsm: remove constructor used only in tests
  raft: fsm: move trace message from `poll_output` to `has_output`
  raft: fsm: extract `has_output()`
  raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
  raft: server: pass `*_aborted` to `set_exception` call
2024-01-29 15:06:04 +02:00
Beni Peled
8009170d3a docs: update the installation instructions with the new gpg 2024 key
Closes scylladb/scylladb#17019
2024-01-29 14:37:25 +02:00
Kefu Chai
6f55d68dd9 .git: add more skip words
these words are either

* shortened words: strategy => strat, read_from_primary => fro
* or acronyms: node_or_data => nd

before we rename them with better names, let's just add them to the
ignore word list.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17002
2024-01-29 14:37:03 +02:00
Patryk Wrobel
781a6a5071 utils/directories: make utils::directories::set an internal type
Previously, utils::directories::set could have been used by
clients of utils::directories class to provide dirs for creation.
Due to moving the responsibility for providing paths of dirs from
db::config to utils::directories, such usage is no longer the case.

This change:
 - defines utils::directories::set in utils/directories.cc to disallow
   its usage by the clients of utils::directories
 - makes utils::directories::create_and_verify() member function
   private; now it is used only by the internals of the class
 - introduces a new member function to utils::directories called
   create_and_verify_sharded_directory() to limit the functionality
   provided to clients

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Patryk Wrobel
dc8d5ffaf6 db::config: keep dir paths unchanged
This change is intended to ensure, that
db::config fields related to directories
are not changed. To achieve that a member
function called setup_directories() is
removed.

The responsibility for directories paths
has been moved to utils::directories,
which may generate default paths if the
configuration does not provide a specific
value.

Fixes: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Patryk Wrobel
0f3b00f9ad cql_transport/controler: use utils::directories to get paths of dirs
This change replaces usage of db::config with
usage of utils::directories to get paths of
directories in cql_transport/controler.

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:38 +01:00
Patryk Wrobel
f08768e767 service/storage_proxy: use utils::directories to get paths of dirs
This change replaces usage of db::config with
usage of utils::directories to get paths of
directories in service/storage_proxy.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
5ac3d0f135 api/storage_service.cc: use utils::directories to get paths of dirs
This change replaces usage of db::config with usage
of utils::directories in api/storage_service.cc in
order to get the paths of directories.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
51fa108df7 tools/scylla-sstable.cc: use utils::directories to get paths
This change replaces usage of db::config with usage
of utils::directories to get paths of directories
in tools/scylla-sstable.cc.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
804afffb11 db/commitlog: do not use db::config to get dirs
This change removes usage of db::config to
get path of commitlog_directory. Instead, it
introduces a new parameter to directly pass
the path to db::commitlog::config::from_db_config().

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
9483d149af Use utils::directories to get dirs paths in replica::database
This change replaces the usage of db::config with
usage of utils::directories to get dirs paths in
replica::database class.

Moreover, it adjusts tests that require construction
of replica::database - its constructor has been
changed to accept utils::directories object.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
1cd676e438 Allow utils::directories to provide paths to dirs
This change extends utils::directories class in
the following way:
 - adds new member variables that correspond to
   fields from db::config that describe paths
   of directories
 - introduces a public interface to retrieve the
   values of the new members
 - allows construction of utils::directories
   object based on db::config to setup internal
   member variables related to paths to dirs

The new members of utils::directories are overriden
when the provided values are empty. The way of setting
paths is taken from db::config.

To ensure that the new logic works correctly
`utils_directories_test` has been created.

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
1b0ccaf4f2 Clean-up of utils::directories
This change is intended to clean-up files in which
utils::directories class is defined to ease further
extensions.

The preparation consists of:
 - removal of `using namespace` from directories.hh to
   avoid namespace pollution in files, that include this
   header
 - explicit inclusion of headers, that were missing or
   were implicitly included to ensure that directories.hh
   is self-sufficient
 - defining directories::set class outside of its parent
   to improve readability

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Botond Dénes
fd66ce1591 test/topolgy_experimental_raft: test_tablets.py: add test for resharding
Check that scylla refuses to start when the shard count is reduced.
2024-01-29 07:04:33 -05:00
Botond Dénes
a7a5aada2a test/pylib: manager[_client]: add update_cmdline()
Similar to the existing update_config(). Updates the command-line
arguments of the specified nodes, merging the new options into the
existing ones. Needs a restart to take effect.
2024-01-29 07:04:33 -05:00
Botond Dénes
8a439fc2a8 main: refuse startup when tablet resharding is required
We do not support tablet resharding yet. All tablet-related code assumes
that the (host_id, shard) tablet replica is always valid. Violating this
leads to undefined behaviour: errors in the tablet load balancer and
potential crashes.
Avoid this by refusing to start if the need to resharding is detected.
Be as lenient as possible: check all tablets with a replica on this node,
and only refuse startup if at least one tablet has an invalid replica
shard.

Startup will fail as:

    ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)
2024-01-29 07:04:33 -05:00
Botond Dénes
95b6aeebae locator: tablets: add check_tablet_replica_shards()
Checks that all tablets with a replica on the this node, have a valid
replica shard (< smp::count).
Will be used to check whether the node can start-up with the current
shard-count.
2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak
7c10cae6c4 docs: dev: topology-over-raft: document the left_token_ring state
In one of the previous patches, we changed the `left_token_ring`
state from a node state to a transition state. We document it
in this patch. The node state wasn't documented, so there is
nothing to remove.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
9b2d1a20a3 topology_coordinator: adjust reason string in left_token_ring handler
We were using the "finished decommission node" reason string for a
failed bootstrap and replace.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
b0eef50b2e raft topology: make left_token_ring a transition state
A node can be in the `left_token_ring` state after:
- a finished decommission,
- a failed bootstrap,
- a failed replace.

When a node is in the `left_token_ring` state, we don't know how
it has ended up in this state. We cannot distinguish a node that
has finished decommissioning from a node that has failed bootstrap.

The main problem it causes is that we incorrectly send the
`barrier_and_drain` command to a node that has failed
bootstrapping or replacing. We must do it for a node that has
finished decommissioning because it could still coordinate
requests. However, since we cannot distinguish nodes in the
`left_token_ring` state, we must send the command to all of them.
This issue appeared in scylladb/scylladb#16797 and this patch is
a follow-up that fixes it.

The solution is changing `left_token_ring` from a node state
to a transition state.

Regarding implementation, most of the changes are simple
refactoring. The less obvious are:
- Before this patch, in `system_keyspace::left_topology_state`, we
had to keep the ignored nodes' IDs for replace to ensure that the
replacing node will have access to it after moving to the
`left_token_ring` state, which happens when replace fails. We
don't need this workaround anymore. When we enter the new
`left_token_ring` transition state, the new node will still be in
the `decommissioning` state, so it won't lose its request param.
- Before this patch, a decommissioning node lost its tokens
while moving to the `left_token_ring` state. After the patch, it
loses tokens while still being in the `decommissioning` state. We
ensure that all `decommissioning` handlers correctly handle a node
that lost its tokens.

Moving the `left_token_ring` handler from `handle_node_transition`
to `handle_topology_transition` created a large diff. There are
only three changes:
- adding `auto node = get_node_to_work_on(std::move(guard));`,
- adding `builder.del_transition_state()`,
- changing error logged when `global_token_metadata_barrier` fails.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
12eb0738cf topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes
The `exclude_nodes` variable was unused, but it wasn't a bug.
The `left_token_ring` and `rollback_to_normal` handlers correctly
compute excluded nodes on their own.
2024-01-29 10:39:06 +01:00
Kefu Chai
0cbf8f75f0 db: add formatter for dht::decorated_key and repair_sync_boundary
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for dht::decorated_key and
repair_sync_boundary.

please note, before this change, repair_sync_boundary was using
the operator<< based formatter of `dht::decorated_key`, so we are
updating both of them in a single commit.

because we still use the homebrew generic formatter of vector<>
in to format vector<repair_sync_boundary> and vector<dht::decorated_key>,
so their operator<< are preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16994
2024-01-29 11:11:41 +02:00
Tzach Livyatan
06a9a925a5 Update link to sizing / pricing calc
Closes scylladb/scylladb#17015
2024-01-29 11:07:20 +02:00
Kefu Chai
b5ff098f28 thrift: add formatter for cassandra::ConsistencyLevel::type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cassandra::ConsistencyLevel::type.
please note, the operator<< for `cassandra::ConsistencyLevel::type`
is generated using `thrift` command line tool, which does not emit
specialization for fmt::formatter yet, so we need to use
`fmt::ostream_formatter` to implement the formatter for this type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17013
2024-01-29 10:10:35 +02:00
Pavel Emelyanov
3abdb3c7ee tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets
It's now unused, string with initial tablets its parsed elsewhere

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17010
2024-01-29 10:03:38 +02:00
Kefu Chai
912c588975 thrift: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17012
2024-01-29 10:02:30 +02:00
Kefu Chai
abb12979f8 raft: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17011
2024-01-29 10:00:56 +02:00
Kefu Chai
8f38bd5376 commitlog: add formatter for db::replay_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::replay_position`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17014
2024-01-29 09:59:30 +02:00
Botond Dénes
d3c1be9107 Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El
This series does a similar change to Alternator as was done recently to CQL:

1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to.
3. In particular, all test/alternator tests will also automatically run with tablets enabled
4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets.
5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later).

Fixes #16355

Closes scylladb/scylladb#16900

* github.com:scylladb/scylladb:
  test/alternator: add "--vnodes" option to run script
  alternator: use tablets by default, if available
  test/alternator: run some tests without tablets
2024-01-29 09:22:13 +02:00
Kefu Chai
cb5453d534 .git: only allow codespell to run on master branch
so that non-master branches are not read by 3rd-party tools unless
they are audited.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16999
2024-01-29 09:04:20 +02:00
Kefu Chai
f96d25a0a7 tool: check for existence of keyspace before getting it
in general, user should save output of `DESC foo.bar` to a file,
and pass the path to the file as the argument of `--schema-file`
option of `scylla sstable` commands. the CQL statement generated
from `DESC` command always include the keyspace name of the table.
but in case user create the CQL statement manually and misses
the keyspace name. he/she would have following assertion failure
```
scylla: cql3/statements/cf_statement.cc:49: virtual const sstring &cql3::statements::raw::cf_statement::keyspace() const: Assertion `_cf_name->has_keyspace()' failed.
```
this is not a great user experience.

so, in this change, we check for the existence of keyspace before
looking it up. and throw a runtime error with a better error mesage.
so when the CQL statement does not have the keyspace name, the new
error message would look like:
```
error processing arguments: could not load schema via schema-file: std::runtime_error (tools::do_load_schemas(): CQL statement does not have keyspace specified)
```

since this check is only performed by `do_load_schemas()` which
care about the existence of keyspace, and it only expects the
CQL statement to create table/keyspace/type, we just override the
new `has_keyspace()` method of the corresponding types derived
from `cf_statement`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16981
2024-01-29 09:02:01 +02:00
Anna Stuchlik
dfa88ccc28 doc: document nodetool resetlocalschema
This adds the documentation for the nodetool resetlocalschema
command.
The syntax description is based on the description for Cassandra
and the ScyllaDB help for nodetool.

Fixes https://github.com/scylladb/scylladb/issues/16286

Closes scylladb/scylladb#16790
2024-01-28 21:09:02 +01:00
Kefu Chai
fe3bc00045 topology_coordinator: fix misspellings in log
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17006
2024-01-26 16:50:39 +02:00
Dawid Medrek
b92fb3537a main: Postpone start-up of hint manager
In this commit, we postpone the start-up
of the hint manager until we obtain information
about other nodes in the cluster.

When we start the hint managers, one of the
things that happen is creating endpoint
managers -- structures managed by
db::hints::manager. Whether we create
an instance of endpoint manager depends on
the value returned by host_filter::can_hint_for,
which, in turn, may depend on the current state
of locator::topology.

If locator::topology is incomplete, some endpoint
managers may not be started even though they
should (because the target node IS part of the
cluster and we SHOULD send hints to it if there
are some).

The situation like that can happen because we
start the hint managers too early. This commit
aims to solve that problem. We only start
the hint managers when we've gathered information
about the other nodes in the cluster and created
the locator::topology using it.

Hinted Handoff is not negatively affected by these
changes since in between the previous point of
starting the hint managers and the current one,
all of the mutations performed by
service::storage_proxy target the local node, so
no hints would need to be generated anyway.

Fixes scylladb/scylladb#11870
Closes scylladb/scylladb#16511
2024-01-26 12:49:40 +01:00
Botond Dénes
c6fd4dffbb Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This PR removes unnamed namespaces from header files.

References:

- [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header)

- [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file)

Closes scylladb/scylladb#16998

* github.com:scylladb/scylladb:
  utils/config_file_impl.hh: remove anonymous namespace from header
  mutation/mutation.hh: remove anonymous namespace from header
2024-01-26 13:20:17 +02:00
Kefu Chai
a9d781d70f test/nodetool: only test "storage_service/cleanup_all" with scylla
this RESTful API is a scylla specific extension and is only used
by scylla-nodetool. currently, the java-based nodetool does not use
it at all, so mark it with "scylla_only".

one can verify this change with:
```
pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17001
2024-01-26 13:19:15 +02:00
Botond Dénes
582ddc70ec Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Closes scylladb/scylladb#16996

* github.com:scylladb/scylladb:
  test/nodetool: return a randomized address if not running with unshare
  test/nodetool: return an address from loopback_network fixture
2024-01-26 13:15:58 +02:00
Kefu Chai
9ee6c00c84 docs: fix misspellings
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17005
2024-01-26 13:14:21 +02:00
Kefu Chai
72cec22932 repair: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16993
2024-01-26 13:12:38 +02:00
Kamil Braun
4f736894e1 Merge 'Add maintenance mode' from Mikołaj Grzebieluch
In this mode, the node is not reachable from the outside, i.e.
* it refuses all incoming RPC connections,
* it does not join the cluster, thus
  * all group0 operations are disabled (e.g. schema changes),
  * all cluster-wide operations are disabled for this node (e.g. repair),
  * other nodes see this node as dead,
  * cannot read or write data from/to other nodes,
* it does not open Alternator and Redis transport ports and the TCP CQL port.

The only way to make CQL queries is to use the maintenance socket. The node serves only local data.

To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file.

REST API works as usual, but some routes are disabled:
* authorization_cache
* failure_detector
* hinted_hand_off_manager

This PR also updates the maintenance socket documentation:
* add cqlsh usage to the documentation
* update the documentation to use `WhiteListRoundRobinPolicy`

Fixes #5489.

Closes scylladb/scylladb#15346

* github.com:scylladb/scylladb:
  test.py: add test for maintenance mode
  test.py: generalize usage of cluster_con
  test.py: when connecting to node in maintenance mode use maintenance socket
  docs: add maintenance mode documentation
  main: add maintenance mode
  main: move some REST routes initialization before joining group0
  message_service: add sanity check that rpc connections are not created in the maintenance mode
  raft_group0_client: disable group0 operations in the maintenance mode
  service/storage_service: add start_maintenance_mode() method
  storage_service: add MAINTENANCE option to mode enum
  service/maintenance_mode: add maintenance_mode_enabled bool class
  service/maintenance_mode: move maintenance_socket_enabled definition to seperate file
  db/config: add maintenance mode flag
  docs: add cqlsh usage to maintenance socket documentation
  docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
2024-01-26 11:02:34 +01:00
Botond Dénes
f94acc2eb4 test/cql-pytest: conftest.py: remove xfail_tablets fixture
No test uses it and going forward we should not add tests wchich do not
work with tablets.
2024-01-26 04:02:40 -05:00
Botond Dénes
dcaf308a59 test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests
The tests in this file, that are related to partition-scans are failing
with tablets, and were hence disabled with xfail_tablets. This means we
are loosing test coverage, so parametrize these tests to run with both
vnodes and tablets, and targetedly mark as xfail only when running with
tablets.
2024-01-26 04:02:40 -05:00
Botond Dénes
3527d0aaed test/cql-pytest: test_describe.py: re-enable disabled tests
This test file has two tests disabled:
* test_desc_cluster - due to #16789
* test_whitespaces_in_table_options - due to #16317

They are disabled via xfail, because they do not work with tablets. This
means we loose test coverage of the respective functionality.
This patch re-enables the two tests, by parametrizing them to run with
both vnodes and tablets:
* test_desc_cluster - when run with tablets, endpoint info is not
  validated. The test is still useful because it checks that DESC
  CLUSTER doesn't break with tablets. A FIXME with a link to #16789
  is left.
* test_whitespaces_in_table_options - marked xfail when run with
  tablets, but not when run with vnodes, thus we re-gain the test
  coverage.
2024-01-26 04:02:40 -05:00
Botond Dénes
a3b75e863b test/cql-pytest: test_cdc.py: re-enable disabled tests
The tests in this file are currently all marked with xfail_tablets,
because tablets are not enabled by default in the cql-pytest suite and
CDC doesn't currently work with tablets at all. This however means that
the CDC functionality looses test coverage. So instead, of a blanket
xfail, prametrize these tests to run with both vnodes and tablets, and
add a targeted xfail for the tablets parameter. This way the no coverage
is lost, the tests are still running with vnode (and will fail if
regressions are introduced), and they are allowed to xfail with tablets
enabled.

We could simply make these tests only run with vnodes for now. But
looking forward, after the CDC functionality is fixed to work with
tablets, we want to verify that it works with both vnodes and tablets.
So we run the test with both and leave the xfail as a remainder that a
fix is required.
2024-01-26 04:02:40 -05:00
Botond Dénes
631f7c99f5 test/cql-pytest: add parameter support to test_keyspace
Tests can now request to be run against both tablets and vnodes, via:

    @pytest.mark.parametrize("test_keyspace", ["tablets", "vnodes"], indirect=True)

This will set request.param for the test_keyspace fixture, which can
create the keyspace according to the requested parameter. This way,
tests can conveniently opt-in to be run against both replication
methods.
When not parameterized like this, the test_keyspace fixture will create
a keyspace as before -- with tablets, if support is enabled.
2024-01-26 04:02:40 -05:00
Kefu Chai
637dd73079 sstable/storage: use fs::path to represent _dir and _temp_dir
they are directories, and we are concating strings to build the paths
to the sstable components. so it would be more elegant to use fs::path
for manipulating paths.

this change was inspired by the discussion on passing the relative
path to sstable to `scylla sstables`, where we use the
`path::parent_path()` as the dir of sstable, and then concatenate
it with the filename component. but if the `parent_path()` method
returns an empty string, we end up with a path like
"/me-42-big-TOC.txt", which is not reachable. what we should be
reading is "me-42-big-TOC.txt". so, we should better off either
using `fs::path` or enforcing the absolute path.

since we already using "/" as separator, and concatenating strings,
this is an opportunity to switch over to `fs::path` to address
the problem and to avoid the string concatenating.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16982
2024-01-26 09:54:41 +02:00
Patryk Wrobel
6faa178f10 utils/config_file_impl.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:44:44 +01:00
Patryk Wrobel
c218333afb cql3/type_json.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16990
2024-01-26 09:41:09 +02:00
Kefu Chai
36e81f93d2 .git: do not apply codespell to licenses
we should keep the licenses as they are, even with misspellings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16992
2024-01-26 09:39:27 +02:00
Patryk Wrobel
ba488b10ec mutation/mutation.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:38:39 +01:00
Kefu Chai
01727a5399 test/nodetool: return a randomized address if not running with unshare
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:32:47 +08:00
Kefu Chai
358d30fd29 test/nodetool: return an address from loopback_network fixture
* rename "maybe_setup_loopback_network" to "server_address"
* return an address from the fixture

this change prepares for bringing back the randomized IP and port,
in case users run this test without test.py, by randomizing the
IP and port, they would have better chance to complete the test
without running into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:20:37 +08:00
Raphael S. Carvalho
3b14c5b84a test/topology_experimental_raft: Add tablet split test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
90c9a5d7af replica: Bypass reshape on boot with tablets temporarily
Without it, table loading fails as reshape mixes sstables from
different tablets together, and now we have a guard for that:

Unable to load SSTable ...-big-Data.db that belongs to tablets 1 and 31,

The fix is about making reshape compaction group aware.
It will be fixed, but not now.

Refs #16966.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
2cb8a824ec replica: Fix table::compaction_group_for_sstable() for tablet streaming
It might happen that sstable being streamed during migration is not
split yet, therefore it should be added to the main compaction group,
allowing the streaming stage to start split work on it, and not
fool the coordinator thinking it can proceed with split execution
which would cause problems.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
4245ad333a test/topology_experimental_raft: Disable load balancer in test fencing
This is easier to reproducer after changes in load balancer, to
emit resize decisions, which in turn results in topology version
being incremented, and that might race with fencing tests that
manipulate the topology version manually.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
85020861fc replica: Remap compaction groups when tablet split is finalized
When coordinator executes split, i.e. commit the new tablet map with
each tablet split into two, all replicas must then proceed with
remapping of compaction groups that were previously split.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
bf6f692f60 service: Split tablet map when split request is finalized
When load balancer emits finalize request, the coordinator will
now react to it by splitting each tablet in the current tablet
map and then committing the new map.

There can be no active migration while we do it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
9342792173 replica: Update table split status if completed split compaction work
The table replica will say to coordinator that its split status
is ready by loading the sequence number from tablet metadata
into its local state, which is pulled periodically by the
coordinator via RPC.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
cfa8200da5 storage_service: Implement split monitor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
e0de3dd844 topology_cordinator: Generate updates for resize decisions made by balancer
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:40 -03:00
Raphael S. Carvalho
3ef792c4e8 load_balancer: Introduce metrics for resize decisions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
638e6e30cb db: Make target tablet size a live-updateable config option
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
7ed5b44d52 load_balancer: Implement resize decisions
This implements the ability in load balancer to emit split or merge
requests, cancel ongoing ones if they're no longer needed, and
also finalize those that are ready for the topology changes.

That's all based on average tablet size, collected by coordinator
from all nodes, and split and merge thresholds.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
8f7f74c490 service: Wire table_resize_plan into migration_plan
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
8d283b2593 service: Introduce table_resize_plan
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
ed2138a35a tablet_mutation_builder: Add set_resize_decision()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
490d109055 topology_coordinator: Wire load stats into load balancer
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
ce353bf47c storage_service: Allow tablet split and migration to happen concurrently
Lack of synchronization could lead the coordinator to think that a
pending replica in migration has split ready status, when in reality
it escaped the check if it happens that the leaving replica escaped
the split ready check, after the status has already been pulled at
destination by coordinator.

Example:
1) Coordinator pulls split status (ready) from destination replica
2) Migration sends a non-split tablet into destination
3) Coordinator pulls split status (ready) from source after
transition stage of migration moved to cleanup (so there's no
longer a leaving replica in it).
4) Migration completes, but compaction group is not split yet.
Coordinator thinks destination is ready.

To solve it, streaming now guarantees that pending replica is
split before returning, so migration can only advance to next
stage after the pending replica is split, if and only if
there's a split request emitted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
2209c7440c topology_coordinator: Periodically retrieve table_load_stats
This implements the fiber that aggregates per-table stats that will
be feeded into load balancer to make resize decisions (split,
merge, or revoke ongoing ones).

Initially, the stats will be refreshed every 60s, but the idea
is that eventually we make the frequency table based, where
the size of each table is taken into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
489a527e20 locator: Introduce topology::get_datacenter_nodes()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
9519a0c9e4 storage_service: Implement table_load_stats RPC
This implements the RPC for collecting table stats.

Since both leaving and pending replica can be accounted during
tablet migration, the RPC handler will look at tablet transition
info and account only either leaving or replica based on the
tablet migration stage. Replicas that are not leaving or
pending, of course, don't contribute to the anomaly in the
reported size.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
4684615927 replica: Expose table_load_stats in table
This is the table replica state that coordinator will aggregate
from all nodes and feed into the load balancer.

A tablet filter is added to not double account migrating tablets,
so only one of pending or leaving tablet replica will be accounted
based on current migration stage. More details can be known in
the patch that will implement the filter.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
beef9c9f70 replica: Introduce storage_group::live_disk_space_used()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
6c74fc4b82 locator: Introduce table_load_stats
This is per table stats that will be aggregated from all nodes, by
the coordinator, in order to help load balancer make resize
decisions.

size_in_bytes is the total aggregated table size, so coordinator
becomes responsible for taking into account RF of each DC and
also tablet count, for computing an accurate average size.

split_ready_seq_number is the minimum sequence number among all
replicas. If coordinator sees all replicas store the seq number
of current split, then it knows all replicas are ready for the
next stage in the split process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
0d5ba1ee4b tablets: Add resize decision metadata to tablet metadata
The new metadata describes the ongoing resize operation (can be either
of merge, split or none) that spans tablets of a given table.
That's managed by group0, so down nodes will be able to see the
decision when they come back up and see the changes to the
metadata.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:06 -03:00
Raphael S. Carvalho
57582ac9c4 locator: Introduce resize_decision
resize_decision is the metadata the says whether tablets of a table
needs split, merge, or none. That will be recorded in tablet metadata,
and therefore stored in group0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:31:12 -03:00
Avi Kivity
03313d359e Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski
To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay.

This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations.

Fixes #16752

Closes scylladb/scylladb#16888

* github.com:scylladb/scylladb:
  test: test_tablets: add a test for cleanup after migration
  test: pylib: add ScyllaCluster.wipe_sstables
  test: boost: add commitlog_cleanup_test
  db: commitlog_replayer: ignore mutations affected by (tablet) cleanups
  replica: table: garbage-collect irrelevant system.commitlog_cleanups records
  db: commitlog: add min_position()
  replica: table: populate system.commitlog_cleanups on tablet cleanup
  db: system_keyspace: add system.commitlog_cleanups
  replica: table: refresh compound sstable set after tablet cleanup
2024-01-25 20:51:03 +02:00
Patryk Wrobel
a858daf038 service/client_state.cc: remove redundant copying
db::schema_tables::all_table_names() returns std::vector<sstring>.
Usage of range-for loop without reference results in copying each
of the elements of the traversed container. Such copying is redundant.

This change introduces usage of const reference to avoid copying.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16983
2024-01-25 20:35:05 +02:00
Kamil Braun
543ad0987a Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak
We didn't send the `barrier_and_drain` command to a
decommissioning node that could still be coordinating requests. It
could happen that a decommissioning node sent a request with an
old topology version after normal nodes received the new fence
version. Then, the request would fail on replicas with the stale
topology exception.

This PR fixes this problem by modifying `exec_global_command`.
From now on, it sends `barrier_and_drain` to a decommissioning
node.

We also stop filtering stale topology exceptions in
`test_topology_ops`. We added this filter after detecting the bug
fixed by this PR.

Fixes scylladb/scylladb#15804
Fixes scylladb/scylladb#16579
Fixes scylladb/scylladb#16642

Closes scylladb/scylladb#16797

* github.com:scylladb/scylladb:
  test: test_topology_ops: remove failed mutations filter
  raft topology: send barrier_and_drain to a decommissioning node
  raft topology: ensure at most one transitioning node
2024-01-25 16:09:02 +01:00
Kefu Chai
ee28cf2285 test.py: s/defalt/default/
this typo was identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16980
2024-01-25 16:54:07 +02:00
Botond Dénes
6d5ee6d48a Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai
before this change, we use a random address when launching
rest_api_mock server, but there are chances that the randomly
picked address conflicts with an already-used address on the
host. and the subprocess fails right away with the returncode of
1 upon this failure, but we just continue on and check the readiness
of the already-dead server. actually, we've seen test failures
caused by the EADDRINUSE failure, and when we checked the readiness
of the rest_api_mock by sending HTTP request and reading the
response, what we had is not a JSON encoded response but a webpage,
which was likely the one returned by a minio server.

in this change, we

* specify the "launcher" option of nodetool
  test suite to "unshare", so that all its tests are launched
  in separated namespaces.
* do not use a random address for the mock server, as the network
  namespaces are separated.

Fixes #16542

Closes scylladb/scylladb#16773

* github.com:scylladb/scylladb:
  test/nodetool: run nodetool tests using "unshare"
  test.py: add "launcher" option support
2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch
763911af5b test.py: add test for maintenance mode
The test checks that in maintenance mode server A is not available for other
nodes and for clients. It is possible to connect by the maintenance socket
to server A and perform local CQL operations.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
ca35e352f5 test.py: generalize usage of cluster_con
Add option to pass load_balancing policy.
Change hosts type to list of IPs or cassandra.Endpoint.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
77a656bfd6 test.py: when connecting to node in maintenance mode use maintenance socket
A node in the maintenance socket hasn't an opened regular CQL port.
To connect to the node, the scylla cluster needs to use the node's maintenance socket.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
9c07a189e8 docs: add maintenance mode documentation 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
0bdbd6e8f5 main: add maintenance mode
In maintenance mode:
* Group0 doesn't start and the node doesn't join the token ring to behave as a dead
node to others,
* Group0 operations are disabled and result in an error,
* Only the maintenance socket listens for CQL requests,
* The storage service initialises token_metadata with the local node as the only node
on the token ring.

Maintenance mode is enabled by passing the --maintenance-mode flag.

Maintenance mode starts before the group0 is initialised.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
617adde9c9 main: move some REST routes initialization before joining group0
Move REST endpoints that don't need connection with other nodes, before joining the group0.
This way, they can be initialized in the maintenance mode.

Move `snapshot_ctl` along with routes because of snapshots API and tasks API.
Its constructor is a noop, so it is safe to move it.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d8de209dcf message_service: add sanity check that rpc connections are not created in the maintenance mode
In maintenance mode, a node shouldn't be able to communicate with other nodes.

To make sure this does not happen, the sanity check is added.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c08266cfe5 raft_group0_client: disable group0 operations in the maintenance mode
In maintenance mode, the node doesn't communicate with other nodes, so it doesn't
start or apply group0 operations. Users can still try to start it, e.g. change
the schema, and the node can't allow it.

Init _upgrade_state with recovery in the maintenance mode.
Throw an error if the group0 operation is started in maintenance mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
97641f646a service/storage_service: add start_maintenance_mode() method
In the maintenance mode, other nodes won't be available thus we disabled joining
the token ring and the token metadata won't be populated with the local node's endpoint.
When a CQL query is executed it checks the `token_metadata` structure and fails if it is empty.

Add a method that initialises `token_metadata` with the local node as the only node in the token ring.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c530756837 storage_service: add MAINTENANCE option to mode enum
join_cluster and start_maintenance_mode are incompatible.
To make sure that only one is called when the node starts, add the MAINTENANCE option.

start_maintenance_mode sets _operation_mode to MAINTENANCE.
join_cluster sets _operation_mode to STARTING.

set_mode will result in an internal error if:
* it tries to set MAINTENANCE mode when the _operation_mode is other than NONE,
  i.e. start_maintenance_mode is called after join_cluster (or it is called during
  the drain, but it also shouldn't happen).
* it tries to set STARTING mode when the mode is set to MAINTENANCE,
  i.e. join_cluster is called after start_maintenance_mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d4c22fc86c service/maintenance_mode: add maintenance_mode_enabled bool class 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
8b2f0e38d9 service/maintenance_mode: move maintenance_socket_enabled definition to seperate file 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
e6a83b9819 db/config: add maintenance mode flag 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
81ef9fc91e docs: add cqlsh usage to maintenance socket documentation
After https://github.com/scylladb/scylla-cqlsh/pull/67, the user can use
cqlsh to connect to the node by maintenance socket.
2024-01-25 15:27:53 +01:00
Botond Dénes
c67698ea06 compaction/compaction_manager: perform_cleanup(): hold the compaction gate
While the cleanup is ongoing. Otherwise, a concurrent table drop might
trigger a use-after-free, as we have seen in dtests recently.

Fixes: #16770

Closes scylladb/scylladb#16874
2024-01-25 14:52:50 +01:00
Mikołaj Grzebieluch
2c34d9fcd8 docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
After https://github.com/scylladb/python-driver/pull/287, the user can use
WhiteListRoundRobinPolicy to connect to the node by maintenance socket.
2024-01-25 14:52:24 +01:00
Pavel Emelyanov
bf3cae4992 Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision. the tests are updated to print the time duration
instead of count to provide information with a higher resolution.

Fixes #15902

Closes scylladb/scylladb#16264

* github.com:scylladb/scylladb:
  tests: utils: error injection: print time duration instead of count
  error_injection: do not cast to milliseconds when injecting timeout
2024-01-25 16:13:27 +03:00
Avi Kivity
69d597075a Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec
New tablet replicas are allocated and rebuilt synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet rebuilding transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes https://github.com/scylladb/scylladb/issues/16690.

Closes scylladb/scylladb#16894

* github.com:scylladb/scylladb:
  tests: tablets: Add tests for removenode and replace
  tablets: Add support for removenode and replace handling
  topology_coordinator: tablets: Do not fail in a tight loop
  topology_coordinator: tablets: Avoid warnings about ignored failured future
  storage_service, topology: Track excluded state in locator::topology
  raft topology: Introduce param-less topology::get_excluded_nodes()
  raft topology: Move get_excluded_nodes() to topology
  tablets: load_balancer: Generalize load tracking
  tablets: Introduce get_migration_streaming_info() which works on migration request
  tablets: Move migration_to_transition_info() to tablets.hh
  tablets: Extract get_new_replicas() which works on migraiton request
  tablets: Move tablet_migration_info to tablets.hh
  tablets: Store transition kind per tablet
2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak
b348014745 test: test_topology_ops: remove failed mutations filter
We added this filter after detecting a bug in the Raft-based
topology. We weren't sending `barrier_and_drain` commands to a
decommissioning node that could still be coordinating requests.
It could cause stale topology exceptions on replicas if the
decommissioning node sent a request with an old topology version
after normal nodes received the new fence version.

This bug has been fixed in the previous commit, so we remove the
filter.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
9aebd6dd96 raft topology: send barrier_and_drain to a decommissioning node
Before this patch, we didn't send the `barrier_and_drain` command
to a decommissioning node that could still be coordinating
requests. It could happen that a decommissioning node sent
a request with an old topology version after normal nodes received
the new fence version. Then, the request would fail on replicas
with the stale topology exception.

We fix this problem by modifying `exec_global_command`. From now
on, it sends `barrier_and_drain` to a decommissioning node, which
can also be in the `left_token_ring` state.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
378cbd0b70 raft topology: ensure at most one transitioning node
We add a sanity check to ensure at most one transitioning node at
a time. If there is more, something must have gone wrong.

In the future, we might implement concurrent topology operations.
Then, we will remove this sanity check.

We also extend the comment describing `transition_nodes` so that
it better explains why we use a map and how it should be handled.
2024-01-25 13:42:46 +01:00
Alexander Turetskiy
c1ae5425f7 DROP TYPE IF EXISTS should work on non-existent keyspace
DROP TYPE IF EXISTS should pass and do nothing  on non-existent keyspace

fixes #9082

Closes scylladb/scylladb#16504
2024-01-25 14:28:43 +02:00
Kefu Chai
b1431f08f7 test/nodetool: run nodetool tests using "unshare"
before this change, we use a random address when launching
rest_api_mock server, but there are chances that the randomly
picked address conflicts with an already-used address on the
host. and the subprocess fails right away with the returncode of
1 upon this failure, but we just continue on and check the readiness
of the already-dead server. actually, we've seen test failures
caused by the EADDRINUSE failure, and when we checked the readiness
of the rest_api_mock by sending HTTP request and reading the
response, what we had is not a JSON encoded response but a webpage,
which was likely the one returned by a minio server.

in this change, we

* specify the "launcher" option of nodetool
  test suite to "unshare", so that all its tests are launched
  in separated namespaces.
* use a random fixed address for the mock server, as the network
  namespaces are not shared anymore
* add an option in `nodetool/conftest.py`, so that it can optionally
  setup the lo network interface when it is launched in a separated
  new network namespace.

Fixes #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 20:28:36 +08:00
Kefu Chai
35b3c51f40 test.py: add "launcher" option support
before this change, all "tool" test suites use "pytest" to launch their
tests. but some of the tests might need a dedicated namespace so they
do not interfere with each other. fortunately, "unshare(1)" allows us
to run a progame in new namespaces.

in this change, we add a "launcher" option to "tool" test suites. so
that these tests can run with the specified "launcher" instead of using
"launcher". if "launcher" is not specified, its default value of
"pytest" is used.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 20:28:01 +08:00
Kurashkin Nikita
d90eeb5c4f cql3:statement_restrictions.cc: multi-column relation null check
Before this patch we received internal server error
"Attempted to create key component from empty optional" when used null in
multi-column relations.
This patch adds a null check for each element of each tuple in the
expression and generates an invalid request error if it finds such an element.

Modified cassandra test and added a new one that checks the occurrence of null values in tuples.
Added a test that checks whether the wrong number of items is entered in tuples.

Fixes #13217

Closes scylladb/scylladb#16415
2024-01-25 14:17:43 +02:00
Botond Dénes
5df4ad2e48 test/cql-pytest: test_tools.py: fix flaky schema load failure test
The test TestScyllaSsstableSchemaLoading.test_fail_schema_autodetect was
observed to be flaky. Sometimes failing on local setups, but not in CI.
As it turns out, this is because, when run via test.py, the test's
working directory is root directory of scylla.git. In this case,
scylla-sstable will find and read conf/scylla.yaml. After having done
so, it will try look in the default data directory
(/var/lib/scylla/data) for the schema tables. If the local machine
happens to have a scylla data-dir setup at the above mentioned location,
it will read the schema tables and will succeed to find the tested
table (which is system table, so it is always present). This will fail
the test, as the test expects the opposite -- the table not being found.

The solution is to change the test's working directory to the random
temporary work dir, so that the local environment doesn't interfere with
it.

Fixes: #16828

Closes scylladb/scylladb#16837
2024-01-25 15:14:16 +03:00
Botond Dénes
b341aa8f6d Merge 'api/api.hh: improve usage of standard containers' from Patryk Wróbel
This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop.

It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand.

When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies.

Closes scylladb/scylladb#16978

* github.com:scylladb/scylladb:
  api/api.hh: use const reference when looping over container
  api/api.hh: use std::vector::reserve() when the total size is known
2024-01-25 13:22:48 +02:00
Kamil Braun
994a2ea5c3 Merge 'Call left/joined notifiers when topology coordinator is enabled' from Gleb
The gossiper topology change code calls left/joined notifiers when a
node leave or joins the cluster. This code it not executed in topology coordinator
mode, so the coordinator needs to call those notifiers by itself. The
series add the calls.

Fixes scylladb/scylladb#15841

* 'gleb/raft-topo-notifications-v1' of github.com:scylladb/scylla-dev:
  storage service: topology coordinator: call notify_joined() when a node joins a cluster
  storage service: topology coordinator: call notify_left() when a node leaves a cluster
  storage_service: drop redundant check from notify_joined()
2024-01-25 12:12:53 +01:00
Kefu Chai
1d33a68dd7 tests: utils: error injection: print time duration instead of count
instead of casting / comparing the count of duration unit, let's just
compare the durations, so that boost.test is able to print the duration
in a more informative and user friendly way (line wrapped)

test/boost/error_injection_test.cc(167): fatal error:
    in "test_inject_future_disabled":
      critical check wait_time > sleep_msec has failed [23839ns <= 10ms]

Refs #15902
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 19:10:24 +08:00
Kefu Chai
8a5689e7a7 error_injection: do not cast to milliseconds when injecting timeout
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision.

Fixes #15902

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 19:10:24 +08:00
Gleb Natapov
adf70aae15 storage service: topology coordinator: call notify_joined() when a node joins a cluster
When the topology coordinator is used for topology changes the gossiper
based code that calls notify_joined() is not called. The coordinator needs
to call it itself. But it needs to call it only once when node becomes
normal. For that the patch changes state loading code to remember the
old set of nodes in normal state to check if a node that is normal after
new state is loaded was not in the normal state before.
2024-01-25 12:28:08 +02:00
Botond Dénes
c9f247f3e8 Merge 'sstables: writer: don't block topology changes while writing sstables' from Avi Kivity
The sstable writer held the effective_replication_map_ptr while writing
sstables, which is both a layering violation and slows down tablet load
balancing. It was needed in order to ensure the sharder was stable. But
it turns out that sharding metadata is unnecessary for tablets, so just
skip the whole thing when writing an sstable for tablets.

Closes scylladb/scylladb#16953

* github.com:scylladb/scylladb:
  sstables: writer: don't require effective_replication_map for sharding metadata
  schema: provide method to get sharder, iff it is static
2024-01-25 12:12:01 +02:00
Botond Dénes
8e82df6fb6 Merge 'coverage libraries: bug fixes' from Eliran Sinvani
This mini-series contains two bug fixes that were found as part of testing coverage reporting in CI:
ref: https://github.com/scylladb/scylladb/pull/16895

1. The html-fixup which is triggered when using:`test/pylib/coverage_utils.py lcov-tools genhtml...` rendered incorrect links for multiple links in the same line.
2. For files that contined `,` in their name the output was simply wrong and resulted in lcov not being able to find such files for the purpose of filtering or generating reports.

The aforementioned draft PR served as a testing bed for finding and fixing those bugs.

Closes scylladb/scylladb#16977

* github.com:scylladb/scylladb:
  lcov_utils.py: support sourcefiles that contains commas in their name
  coreage_utils.py: make regular expression lazy in  html-fixup
2024-01-25 11:46:15 +02:00
Kefu Chai
0fbfc96619 db: add formatter for schema_tables::table_kind
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for db::schema_tables::table_kind,
and its operator<<() is still used by the homebrew generic formatter
for std::map<>, so it is preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16972
2024-01-25 11:33:13 +03:00
Kefu Chai
ffb5ad494f api: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16973
2024-01-25 11:28:02 +03:00
Patryk Wrobel
cdfe0c1c35 api/api.hh: use const reference when looping over container
When reference is not used in the range-for loop, then
each element of a container is copied. Such copying
is not a problem for scalar types. However, the in case
of non-trivial types it may cause unneeded overhead.

This change replaces copying with const references
to avoid copying of types like seastar::sstring etc.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-25 09:20:35 +01:00
Patryk Wrobel
1ca71f2532 api/api.hh: use std::vector::reserve() when the total size is known
When growing via push_back(), std::vector may need to reallocate
its internal block of memory due to not enough space. It is advised
to allocate the required space before appending elements if the
size is known beforehand.

This change introduces usage of std::vector::reserve() in api.hh
to ensure that push_back() does not cause reallocations.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-25 08:50:19 +01:00
Eliran Sinvani
d27283918f lcov_utils.py: support sourcefiles that contains commas in their name
As part of the parsing, every line of an lcov file was modeled as
INFO_TYPE:field[,field]...
However specifically for info type "SF" which represents the source file
there can only be one field.
This caused files that are using ',' in their names to be cut down up to
the first ',' and as a results not handled  correctly by lcov_utils.py
especially when rewriting a file.
This patch adds a special handling for the "SF" INFO_TYPE.
ref : `man geninfo`

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-25 09:30:52 +02:00
Eliran Sinvani
11eb9f5bb2 coreage_utils.py: make regular expression lazy in html-fixup
The html-fixup procedure was created because of a bug in genhtml (`man
genhtml` for details about what genhtml is). The bug is that genhtml
doesn't account for file names that contains illegal  url characters (ref:
https://stackoverflow.com/a/1547940/2669716). html-fixup converts those
characters to the %<octet> notation (i.e space character becomes %20
etc..). However, the regular expression used to detect links was eager,
which didn't account for multiple links in the same line. This was
discovered during browsing one of the report and noticing that the links
that are meant to alternate between code view and function view of a
source got scrambled and unusable after html-fixup.
This change makes the regex that is used to detect links lazy so it can
handle multiple links in the same line in an html file correctly.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-25 09:30:42 +02:00
Nadav Har'El
69a68e35dd Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.

Tests are also added.

Fixes: #16492

Closes scylladb/scylladb#16517

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
  test/cql-pytest: test_tools.py: extract some fixture logic to functions
  test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
  tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
  tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
  test/boost/schema_loader_test: add test for mvs and indexes
  tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
  replica/database: extract existing_index_names and get_available_index_name
  tools/schema_loader: make real_db.tables the only source of truth on existing tables
  tools/schema_loader: table(): store const keyspace&
  tools/schema_loader: make database,keyspace,table non-movable
  cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
  cql3/statements/create_index_statement: make build_index_schema() public
  cql3/statements/create_index_statement: relax some method's dependence on qp
  cql3/statements/create_view_statement: make prepare_view() public
2024-01-24 23:36:54 +02:00
Nadav Har'El
df6c9828ef Merge 'Add protobuf and Native histogram support' from Amnon Heiman
Native histograms (also known as sparse histograms) are an experimental Prometheus feature.
They use protobuf as the reporting layer.
Native histograms hold the benefits of high resolution at a lower resource cost.

This series allows sending histograms in a native histogram format over protobuf.
By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf.

Fixes #12931

Closes scylladb/scylladb#16737

* github.com:scylladb/scylladb:
  main.cc: Add prometheus_allow_protobuf command line
  histogram_metrics_helper: support native histogram
  config: Add prometheus_allow_protobuf flag
2024-01-24 21:24:50 +02:00
Michał Chojnowski
f0eadc734e test: test_tablets: add a test for cleanup after migration
Reproduces the problems fixed by earlier commits in the series.
2024-01-24 19:36:29 +01:00
Botond Dénes
7bb3ed7f23 docs/operating-scylla: scylla-sstable.rst: fix checksum list
Add empty line before list of different checksums in
validate-checksums's description. Otherwise the list is not rendered.

Closes scylladb/scylladb#16401
2024-01-24 16:34:13 +01:00
Kefu Chai
a9851cf834 test.py: replace "$foo is False" with "not $foo"
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16960
2024-01-24 15:21:53 +02:00
Kefu Chai
add74ec8ee mutation_writer: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16958
2024-01-24 15:20:02 +02:00
Kefu Chai
c978d1b3f8 config: s/re-use/reuse/
this misspelling is identified by codespell.
per m-w, reuse is a word per-se, and we don't need the hyphen for
addressing the ambiguity in the use cases, like, recover and re-cover.
see also https://www.merriam-webster.com/dictionary/reuse

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16962
2024-01-24 15:19:03 +02:00
Kefu Chai
8c39aba820 tools/scylla-sstable: use canonical path for sst_path
we deduce the paths to other SSTable components from the one
specified from the command line, for instance, if
/a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool
would try to read /a/b/c/me-really-big-TOC.txt for the list of
other components. this works fine if the full path is specified
in the command line.

but if a relative path is specified, like, "me-really-big-Data.db",
this does not work anymore. before this change, the tool
would be reading "/me-really-big-TOC.txt", which does not exist
under most circumstances. while $PWD/me-really-big-TOC.txt should
exist if the SSTable is sane.

after this change, we always convert the specified path to
its canonical representation, no matter it is relative or absolutate.
this enables us to get the correct parent path path when trying
to read, for instance, the TOC component.

Fixes #16955
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16964
2024-01-24 13:28:40 +02:00
Michał Chojnowski
b88a0eb9ab test: pylib: add ScyllaCluster.wipe_sstables
Add a method which wipes sstables files for a particular table on a particular
stopped node.
2024-01-24 11:52:49 +01:00
Michał Chojnowski
94cdfcaa94 test: boost: add commitlog_cleanup_test
Adds a test for the commitlog cleanup functionality added
earlier in the series.
2024-01-24 10:37:39 +01:00
Michał Chojnowski
a246bb39ef db: commitlog_replayer: ignore mutations affected by (tablet) cleanups
To avoid data resurrection, mutations deleted by cleanup operations
have to be skipped during commitlog replay.

This patch implements this, based on the metadata recorded on cleanup
operations into system.commitlog_cleanups.
2024-01-24 10:37:39 +01:00
Michał Chojnowski
f458a1bf3e replica: table: garbage-collect irrelevant system.commitlog_cleanups records
Currently, rows in system.commitlog_cleanups are only dropped on node restart,
so the table can accumulate an unbounded number of records.

This probably isn't a problem in practice, because tablet cleanups aren't that
frequent, but this patch adds a countermeasure anyway.

This patch makes the choice to delete the unneeded records right when new records
are added. This isn't ideal -- it would be more natural if the unneeded records
were deleted as soon as they become unneeded -- but it does the job with a
minimal amount of code.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
05ff32ebf9 db: commitlog: add min_position()
Add a helper function which returns the minimum replay position
across all existing or future commitlog segments.
Only positions greater or equal to it can be replayed on the next reboot.

We will use this helper in a future patch to garbage collect some cleanup
metadata which refers to replay positions.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
a10650959c replica: table: populate system.commitlog_cleanups on tablet cleanup
To avoid data resurrection after cleanup, we have to filter out the
cleaned mutations during commitlog replay.

In this patch, we get tablet cleanup to record the affected set of mutations
to system.commitlog_cleanups. In a later patch, we will use these records
for filtering during commitlog replay.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
7c5a8894be db: system_keyspace: add system.commitlog_cleanups
Add a system table which will hold records of cleanup operations,
for the purpose of filtering commitlog replays to avoid data
resurrections.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
8bfd078c54 replica: table: refresh compound sstable set after tablet cleanup
If the compound set isn't refreshed, readers will keep seeing
the dataset as it was before the cleanup, which is a bug.
2024-01-24 10:37:38 +01:00
Kefu Chai
207fe93b90 utils: add formatter for rjson::value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for rjson::value, and drop its
operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16956
2024-01-24 10:30:52 +02:00
Gleb Natapov
b97ff54a41 storage service: topology coordinator: call notify_left() when a node leaves a cluster
When the topology coordinator is used for topology changes the gossiper
based code that calls notify_left() is not called. The coordinator needs
to call it itself.
2024-01-24 10:21:01 +02:00
Gleb Natapov
5459a8b9a5 storage_service: drop redundant check from notify_joined()
notify_joined() is called from handle_state_normal only, so there is no
point checking that the state is normal inside the function as well.
2024-01-24 10:17:12 +02:00
Avi Kivity
8ee75ae8f4 sstables: writer: don't require effective_replication_map for sharding metadata
Currently, we pass an effective_replication_map_ptr to sstable_writer,
so that we can get a stable dht::sharder for writing the sharding metadata.
This is needed because with tablets, the sharder can change dynamically.

However, this is both bad and unnecessary:
 - bad: holding on to an effective_replication_map_ptr is a barrier
   for topology operations, preventing tablet migrations (etc) while
   an sstable is being written
 - unnecessary: tablets don't require sharding metadata at all, since
   two tablets cannot overlap (unlike two sstables from different shards in
   the same node). So the first/last key is sufficient to determine the
   shard/tablet ownership.

Given that, just pass the sharder for vnode sstables, and don't generate
sharding metadata for tablet sstables.
2024-01-23 22:23:08 +02:00
Avi Kivity
b88f422a53 schema: provide method to get sharder, iff it is static
The current get_sharder() method only allows getting a static sharder
(since a dynamic sharder needs additional protection). However, it
chooses to abort if someone attempt to get a dynamic sharder.

In one case, it's useful to get a sharder only if it's static, so
provide a method to do that. This is for providing sstable sharding
metadata, which isn't useful with tablets.
2024-01-23 22:20:59 +02:00
Kamil Braun
05643208a8 Merge 'raft topology: move the topology coordinator to a dedicated file' from Piotr Dulikowski
The `topology_coordinator` is a large class (>1000 loc) which resides in
an even larger source file (storage_service.cc, ~7800 loc). This PR
moves the topology_coordinator class out of the storage_service.cc file
in order to improve modularity and recompilation times during
development.

As a first step, the `topology_mutation_builder` and
`topology_node_mutation_builder` classes are also moved from
storage_service.cc to their own, new header/source files as they are an
important abstraction used both by the topology coordinator code and
some other code in storage_service.cc that won't be moved.

Then, the `topology_coordinator` is moved out. The
`topology_coordinator` class is completely hidden in the new
topology_coordinator.cc file and can only be started and waited on to
finish via the new `run_topology_coordinator` function.

Fixes: scylladb/scylladb#16605

Closes scylladb/scylladb#16609

* github.com:scylladb/scylladb:
  service: move topology coordinator to a separate file
  storage_service: introduce run_topology_coordinator function
  service: move topology mutation builder out of storage_service
  storage_service: detemplate topology_node_mutation_builder::set
2024-01-23 20:02:06 +01:00
Kefu Chai
f86a5ae87a streaming: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16947
2024-01-23 19:38:30 +02:00
Kefu Chai
d493f949ca cql3: add formatter for cql3::statements::statement_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cql3::statements::statement_type. and its operator<<() is dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16948
2024-01-23 19:36:24 +02:00
Piotr Dulikowski
c3c3f5c1c8 service: move topology coordinator to a separate file
The topology coordinator is a large class that sits in an even larger
storage_service.cc file. For the sake of code modularization and
reducing recompilation time, move the topology coordinator outside
storage_service.cc.

The topology_coordinator class is moved to the new
topology_coordinator.cc unchanged. Along with it, the following items
are moved:

- wait_for_ip function - it's used both by storage_service and
  topology_coordinator, so in order for the new topology_coordinator.cc
  not to depend on storage service, it is moved to the new file,
- raft_topology logger - for the same reason as wait_for_ip,
- run_topology_coordinator - serves as the main interface for the
  topology coordinator. The topology coordinator class is not exposed at
  all, it's only possible to start the coordinator and wait until it
  shuts down itself via that function.
2024-01-23 17:51:10 +01:00
Avi Kivity
4a57b67634 docs: add a rough diagram of module interaction
It is incomplete and maybe inaccurate, but it is a start.

Closes scylladb/scylladb#16903
2024-01-23 18:08:48 +02:00
Kamil Braun
1824c12975 raft: remove empty() from fsm_output
Nobody remembered to keep this function up to date when adding stuff to
`fsm_output`.

Turns out that it's not being used by any Raft logic but only in some
tests. That use case can now be replaced with `fsm::has_output()` which
is also being used by `raft::server` code.
2024-01-23 16:48:28 +01:00
Kamil Braun
bf6d5309ca test: add test for manual triggering of Raft snapshots 2024-01-23 16:48:28 +01:00
Kamil Braun
617e09137d api: add HTTP endpoint to trigger Raft snapshots
This uses the `trigger_snapshot()` API added in previous commit on a
server running for the given Raft group.

It can be used for example in tests or in the context of disaster
recovery (ref scylladb/scylladb#16683).
2024-01-23 16:48:28 +01:00
Kamil Braun
0eda7a2619 raft: server: add trigger_snapshot API
This allows the user of `raft::server` to ask it to create a snapshot
and truncate the Raft log. In a later commit we'll add a REST endpoint
to Scylla to trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).
2024-01-23 16:48:28 +01:00
David Garcia
77822fc51d chore: add azure and gcp images extensions
Closes scylladb/scylladb#16942
2024-01-23 16:06:40 +02:00
Botond Dénes
e79ea91990 Merge 'Extend query tracing information' from Michał Jadwiszczak
This little patch adds:
- authenticated user to "Processing a statement" tracing log
- name of a semaphore to reader concurrency semaphore logs

The purpose of this patch is to be able to verify parts of query execution to track down issues with service levels.

```
cassandra@cqlsh> select * from ks1.t where a = 1;

 a | b
---+---

(0 rows)

Tracing session: ea7e5ce0-b9f5-11ee-b123-b0816809f2c0

 activity                                                                                                                                     | timestamp                  | source    | source_elapsed | client
----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                           Execute CQL3 query | 2024-01-23 14:47:14.734000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                         Parsing a statement [shard 1/sl:sl1] | 2024-01-23 14:47:14.734126 | 127.0.0.1 |              3 | 127.0.0.1
                                                                    Processing a statement for authenticated user: cassandra [shard 1/sl:sl1] | 2024-01-23 14:47:14.734279 | 127.0.0.1 |            156 | 127.0.0.1
      Creating read executor for token -4069959284402364209 with all: {127.0.0.2} targets: {127.0.0.2} repair decision: NONE [shard 1/sl:sl1] | 2024-01-23 14:47:14.737348 | 127.0.0.1 |           3225 | 127.0.0.1
   Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 1/sl:sl1] | 2024-01-23 14:47:14.737351 | 127.0.0.1 |           3228 | 127.0.0.1
                                                                                  read_data: sending a message to /127.0.0.2 [shard 1/sl:sl1] | 2024-01-23 14:47:14.737358 | 127.0.0.1 |           3236 | 127.0.0.1
                                                                                 read_data: message received from /127.0.0.1 [shard 1/sl:sl1] | 2024-01-23 14:47:14.737593 | 127.0.0.2 |             16 | 127.0.0.1
                                                        Start querying singular range {{-4069959284402364209, 000400000001}} [shard 0/sl:sl1] | 2024-01-23 14:47:14.737676 | 127.0.0.2 |             24 | 127.0.0.1
                                                                  [reader concurrency semaphore sl:sl1] admitted immediately [shard 0/sl:sl1] | 2024-01-23 14:47:14.737684 | 127.0.0.2 |             31 | 127.0.0.1
                                                                        [reader concurrency semaphore sl:sl1] executing read [shard 0/sl:sl1] | 2024-01-23 14:47:14.737688 | 127.0.0.2 |             35 | 127.0.0.1
                                    Querying cache for range {{-4069959284402364209, 000400000001}} and slice {(-inf, +inf)} [shard 0/sl:sl1] | 2024-01-23 14:47:14.737715 | 127.0.0.2 |             63 | 127.0.0.1
 Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) [shard 0/sl:sl1] | 2024-01-23 14:47:14.737724 | 127.0.0.2 |             72 | 127.0.0.1
                                                                                                            Querying is done [shard 0/sl:sl1] | 2024-01-23 14:47:14.737731 | 127.0.0.2 |             79 | 127.0.0.1
                                                                read_data handling is done, sending a response to /127.0.0.1 [shard 1/sl:sl1] | 2024-01-23 14:47:14.738321 | 127.0.0.2 |            743 | 127.0.0.1
                                                                                     read_data: got response from /127.0.0.2 [shard 1/sl:sl1] | 2024-01-23 14:47:14.739148 | 127.0.0.1 |           5026 | 127.0.0.1
                                                                                        Done processing - preparing a result [shard 1/sl:sl1] | 2024-01-23 14:47:14.739196 | 127.0.0.1 |           5074 | 127.0.0.1
                                                                                                                             Request complete | 2024-01-23 14:47:14.739087 | 127.0.0.1 |           5087 | 127.0.0.1

```

Closes scylladb/scylladb#16920

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: add name of semaphore in tracing messages
  cql3:query_processor: add logged user to query tracing info
2024-01-23 16:06:16 +02:00
Piotr Dulikowski
4ad6b6563b storage_service: introduce run_topology_coordinator function
Extracts a part of the logic of the raft_state_monitor_fiber method into
a separate function. It will be moved to a separate file in the next
commit along with the topology coordinator, and will serve as the only
way of interaction with the topology coordinator while the class itself
will remain hidden.

The topology_coordinator class is now directly constructed on the stack
(or rather in the coroutine frame), the indirection via shared_ptr is no
longer needed.
2024-01-23 14:09:12 +01:00
Patryk Wrobel
f15880dc48 compaction_group::stop(): always call compaction_manager.remove()
Before introduction of PR#15524 the removal had always been invoked
via finally() continuation. In spite of making flush() noexcept, the
mentioned PR modified the logic. If flush() returns exceptional future,
then the removal is not performed.

This change restores the old behavior - removal operation is always called.
Since now, the logic of compaction_group::stop() is as follows:
 - firstly, it waits for completion of flush() via
   seastar::coroutine::as_future() to avoid premature exception
 - then it executes compaction_manager.remove()
 - in the end it inspects the future returned from flush()
   to re-throw the exception if the operation failed

Fixed: scylladb#16751

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16940
2024-01-23 14:56:27 +02:00
Botond Dénes
78ec96f5f3 Merge 'alternator: allow empty tag value' from Nadav Har'El
Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests.

Fixes #16904.

Closes scylladb/scylladb#16910

* github.com:scylladb/scylladb:
  test/alternator: add more tests for TagResource
  alternator: allow empty tag value
2024-01-23 13:53:30 +02:00
Botond Dénes
26d814d8be Merge 'Configure initial tablets count scaling' from Pavel Emelyanov
There are currently two options how to "request" the number of initial tables for a table

1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own

Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.

Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.

Closes scylladb/scylladb#16919

* github.com:scylladb/scylladb:
  config: Add --tablets-initial-scale-factor
  tablet_allocator: Add initial tablets scale to config
  tablet_allocator: Add config
2024-01-23 13:25:12 +02:00
Amnon Heiman
50b3078916 main.cc: Add prometheus_allow_protobuf command line
This patch add the prometheus_allow_protobuf command line support.

When set to true, Prometheus will accept protobuf requests and will
reply with protobuf protocol.
This will also enable the experimental Prometheus Native Histograms.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:34 +02:00
Amnon Heiman
95d1146fea histogram_metrics_helper: support native histogram
approx_exponential_histogram uses similar logic to Prometheus native
histogram, to allow Prometheus sending its data in a native histogram
format it needs to report schema and min id (id of the first bucket).

This patch update to_metrics_histogram to set those optional parameters,
leaving it to the Prometheus to decide in what format the histogram will
be reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:34 +02:00
Amnon Heiman
fc9bd2de03 config: Add prometheus_allow_protobuf flag
Native histograms (also known as sparse histograms) are an experimental
Prometheus feature. They use protobuf as the reporting layer.  The
prometheus_allow_protobuf flag allows the user to enable protobuf
protocol. When this flag is set to true, and the Prometheus server sends
in the request that it accepts protobuf, the result will be in protobuf
protocol.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:07 +02:00
Piotr Dulikowski
79c3ed7fdb service: move topology mutation builder out of storage_service
The topology_mutation_builder, topology_node_mutation_builder and
topology_request_tracking_mutation_builder are currently used by
storage service - mainly, but not exclusively, by the topology
coordinator logic. As we are going to extract the topology coordinator
to a separate file, we need to move the builders to their own file as
well so that they will be accessible both by the topology coordinator
and the storage service.
2024-01-23 11:17:46 +01:00
Piotr Dulikowski
6f11651222 storage_service: detemplate topology_node_mutation_builder::set
One of the overloads of `topology_node_mutation_builder::set` is a
template which takes a std::set of things that convert to a sstring.
This was done to support sets of strings of different types (e.g.
sstring, string_view) but it turns out that only sstring is used at the
moment.

De-template the method as it is unnecessary for it to be a template.
Moreover, the `topology_node_mutation_builder` is going to be moved in
the next commit of the PR to a separate file, so not having template
methods makes the task simpler.
2024-01-23 11:17:46 +01:00
Nadav Har'El
830e52008d test/alternator: add more tests for TagResource
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.

These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.

As usual, all the new tests also pass on DynamoDB.

Refs #16904
Refs #16908

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:55:22 +02:00
Nadav Har'El
08b26269d8 alternator: allow empty tag value
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.

While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).

The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.

Fixes #16904.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:26:08 +02:00
Michał Jadwiszczak
49544c47a1 reader_concurrency_semaphore: add name of semaphore in tracing messages 2024-01-23 10:25:34 +01:00
Michał Jadwiszczak
aac90c1f92 cql3:query_processor: add logged user to query tracing info 2024-01-23 10:25:34 +01:00
Nadav Har'El
4d6b286345 test/alternator: add "--vnodes" option to run script
test/cql-pytest/run.py was recently modified to add the "tablets"
experimental feature, so test/alternator/run now also runs Scylla by
default with tablets enabled.

This is the correct default going forward, but in the short term it
would be nice to also have an option to easily do a manual test run
*without* tablets.

So this patch adds a "--vnodes" option to the test/alternator/run script.
This option causes "run" to run Scylla without enabling the "tablets"
experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
c496d60716 alternator: use tablets by default, if available
Before this patch, Alternator tables did not use tablets even if this
feature was available - tablets had to be manually enabled per table
by using a tag. But recently we changed CQL to enable tablets by default
on all keyspaces (when the experimental "tablets" option is turned on),
so this patch does the same for Alternator tables:

1. When the "tablets" experimental feature is on, new Alternator tables
   will use tablets instead of vnodes. They will use the default choice
   of initial_tablets.

2. The same tag that in the past could be used to enable tablets on a
   specific table, now can be used to disable tablets or change the
   default initial_tablets for a specific table at creation time.

Fixes #16355

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
36f14f89df test/alternator: run some tests without tablets
If an Alternator table uses tablets (we'll turn this on in a following
patch), some tests are known to fail because of features not yet
supported with tablets, namely:

  Refs #16317 - Support Alternator Streams with tablets (CDC)
  Refs #16567 - Support Alternator TTL with tablets

This patch changes all tests failing on tablets due to one of these two
known issues to explicitly ask to disable tablets when creating their
test table. This means that at least we continue to test these two
features (Streams and TTL) even if they don't yet work with tablets.

We'll need to remember to remove this override when tablet support
for CDC and Alternator TTL arrives. I left a comment in the right
places in the code with the relevant issue numbers, to remind us what
to change when we fix those issues.

This patch also adds xfail_tablets and skip_tablets fixtures that can
be used to xfail or skip tests when running with tablets - but we
don't use them yet - and may never use them, but since I already wrote
this code it won't hurt having it, just in case. When running without
tablets, or against an older Scylla or on DynamoDB, the tests with
these marks are run normally.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:46:48 +02:00
Botond Dénes
08cf5ccd23 Merge 'Fix test_tablet_missing_data_repair' from Asias He
This PR fixes test_tablet_missing_data_repair and enable the test again.

If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail.  Check nodes see each other as UP before repair.

Closes scylladb/scylladb#16930

* github.com:scylladb/scylladb:
  test: Enable test_tablet_missing_data_repair again
  test: Wait for nodes to be up when repair
  test: Check repair status in ScyllaRESTAPIClient
2024-01-23 10:38:13 +02:00
Anna Stuchlik
9076a944c5 doc: improve the ScyllaDB for Developers page
This commit improves the developer-oriented section
of the core documentation:

- Added links to the developer sections in the new
  Get Started guide (Develop with ScyllaDB and
  Tutorials and Example Projects) for ease of access.

- Replaced the outdated Learn to Use ScyllaDB page with
  a link to the up-to-date page in the Get Started guide.
  This involves removing the learn.rst file and adding
  an appropriate redirection.

- Removed the Apache Copyrights, as this page does not
  need it.

- Removed the Features panel box as there was only one
  feature listed, which looked weird. Also, we are in
  the process of removing the Features section.

Closes scylladb/scylladb#16800
2024-01-23 10:06:31 +02:00
Kefu Chai
ac473eca91 utils:: add formatter for enum_option
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16917
2024-01-23 10:03:51 +02:00
Kefu Chai
91a93b125b utils:: add formatter for cql3::authorized_prepared_statements_cache_key
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cql3::authorized_prepared_statements_cache_key, and remove its
operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16924
2024-01-23 09:13:14 +02:00
Kefu Chai
76b9e4f4f4 locator: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16914
2024-01-23 09:12:23 +02:00
Asias He
99e3d2ce72 test: Enable test_tablet_missing_data_repair again
Fixes #16859
2024-01-23 15:02:02 +08:00
Kefu Chai
db77587309 tracing: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16925
2024-01-23 08:57:11 +02:00
Kefu Chai
26004071b3 configure.py: reenable -Wnarrowing
it seems that the tree builds just fine with this warning enabled.
and narrowing is a potentially unsafe numeric conversion. so let's
enable this warning option.

this change also helps to reduce the difference between the rules
generated by configure.py and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16929
2024-01-23 08:49:25 +02:00
Kefu Chai
5005e0a156 configure.py: s/--std=/-std/
neither clang nor gcc supports the --std flag, they support -std=
though. see https://clang.llvm.org/cxx_status.html and
https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
so, let's use the -std=gnu++20 for the C++20 standard with GNU
extensions.

this change also helps to reduce the difference between the rules
generated by `configure.py` and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16928
2024-01-23 08:48:05 +02:00
Asias He
7c230f17cc test: Wait for nodes to be up when repair
If a node is not UP yet, repair in the test will be a partial repair.
Check nodes see each other as UP before repair.

Fixes #16859
2024-01-23 11:10:08 +08:00
Asias He
57a4e5594d test: Check repair status in ScyllaRESTAPIClient
Raise an exception in case the repair is not successful.
2024-01-23 11:10:08 +08:00
Tomasz Grabiec
06c42681bd tests: tablets: Add tests for removenode and replace 2024-01-23 01:19:42 +01:00
Tomasz Grabiec
e5dcf03b88 tablets: Add support for removenode and replace handling
New tablet replicas are allocated synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes #16690.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
bdd5bdae14 topology_coordinator: tablets: Do not fail in a tight loop
If streaming or cleanup RPC fails, we would retry immediately. That
fills the logs with erorrs. Throttle them by sleeping on error before
the same action is retried.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
a3f6682ba2 topology_coordinator: tablets: Avoid warnings about ignored failured future 2024-01-23 01:18:10 +01:00
Tomasz Grabiec
5fccee3a13 storage_service, topology: Track excluded state in locator::topology
Will be used by tablet load balancer to avoid excluded nodes in
scheduling.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d59db94f3c raft topology: Introduce param-less topology::get_excluded_nodes()
Picks up currently excluded nodes. Will be used during tablet rebuild
on removenode.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d053c5ef1e raft topology: Move get_excluded_nodes() to topology
Will be accessed outside topology coordinator from tablet rebuild handler.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
92f01674f2 tablets: load_balancer: Generalize load tracking
This patch removes some duplication of logic and implicit assumptions
by creating clear algebra for load impact calculation and its
application to state of the load balancer.

Will make adding new kinds of tablet transitions with different impact
on load much easier.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
649ca0e46c tablets: Introduce get_migration_streaming_info() which works on migration request
Will be used by tablet load balancer to compute impact on load of
planned migrations. Currently, the logic is hard coded in the load
balancer and may get out of sync with the logic we have in
get_migration_streaming_info() for already running tablet transitions.

The logic will become more complex for rebuild transition, so use
shared code to compute it.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
6dc56fd80b tablets: Move migration_to_transition_info() to tablets.hh 2024-01-23 01:12:57 +01:00
Tomasz Grabiec
1df256221c tablets: Extract get_new_replicas() which works on migraiton request
Now we have a single place which translates tablet migration request to new
replicas.

Will be reused in other places.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
ae382196f1 tablets: Move tablet_migration_info to tablets.hh
Will add methods which operate on it to tablets.hh where they belong.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
4a06ffb43c tablets: Store transition kind per tablet
Will be used to distinguish regular migration from rebuild, repair and
RF change.
2024-01-23 01:12:57 +01:00
Pavel Emelyanov
d1d4620af8 config: Add --tablets-initial-scale-factor
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:18:18 +03:00
Pavel Emelyanov
eb3b237e05 tablet_allocator: Add initial tablets scale to config
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.

It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.

As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:14:45 +03:00
Pavel Emelyanov
f57b194db0 tablet_allocator: Add config
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:13:58 +03:00
Kamil Braun
3268be3860 raft: server: track last persisted snapshot descriptor index
Also introduce a condition variable notified whenever this index is
updated.

Will be user in following commits.
2024-01-22 16:48:08 +01:00
Kamil Braun
1e786d9d64 raft: server: framework for handling server requests
Add data structures and modify `io_fiber` code to prepare it for
handling requests generated by the `server`, not just `fsm`.
Used in later commits.
2024-01-22 16:47:34 +01:00
Kefu Chai
33794eca19 database: wait until commitlog are reclaimed in flush_all_tables()
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".

active commitlog segments are recycled in the background once they are
discarded.

and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.

the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment

> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.

so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.

Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16915
2024-01-22 17:31:57 +02:00
David Garcia
f3eeba8cc6 docs: parse config.cc properties as rst text
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).

By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.

Closes scylladb/scylladb#16311
2024-01-22 16:40:18 +02:00
Botond Dénes
a48881801a replica/tablets: drop keyspace_name from system.tablets partition-key
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.

Fixes: #16377

Closes scylladb/scylladb#16881
2024-01-22 13:12:02 +01:00
Petr Gusev
6a4176c84f Update seastar submodule
* seastar 8b9ae36b...85359b28 (4):
  > rpc: extend the use_gate until request processing is finished

Fixes scylladb/scylladb#16382

  > scripts: Remove build.sh
  > build: do not install FindProtobuf.cmake
  > net: add missing include

Closes scylladb/scylladb#16883
2024-01-22 11:29:50 +01:00
Kamil Braun
1007ac4956 Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev
Before the patch we called `gossiper.remove_endpoint` for IP-s of the
left nodes. The problem is that in replace-with-same-ip scenario we
called `gossiper.remove_endpoint` for IP which is used by the new,
replacing node. The `gossiper.remove_endpoint` method puts the IP into
quarantine, which means gossiper will ignore all events about this IP
for `quarantine_delay` (one minute by default). If we immediately
replace just replaced node with the same IP again, the bootstrap will
fail since the gossiper events are blocked for this IP, and we won't be
able to resolve an IP for the new host_id.

Another problem was that we called gossiper.remove_endpoint method,
which doesn't remove an endpoint from `_endpoint_state_map`, only from
live and unreachable lists. This means the IP will keep circulating in
the gossiper message exchange between cluster nodes until full cluster
restart.

This patch fixes both of these problems. First, we rely on the fact that
when topology coordinator moves the `being_replaced` node to the left
state, the IP of the `replacing` node is known to all nodes. This means
before removing an IP from the gossiper we can check if this IP is
currently used by another node in the current raft topology. This is
done by constructing the `used_ips` map based on normal and transition
nodes. This map is cached to avoid quadratic behaviour.

Second, we call `gossiper.force_remove_endpoint`, not
`gossiper.remove_endpoint`. This function removes and IP from
`_endpoint_state_map`, as well as from live and unreachable lists.

Closes scylladb/scylladb#16820

* github.com:scylladb/scylladb:
  get_peer_info_for_update: update only required fields in raft topology mode
  get_peer_info_for_update: introduce set_field lambda
  storage_service::on_change: fix indent
  storage_service::on_change: skip handle_state functions in raft topology mode
  test_replace_different_ip: check old IP is removed from gossiper
  test_replace: check two replace with same IP one after another
  storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
2024-01-22 11:25:55 +01:00
Botond Dénes
742bc1bd11 test/topology_experimental_raft: test_tablet.py: disable flaky test
Skip test_tablet_missing_data_repair, it is failing a lot breaking
promotion and CI. Can't revert because the PR introducing it was already
piled on. So disable while investigated.

Refs: #16859

Closes scylladb/scylladb#16879
2024-01-22 11:49:05 +02:00
Avi Kivity
9e8b65f587 chunked_vector: remove range constructor
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.

Make the API more uniform by removing this special constructor.

The only caller, in a test, is adjusted.

Closes scylladb/scylladb#16905
2024-01-22 10:26:15 +02:00
Lakshmi Narayanan Sreethar
a1867986e7 test.py: deduce correct path for unit tests when built with cmake
Fix the path deduction for unit test executables when the source code is
built with cmake.

Fixes #16906

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16907
2024-01-22 10:03:44 +02:00
Nadav Har'El
0bef50ef0c cql-pytest: add "--vnodes" option to "run" script
Running test/cql-pytest/run now defaults to enabling the "tablets"
experimental feature when running Scylla - and tests detect this and
use this feature as appropriate. This is the correct default going
forward, but in the short term it would be nice to also have an
option to easily do a manual test run *without* tablets.

So this patch adds a "--vnodes" option to the test/cql-pytest/run
script. This option causes "run" to run Scylla without enabling the
"tablets" experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16896
2024-01-22 09:35:11 +02:00
Anna Stuchlik
a462b914cb doc: add 2024.1 to the OSS vs. Enterprise matrix
This commit adds the information that
ScyllaDB Enterprise 2024.1 is based
on ScyllaDB Open Source 5.4
to the OSS vs. Enterprise matrix.

Closes scylladb/scylladb#16880
2024-01-22 09:25:08 +02:00
Kefu Chai
9550f29d22 cql3: add formatter for cql3::prepared_cache_key_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for cql3::prepared_cache_key_type
and cql3::prepared_cache_key_type::cache_key_type, and remove
their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16901
2024-01-21 19:12:59 +02:00
Avi Kivity
3092e3a5dc Merge 'doc: improvements to the Create Cluster page' from Anna Stuchlik
This PR:
- Removes the redundant information about previous versions from the Create Cluster page.
- Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB".

(nobackport)

Closes scylladb/scylladb#16885

* github.com:scylladb/scylladb:
  doc: fix the language on the Create Cluster page
  doc: remove reduntant info about old versions
2024-01-21 18:18:32 +02:00
Avi Kivity
5810396ba1 Merge 'Invalidate prepared statements for views when their schema changes.' from Eliran Sinvani
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes https://github.com/scylladb/scylladb/issues/16392

This series also adds a test which fails without this fix and passes when the fix is applied.

Closes scylladb/scylladb#16897

* github.com:scylladb/scylladb:
  Add test for mv prepared statements invalidation on base alter
  query processor: treat view changes at least as table changes
2024-01-21 17:43:49 +02:00
Kefu Chai
d1dd71fbd7 mutation: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16889
2024-01-21 16:58:26 +02:00
Kefu Chai
1ce58595aa dht: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16891
2024-01-21 16:56:16 +02:00
Kefu Chai
45c4f2039b cql3: add formatter for cql3::ut_name
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for cql3::ut_name, and remove
their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16890
2024-01-21 16:53:05 +02:00
Kefu Chai
f916286b25 index: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16892
2024-01-21 16:52:25 +02:00
Kefu Chai
ce076b5ae3 gossiping_property_file_snitch: drop unused using namespace
we don't use any symbol in this namespace, in this function, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16893
2024-01-21 16:48:37 +02:00
Eliran Sinvani
0e5a8cad62 Add test for mv prepared statements invalidation on base alter
Issue #16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-21 15:44:06 +02:00
Eliran Sinvani
5e33d9346b query processor: treat view changes at least as table changes
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes #16392

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-21 15:40:54 +02:00
Anna Stuchlik
652cf1fa70 doc: remove the 5.1-to-2022.2 upgrade guide
This commit removes the 5.1-to-2022.2 upgrade
guide - the upgrade guide for versions we
no longer support.
We should remove it while adding the 5.4-to-2024.1
upgrade guide (the previous commit).
2024-01-19 18:33:08 +01:00
Anna Stuchlik
3c17fca363 doc: add the 5.4-to-2024.1 upgrade guide
This commit adds the upgrade guide from
ScyllaDB Open Source 5.4 to ScyllaDB
Enterprise 2024.1.

The need to include the "Restore system tables"
step in rollback has been confirmed; see
https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959

Fixes https://github.com/scylladb/scylladb/issues/16445
2024-01-19 18:23:37 +01:00
Petr Gusev
5de970e430 get_peer_info_for_update: update only required fields in raft topology mode
Some fields of system.peers table are updated
through raft, we don't need to peek them from gossiper.

The goal of the patch is to declare explicitly
which code is responsible for which fields.
In particular, in raft topology mode we don't
need to update raft-managed fields since
it's done in topology_state_load and
raft_ip_address_updater.
2024-01-19 20:37:12 +04:00
Petr Gusev
f51f843b67 get_peer_info_for_update: introduce set_field lambda
This is a refactoring commit. In the next commit
we'll add a parameter to this unified lambda and
this is easy to do if we have only one lambda and
not three.
2024-01-19 20:37:12 +04:00
Petr Gusev
37063e2432 storage_service::on_change: fix indent 2024-01-19 20:37:12 +04:00
Petr Gusev
8e6b569de5 storage_service::on_change: skip handle_state functions in raft topology mode
We don't need them in raft topology mode since the token_metadata
update happens in topology_state_load function. We lift the
_raft_topology_change_enabled checks from those functions to on_change.
2024-01-19 20:37:12 +04:00
Petr Gusev
1e00889842 test_replace_different_ip: check old IP is removed from gossiper
In this commit we modify the existing
test_replace_different_ip. We add the check that the old
IP is not contained in alive or down lists, which
means it's completely wiped from gossiper. This test is failing
without the force_remove_endpoint fix from
a previous commit. We also check that the state of
local system.peers table is correct.
2024-01-19 20:36:52 +04:00
Anna Stuchlik
d345a893d6 doc: fix the language on the Create Cluster page
This commit fixes language mistakes on
the Create Cluster page, and replaces
"Scylla" with "ScyllaDB".
2024-01-19 17:21:12 +01:00
Anna Stuchlik
af669dd7ae doc: remove reduntant info about old versions
This commit removes the information about
old versions, which is reduntant in the next
upcoming version.
2024-01-19 17:06:34 +01:00
Anna Stuchlik
b1ba904c49 doc: remove upgrade for unsupported versions
This commit removes the upgrade guides
from ScyllaDB Open Source to Enterprise
for versions we no longer support.

In addition, it removes a link to
one of the removed pages from
the Troubleshooting section (the link is
redundant).

Closes scylladb/scylladb#16249
2024-01-19 15:59:35 +02:00
Mikołaj Grzebieluch
c589793a9e test.py: test_maintenance_socket: remove pytest.xfail
Issue https://github.com/scylladb/python-driver/issues/278 was fixed in
https://github.com/scylladb/python-driver/pull/279.

Closes scylladb/scylladb#16873
2024-01-19 14:54:15 +01:00
Botond Dénes
b50d9bb802 Merge 'Add code coverage support' from Eliran Sinvani
This mini-set includes code coverage support for ScyllaDB, it provides:
1. Support for building ScyllaDB with coverage support.
2. Utilities for processing coverage profiling data
3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports.

Refs #16323

Closes scylladb/scylladb#16784

* github.com:scylladb/scylladb:
  Add code coverage documentation
  test.py: support code coverage
  code coverage: Add libraries for coverage handling
  test.py: support --coverage and --coverage-mode
  configure.py support coverage profiles on standrad build modes
2024-01-19 15:27:44 +02:00
Pavel Emelyanov
e62114214f Merge 'More logging for Raft-based topology' from Kamil Braun
Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events.

Closes scylladb/scylladb#16749

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: enable raft_topology=debug level by default
  raft topology: increase level of some TRACE messages
  raft topology: log when entering transition states
  raft topology: don't include null ID in exclude_nodes
  raft topology: INFO log when executing global commands and updating topology state
  storage_service: separate logger for raft topology
2024-01-19 16:19:44 +03:00
Nadav Har'El
debf6753c7 Merge 'test/cql-pytest: run tests with tablets' from Botond Dénes
Add `--experimental-features=tablets` to both `test/cql-pytest/suite.yaml` and `test/cql-pytest/run.py`, so tablets are enabled. Detect tablet support in `contest.py` and add an xfail and skip marker to mark tests that fail/crash with tablets. These are expected to be fixed soon.

Some tests checking things around alter-keyspace, had to force-disable tablets on the created keyspace, because tablets interfere with the test (a keyspace with tablets cannot have simple strategy for example).
Tablets were also interfering with `test_keyspace.py:test_storage_options_local`, because it is expecting `system_schema.scylla_keyspaces` to not have any entries for local storage keyspace, but they have it if tablets are enabled. Adjust the test to account for this.

Closes scylladb/scylladb#16840

* github.com:scylladb/scylladb:
  test/cql-pytest: run.py,suite.yaml: enable tablets by default
  test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed
  test/cql-pytest: disable tablets for some keyspace-altering tests
  test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets
  test/cql-pytest: fix test_tablets.py to set initial_tablets correctly
  test/cql-pytest: add tablet detection logic and fixtures
  test/cql-pytest: extract is_scylla check into util.py
2024-01-19 13:38:56 +02:00
Kamil Braun
cc039498c6 Update tools/cqlsh submodule
* tools/cqlsh 426fa0ea...b8d86b76 (8):
  > Make cqlsh work with unix domain sockets

Fixes scylladb/scylladb#16489

  > Bump python-driver version
  > dist/debian: add trailer line
  > dist/debian: wrap long line
  > Draft: explicit build-time packge dependencies
  > stop retruning status_code=2 on schema disagreement
  > Fix minor typos in the code
  > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages
2024-01-19 11:23:22 +01:00
Botond Dénes
04881b3915 test/cql-pytest: run.py,suite.yaml: enable tablets by default
All the preparations are done, the tests can now run with tablets.
2024-01-19 03:46:38 -05:00
Botond Dénes
075be5a04a test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed
For tests that cover functionality, which doesn't yet work with tablets.
These tests and the respective functionality they test, are expected to
be fixed soon, and then these fixtures will be removed.
2024-01-19 03:46:38 -05:00
Botond Dénes
6e6bee4368 test/cql-pytest: disable tablets for some keyspace-altering tests
When tablets are enabled on a keyspace, they cannot be altered to simple
replication strategy anymore.
These keyspaces are testing exactly that, so disable tablets on the
initial keyspace create statements.
2024-01-19 03:46:38 -05:00
Botond Dénes
5f11aa940d test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets
This test expects a keyspace with local storage option, to not have a
row in system_schema.scylla_keyspace. With tablets enabled by default,
this won't be the case. Adjust the test to check for the specific
storage-related columns instead.
2024-01-19 03:46:38 -05:00
Nadav Har'El
f92d2b4928 test/cql-pytest: fix test_tablets.py to set initial_tablets correctly
Recently, in commit 49026dc319, the
way to choose the number of tablets in a new keyspace changed.
This broke the test we had for a memory leak when many tablets were
used, which saw the old syntax wasn't recognized and assumed Scylla
is running without tablet support - so the test was skipped.

Let's fix the syntax. After this patch the test passes if the tablets
experimental feature is enabled, and only skipped if it isn't.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-19 03:46:38 -05:00
Botond Dénes
2119faf7fe test/cql-pytest: add tablet detection logic and fixtures
Add keyspace_has_tablets() utility function, which, given a keyspace,
returns whether it is using tablets or not.
In addition, 3 new fixtures are added:
* has_tablets - does scylla has tablets by default?
* xfail_tablets - the test is marked xfail, when tablets are enabled by
  default.
* skip_with_tablets - the test is skipped when tablets are enabled by
  default, because it might crash with tablets.

We expect the latter two to be removed soon(ish), as we make all test,
and the functionality they test work with tablets.
2024-01-19 03:46:38 -05:00
Botond Dénes
6e53264bc3 test/cql-pytest: extract is_scylla check into util.py
This logic is currently in the scylla_only fixture, but we want to
re-use this in other utility functions in the next patches too.
2024-01-19 03:46:38 -05:00
Petr Gusev
070de5c551 test_replace: check two replace with same IP one after another
This is a test case for the problem, described in the
previous commit. Before that fix the second replace
failed since it couldn't resolve an IP for the new host_id.
2024-01-19 12:24:04 +04:00
Petr Gusev
30b2e5838c storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
Before the patch we called gossiper.remove_endpoint for IP-s
of the left nodes. The problem is that in replace-with-same-ip
scenario we called gossiper.remove_endpoint for IP which is
used by the new, replacing node. The gossiper.remove_endpoint
method puts the IP into quarantine, which means gossiper will
ignore all events about this IP for quarantine_delay (one minute by
default). If we immediately replace just replaced node with
the same IP again, the bootstrap will fail since the gossiper
events are blocked for this IP, and we won't be able to
resolve an IP for the new host_id.

Another problem was that we called gossiper.remove_endpoint
method, which doesn't remove an endpoint from _endpoint_state_map,
only from live and unreachable lists. This means the IP
will keep circulating in the gossiper message exchange between cluster
nodes until full cluster restart.

This patch fixes both of these problems. First, we rely on
the fact that when topology coordinator moves the being_replaced
node to the left state, the IP of the replacing node is known to all nodes.
This means before removing an IP from the gossiper we can check if
this IP is currently used by another node in the current raft topology.
This is done by constructing the used_ips map based on normal and
transition nodes. This map is cached to avoid quadratic behaviour.

Second, we call gossiper.force_remove_endpoint, not
gossiper.remove_endpoint. This function removes and IP from
_endpoint_state_map, as well as from live and unreachable lists.

The tests for both of these improvements will be added in subsequent
commits.
2024-01-19 12:24:04 +04:00
Kefu Chai
0dbb0ed09f api: storage_service: correct a typo
s/trough/through/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16870
2024-01-19 10:21:41 +02:00
Kefu Chai
5c0484cb02 db: add formatter for db::operation_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for db::operation_type, and
remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16832
2024-01-19 10:16:41 +02:00
Kefu Chai
2d2cd5fa3a repair: do not compare unsigned with signed
this change should silence the warning like

```
/home/kefu/dev/scylladb/repair/repair.cc:222:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare]
  222 |     for (int i = 0; i < all.size(); i++) {
      |                     ~ ^ ~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16867
2024-01-19 08:52:02 +02:00
Kefu Chai
21d55abe8b unimplemented: add format_as() for unimplemented::cause
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we replace operator<< with format_as() for
unimplemented::cause, so that we don't rely on the deprecated behavior,
and neither do we create a fully blown fmt::formatter. as in
fmt v10, format_as() can be used in place of fmt::formatter,
while in fmt v9, format_as() is only allowed to return a integer.
so, to be future-proof, and to be simpler, format_as() is used.
we can even replace `format_as(c)` with `c`, once fmt v10 is
available in future.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16866
2024-01-19 08:38:30 +02:00
Botond Dénes
70252ee36f Merge 'auth: do not include unused headers' from Kefu Chai
these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning.

Closes scylladb/scylladb#16868

* github.com:scylladb/scylladb:
  auth: do not include unused headers
  locator: Handle replication factor of 0 for initial_tablets calculations
  table: add_sstable_and_update_cache: trigger compaction only in compaction group
  compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
  compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
2024-01-19 08:30:11 +02:00
Kefu Chai
263e2fabae auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-19 10:49:17 +08:00
Avi Kivity
d65ce16cf6 Merge 'Prevent empty compaction tasks in cleanup, upgrade sstables, and add_sstable' from Benny Halevy
This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do.
This is possible in the clean path by:
- improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted
- checking that list of sstables selected for cleanup isn't empty before creating the cleanup task

For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty.

For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them.
Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage.

Refs scylladb/scylladb#15673
Refs scylladb/scylladb#16694
Fixes scylladb/scylladb#16803

Closes scylladb/scylladb#16808

* github.com:scylladb/scylladb:
  table: add_sstable_and_update_cache: trigger compaction only in compaction group
  compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
  compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
2024-01-18 19:47:33 +02:00
Pavel Emelyanov
8595d64d01 locator: Handle replication factor of 0 for initial_tablets calculations
When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc,
but the denominator in it can be configured to be literally zero and the
division doesn't work.

Fix by assuming zero tablets for dcs with zero rf

fixes: #16844

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16861
2024-01-18 19:42:08 +02:00
Kamil Braun
8d9b0a6538 raft: server: inline poll_fsm_output 2024-01-18 18:09:13 +01:00
Kamil Braun
754a7b54e4 raft: server: fix indentation 2024-01-18 18:09:11 +01:00
Kamil Braun
527780987b raft: server: move io_fiber's processing of batch to a separate function 2024-01-18 18:09:02 +01:00
Kamil Braun
3e6b4910a6 raft: move poll_output() from fsm to server
`server` was the only user of this function and it can now be
implemented using `fsm`'s public interface.

In later commits we'll extend the logic of `io_fiber` to also subscribe
to other events, triggered by `server` API calls, not only to outputs
from `fsm`.
2024-01-18 18:07:52 +01:00
Kamil Braun
95b6a60428 raft: move _sm_events from fsm to server
In later commits we will use it to wake up `io_fiber` directly from
`raft::server` based on events generated by `raft::server` itself -- not
only from events generated by `raft::fsm`.

`raft::fsm` still obtains a reference to the condition variable so it
can keep signaling it.
2024-01-18 18:07:44 +01:00
Kamil Braun
a83e04279e raft: fsm: remove constructor used only in tests
This constructor does not provide persisted commit index. It was only
used in tests, so move it there, to the helper `fsm_debug` which
inherits from `fsm`.

Test cases which used `fsm` directly instead of `fsm_debug` were
modified to use `fsm_debug` so they can access the constructor.
`fsm_debug` doesn't change the behavior of `fsm`, only adds some helper
members. This will be useful in following commits too.
2024-01-18 18:07:17 +01:00
Kamil Braun
689d59fccd raft: fsm: move trace message from poll_output to has_output
In a later commit we'll move `poll_output` out of `fsm` and it won't
have access to internals logged by this message (`_log.stable_idx()`).

Besides, having it in `has_output` gives a more detailed trace. In
particular we can now see values such as `stable_idx` and `last_idx`
from the moment of returning a new fsm output, not only when poll
started waiting for it (a lot of time can pass between these two
events).
2024-01-18 18:06:55 +01:00
Kamil Braun
f6d43779af raft: fsm: extract has_output()
Also use the more efficient coroutine-specific
`condition_variable::when` instead of `wait`.
2024-01-18 18:06:27 +01:00
Kamil Braun
dccfd09d83 raft: pass max_trailing_entries through fsm_output to store_snapshot_descriptor
This parameter says how many entries at most should be left trailing
before the snapshot index. There are multiple places where this
decision is made:
- in `applier_fiber` when the server locally decides to take a snapshot
  due to log size pressure; this applies to the in-memory log
- in `fsm::step` when the server received an `install_snapshot` message
  from the leader; this also applies to the in-memory log
- and in `io_fiber` when calling `store_snapshot_descriptor`; this
  applies to the on-disk log.

The logic of how many entries should be left trailing is calculated
twice:
- first, in `applier_fiber` or in `fsm::step` when truncating the
  in-memory log
- and then again as the snapshot descriptor is being persisted.

The logic is to take `_config.snapshot_trailing` for locally generated
snapshots (coming from `applier_fiber`) and `0` for remote snapshots
(from `fsm::step`).

But there is already an error injection that changes the behavior of
`applier_fiber` to leave `0` trailing entries. However, this doesn't
affect the following `store_snapshot_descriptor` call which still uses
`_config.snapshot_trailing`. So if the server got restarted, the entries
which were truncated in-memory would get "revived" from disk.
Fortunately, this is test-only code.

However in future commits we'd like to change the logic of
`applier_fiber` even further. So instead of having a separate
calculation of trailing entries inside `io_fiber`, it's better for it to
use the number that was already calculated once. This number is passed to
`fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then
be received by `io_fiber` from `fsm_output` to use it inside
`store_snapshot_descriptor`.
2024-01-18 18:05:45 +01:00
Kamil Braun
40cd91cff7 raft: server: pass *_aborted to set_exception call
This looks like a minor oversight, in `server_impl::abort` there are
multiple calls to `set_exception` on the different promises, only one of
them would not receive `*_aborted`.
2024-01-18 18:05:18 +01:00
Kefu Chai
09a688d325 sstables: do not use lambda when not necessary
before this change, we always reference the return value of
`make_reader()`, and the return value's type `flat_mutation_reader_v2`
is movable, so we can just pass it by moving away from it.

in this change, instead of using a lambda, let's just have the
return value of it. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16835
2024-01-18 15:54:49 +02:00
Kefu Chai
a1dcddd300 utils: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16833
2024-01-18 12:50:06 +02:00
Asias He
d3efb3ab6f storage_service: Set session id for raft_rebuild
Raft rebuild is broken because the session id is not set.

The following was seen when run rebuild

stream_session - [Stream #8cfca940-afc9-11ee-b6f1-30b8f78c1451]
stream_transfer_task: Fail to send to 127.0.70.1:0:
seastar::rpc::remote_verb_error (Session not found:
00000000-0000-0000-0000-000000000000)

with raft topology, e.g.,

scylla --enable-repair-based-node-ops 0 --consistent-cluster-management true --experimental-features consistent-topology-changes

Fix by setting the session id.

Fixes #16741

Closes scylladb/scylladb#16814
2024-01-18 12:47:20 +02:00
Kamil Braun
e4918c0d31 test/pylib: scylla_cluster: enable raft_topology=debug level by default
To help debugging test.py failures in CI.
2024-01-18 11:24:16 +01:00
Kamil Braun
52e67ca121 raft topology: increase level of some TRACE messages
Increased them to DEBUG level, and in one case to WARN (inside an
exception handler).

The selected messages are still relatively rare (per-node per-transition
control plane events, plus events such as fibers sleeping and waking up)
although more low level. They are also small messages. Messages that are
large such as those which print all tokens of nodes or large mutations
are left on TRACE level.

The plan is to enable DEBUG level logging in test.py tests for
raft_topology, while not spamming the logs completely such as by
printing large mutations.
2024-01-18 11:24:16 +01:00
Kamil Braun
92e6604127 raft topology: log when entering transition states
Those are rare control plane events, but might be useful when debugging
problems with topology coordinator (e.g. where it got stuck).
2024-01-18 11:24:15 +01:00
Kamil Braun
aeb53ea31d raft topology: don't include null ID in exclude_nodes
Observed with newly added logs:
```
raft topology - executing global topology command barrier_and_drain, excluded nodes: {00000000-0000-0000-0000-000000000000}
```
2024-01-18 11:24:15 +01:00
Kamil Braun
ae25f703c4 raft topology: INFO log when executing global commands and updating topology state
Those are rare control plane events, but useful for debugging e.g.  if
topology coordinator gets stuck at some point.
2024-01-18 11:24:15 +01:00
Kamil Braun
71957b4320 storage_service: separate logger for raft topology
Allows selectively enabling higher logging levels for just raft-topology
related things, without doing it for the entire storage_service (which
includes things like gossiper callbacks).

Also gets rid of the redundant "raft topology:" prefix which was also
not included everywhere.
2024-01-18 11:24:14 +01:00
Eliran Sinvani
32d8dadf1a Add code coverage documentation
Add `docs/dev/code-coverage.md` with explanations about how to work with
the different tools added for coverage reporting and cli options added
to `configure.py` and `test.py`

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
c7dff1b81b test.py: support code coverage
test.py already support the routing of coverage data into a
predetermined folder under the `tmpdir` logs folder. This patch extends
on that and leverage the code coverage processing libraries to produce
test coverage lcov files and a coverage summary at the end of the run.
The reason for not generating the full report (which can be achieved
with a one liner through the `coverage_utils.py` cli) is that it is
assumed that unit testing is not necessarily the "last stop" in the
testing process and it might need to be joined with other coverage
information that is created at other testing stages (for example dtest).

The result of this patch is that when running test.py with one of the
coverage options (`--coverage` / `--mode-coverage`) it will perform
another step of processing and aggregating the profiling information
created.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
00a55abdd6 code coverage: Add libraries for coverage handling
Coverage handling is divided into 3 steps:
1. Generation of  profiling data from a run of an instrumented file
   (which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
   producing the data in some format that can be manipulated and
   unified.
3. Generate some reporting based on this data.

The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
   data, it also contains a cli for the (assumed) most common operations
   that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
   which is a textual form containing a source dependant coverage data.
   An example of such manipulation can be `coverage diff` operation
   which produces a set like difference operation. cov_a - cov_b = diff
   where diff is an lcov formated file containing coverage data for code
   cov_a that is not covered at all in cov_b.

The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.

This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
f4b6c9074a test.py: support --coverage and --coverage-mode
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.

For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
             coverage.
--coverage-mode - which can be used to "turn on" coverage support only
                  for some of the modes in this run.

The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.

Also added support of  'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
759d70deee configure.py support coverage profiles on standrad build modes
We already have a dedicated coverage build, however, this build is
dedicated mostly for coverage in boost and standalone unit tests.
This added configuration option will compile every configured
build mode with coverage profiling support (excluding 'coverage' mode).
It also does targeted profiling that is narrowed down only to ScyllaDB
code and doesn't instrument seastar and testing code, this should give
a more accurate coverage reporting and also impact performance less, as
one example, the reactor loop in seastar will not be profiled (along
with everything else).
The targeted profiling is done with the help of the newly added
`coverage_sources.list` file which excludes all seastar sub directories
from the profiling.
Also an extra measure is taken to make sure that the seastar
library will not be linked with the coverage framework
(so it will not dump confusing empty profiles).
Some of the seastar headers are still going to be included in the
profile since they are indirectly included by profiled source files in
order to remove them from the final report a processing step on the
resulting profile will need to take place.

A note about expected performance impact:
It is expected to have minimal impact on performance since the
instrumentation adds counter increments without locking.
Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update
This means that the numbers themselves are less reliable but all covered
lines are guarantied to have at least non-zero value.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Kefu Chai
f5d1836a45 types: fix indent
f344e130 failed to get the indent right, so fix it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16834
2024-01-18 09:14:39 +02:00
Botond Dénes
8087bc72f0 Merge 'Basic tablet repair support' from Asias He
This patch adds basic tablet repair support.

Below is an example showing how tablet repairs works. The `nodetool
repair -pr` cmd was performed on all the nodes, which makes sure no duplication
repair work will be performed and each tablet will be repaired exactly once.

Three nodes in the cluster. RF = 2. 16 initial tablets.

Tablets:
```
cqlsh> SELECT  * FROM system.tablets;

 keyspace_name | table_id                             | last_token           | table_name | tablet_count | new_replicas | replicas                                                                               | session | stage
---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+-------
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -8070450532247928833 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -6917529027641081857 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -5764607523034234881 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -4611686018427387905 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -3458764513820540929 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -2305843009213693953 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -1152921504606846977 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |                   -1 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  1152921504606846975 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  2305843009213693951 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  3458764513820540927 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  4611686018427387903 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  5764607523034234879 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  6917529027641081855 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  8070450532247928831 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] |    null |  null
```

node1:
```
$nodetool repair -p 7199 -pr ks1 standard1
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true
[shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true
[shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true
[shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true
[shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s
```

node2:
```
$nodetool repair -p 7200 -pr ks1 standard1
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true
[shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s
```
node3:
```
$nodetool repair -p 7300 -pr ks1 standard1
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true
[shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true
[shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true
[shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s
```

Fixes #16599

Closes scylladb/scylladb#16600

* github.com:scylladb/scylladb:
  test: Add test_tablet_missing_data_repair
  test: Add test_tablet_repair
  test: Allow timeout in server_stop_gracefully
  test: Increase STOP_TIMEOUT_SECONDS
  repair: Wire tablet repair with the user repair request
  repair: Pass raft_address_map to repair service
  repair: Add host2ip_t type
  repair: Add finished user-requested log for vnode table too
  repair: Log error in the rpc_stream_handler
  repair: Make row_level repair work with tablet
  repair: Add get_dst_shard_id
  repair: Add shard to repair_node_state
  repair: Add shard map to repair_neighbors
2024-01-18 09:13:00 +02:00
Asias He
1399dc0ff2 test: Add test_tablet_missing_data_repair
The test verifies repair brings the missing rows to the owner.

- Shutdown part of the nodes in the cluster
- Insert data
- Start all nodees
- Run repair
- Shutdown part of the nodes
- Check all data is present
2024-01-18 08:49:06 +08:00
Asias He
bfe5894a9f test: Add test_tablet_repair
A basic repair test that verifies tablet repair works.
2024-01-18 08:49:06 +08:00
Asias He
39912d7bed test: Allow timeout in server_stop_gracefully
The default is 60s. Sometimes it takes more than 60s to stop a node for
some reason.
2024-01-18 08:49:06 +08:00
Asias He
276b04a572 test: Increase STOP_TIMEOUT_SECONDS
It is observed that the stop of scylla took more than 60s to finish in
some cases. Increase the hard coded stop timeout.
2024-01-18 08:49:06 +08:00
Asias He
54239514af repair: Wire tablet repair with the user repair request
Currently, only the table and primary replica selection options are
supported.

Reject repair request if the repair options are not supported yet.

With this patch, users can repair tablet tables by running

    nodetool repair -pr myks mytable

on each node in the cluster, so that each tablet will be repaired only
once without duplication work.

Below is an example showing how tablet repairs works. The `nodetool
repair -pr` cmd was performed on all the nodes. Three nodes in the
cluster. RF = 2. 16 initial tablets.

Tablets:

cqlsh> SELECT  * FROM system.tablets;

 keyspace_name | table_id                             | last_token           | table_name | tablet_count | new_replicas | replicas                                                                               | session | stage
---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+-------
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -8070450532247928833 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -6917529027641081857 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -5764607523034234881 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -4611686018427387905 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -3458764513820540929 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -2305843009213693953 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -1152921504606846977 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |                   -1 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  1152921504606846975 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  2305843009213693951 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  3458764513820540927 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  4611686018427387903 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  5764607523034234879 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  6917529027641081855 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  8070450532247928831 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] |    null |  null

node1:
$nodetool repair -p 7199 -pr ks1 standard1
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true
[shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true
[shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true
[shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true
[shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s

node2:
$nodetool repair -p 7200 -pr ks1 standard1
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true
[shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s

node3:
$nodetool repair -p 7300 -pr ks1 standard1
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true
[shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true
[shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true
[shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s

Fixes #16599
2024-01-18 08:49:06 +08:00
Asias He
93028f4848 repair: Pass raft_address_map to repair service
It is needed to translate hostid to ip address.
2024-01-18 08:49:06 +08:00
Asias He
194e870996 repair: Add host2ip_t type
It is used to translate hostid to ip address in repair code.
2024-01-18 08:49:06 +08:00
Asias He
637b8e4f51 repair: Add finished user-requested log for vnode table too 2024-01-18 08:49:06 +08:00
Asias He
b24f6fbc92 repair: Log error in the rpc_stream_handler
It is useful for debug when the handler goes wrong. In addition to send
the error back to the peer. Log the error as well.
2024-01-18 08:49:06 +08:00
Asias He
fd774862be repair: Make row_level repair work with tablet
Since a given tablet belongs to a single shard on both repair master and repair
followers, row level repair code needs to be changed to work on a single
shard for a given tablet. In order to tell the repair followers which
shard to work on, a dst_cpu_id value is passed over rpc from the repair
master.
2024-01-18 08:49:06 +08:00
Asias He
e1f68ea64a repair: Add get_dst_shard_id
A helper to get the dst shard id on the repair follower.

If the repair master specifies the shard id for the follower, use it.
Otherwise, the follower chooses one itself.
2024-01-18 08:49:06 +08:00
Asias He
2e8c6ebfca repair: Add shard to repair_node_state
It is used to specify the shard id that repair instance runs on.
2024-01-18 08:49:06 +08:00
Asias He
16349be37e repair: Add shard map to repair_neighbors
It is used to specify the shard id that repair instance should run repair
on.
2024-01-18 08:49:06 +08:00
Avi Kivity
394ef13901 build: regenerate frozen toolchain for tablets-aware Python driver
Pull in scylla-driver 3.26.5, which supports tablets.

Closes scylladb/scylladb#16829
2024-01-17 22:47:36 +02:00
Kefu Chai
0ae81446ef ./: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16766
2024-01-17 16:30:14 +02:00
Kamil Braun
787b24cd24 Merge 'raft topology: join: shut down a node on error in response handler' from Patryk Jędrzejczak
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.

Additionally, we ensure that if the first join request response
was a rejection or the node failed while handling it, the
following acceptances by the (possibly different) coordinator
don't succeed. The node considers the join operation as failed.
We shouldn't add it to the cluster.

Fixes scylladb/scylladb#16333

Closes scylladb/scylladb#16650

* github.com:scylladb/scylladb:
  topology_coordinator: clarify warnings
  raft topology: join: allow only the first response to be a succesful acceptance
  storage_service: join_node_response_handler: fix indentation
  raft topology: join: shut down a node on error in response handler
2024-01-17 14:55:26 +01:00
Botond Dénes
f22fc88a64 Merge 'Configure service levels interval' from Michał Jadwiszczak
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.

This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.

Closes scylladb/scylladb#16394

* github.com:scylladb/scylladb:
  test:cql-pytest: change service levels intervals in tests
  configure service levels interval
2024-01-17 12:24:49 +02:00
Benny Halevy
0d937f3974 table: add_sstable_and_update_cache: trigger compaction only in compaction group
There is no need to trigger compaction in all compaction
groups when an sstable is added to only one of them.

And with that level of control, if the caller passes
sstables::offstrategy::yes, we know it will
trigger offstrategy compaction later on so there
is no need to trigger compaction at all
for this sstable at this time.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 12:13:17 +02:00
Benny Halevy
51a46aa83b compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
Prevent the creation of a compaction task when
the list of sstables is known to be empty ahead
of time.

Refs scylladb/scylladb#16694
Fixes scylladb/scylladb#16803

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
Benny Halevy
bd1d65ec38 compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
3b424e391b introduced a loop
in `perform_cleanup` that waits until all sstables that require
cleanup are cleaned up.

However, with f1bbf705f9,
an sstable that is_eligible_for_compaction (i.e. it
is not in staging, awaiting view update generation),
may already be compacted by e.g. regular compaction.
And so perform_cleanup should interrupt that
by calling try_perform_cleanup, since the latter
reevaluates `update_sstable_cleanup_state` with
compaction disabled - that stops ongoing compactions.

Refs scylladb/scylladb#15673

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
David Garcia
f555a2cb05 docs: dynamic include based on flag
docs: extend include options

Closes scylladb/scylladb#16753
2024-01-17 09:33:40 +02:00
Calle Wilund
af0772d605 commitlog: Add wait_for_pending_deletes
Refs #16757

Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.

Test included.

Closes scylladb/scylladb#16801
2024-01-17 09:30:55 +02:00
Kefu Chai
84a9d2fa45 add formatter for auth::role_or_anonymous
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for auth::role_or_anonymous,
and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16812
2024-01-17 09:28:13 +02:00
Kefu Chai
3f0fbdcd86 replica: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16810
2024-01-17 09:27:09 +02:00
Tomasz Grabiec
3d76aefb98 Merge "Enhance topology request status tracking" from Gleb
Currently to figure out if a topology request is complete a submitter
checks the topology state and tries to figure out from that the status
of the request. This is not exact. Lets look at rebuild handling for
instance. To figure out if request is completed the code waits for
request object to disappear from the topology, but if another rebuild
starts between the end of the previous one and the code noticing that
it completed the code will continue waiting for the next rebuild.
Another problem is that in case of operation failure there is no way to
pass an error back to the initiator.

This series solves those problems by assigning an id for each request and
tracking the status of each request in a separate table. The initiator
can query the request status from the table and see if the request was
completed successfully or if it failed with an error, which is also
evadable from the table.

The schema for the table is:

    CREATE TABLE system.topology_requests (
        id timeuuid PRIMARY KEY,

        initiating_host uuid,
        start_time timestamp,

        done boolean,
        error text,
        end_time timestamp,
    );

and all entries have TTL of one month.
2024-01-17 00:37:19 +01:00
Benny Halevy
d6071945c8 compaction, table: ignore foreign sstables replay_position
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.

Therefore, validate the originating host and shard
before using it in compaction or table truncate.

Fixes #10080

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16550
2024-01-16 18:45:59 +02:00
Benny Halevy
7a7a1db86b sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets
And call on_internal_error if process_upload_dir
is called for tablets-enabled keyspace as it isn't
supported at the moment (maybe it could be in the future
if we make sure that the sstables are confined to tablets
boundaries).

Refs #12775
Fixes #16743

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16788
2024-01-16 18:43:52 +02:00
Gleb Natapov
9a7243d71a storage_service: topology coordinator: Consolidate some mutation builder code 2024-01-16 17:02:54 +02:00
Gleb Natapov
a145a73136 storage_service: topology coordinator: make topology operation rollback error more informative
Include an error which caused the rollback.
2024-01-16 17:02:54 +02:00
Gleb Natapov
bf91eb37f2 storage_service: topology coordinator: make topology operation cancellation error more informative
Include the list of nodes that were down when cancellation happened.
2024-01-16 17:02:54 +02:00
Gleb Natapov
8beb399b72 storage_service: topology coordinator: consolidate some code in cancel_all_requests
There is a code duplication that can be avoided.
2024-01-16 17:02:54 +02:00
Gleb Natapov
fba6877b3e storage_service: topology coordinator: TTL topology request table
To prevent topology_request table growth TTL all writes to expire after
a month.
2024-01-16 17:02:54 +02:00
Gleb Natapov
d576ed31dc storage_service: topology request: drop explicit shutdown rpc
Now that we have explicit status for each request we may use it to
replace shutdown notification rpc. During a decommission, in
left_token_ring state, we set done to true after metadata barrier
that waits for all request to the decommissioning node to complete
and notify the decommissioning node with a regular barrier. At this
point the node will see that the request is complete and exit.
2024-01-16 17:02:54 +02:00
Gleb Natapov
84197ff735 storage_service: topology coordinator: check topology operation completion using status in topology_requests table
Instead of trying to guess if a request completed by looking into the
topology state (which is sometimes can be error prone) look at the
request status in the new topology_requests. If request failed report
a reason for the failure from the table.
2024-01-16 17:02:54 +02:00
Kefu Chai
0092700ad1 memtable: add formatter for replica::{memtable,memtable_entry}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for replica::memtable and
replica::memtable_entry, and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16793
2024-01-16 16:46:52 +02:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Avi Kivity
a9844ed69a Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El
The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass.

The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place.

Fixes #16598

Closes scylladb/scylladb#16670

* github.com:scylladb/scylladb:
  view: revert cleanup filter that doesn't work with tablets
  mv: sleep a bit before view-update-generator restart
2024-01-16 16:42:20 +02:00
Gleb Natapov
1c18476385 storage_service: topology coordinator: update topology_requests table with requests progress
Make topology coordinator update request's status in topology_requests table as it changes.
2024-01-16 15:35:18 +02:00
Benny Halevy
e277ec6aef force_keyspace_cleanup: skip keyspaces that do not require or support cleanup
Local keyspaces do not need cleanup, and
keyspaces configured with tablets, where their
replication strategy is per-table do not support
cleanup.

In both cases, just skip their cleanup via the api.

Fixes #16738

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16785
2024-01-16 15:01:49 +03:00
Gleb Natapov
1ce1c5001d topology coordinator: add topology_requests table to group0 snapshot
Since the table is updated through raft's group0 state machine its
content needs to be part of the snapshot.
2024-01-16 13:57:27 +02:00
Gleb Natapov
584551f849 topology coordinator: add request_id to the topology state machine
Provide a unique ID for each topology request and store it the topology
state machine. It will be used to index new topology requests table in
order to retrieve request status.
2024-01-16 13:57:27 +02:00
Gleb Natapov
ecb8778950 system keyspace: introduce local table to store topology requests status
The table has the following schema and will be managed by raft:

CREATE TABLE system.topology_requests (
    id timeuuid PRIMARY KEY,

    initiating_host uuid,
    start_time timestamp,

    done boolean,
    error text,
    end_time timestamp,
);

In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.
2024-01-16 13:57:16 +02:00
Tomasz Grabiec
49026dc319 Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.

This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is

* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing

fixes: #16319

Closes scylladb/scylladb#16364

* github.com:scylladb/scylladb:
  code: Enable tablets if cluster feature is enabled
  test: Turn off tablets feature by default
  test: Move test_tablet_drain_failure_during_decommission to another suite
  test/tablets: Enable tables for real on test keyspace
  test/tablets: Make timestamp local
  cql3: Add feature service to as_ks_metadata_update()
  cql3: Add feature service to ks_prop_defs::as_ks_metadata()
  cql3: Add feature service to get_keyspace_metadata()
  cql: Add tablets on/off switch to CREATE KEYSPACE
  cql: Move initial_tablets from REPLICATION to TABLETS in DDL
  network_topology_strategy: Estimate initial_tablets if 0 is set
2024-01-16 00:15:10 +01:00
Avi Kivity
5e70dd1dbe database: don't allow keyspace objects to be copied
keyspace objects are heavyweight and copies are immediately our-of-date,
so copying them is bad.

Fix by deleting the copy constructor and copy assignment operator. One
call site is fixed. This call site is safe since the it's only used
for accessing a few attributes (introduced in f70c4127c6).

Closes scylladb/scylladb#16782
2024-01-15 21:48:32 +01:00
Botond Dénes
204d3284fa readers/multishard: evictable_reader::fast_forward_to(): close reader on exception
When the reader is currently paused, it is resumed, fast-forwarded, then
paused again. The fast forwarding part can throw and this will lead to
destroying the reader without it being closed first.
Add a try-catch surrounding this part in the code. Also mark
`maybe_pause()` and `do_pause()` as noexcept, to make it clear why
that part doesn't need to be in the try-catch.

Fixes: #16606

Closes scylladb/scylladb#16630
2024-01-15 20:55:55 +01:00
Kefu Chai
e5300f3e21 topology_state_machine: add formatter for service::cleanup_status
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for service::cleanup_status,
and remove its operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16778
2024-01-15 21:31:42 +02:00
Anna Stuchlik
af1405e517 doc: remove support for CentOS 7
This commit removes support for CentOS 7
from the docs.

The change applies to version 5.4,so it
must be backported to branch-5.4.

Refs https://github.com/scylladb/scylla-enterprise/issues/3502

In addition, this commit removes the information
about Amazon Linux and Oracle Linux, unnecessarily added
without request, and there's no clarity over which versions
should be documented.

Closes scylladb/scylladb#16279
2024-01-15 15:37:29 +02:00
Anna Stuchlik
bca39b2a93 doc: remove Serverless from the Drivers page
This commit removes the information about ScyllaDB Cloud Serverless,
which is no longer valid.

Closes scylladb/scylladb#16700
2024-01-15 15:36:51 +02:00
Botond Dénes
66bef6e961 cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces
Tablet keyspaces have per/table range ownership, which cannot currently
be expressed in a DESC CLUSTER statement, which describes range
ownership in the current keyspace (if set). Until we figure out how to
represent range ownership (tablets) of all tables of a keyspace, we
disable range ownership for tablet keyspaces.

Fixes: #16483

Closes scylladb/scylladb#16713
2024-01-15 14:03:54 +01:00
Patryk Wrobel
aec0db1b96 cql_auth_query_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.

It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16758
2024-01-15 13:30:05 +02:00
Kefu Chai
ece2bd2f6e service: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16764
2024-01-15 13:29:33 +02:00
Kefu Chai
fc97d91f1a auth: add fmt::format for auth::resource and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16765
2024-01-15 13:26:39 +02:00
Kefu Chai
f344e13066 types: add formatter for data_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for data_value, but its
its operator<<() is preserved as we are still using the generic
homebrew formatter for formatting std::vector, which in turn uses
operator<< of the element type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16767
2024-01-15 13:18:23 +02:00
Kefu Chai
218334eaf5 test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.

we will promote this change to a shared place if we have similar
needs in other tests as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16775
2024-01-15 12:52:35 +02:00
Pavel Emelyanov
dd892b0d8a code: Enable tablets if cluster feature is enabled
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.

To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4838eeb201 test: Turn off tablets feature by default
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
ae7da54f88 test: Move test_tablet_drain_failure_during_decommission to another suite
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
46b36d8c07 test/tablets: Enable tables for real on test keyspace
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.

First, the tables map will no longer be empty on test start.

Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.

This patch fixes all three places:

1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
2376b699e0 test/tablets: Make timestamp local
Just to make next patching simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
f3a69bfaca cql3: Add feature service to as_ks_metadata_update()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4dede19e4f cql3: Add feature service to ks_prop_defs::as_ks_metadata()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
267770bf0f cql3: Add feature service to get_keyspace_metadata()
To be passed down to ks_prop_defs::as_ks_metadata()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
6cb3055059 cql: Add tablets on/off switch to CREATE KEYSPACE
Now the user can do

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }

to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.

Also one can do just

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }

and let Scylla select the initial tablets value by its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:11 +03:00
Pavel Emelyanov
941f6d8fca cql: Move initial_tablets from REPLICATION to TABLETS in DDL
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Pavel Emelyanov
4c4a9679d8 network_topology_strategy: Estimate initial_tablets if 0 is set
If user configured zero initial tablets (spoiler: or this value was set
automagically when enabling tablets begind the scenes) we still need
some value to start with and this patch calculates one.

The math is based on topology and RF so that all shards are covered:

initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters)

The estimation is done when a table is created, not when the keyspace is
created. For that, the keyspace is configured with zero initial tabled,
and table-creation time zero is converted into auto-estimated value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Kamil Braun
423234841e Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb
For correctness sstable cleanup has to run between (some) topology
changes.  Sometimes even a failed topology change may require running
the cleanup.  The series introduces automatic sstable cleanup step to the
topology change coordinator. Unlike other operations it is not represented
as a global transition state, but done by each node independently which
allows cleanup to run without locking the topology state machine so
tablet code can run in parallel with the cleanup.

It is done by having a cleanup state flag for each node in the
topology. The flag is a tri state: "clean" - the node is clean, "needed"
- cleanup is needed (but not running), "running" - cleanup is running. No
topology operation can proceed if there is a node in "running" state, but
some operation can proceed even if there are nodes in "needed" state. If
the coordinator needs to perform a topology operation that cannot run while
there are nodes that need cleanup the coordinator will start one
automatically and continue only after cleanup completes. There is also a
possibility to kick cleanup manually through the new RAFT API call.

* 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla:
  test: add test for automatic cleanup procedure
  test: add test for topology requests queue management
  storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator
  storage_service: topology coordinator: add logging to removenode and decommission
  storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
  storage_service: topology coordinator: manage cluster cleanup as part of the topology management
  storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter
  test: use servers_see_each_other when needed
  test: add servers_see_each_other helper
  storage_service: topology coordinator: make topology coordinator lifecycle subscriber
  system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request
  storage_service: topology coordinator: introduce sstable cleanup fiber
  storage_proxy: allow to wait for all ongoing writes
  storage_service: topology coordinator: mark nodes as needing cleanup when required
  storage_service: add mark_nodes_as_cleanup_needed function
  vnode_effective_replication_map: add get_all_pending_nodes() function
  vnode_effective_replication_map: pre calculate dirty endpoints during topology change
  raft topology: add cleanup state to the topology state machine
2024-01-14 18:54:02 +01:00
Gleb Natapov
f8b90aeb14 test: add test for automatic cleanup procedure
The test runs two bootstraps and checks that there is no cleanup
in between.  Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again.  Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
2024-01-14 15:45:53 +02:00
Gleb Natapov
5882855669 test: add test for topology requests queue management
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete.  Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
2024-01-14 15:45:53 +02:00
Gleb Natapov
ba7aa0d582 storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator 2024-01-14 15:45:53 +02:00
Gleb Natapov
1afc891bd5 storage_service: topology coordinator: add logging to removenode and decommission
Add some useful logging to removenode and decommission to be used by
tests later.
2024-01-14 15:45:53 +02:00
Gleb Natapov
97ab3f6622 storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
2024-01-14 15:45:53 +02:00
Gleb Natapov
0adb3904d8 storage_service: topology coordinator: manage cluster cleanup as part of the topology management
Sometimes it is unsafe to start a new topology operation before cleanup
runs on dirty nodes. This patch detects the situation when the topology
operation to be executed cannot be run safely until all dirty nodes do
cleanup and initiates the cleanup automatically. It also waits for
cleanup to complete before proceeding with the topology operation.

There can be a situation that nodes that needs cleanup dies and will
never clear the flag. In this case if a topology operation that wants to
run next does not have this node in its ignore node list it may stuck
forever. To fix this the patch also introduces the "liveness aware"
request queue management: we do not simple choose _a_ request to run next,
but go over the queue and find requests that can proceed considering
the nodes liveness situation. If there are multiple requests eligible to
run the patch introduces the order based on the operation type: replace,
join, remove, leave, rebuild. The order is such so to not trigger cleanup
needlessly.
2024-01-14 15:45:50 +02:00
Nadav Har'El
2d04070120 Update seastar submodule
* seastar 0ffed835...8b9ae36b (4):
  > net/posix: Track ap-server ports conflict

Fixes #16720

  > include/seastar/core: do not include unused header
  > build: expose flag like -std=c++20 via seastar.pc
  > src: include used headers for C++ modules build

Closes scylladb/scylladb#16769
2024-01-14 14:51:11 +02:00
Gleb Natapov
c9b7bd5a33 storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter
Needed by the next patch.
2024-01-14 14:44:07 +02:00
Gleb Natapov
0e68073b22 test: use servers_see_each_other when needed
In the next patch we want to abort topology operations if there is no
enough live nodes to perform them. This will break tests that do a
topology operation right after restarting a node since a topology
coordinator may still not see the restarted node as alive. Fix all those
tests to wait between restart and a topology operation until UP state
propagates.
2024-01-14 14:44:07 +02:00
Gleb Natapov
455ffaf5d8 test: add servers_see_each_other helper
The helper makes sure that all nodes in the cluster see each other as
alive.
2024-01-14 14:44:07 +02:00
Gleb Natapov
067267ff76 storage_service: topology coordinator: make topology coordinator lifecycle subscriber
We want to change the coordinator to consider nodes liveness when
processing the topology operation queue. If there is no enough live
nodes to process any of the ops we want to cancel them. For that to work
we need to be able to kick the coordinator if liveness situation
changes.
2024-01-14 14:44:07 +02:00
Gleb Natapov
a4ac64a652 system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request
Next patch will need ignore nodes list while processing removenode
request. Load it.
2024-01-14 14:44:07 +02:00
Gleb Natapov
f70c4127c6 storage_service: topology coordinator: introduce sstable cleanup fiber
Introduce a fiber that waits on a topology event and when it sees that
the node it runs on needs to perform sstable cleanup it initiates one
for each non tablet, non local table and resets "cleanup" flag back to
"clean" in the topology.
2024-01-14 14:44:07 +02:00
Gleb Natapov
5b246920ae storage_proxy: allow to wait for all ongoing writes
We want to be able to wait for all writes started through the storage
proxy before a fence is advanced. Add phased_barrier that is entered
on each local write operation before checking the fence to do so. A
write will be either tracked by the phased_barrier or fenced. This will
be needed to wait for all non fenced local writes to complete before
starting a cleanup.
2024-01-14 14:44:07 +02:00
Gleb Natapov
b2ba77978c storage_service: topology coordinator: mark nodes as needing cleanup when required
A cleanup needs to run when a node loses an ownership of a range (during
bootstrap) or if a range movement to an normal node failed (removenode,
decommission failure). Mark all dirty node as "cleanup needed" in those cases.
2024-01-14 14:43:59 +02:00
Gleb Natapov
dbededb1a6 storage_service: add mark_nodes_as_cleanup_needed function
The function creates a mutation that sets cleanup to "needed" for each
normal node that, according to the erm, has data it does not own after
successful or unsuccessful topology operation.
2024-01-14 14:43:33 +02:00
Gleb Natapov
23a27ccc24 vnode_effective_replication_map: add get_all_pending_nodes() function
Add a function that returns all nodes that have vnode been moved to them
during a topology change operation. Needed to know which nodes need to
do cleanup in case of failed topology change operation.
2024-01-14 14:37:16 +02:00
Gleb Natapov
a8f11852da vnode_effective_replication_map: pre calculate dirty endpoints during topology change
Some topology change operations causes some nodes loose ranges. This
information is needed to know which nodes need to do cleanup after
topology operation completes. Pre calculate it during erm creation.
2024-01-14 14:11:19 +02:00
Gleb Natapov
cc54796e23 raft topology: add cleanup state to the topology state machine
The patch adds cleanup state to the persistent and in memory state and
handles the loading. The state can be "clean" which means no cleanup
needed, "needed" which means the node is dirty and needs to run cleanup
at some point, "running" which means that cleanup is running by the node
right now and when it will be completed the state will be reset to "clean".
2024-01-14 13:30:54 +02:00
Nadav Har'El
1bcaeb89c7 view: revert cleanup filter that doesn't work with tablets
This patch reverts commit 10f8f13b90 from
November 2022. That commit added to the "view update generator", the code
which builds view updates for staging sstables, a filter that ignores
ranges that do not belong to this node. However,

1. I believe this filter was never necessary, because the view update
   code already silently ignores base updates which do not belong to
   this replica (see get_view_natural_endpoint()). After all, the view
   update needs to know that this replica is the Nth owner of the base
   update to send its update to the Nth view replica, but if no such
   N exists, no view update is sent.

2. The code introduced for that filter used a per-keyspace replication
   map, which was ok for vnodes but no longer works for tablets, and
   causes the operation using it to fail.

3. The filter was used every time the "view update generator" was used,
   regardless of whether any cleanup is necessary or not, so every
   such operation would fail with tablets. So for example the dtest
   test_mvs_populating_from_existing_data fails with tablets:
     * This test has view building in parallel with automatic tablet
       movement.
     * Tablet movement is streaming.
     * When streaming happens before view building has finished, the
       streamed sstables get "view update generator" run on them.
       This causes the problematic code to be called.

Before this patch, the dtest test_mvs_populating_from_existing_data
fails when tablets are enabled. After this patch, it passes.

Fixes #16598

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-14 13:24:44 +02:00
Nadav Har'El
0fe40f729e mv: sleep a bit before view-update-generator restart
The "view update generator" is responsible for generating view updates
for staging sstables (such as coming from repair). If the processing
fails, the code retries - immediately. If there is some persistent bug,
such as issue #16598, we will have a tight loop of error messages,
potentially a gigabyte of identical messages every second.

In this patch we simply add a sleep of one second after view update
generation fails before retrying. We can still get many identical
error messages if there is some bug, but not more than one per second.

Refs #16598.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-14 13:13:52 +02:00
Kamil Braun
4e18f8b453 Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev
The loop in `id2ip` lambda makes problems if we are applying an old raft
log that contains long-gone nodes. In this case, we may never receive
the `IP` for a node and stuck in the loop forever. In this series we
replace the loop with an if - we just don't update the `host_id <-> ip`
mapping in the `token_metadata.topology` if we don't have an `IP` yet.

The PR moves `host_id -> IP` resolution to the data plane, now it
happens each time the IP-based methods of `erm` are called. We need this
because IPs may not be known at the time the erm is built. The overhead
of `raft_address_map` lookup is added to each data plane request, but it
should be negligible. In this PR `erm/resolve_endpoints` continues to
treat missing IP for `host_id` as `internal_error`, but we plan to relax
this in the follow-up (see this PR first comment).

Closes scylladb/scylladb#16639

* github.com:scylladb/scylladb:
  raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater
  gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
  storage_service: topology_state_load: remove IP waiting loop
  storage_service: sync_raft_topology_nodes: add target_node parameter
  storage_service: sync_raft_topology_nodes: move loops to the end
  storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node
  storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node
  storage_service: sync_raft_topology_nodes: move update_topology up
  storage_service: topology_state_load: remove clone_async/clear_gently overhead
  storage_service: fix indentation
  storage_service: extract sync_raft_topology_nodes
  storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
  address_map: move gossiper subscription logic into storage_service
  topology_coordinator: exec_global_command: small refactor, use contains + reformat
  storage_service: wait_for_ip for new nodes
  storage_service.idl.hh: fix raft_topology_cmd.command declaration
  erm: for_each_natural_endpoint_until: use is_vnode == true
  erm: switch the internal data structures to host_id-s
  erm: has_pending_ranges: switch to host_id
2024-01-12 18:46:51 +01:00
Petr Gusev
e24bee545b raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater 2024-01-12 18:29:22 +04:00
Petr Gusev
6e7bbc94f4 gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
When a node changes its IP we need to store the mapping in
system.peers and update token_metadata.topology and erm
in-memory data structures.

The test_change_ip was improved to verify this new
behaviour. Before this patch the test didn't check
that IPs used for data requests are updated on
IP change. In this commit we add the read/write check.
It fails on insert with 'node unavailable'
error without the fix.
2024-01-12 18:28:57 +04:00
Petr Gusev
6d6e1ba8fb storage_service: topology_state_load: remove IP waiting loop
The loop makes problems if we are applying an old
raft log that contains long-gone nodes. In this case, we may
never receive the IP for a node and stuck in the loop forever.

The idea of the patch is to replace the loop with an
if - we just don't update the host_id <-> ip mapping
in the token_metadata.topology if we don't have an IP yet.
When we get the mapping later, we'll call
sync_raft_topology_nodes again from
gossiper_state_change_subscriber_proxy.
2024-01-12 15:37:50 +04:00
Petr Gusev
260874c860 storage_service: sync_raft_topology_nodes: add target_node parameter
If it's set, instead of going over all the nodes in raft topology,
the function will update only the specified node. This parameter
will be used in the next commit, in the call to sync_raft_topology_nodes
from gossiper_state_change_subscriber_proxy.
2024-01-12 15:37:50 +04:00
Petr Gusev
a9d58c3db5 storage_service: sync_raft_topology_nodes: move loops to the end 2024-01-12 15:37:50 +04:00
Petr Gusev
d1bce3651b storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node 2024-01-12 15:37:50 +04:00
Petr Gusev
aa37b6cfd3 storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node 2024-01-12 15:37:50 +04:00
Petr Gusev
a508d7ffc5 storage_service: sync_raft_topology_nodes: move update_topology up
In this and the following commits we prepare sync_raft_topology_nodes
to handle target_node parameter - the single host_id which should be
updated.
2024-01-12 15:37:50 +04:00
Petr Gusev
1b12f4b292 storage_service: topology_state_load: remove clone_async/clear_gently overhead
Before the patch we used to clone the entire token_metadata
and topology only to immediately drop everything in clear_gently.
This is a sheer waste.
2024-01-12 15:37:50 +04:00
Petr Gusev
1531e5e063 storage_service: fix indentation 2024-01-12 15:37:50 +04:00
Petr Gusev
9c50637f28 storage_service: extract sync_raft_topology_nodes
In the following commits we need part of the
topology_state_load logic to be applied
from gossiper_state_change_subscriber_proxy.
In this commit we extract this logic into a
new function sync_raft_topology_nodes.
2024-01-12 15:37:50 +04:00
Petr Gusev
9679b49cf4 storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
In the next commit we extract the loops by nodes into
a new function, in this commit we just move them
closer to each other.

Now the remove_endpoint function might be called under
token_metadata_lock (mutate_token_metadata takes it).
It's not a problem since gossiper event handlers in
raft_topology mode doesn't modify token_metadata so
we won't get a deadlock.
2024-01-12 15:37:50 +04:00
Petr Gusev
15b8e565ed address_map: move gossiper subscription logic into storage_service
We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.

In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
2024-01-12 15:37:50 +04:00
Petr Gusev
743be190f9 topology_coordinator: exec_global_command: small refactor, use contains + reformat 2024-01-12 15:37:50 +04:00
Petr Gusev
db1f0d5889 storage_service: wait_for_ip for new nodes
When a new node joins the cluster we need to be sure that it's IP
is known to all other nodes. In this patch we do this by waiting
for the IP to appear in raft_address_map.

A new raft_topology_cmd::command::wait_for_ip command is added.
It's run on all nodes of the cluster before we put the topology
into transition state. This applies both to new and replacing nodes.
It's important to run wait_for_ip before moving to
topology::transition_state::join_group0 since in this state
node IPs are already used to populate pending nodes in erm.
2024-01-12 15:37:46 +04:00
Michał Jadwiszczak
013487e1e1 test:cql-pytest: change service levels intervals in tests
Set the interval to 0.5s to reduce required sleep time.
2024-01-12 10:28:28 +01:00
Michał Jadwiszczak
f6a464ad81 configure service levels interval
So far the service levels interval, responsible for updating SL configuration,
was hardcoded in main.
Now it's extracted to `service_levels_interval_ms` option.
2024-01-12 10:28:24 +01:00
Kefu Chai
a0e5c14c55 alternator: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16736
2024-01-12 10:53:32 +02:00
Botond Dénes
5f44ae8371 Merge 'Add more logging for gossiper::lock_endpoint and storage_service::handle_state_normal' from Kamil Braun
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

Thus the plan is to continue debugging using the longevity test, but we need
more logs. To check whether `handle_state_normal` was called and which branches
were taken, include some INFO level logs there. Also, detect deadlocks inside
`gossiper::lock_endpoint` by reporting an error message if `lock_endpoint`
waits for the lock for too long.

Ref: scylladb/scylladb#16668

Closes scylladb/scylladb#16733

* github.com:scylladb/scylladb:
  gossiper: report error when waiting too long for endpoint lock
  gossiper: store source_location instead of string in endpoint_permit
  storage_service: more verbose logging in handle_state_normal
2024-01-12 10:51:21 +02:00
Lakshmi Narayanan Sreethar
cd9e027047 types: fix ambiguity in align_up call
Compilation fails with recent boost versions (>=1.79.0) due to an
ambiguity with the align_up function call. Fix that by adding type
inference to the function call.

Fixes #16746

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16747
2024-01-12 10:50:31 +02:00
Kefu Chai
344ea25ed8 db: add fmt::format for db::consistency_level
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `db::consistency_level`
* drop its `operator<<`, as it is not used anymore

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16755
2024-01-12 10:49:00 +02:00
Patryk Wrobel
87545e40c7 test/boost/auth_resource_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<T>&)
from auth_resource_test.cc.

It prepares the test for removal of the templated helpers
from utils/to_string.hh, which is one of goals of the
referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16754
2024-01-12 10:48:01 +02:00
Petr Gusev
802da1e7a5 storage_service.idl.hh: fix raft_topology_cmd.command declaration
Make IDL correspond to the declaration of
raft_topology_cmd::command in topology_state_machine.hh.
2024-01-12 12:23:22 +04:00
Petr Gusev
41c15814e6 erm: for_each_natural_endpoint_until: use is_vnode == true
This is an optimisation - for_each_natural_endpoint_until is
called only for vnode tokens, we don't need to run the
binary search for it in tm.first_token.

Also the function is made private since it's only used
in erm itself.
2024-01-12 12:23:22 +04:00
Petr Gusev
07f2ec63c7 erm: switch the internal data structures to host_id-s
Before this patch the host_id -> IP mapping was done
in calculate_effective_replication_map. This function
is called from mutate_token_metadata, which means we
have to have an IP for each host_id in topology_state_load,
otherwise we get an error. We are going to remove
the IP waiting loop from topology_state_load, so
we need to get rid of IPs resolution from
calculate_effective_replication_map.

In this patch we move the host_id -> IP resolution to
the data plane. When a write or read request is sent
the target endpoints are requested from erm through
get_natural_endpoints_without_node_being_replaced,
get_pending_endpoints and get_endpoints_for_reading
methods and this is where the IP resolution
will now occur.
2024-01-12 12:23:22 +04:00
Petr Gusev
1928dc73a8 erm: has_pending_ranges: switch to host_id
In the next patches we are going to change erm data structures
(replication_map and ring_mapping) from IP to host_id. Having
locator::host_id instead of IP in has_pending_ranges arguments
makes this transition easier.
2024-01-12 12:23:19 +04:00
Botond Dénes
b69f7126c3 Update tools/java submodule
* tools/java 24e51259...c75ce2c1 (1):
  > Update JNA dependency to 5.14.0
2024-01-12 09:47:20 +02:00
Benny Halevy
3e938dbb5a storage_service: get rid of handle_state_moving declaration
The implementation was already removed in e64613154f

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16742
2024-01-12 09:38:23 +02:00
Nadav Har'El
5c7e029012 test/cql-pytest: add reproducer for task-tracking memory leak
This patch adds a reproducer test for the memory leak described in
issue #16493: If a table is repeatedly created and dropped, memory
is leaked by task tracking. Although this "leak" can be temporary
if task_ttl_in_seconds is properly configured, it may still use too
much memory if tables are too frequently created and dropped.
The test here shows that (before #16493 was fixed) as little as
100 tables created and deleted can cause Scylla to run out of
memory.

The problem is severely exacerbated when tablets are used which is
why the test here uses tablets. Before the fix for #16493 (a Seastar
patch, scylladb/seastar#2023), this test of 100 iterations always
failed (with test/cql-pytest/run's default memory allowance).
After the fix, the test doesn't fail in 100 iterations - and even
if increased manually to 10,000 iterations it doesn't fail.

The new test uses the initial_tablets feature, so requires Scylla to be
run with the "tablets" experimental option turned on. This is not
currently the default of test.py or test/cql-pytest/run, so I turned
it on manually to check this test. I also checked that the test is
correctly skipped if tablets are not turned on.

Refs #16493

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16717
2024-01-12 09:37:32 +02:00
Botond Dénes
63b266e94c Merge ' db: Make the "me" sstable format mandatory' from Kefu Chai
The `me` sstable format includes an important feature of storing the `host_id` of the local node when writing sstables.
The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the me format mandatory.

in this series, `sstable_format` option is deprecated, and the default sstable format is bumped up from `mc` to `md`, so that a cluster composed of nodes with this change should always use `me` as the sstable format.  if a node with this change joins a 5.x cluster which still using `md` because they are configured as such, this node will also be using `md`, unless the other node(s) changes its `sstable_format` setting to `me`.

Fixes #16551

Closes scylladb/scylladb#16716

* github.com:scylladb/scylladb:
  db/config.cc: do not respect sstable_format option
  feature_service: abort if sstable_format < md
  db, sstable: bump up default sstable format to "md"
2024-01-12 09:33:08 +02:00
Kamil Braun
cf646022cb gossiper: report error when waiting too long for endpoint lock
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt
with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127).

Modify the code so it reports an error if `lock_endpoint` waits for the
lock for more than a minute. When the issue reproduces again in
longevity, we will see if `lock_endpoint` got stuck.
2024-01-11 17:29:25 +01:00
Kefu Chai
7abd263ee6 db/config.cc: do not respect sstable_format option
"me" sstable format includes an important feature of storing the
`host_id` of the local node when writing sstables. The is crucial
for validating the sstable's `replay_position` in stats metadata as
it is valid only on the originating node and shard (#10080), therefor
we would like to make the `me` format mandatory.

before making `me` mandatory, we need to stop handling `sstable_format`
option if it is "md".

in this change

- gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if
  `sstable_format` is configured with "md". and in that case, instead,
  a warning is printed in the logging message to note that
  this setting is not valid anymore.
- docs/architecture/sstable: note that "me" is used by default now.

after this change, "sstable_format" will only accept "me" if it's
explicitly configured. and when a server with this change joins a
cluster, it uses "md" if the any of the node in the cluster still has
`sstable_format`. practically, this change makes "me" mandatory
in a 6.x cluster, assuming this change will be included in 6.x
releases.

Fixes #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kefu Chai
bece3eff0c feature_service: abort if sstable_format < md
sstable_format comes from scylla.yaml or from the command line
arguments, and we gate scylla from unallowed sstable formats lower
than `md` when parsing the configuration, and scylla bails out
at seeing the unallowed sstable format like:

```
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me
Aborted (core dumped)
```

scylla errors out way before `feature_config_from_db_config()`
gets called -- it throws in `bpo::notify(configuration)`,
way before `func` is evaluated in `app_template::run_deprecated()`.

so, in this change, we do not handle these values anymore, and
consider it a bug if we run into any of them.

Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kefu Chai
54d49c04e0 db, sstable: bump up default sstable format to "md"
before this change, we defaults to use "mc" sstable format, and
switch to "md" if the cluster agrees on using it, and to
"me" if the cluster agrees on using this. the cluster feature
is used to get the consensus across the members in the cluster,
if any of the existing nodes in the cluster has its `sstable_format`
configured to, for instance, "mc", then the cluster is stuck with
"mc".

but we disabled "mc" sstable format back in 3d345609, the first LTS
release including that change was scylla v5.2.0. which means, the
cluster of the last major version Scylla should be using "md" or
"me". per our document on upgrade, see docs/upgrade/index.rst,

> You should perform the upgrades consecutively - to each
> successive X.Y version, without skipping any major or minor version.
>
> Before you upgrade to the next version, the whole cluster (each
> node) must be upgraded to the previous version.

we can assume that, a 6.x node will only join a cluster
with 5.x or 6.x nodes. (joining a 7.x cluster should work, but
this is not relevant to this change). in both cases, since
5.x and up scylla can only configured with "md" `sstable_format`,
there is no need to switch from "mc" to "md" anymore. so we can
ditch the code supporting it.

Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Avi Kivity
f0d6330204 build: add crypto++ to dependencies
We depend on the crypto++ library (see utils/hashers.hh) but don't list
it in install-dependencies.sh. Currently this works because Seastar's
install-dependencies.sh installs it, but that's going away in [1]. List
crypto++ directly to keep install-dependencies.sh working.

Regenerating the frozen toolchain is unnecessary since we're re-adding
an existing dependency.

[1] 6bdef1e431

Closes scylladb/scylladb#16563
2024-01-11 16:26:20 +02:00
Patryk Jędrzejczak
e99d03a21e topology_coordinator: clarify warnings
It was unclear where the error messages ended if they consisted of
multiple sentences.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
b4b170047b raft topology: join: allow only the first response to be a succesful acceptance
The joining node might receive more than one join response (see
the comment at the beginning of `join_node_response_handler`).

If the first response was a rejection or it was an acceptance but
the joining node failed while handling it, the following
acceptances by the coordinator shouldn't succeed. The joining
node considers the join operation as failed.

Currently, we always immediately return from non-first response
handler calls. However, if the response is an acceptance, and the
first response wasn't a successfully handled acceptance, we need
to throw an exception to ensure the topology coordinator moves
the node to the left state. We do it in this patch. We throw the
exception set while handling the first response. It explains why
we are failing the current acceptance.

We don't want to throw the exception on rejection. The topology
coordinator will move the node to the left state anyway. Also,
failing the rejection with an error message containing "the
topology coordinator rejected request to join the cluster" (from
the previous rejection) would be very confusing.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
f3a08757af storage_service: join_node_response_handler: fix indentation
Broken in the previous patch.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
ddfd9c3173 raft topology: join: shut down a node on error in response handler
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.

We rethrow the caught exception to ensure the topology coordinator
knows the RPC has failed. In case of rejection, it does not matter
because the coordinator behaves the same way in both cases: RPC
success and RPC failure. It transitions the rejected node to the
left state. However, in case of acceptance, this only happens if
the RPC fails. Otherwise, the coordinator continues handling the
request.

On abort, one of the two events happens first:
- the new catch statement catches `abort_requested_exeption` and
sets it on `_join_node_response_done`,
- `co_await _ss._join_node_response_done.get_shared_future(as);`
in `join_node_rpc_handshaker::post_server_start` resolves with
`abort_requested_exception` after triggering `as`. In both cases,
`join_node_rpc_handshaker::post_server_start` throws
`abort_requested_exception`. Therefore, we don't need a separate
catch statement for `abort_requested_exception` in
`join_node_response_handler`.
2024-01-11 14:19:37 +01:00
Botond Dénes
697ebef149 Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.

Fixes: #16735
Refs: #16694.

Closes scylladb/scylladb#16698

* github.com:scylladb/scylladb:
  compaction: make regular compaction tasks internal
  tasks: don't keep internal root tasks after they complete
2024-01-11 12:10:44 +02:00
Nadav Har'El
5762170526 main: fix "starting {}" messages
The supervisor::notify() function expects a single string - not a
format and parameters. Calls we have in main.cc like

    supervisor::notify("starting {}", what);

end up printing the silly message "starting {}". The second parameter
"what" is converted to a bool, also having an unintended consequence
for telling notify we're "ready".

This patch fixes it to call fmt::format, as intended.

Fixes #16728

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16729
2024-01-11 11:43:07 +02:00
Botond Dénes
ac69473bac Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai
this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree.

Closes scylladb/scylladb#16726

* github.com:scylladb/scylladb:
  utils/pretty_printers: add "I" specifier support
  utils/pretty_printers: use the formatting of to_hr_size()
2024-01-11 10:54:14 +02:00
Kefu Chai
0c2ef5de54 test/unit/bptree_validation: use "{}" for formatting test_data
before this change, "{:d}" is used for formatting `test_data` y
bptree_stress_test.cc. but the "d" specifier is only used for
formatting integers, not for formatting `test_data` or generic
data types, so this fails when the test is compiled with {fmt} v10,
like:

```
In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20:
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression
  294 |             fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd);
      |                                   ^
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here
  267 |             return forward_check();
      |                    ^
/home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here
   92 |                         if (!itc->step()) {
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
```

in this change, instead of specifying "{:d}", let's just use "{}",
which works for both integer and `test_data`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16727
2024-01-11 10:53:33 +02:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Kefu Chai
5874652967 cql3: define format_as() for formatting cql3::cql3_type
in the same spirit of 724a6e26, format_as() is defined for
cql3::cql3_type. despite that this is not used yet by fmt v9,
where we still have FMT_DEPRECATED_OSTREAM, this prepares us for
fmt v10.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16232
2024-01-11 09:07:18 +02:00
Botond Dénes
3d1667c720 Update ./tools/java submodule
* ./tools/java e106b500...24e51259 (1):
  > build.xml: update io.airlift to 0.9
2024-01-11 08:55:51 +02:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Kefu Chai
f11a53856d utils/pretty_printers: add "I" specifier support
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:47 +08:00
Patryk Wrobel
f4e311e871 cql3: add formatter for cql3::expr::oper_t
This change introduces a specialization of fmt::formatter
for cql3::expr::oper_t. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.

Usage of cql3::expr::oper_t without the defined formatter
resulted in compilation error when compiled with FMTv10.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16719
2024-01-11 08:33:35 +02:00
Kefu Chai
7d627b328f utils/pretty_printers: use the formatting of to_hr_size()
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.

tests are updated to exercise both IEC and SI notations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:03 +08:00
Kefu Chai
8c4576f55d api: storage_service: correct the descriptions of two APIs
this change is more about documentation of the RESTful API of
storage_service. as we define the API using Swagger 2.0 format, and
generate the API document from the definitions. so would be great
if the document matches with the API.

in this change, since the keyspace is not queried but mutated. so
changed to a more accurate description.

from the code perspective, it is but cosmetic. as we don't read the
description fields or verify them in our tests.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16637
2024-01-11 08:28:14 +02:00
Kamil Braun
6e39c2ffde gossiper: store source_location instead of string in endpoint_permit
The original code extracted only the function_name from the
source_location for logging. We'll use more information from the
source_location in later commits.
2024-01-10 17:02:52 +01:00
Kamil Braun
664349a10f storage_service: more verbose logging in handle_state_normal
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

Improve the INFO level logging in handle_state_normal to aid debugging
in the future.

The amount of logs is still constant per-node. Even though some log
messages report all tokens owned by a node, handle_state_normal calls
are still rare. The most "spammy" situation is when a node starts and
calls handle_state_normal for every other node in the cluster, but it is
a once-per-startup event.
2024-01-10 16:39:55 +01:00
Patryk Wrobel
a64eb92369 utils: specialize fmt::formatter for utils::tagged_integer
This change introduces a specialization of fmt::formatter
for utils::tagged_integer. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.

Usage of utils::tagged_integer without the defined formatter
resulted in compilation error when compiled with FMTv10.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16715
2024-01-10 18:32:43 +03:00
Nadav Har'El
083868508c Update seastar submodule
* seastar 70349b74...0ffed835 (15):
  > http/client: include used header files
  > treewide: s/format/fmt::format/ when appropriate
  > shared_future: shared_state::run_and_dispose(): release reserve of _peers

Fixes #16493

  > metrics_tester - A demo app to test metrics
  > build: silence the waring of -Winclude-angled-in-module-purview
  > estimated_histogram.hh: Support native histograms
  > prometheus.cc: Clean the pick representation code
  > prometheus.cc add native histogram
  > memory: fix the indentation.
  > metrics_types.hh: add optional native histogram information
  > memory: include used header
  > prometheus.cc: Add filter, aggregate by label and skip_when_empty
  > src/proto/metrics2.proto: newer proto buf definition
  > print: deprecate format_separated()
  > reactor: use fmt::join() when appropriate

Closes scylladb/scylladb#16712
2024-01-10 14:02:04 +02:00
Nadav Har'El
39dd2a2690 cql-pytest: translated Cassandra's test for LWT with static column
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionStaticsTest.java into our
cql-pytest framework.

This test file checks various LWT conditional updates which involve
static columns or UDTs (there are separate test file for LWT conditional
updates that do not involve static columns).

This test did not uncover any new bugs, but demonstrates yet again
several places where we intentionally deviated from Cassandra's behavior,
forcing me to add "is_scylla" checks in many of the checks to allow
them to pass on both Scylla and Cassanda. These deviations are known,
intentional and some are documented in docs/kb/lwt-differences.rst but
not all, so it's worth listing here the ones re-discovered by this test:

    1. On a successful conditional write, Cassandra returns just True, Scylla
       also returns the old contents of the row. This difference is officially
       documented in docs/kb/lwt-differences.rst.

    2. On a batch request, Scylla always returns a row per statement,
       Cassandra doesn't - it often returns just a single failed row,
       or just True if the whole batch succeeded. This difference is
       officially documented in docs/kb/lwt-differences.rst.

    3. In a DELETE statement with a condition, in the returned row
       Cassandra lists the deleted column first - while Scylla lists
       the static column first (as in any other row). This difference
       is probably inconsequential, because columns also have names
       so their order in the response usually doesn't matter.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16643
2024-01-10 12:14:06 +02:00
Nadav Har'El
b1a441ba56 test/cql-pytest: correct xfail status of timestamp parser
The recently-added test test_fromjson_timestamp_submilli demonstrated a
difference between Scylla's and Cassandra's parsing timestamps in JSON:
Trying to use too many (more than 3) digits of precision is forbidden
in Scylla, but ignored in Cassandra. So we marked the test "xfail",
suggesting we think it's a Scylla bug that should be fixed in the future.

However, it turns out that we already had a different test,
test_type_timestamp_from_string_overprecise, which showed the same
difference in a different context (without JSON). In that older test,
the decision was to consider this a Cassandra bug, not Scylla bug -
because Cassandra seemingly allows the sub-millisecond timestap but
in reality drops the extra precision.

So we need to be consistent in the tests - this is either a Scylla bug
or a Cassandra bug, we can't make once choice in one test and another
in a different test :-) So let's accept our older decision, and consider
Scylla's behavior the correct one in this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16586
2024-01-10 12:12:26 +02:00
Kefu Chai
eb9216ef11 compaction: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16707
2024-01-10 11:07:36 +02:00
Kefu Chai
317af97e41 test/pylib: shutdown unix RESTful client
when stopping the ManagerClient, it would be better to close
all connected connector, otherwise aiohttp complains like:

```
13:57:53.763 ERROR> Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]']
connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890>
```

this warning message is printed to the console, and it is distracting
when testing manually.

so, in this change, let's close the client connecting to unix domain
socket.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16675
2024-01-10 11:07:14 +02:00
Kefu Chai
f61f6c27e3 gms: add formatter for gms::endpoint_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::endpoint_state, and
change update the callers of `operator<<` to use `fmt::print()`.
but we cannot drop `operator<<` yet, as we are still using the
templated operator<< and templated fmt::formatter to print containers
in scylla and in seastar -- they are still using `operator<<`
under the hood.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16705
2024-01-10 09:16:23 +02:00
Sylwia Szunejko
eabe97bcd0 transport: remove additional options from TABLETS_ROUTING_V1
Closes scylladb/scylladb#16701
2024-01-10 09:00:25 +02:00
Botond Dénes
5981900dca Update tools/jmx submodule
* tools/jmx 80ce5996...3257897a (1):
  > scylla-apiclient: drop hk2-locator dependency
2024-01-10 08:53:20 +02:00
Kefu Chai
34b03867b2 tools: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16673
2024-01-10 08:44:09 +02:00
Kefu Chai
0dc7db54d1 build: cmake: add "unit_test_list" target
this target is used by test.py for enumerating unit tests

* test/CMakeLists.txt: append executable's full path to
  `scylla_tests`. add `unit_test_list` target printing
  `scylla_tests`, please note, `cmake -E echo` does not
  support the `-e` option of `echo`, and ninja does not
  support command line with newline in it, we have to use
  `echo` to print the list of tests.
* test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests
  only if $PWD/suite.yaml exists. we could hardwire this
  logic in these files, as it is known that this file
  exists in these directory, but this is still put this way,
  so that it serves as a comment explaining that the reason
  why we update scylla_tests here but not somewhere else
  where we also use `add_scylla_test()` function is just
  suite.yaml exists here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16702
2024-01-10 08:43:04 +02:00
Botond Dénes
4aba445ef6 Merge 'test.py: adapt to cmake building system' from Kefu Chai
in this series, we adapt to cmake building system by mapping scylla build mode to `CMAKE_BUILD_TYPE` and by using `build/build.ninja` if it exists, as `configure.py` generates `build.ninja` in `build` when using CMake for creating `build.ninja`.

Closes scylladb/scylladb#16703

* github.com:scylladb/scylladb:
  test.py: build using build/build.ninja when it exists
  test.py: extract ninja()
  test.py: extract path_to()
  test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE
2024-01-10 08:39:33 +02:00
Kefu Chai
382a5e2d0c test.py: build using build/build.ninja when it exists
CMake puts `build.ninja` under `build`, so use it if it exists, and
fall back to current directory otherwise.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
6674e87842 test.py: extract ninja()
use ninja() to build target using `ninja`. since CMake puts
`build.ninja` under "build", while `configure.py` puts it under
the root source directory, this change prepares us for a follow-up
change to build with build/build.ninja.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
5fda822c4e test.py: extract path_to()
use path_to() to find the path to the directory under build directory.

this change helps to find the executables built using CMake as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
0b11ae9fe6 test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE
because scylla build mode and CMAKE_BUILD_TYPE is not identical,
let's define `all_modes` as a dict so we can look it up.
this change prepares for a follow-up commit which adds a path
resolver which support both build system generator: the plain
`configure.py` and CMake driven by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Botond Dénes
f4f724921c load_meter: get_load_map(): don't unconditionally dereference _lb
Said method has a check on `_lb` not being null, before accessing it.
However, since 0e5754a, there was an unconditional access, adding an
entry for the local node. Move this inside the if, so it is covered by
the null-check. The only caller is the api (probably nodetool), the
worst that can happend is that they get completely empty load-map if
they call too early during startup.

Fixes: #16617

Closes scylladb/scylladb#16659
2024-01-09 16:02:12 +03:00
Aleksandra Martyniuk
6b87778ef2 compaction: make regular compaction tasks internal
Regular compaction tasks are internal.

Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk
6b2b384c83 tasks: don't keep internal root tasks after they complete 2024-01-09 13:13:54 +01:00
Pavel Emelyanov
cdf5124003 Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes
The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level.
If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538

Closes scylladb/scylladb#16657

* github.com:scylladb/scylladb:
  tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
  tools/scylla-sstable: allow always passing --scylla-yaml-file option
2024-01-09 14:28:49 +03:00
Kefu Chai
b91eb89ffa gms: heart_beat_state: add formatter for gms::heart_beat_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::heart_beat_state, and
remove its operator<<(). the only caller site of its operator<< is
updated to use `fmt::print()`

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16652
2024-01-09 11:52:40 +02:00
Kefu Chai
cca786e847 gms: endpoint_state: fix a typo in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16653
2024-01-09 11:51:49 +02:00
Kefu Chai
c1beba1f7d utils: config_file: throw bpo::invalid_option_value() when seeing invalid option
before this change, `std::invalid_argument` is thrown by
`bpo::notify(configuration)` in `app_template::run_deprecated()` when
invalid option is passed in via command line. `utils::named_value`
throws `std::invalid_argument` if the given value is not listed in
`_allowed_values`. but we don't handle `std::invalid_argument` in
`app_template::run_deprecated()`. so the application aborts with
unhandled exception if the specified argument is not allowed.

in this change, we convert the `std::invalid_argument` to a
derived class of `bpo::error` in the customized notify handler,
so that it can be handled in `app_template::run_deprecated()`.

because `name_value::operator()` is also used otherwhere, we
should not throw a bpo::error there. so its exception type
is preserved.

Fixes #16687
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16688
2024-01-09 11:49:06 +02:00
Kefu Chai
a6152cb87b sstables: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16666
2024-01-09 11:45:44 +02:00
Kefu Chai
be364d30fd db: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16664
2024-01-09 11:44:19 +02:00
Aleksandra Martyniuk
6f13e55187 tasks: call release_resources when task is finished
Call task_manager::task::impl::release_resources when task is finished
instead of putting the responsibility on user.

Closes scylladb/scylladb#16660
2024-01-09 11:41:54 +02:00
Pavel Emelyanov
cfeff893c6 network_topology_strategy: Print map of dc:rf pairs in one go
The strategy constructor prints the dc:rf at the end making the sstring
for it by hand. Modern fmt-based logger can format unordered_map-s on
its own. The message would look slightly different though:

  Configured datacenter replicas are: foo:1 bar:2

into

  Configured datacenter replicas are: {"foo": 1, "bar": 2}

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16443
2024-01-09 11:30:49 +02:00
Kamil Braun
d93074e87e cql3: don't parallelize select aggregates to local tables
We've observed errors during shutdown like the following:
```
ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)
INFO  2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped)
ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)")
```

some CQL statement execution was trying to use `forward_service` during
shutdown.

It turns out that the statement is in
`system_keyspace::load_topology_state`:
```
auto gen_rows = co_await execute_cql(
    format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?",
           NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY),
    gen_uuid);
```
It's querying a table in the `system` keyspace.

Pushing local table queries through `forward_service` doesn't make sense
as the data is not distributed. Excluding local tables from this logic
also fixes the shutdown error.

Fixes scylladb/scylladb#16570

Closes scylladb/scylladb#16662
2024-01-08 14:44:22 -05:00
Kamil Braun
d4f4b58f3a Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.

This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.

Fixes scylladb/scylladb#16109

Closes scylladb/scylladb#16584

* github.com:scylladb/scylladb:
  test: add test_remove_alive_node
  topology_coordinator: reject removenode if the removed node is alive
  test: ManagerClient: remove unused wait_for_host_down
  test: remove_node: wait until the node being removed is dead
2024-01-08 12:39:23 +01:00
Kamil Braun
d11e824802 Merge 'storage_service: make all Raft-based operations abortable' from Patryk Jędrzejczak
During a shutdown, we call `storage_service::stop_transport` first.
We may try to apply a Raft command after that, or still be in the
the process of applying a command. In such a case, the shutdown
process will hang because Raft retries replicating a command until
it succeeds even in the case of a network error. It will stop when
a corresponding abort source is set. However, if we pass `nullptr`
to a function like `add_entry`, it won't stop. The shutdown
process will hang forever.

We fix all places that incorrectly pass `nullptr`. These shutdown
hangs are not only theoretical. The incorrect `add_entry` call in
`update_topology_state` caused scylladb/scylladb#16435.

Additionally, we remove the default `nullptr` values in all member
functions of `server` and `raft_group0_client` to avoid similar bugs
in the future.

Fixes scylladb/scylladb#16435

Closes scylladb/scylladb#16663

* github.com:scylladb/scylladb:
  server, raft_group0_client: remove the default nullptr values
  storage_service: make all Raft-based operations abortable
2024-01-08 11:30:56 +01:00
Botond Dénes
9119bcbd67 tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
The default error handler throws an exception, which means
scylla-sstable will exit with exception if there is any problem in the
configuration. Not even ScyllaDB itself is this harsh -- it will just
log a warning for most errors. A tool should be much more lenient. So
this patch passes an error handler which just logs all errors with debug
level.
If reading an sstable fails, the user is expected to investigate turning
debug-level logging on. When they do so, they will see any problems
while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538
2024-01-08 02:18:15 -05:00
Botond Dénes
16791a63c9 tools/scylla-sstable: allow always passing --scylla-yaml-file option
Currently, if multiple schema sources are provided, the tool complains
about ambiguity, over which to consider. One of these option is
--scylla-yaml-file. However, we want to allow passing this option any
time, otherwise encrypted sstables cannot be read. So relax the multiple
schema source check to also allow this option to be used even when e.g.
--schema-file was used as the schema source.
2024-01-08 02:18:12 -05:00
Nadav Har'El
61395a3658 Update tools/java submodule
* tools/java b7ebfd38...e106b500 (3):
  > build.xml: update scylla-driver-core to 3.11.5.1
  > Use ReplicaOrdering.NEUTRAL in TokenAwarePolicy to respect RackAwareness
  > treewide: update "guava" package

Refs https://github.com/scylladb/scylladb/pull/16491
Refs https://github.com/scylladb/scylla-tools-java/pull/372
2024-01-07 15:12:15 +02:00
Patryk Jędrzejczak
df2034ebd7 server, raft_group0_client: remove the default nullptr values
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.

Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
2024-01-05 18:45:50 +01:00
Patryk Jędrzejczak
3d4af4ecf1 storage_service: make all Raft-based operations abortable
During a shutdown, we call `storage_service::stop_transport` first.
We may try to apply a Raft command after that, or still be in the
the process of applying a command. In such a case, the shutdown
process will hang because Raft retries replicating a command until
it succeeds even in the case of a network error. It will stop when
a corresponding abort source is set. However, if we pass `nullptr`
to a function like `add_entry`, it won't stop. The shutdown
process will hang forever.

We fix all places that incorrectly pass `nullptr`. These shutdown
hangs are not only theoretical. The incorrect `add_entry` call in
`update_topology_state` caused scylladb/scylladb#16435.
2024-01-05 18:45:20 +01:00
Kefu Chai
7e84e03f52 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

because the removal of `#include "unimplemented.hh"`,
`service/migration_manager.cc` misses the definition of
`unimplemented::cause::VALIDATION`, so include the header where it is
used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16654
2024-01-05 13:37:08 +02:00
Nadav Har'El
94580df1c5 test/alternator: fix flaky test in test_filter_expression.py
The test test_filter_expression.py::test_filter_expression_precedence
is flaky - and can fail very rarely (so far we've only actually seen it
fail once). The problem is that the test generates items with random
clustering keys, chosen as an integer between 1 and 1 million, and there
is a chance (roughly 2/10,000) that two of the 20 items happen to have the
same key, so one of the items is "lost" and the comparison we do to the
expected truth fails.

The solution is to just use sequential keys, not random keys.
There is nothing to gain in this test by using random keys.

To make this test bug easy to reproduce, I temporarily changed
random_i()'s range from 1,000,000 to 3, and saw the test failing every
single run before this patch. After this patch - no longer using
random_i() for the keys - the test doesn't fail any more.

Fixes #16647

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16649
2024-01-04 21:36:40 +02:00
Kamil Braun
bf068dd023 Merge handle error in cdc generation propagation during bootstrap from Gleb
Bootstrap cannot proceed if cdc generation propagation to all nodes
fails, so the patch series handles the error by rolling the ongoing
topology operation back.

* 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev:
  test: add test to check failure handling in cdc generation commit
  storage_service: topology coordinator: rollback on failure to commit cdc generation
2024-01-04 15:38:51 +01:00
Kamil Braun
f942bf4a1f Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy
Currently, `add_saved_endpoint` is called from two paths:  One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.

In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet.  However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.

Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.

This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation.  An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.

Fixes #16429

Closes scylladb/scylladb#16432

* github.com:scylladb/scylladb:
  gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
  storage_service: topology_state_load: lock endpoint for add_saved_endpoint
  raft_group_registry: move on_alive error injection to gossiper
2024-01-04 14:47:10 +01:00
qiulijuan2
7fa2c33ba1 replica: remove duplicated function calling
set_skip_when_empty is duplicated of metric column_family_row_hits in replica/table.cc

fix: #16582

Signed-off-by: qiulijuan2<qiulijuan2_yewu@cmss.chinamobile.com>

Closes scylladb/scylladb#16581
2024-01-04 15:04:31 +02:00
Kefu Chai
ee28a1cf4b build: enable -Wimplicit-int-float-conversion
a209ae15 addresses that last -Wimplicit-int-float-conversion warning
in the tree, so we now have the luxury of enabling this warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16640
2024-01-04 12:45:23 +02:00
Botond Dénes
9f0bd62d78 test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI 2024-01-04 03:20:17 -05:00
Botond Dénes
58d5339baa test/cql-pytest: test_tools.py: extract some fixture logic to functions
Namely, the fixture for preparing an sstable and the fixture for
producing a reference dump (from an sstable). In the next patch we will
add more similar fixtures, this patch enables them to share their core
logic, without repeating code.
2024-01-04 03:20:17 -05:00
Botond Dénes
f7d59b3af0 test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
In the next patch, we want to add schema-load tests specific to views
and indexes. Best to place these into a separate class, so extract the
to-be-shared parts into a common base-class.
2024-01-04 03:20:17 -05:00
Botond Dénes
bea21657ec tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
The table information of MVs (either user-created, or those backing a
secondary index) is stored in system_schema.views, not
system_schema.tables. So load this table when system_schema.tables has
no entries for the looked-up table. Base table schema is not loaded.
2024-01-04 03:20:17 -05:00
Botond Dénes
79a006d6a8 tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
The underlying infrastructure (`load_schemas()`) already supports
loading views and inedxes, extend this to said method.
When loading a view/index, expect `load_schemas()` to return two
schemas. The first is the base schema, the second is the view/index
schema (this is validated). Only the latter is returned.
2024-01-04 03:20:17 -05:00
Botond Dénes
276bb16013 test/boost/schema_loader_test: add test for mvs and indexes 2024-01-04 03:20:17 -05:00
Botond Dénes
f5d4c1216e tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
Add support for processing cql3::statement::create_view_statement and
cql3::statement::create_index_statement statements. The CQL text
(usually a file) has to provide the definition of the base table,
before the definition of the views/indexes.
2024-01-04 03:20:17 -05:00
Botond Dénes
94aac35169 replica/database: extract existing_index_names and get_available_index_name
To standalone functions in index/secondary_index_manager.{hh,cc}. This
way, alternative data dictionary implementations (in
tools/schema_loader.cc), can also re-use this code without having to
instantiate a database or resorting to copy-paste.

The functions are slighly changed: there are some additional params
added to cover for things not internally available in the database
object. const sstring& is converted to std::string_view.
2024-01-04 03:20:17 -05:00
Kefu Chai
cf932888de Update seastar submodule
* seastar e0d515b6...70349b74 (33):
  > util/log: drop unused function
  > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0
  > Fix edge case in memory sampler at OOM
  > exp/geo distribution benchmark
  > Additional allocation tests
  > Remove null pointer check on free hot path
  > Optimize final part of allocation hot path
  > Optimize zero size checking in allocator
  > memory: Optimize free fast path
  > memory: Optimize small alloc alloation path
  > memory: Limit alloc_sites size
  > memory: Add general comment about sampling strategy
  > memory: Use probabilistic sampler
  > util: Adapt memory sampler to seastar
  > util: Import Android Memory Sampler
  > memory: Use separate small pool for tracking sampled allocations
  > memory: Support enabling memory profiling at runtime
  > util/source_location-compat: mark `source_location::current()` consteval
  > build: use new behavior defined by CMP0155 when building C++ modules
  > circleci: build with C++20 modules enabled
  > seastar.cc: replace cryptopp with gnutls when building seastar modules
  > alien: include used header
  > seastar.cc: include used headers in the global purview
  > docker: install clang-tools-17
  > net/tcp: generate a random src_port hashed to current shard if smp::count > 1
  > net, websocket: replace Crypto++ calls with GnuTLS
  > README-DPDK.md: point user to DPDK's quick start guide
  > reactor: print fatal error using logger as well
  > Avoid ping-pong in spinlock::lock
  > memory: Add allocator perf tests
  > memory: Add a basic sized deletion test
  > Prometheus: Disable Prometheus protobuf with a configuration
  > treewide: bring back prometheus protobuf support
* test/manual/sstable_scan_footprint_test: update to adapt to the
  breaking change of "memory: Use probabilistic sampler" in seastar

Closes scylladb/scylladb#16610
2024-01-04 09:36:53 +02:00
Kefu Chai
47d8edc0fc test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/
the latter raises a RuntimeError if there is no no running event loop,
while the former gets one from the the default policy in this case.
in the use cases in test.py, there is always a running event loop,
when `asyncio.get_event_loop()` gets called. so let's use
the preferred `asyncio.get_running_loop()`.

see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16398
2024-01-04 08:39:49 +02:00
Botond Dénes
d9c30833ea tools/schema_loader: make real_db.tables the only source of truth on existing tables
Currently, we have `real_db.tables` and `schemas`, the former containing
system tables needed to parse statements, and the latter accumulating
user tables parsed from CQL. This will be error-prone to maintain with
view/index support, so ditch `schemas` and instead add a `user` flag to
`table` and accumulate all tables in `real_db.tables`.
At the end, just return the schemas of all user tables.
2024-01-04 01:32:10 -05:00
Botond Dénes
ef3d143886 tools/schema_loader: table(): store const keyspace&
No need for mutable reference, const ref makes life easier, because some
lookup APIs of data_dictinary::database return const keyspace& only.
2024-01-04 01:32:10 -05:00
Botond Dénes
1003508066 tools/schema_loader: make database,keyspace,table non-movable
These types contain self-references. Make sure they are not moved, not
even accidentally.
2024-01-04 01:32:10 -05:00
Botond Dénes
1f7b03672c cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
Scylla's schema tables code determines which index was added, by diffing
index definitions with previous ones. This is clunky to use in
tools/schema_loader.cc, so also return the index metadata for the newly
created index.
2024-01-04 01:32:10 -05:00
Botond Dénes
94dbb7cb29 cql3/statements/create_index_statement: make build_index_schema() public
tools/schema_builder.cc wants it.
2024-01-04 01:32:10 -05:00
Botond Dénes
039d41f5d4 cql3/statements/create_index_statement: relax some method's dependence on qp
The methods `validate_while_excuting()` and its only caller,
`build_index_schema()`, only use the query processor to get db from it.
So replace qp parameter with db one, relaxing requirements w.r.t.
callers.
2024-01-04 01:32:10 -05:00
Botond Dénes
5f42c2c7c4 cql3/statements/create_view_statement: make prepare_view() public
tools/schema_loader.cc wants to use it.
2024-01-04 01:32:10 -05:00
Kefu Chai
50cf62e186 build: cmake: do not link against Boost::dynamic_linking
Boost::dynamic_linking was introduced as a compatibility target
which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since
Scylla only runs on Linux, there is no need to link against this
library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16544
2024-01-04 08:06:19 +02:00
Lakshmi Narayanan Sreethar
1d6eaf2985 compaction manager: remove: cleanup _compaction_state on exceptions
If for some reason an exception is thrown in compaction_manager::remove,
it might leave behind stale table pointers in _compaction_state. Fix
that by setting up a deffered action to perform the cleanup.

Fixes #16635

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16632
2024-01-03 22:03:24 +02:00
Benny Halevy
9e8998109f gossiper: get_*_members_synchronized: acquire endpoint update semaphore
To ensure that the value they return is synchronized on all shards.

This got broken recently by 147f30caff.

Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16629
2024-01-03 17:41:46 +01:00
Michał Chojnowski
a209ae1573 cql3: type_json: fix an edge case in float-to-int conversion
Refer to the added comment for details.

This problem was found by a compiler warning, and I'm fixing
it mainly to silence the warning. I didn't give any thought
to its effects in practice.

Fixes #13077

Closes scylladb/scylladb#16625

[avi: changed Refs to Fixes]
2024-01-03 17:59:01 +02:00
Kefu Chai
2ad532df43 test: randomized_nemesis_test: move std::variant formatter up
we format `std::variant<std::monostate, seastar::timed_out_error,
raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown,
raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>`
in this source file. and currently, we format `std::variant<..>` using
the default-generated `fmt::formatter` from `operator<<`, so in order to
format it using {fmt}'s compile-time check enabled, we have to make the
`operator<<` overload for `std::variant<...>` visible from the caller
sites which format `std::variant<...>` using {fmt}.

in this change, the `operator<<` for `std::variant<...>` is moved to
from the middle of the source file to the top of it, so that it can
be found when the compiler looks up for a matched `fmt::formatter`
for `std::variant<...>`.

please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`,
as its specialization for `std::variant` requires that all the types
of the variant is `is_formattable`. but the default generated formatter
for type `T` is not considered as the proof that `T` is formattable.

this should address the FTBFS with the latest seastar like:

```
 /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>')
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16616
2024-01-03 16:38:25 +01:00
Kefu Chai
2c394e3f6f tablets: remove unused #includes
the removed #include headers are not used, so let's drop their
`#include`s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16619
2024-01-03 15:30:40 +01:00
Avi Kivity
20531872a7 Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#16614

* github.com:scylladb/scylladb:
  test: randomized_nemesis_test: add formatter for append_entry
  test: randomized_nemesis_test: move append_reg_model::entry out
2024-01-03 15:06:33 +02:00
Kefu Chai
dde8f694f6 build: cmake: use # for line comment
it was a copy-pasta error introduced by 2508d339. the copyright
blob was copied from a C++ source code, but the CMake language
define the block comment is different from the C++ language.

let's use the line comment of CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16615
2024-01-03 15:05:00 +02:00
Tomasz Grabiec
715e062d4a Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.

Closes scylladb/scylladb#16571

* github.com:scylladb/scylladb:
  table, memtable: share log-structured allocator statistics across all memtables in a table
  memtable: consolidate _read_section, _allocating_section in a struct
2024-01-03 14:03:40 +01:00
Avi Kivity
b8a0e3543e docs: ddl: document the initial_tablets replication strategy option
While the feature is experimental, this makes it easier to experiment
with it.

An example is provided.

Closes scylladb/scylladb#16193
2024-01-03 13:49:30 +01:00
Benny Halevy
147f30caff gossiper: mutate_live_and_unreachable_endpoints: make exception safe
Change the mutate_live_and_unreachable_endpoints procedure
so that the called `func` would mutate a cloned
`live_and_unreachable_endpoints` object in place.

Those are replicated to temporary copies on all shards
using `foreign<unique_ptr<>>` so that the would be
automatically freed on exception.

Only after all copies are made, they are applied
on all gossiper shards in a noexcept loop
and finally, a `on_success` function is called
to apply further side effects if everything else
was replicated successfully.

The latter is still susceptible to exceptions,
but we can live with those as long as `_live_endpoints`
and `_unreachable_endpoints` are synchronized on all shards.

With that, the read-only methods:
`get_live_members_synchronized` and
`get_unreachable_members_synchronized`
become trivial and they just return the required data
from shard 0.

Fixes #15089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16597
2024-01-03 14:46:10 +02:00
Benny Halevy
fadcef01f5 database: setup_scylla_memory_diagnostics_producer: replace infinity sign with unlimited string
The infinity unicode sign used for dumping read concurrency semaphore
state, `∞` may be misrendered.
For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log
```
  Read Concurrency Semaphores:
    user: 0/100, 1K/9M, queued: 0
    streaming: 0/10, 0B/9M, queued: 0
    system: 0/10, 0B/9M, queued: 0
    compaction: 0/∞, 0B/∞
```

Instead, just print the word `unlimited`.

This was introduced in 34c213f9bb

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16534
2024-01-03 14:46:10 +02:00
Kefu Chai
3e4159fece repair: remove unused #include
remove the unused #include headers from repair.hh, as they are not
directly used. after this change, task_manager_module.hh fails to
have access to stream_reason, so include it where it is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16618
2024-01-03 14:46:10 +02:00
Kefu Chai
1f4b5126f6 build: cmake: add comment explaining CMAKE_CXX_FLAGS_RELWITHDEBINFO
to clarify why we need to set this flagset instead of appending to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16546
2024-01-03 14:46:10 +02:00
Kefu Chai
3ef0345b7f test/nodetool: log response from mock server when handling JSONDecodeError
it's observed that the mock server could return something not decodable
as JSON. so let's print out the response in the logging message in this case.
this should help us to understand the test failure better if it surfaces again.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16543
2024-01-03 14:46:10 +02:00
Kefu Chai
0484ac46af test: randomized_nemesis_test: add formatter for append_entry
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Kefu Chai
32e55731ab test: randomized_nemesis_test: move append_reg_model::entry out
this change prepares for adding fmt::formatter for append_entry.
as we are using its formatter in the inline member functions of
`append_reg_model`. but its `fmt::formatter` can only be specialized out of
this class. and we don't have access to `format_as()` yet in {fmt} 9.1.0
which is shipped along with fedora38, which is in turn used for
our base build image.

so, in this change, `append_reg_model::entry` is extracted and renamed
to `append_entry`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Sylwia Szunejko
91a5a41313 add a way to negotiate generation of the tablet info for drivers
Tablets metadata is quite expensive to generate (each data_value is
an allocation), so an old driver (without support for tablets) will
generate huge amounts of such notifications. This commit adds a way
to negotiate generation of the notification: a new driver will ask
for them, and an old driver won't get them. It uses the
OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec.

Closes scylladb/scylladb#16611
2024-01-02 20:00:50 +02:00
Kefu Chai
2508d33946 build: cmake: add Findcryptopp.cmake
seastar dropped the dependency to Crypto++, and it also removed
Findcryptopp.cmake from its `cmake` directory. but scylladb still
depends on this library. and it has been using the `Findcryptopp.cmake`
in seastar submodule for finding it.

after the removal of this file, scylladb would not be able to
use it anymore. so, we have to provide our own `Findcryptopp.cmake`.

Findcryptopp.cmake is copied from the Seastar project. So its
date of copyright is preserved. and it was licensed under Apache 2.0,
since we are creating a derivative work from it. let's relicense
it under Apache 2.0 and AGPL 3.0 or later.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16601
2024-01-02 19:09:50 +02:00
Kefu Chai
34259a03d0 treewide: use consteval string as format string when formatting log message
seastar::logger is using the compile-time format checking by default if
compiled using {fmt} 8.0 and up. and it requires the format string to be
consteval string, otherwise we have to use `fmt::runtime()` explicitly.

so adapt the change, let's use the consteval string when formatting
logging messages.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16612
2024-01-02 19:08:47 +02:00
Kefu Chai
64a227fba0 alternator/auth: remove unused #include
in `alternator/auth.cc`, none of the symbols in "query" namespace
provided by the removed headers is used is used, so there is no
need to include this header file.

the same applies to other removed header files.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16603
2024-01-02 17:50:59 +02:00
Kamil Braun
949658590f Merge 'raft topology: do not update token metadata in on_alive and on_remove' from Patryk Jędrzejczak
In the Raft-based topology, we should never update token metadata
through gossip notifications. `storage_service::on_alive` and
`storage_service::on_remove` do it, so we ignore their parts that
touch token metadata.

Additionally, we improve some logs in other places where we ignore
the function because of using the Raft-based topology.

Fixes scylladb/scylladb#15732

Closes scylladb/scylladb#16528

* github.com:scylladb/scylladb:
  storage_service: handle_state_left, handle_state_normal: improve logs
  raft topology: do not update token metadata in on_alive and on_remove
2024-01-02 16:08:50 +01:00
Kefu Chai
dd496afff3 mutation: add formatter for {atomic_cell_view,atomic_cell}::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for `atomic_cell_view::printer`
and `atomic_cell::printer` respectively, and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16602
2024-01-02 16:14:42 +02:00
Kamil Braun
7f6955b883 Merge 'test: make use of concurrent bootstrap' from Patryk Jędrzejczak
In #16102, we added a test for concurrent bootstrap in the raft-based
topology. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low. Therefore,
we can safely make use of it in all tests using the raft-based topology.

This PR:
- makes all initial servers start concurrently in topology tests,
- replaces all multiple `server_add` calls with a single `servers_add`
  call in tests using the raft-based topology,
- removes no longer needed `test_concurrent_bootstrap`.

The changes listed above:
- make running tests a bit faster due to concurrent bootstraps,
- make multiple tests test concurrent bootstrap previously tested by
  a single test.

Fixes scylladb/scylladb#15423

Closes scylladb/scylladb#16384

* github.com:scylladb/scylladb:
  test: test_different_group0_ids: fix comments
  test: remove test_concurrent_bootstrap
  test: replace multiple server_add calls with servers_add
  test: ScyllaCluster: start all initial servers concurrently
  test: ManagerClient: servers_add: specify consistent-topology-changes assumption
2024-01-02 15:11:18 +01:00
Sylwia Szunejko
467d466f7e put all tablet info into one field of custom_payload and update docs
Previously, the tablet information was sent to the drivers
in two pieces within the custom_payload. We had information
about the replicas under the `tablet_replicas` key and token range
information under `token_range`. These names were quite generic
and might have caused problems for other custom_payload users.
Additionally, dividing the information into two pieces raised
the question of what to do if one key is present while the other
is missing.

This commit changes the serialization mechanism to pack all information
under one specific name, `tablets-routing-v1`.

From: Sylwia Szunejko <sylwia.szunejko@scylladb.com>

Closes scylladb/scylladb#16148
2024-01-02 14:35:37 +02:00
Patryk Jędrzejczak
215534d527 test: test_different_group0_ids: fix comments
The test disables consistent topology changes, not cluster
management.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
466723a74f test: remove test_concurrent_bootstrap
This test only adds 3 nodes concurrently to the empty cluster.
After making many other tests use ManagerClient.servers_add, it
serves no purpose.

We had added this test before we decided to use
ManagerClient.servers_add in many tests to avoid multiple failures
in CI if it turned out that the concurrent bootstrap is flaky with
high frequency there. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
a8513bd41b test: replace multiple server_add calls with servers_add
ManagerClient.servers_add can be used in every test that uses
consistent topology changes. We replace all multiple server_add
calls in such tests with a single servers_add call to make these
tests faster and simplify their code. Additionally, these
servers_add calls will test concurrent bootstraps for free.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
debf1db3ef test: ScyllaCluster: start all initial servers concurrently
Starting all initial servers concurrently makes tests in suites
with initial_size > 1 run a bit faster. Additionally, these tests
test concurrent bootstraps for free.

add_servers can be called only if the cluster uses consistent
topology changes. We can use this function unconditionally in
install_and_start because every suite uses consistent topology
changes by default. The only way to not use it is by adding all
servers with a config that contains experimental_features without
consistent-topology-changes.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
16b0eeb3d6 test: ManagerClient: servers_add: specify consistent-topology-changes assumption
ManagerClient.servers_add can be called only if the cluster uses
consistent topology changes. We add this specification to the
leading comment.
2024-01-02 12:19:31 +01:00
Kefu Chai
f4bd86384b install.sh: use a temporary file when packaging scylla.yaml
we create a default `scylla.yaml` on the fly in `install.sh`. but
the path to the temporary file holding the default yaml file is
hardwired to `/tmp/scylla.yaml`. this works fine if we only have a
single `install.sh` at a certain time point. but if we have multiple
`install.sh` process running in parallel, these packaging jobs could
step on each other when they create and remove the `scylla.yaml`.

in this change, because the limit of `installconfig`, it always consider
the "dest" parameter as a directory, `mktemp` is used for creating a
parent directory of the temporary file.

Fixes #16591
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16592
2024-01-01 21:50:29 +02:00
Kefu Chai
48b8544a63 .git: add skip more words and directories
we use "ue" for the short of "update_expressions", before we change
our minds and use a more readable name, let's add "ue" to the
"ignore_word_list" option of the codespell.

also, use the abslolute path in "skip" option. as the absolute paths
are also used by codespell's own github workflow. and we are still
observing codespell github workflow is showing the misspelling errors
in our "test/" directory even we have it listed in "skip". so this
change should silence them as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16593
2024-01-01 14:32:16 +02:00
Avi Kivity
8ba0decda5 Merge 'System.peers: enforce host_id' from Benny Halevy
The HOST_ID is already written to system.peers since inception pretty much (See https://github.com/scylladb/scylladb/pull/16376#discussion_r1429248185 for details).

However, it is written to the table using an individual CQL query and so it is not set atomically with other columns.
If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value.

This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes.

On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted.

With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id.

Refs https://github.com/scylladb/scylladb/pull/15903

Closes scylladb/scylladb#16376

* github.com:scylladb/scylladb:
  gms: endpoint_state: change application_state_map to std::unordered_map
  system_keyspace: update_peer_info: drop single-column overloads
  storage_service: drop do_update_system_peers_table
  storage_service: on_change: fixup indentation
  endpoint_state subscriptions: batch on_change notification
  everywhere: drop before_change subscription
  system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id
  system_keyspace: drop update_tokens(endpoint, tokens) overload
  storage_service: seal peer info with host_id
  storage_service: update_peer_info: pass peer_info to sys_ks
  gms: endpoint_state: define application_state_map
  system_keyspace: update_peer_info: use struct peer_info for all optional values
  query_processor: execute_internal: support unset values
  types: add data_value_list
  system_keyspace: get rid of update_cached_values
  storage_service: do not update peer info for this node
2023-12-31 21:22:04 +02:00
Benny Halevy
cdd5605d81 gms: endpoint_state: change application_state_map to std::unordered_map
State changes are processed as a batch and
there is no reason to maintain them as an ordered map.
Instead, use a std::unordered_map that is more efficient.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
c520fc23f0 system_keyspace: update_peer_info: drop single-column overloads
They are no longer used.
Instead, all callers now pass peer_info.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
0e5a666e6f storage_service: drop do_update_system_peers_table
It is no longer used after previous patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
13d395fa6a storage_service: on_change: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
1d07a596bf everywhere: drop before_change subscription
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
7670f60b83 system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id
Skip rows that have no host_id to make
sure the node state we load always has a valid host_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
74159bb5ae system_keyspace: drop update_tokens(endpoint, tokens) overload
It is unused now after the previous patch
to update_peer_info in one call.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
2075c85b70 storage_service: seal peer info with host_id
When adding a peer via update_peer_info,
insert all columns in a single query
using system_keyspace::peer_info.
This ensures that `host_id` is inserted along with all
other app states, so we can rely on it
when loading the peer info after restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
eb4cd388ce storage_service: update_peer_info: pass peer_info to sys_ks
Use the newly added system_keyspace::peer_info
to pass a struct of all optional system.peea members
to system_keyspace::update_peer_info.

Add `get_peer_info_for_update` to construct said struct
from the endpoint state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
5abf556399 gms: endpoint_state: define application_state_map
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
b2735d47f7 system_keyspace: update_peer_info: use struct peer_info for all optional values
Define struct peer_info holding optional values
for all system.peers columns, allowing the caller to
update any column.

Pass the values as std::vector<std::optional<data_value>>
to query_processor::execute_internal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:30 +02:00
Benny Halevy
6123dc6b09 query_processor: execute_internal: support unset values
Add overloads for execute_internal and friends
accepting a vector of optional<data_value>.

The caller can pass nullopt for any unset value.
The vector of optionals is translated internally to
`cql3::raw_value_vector_with_unset` by `make_internal_options`.

This path will be called by system_keyspace::update_peer_info
for updating a subset of the system.peers columns.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:21:35 +02:00
Benny Halevy
328ce23c78 types: add data_value_list
data_value_list is a wrapper around std::initializer_list<data_value>.
Use it for passing values to `cql3::query_processor::execute_internal`
and friends.

A following path will add a std::variant for data_value_or_unset
and extend data_value_list to support unset values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:17:27 +02:00
Benny Halevy
3cba079b26 gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
Currently, when loading peers' endpoint state from system.peers,
add_saved_endpoint is called.
The first instance of the endpoint state is created with the default
heart_beat_state, with both generation and version set to zero.
However, if add_saved_endpoint finds an existing instance of the
endpoint state, it reuses it, but it updates its heart_beat_state
with the local heart_beat_state() rather than keeping the existing
heart_beat_state, as it should.

This is a problem since it may confuse updates over gossip
later on via do_apply_state_locally that compares the remote
generation vs. the local generation, so they must stem from
the same root that is the endpoint itself.

Fixes #16429

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
3099c5b8ab storage_service: topology_state_load: lock endpoint for add_saved_endpoint
`topology_state_load` currently calls `add_saved_endpoint`
only if it finds no endpoint_state_ptr for the endpoint.
However, this is done before locking the endpoint
and the endpoint state could be inserted concurrently.

To prevent that, a permit_id parameter was added to
`add_saved_endpoint` allowing the caller to call it
while the endpoint is locked.  With that, `topology_state_load`
locks the endpoint and checks the existence of the endpoint state
under the lock, before calling `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
db434e8cb5 raft_group_registry: move on_alive error injection to gossiper
Move the `raft_group_registry::on_alive` error injection point
to `gossiper::real_mark_alive` so it can delay marking the endpoint as
alive, and calling the `on_alive` callback, but without holding
the endpoint_lock.

Note that the entry for this endpoint in `_pending_mark_alive_endpoints`
still blocks marking it as alive until real_mark_alive completes.

Fixes #16506

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 15:28:54 +02:00
Konstantin Osipov
246da8884a test.py: override SCYLLA_* env keys
test.py inherits its env from the user, which is the right thing:
some python modules, e.g. logging, do accept env-based configuration.

However, test.py also starts subprocesses, i.e. tests, which start
scylladb instances. And when the instance is started without an explicit
configuration file, SCYLLA_CONF from user environment can be used.

If this scylla.conf contains funny parameters, e.g. unsupported
configuration options, the tests may break in an unexpected way.

Avoid this by resetting the respecting env keys in test.py.

Fixes gh-16583

Closes scylladb/scylladb#16577
2023-12-31 13:02:49 +02:00
Benny Halevy
85b3232086 system_keyspace: get rid of update_cached_values
It's a no-op.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 10:10:51 +02:00
Benny Halevy
f64ecc2edf storage_service: do not update peer info for this node
system_keyspace had a hack to skip update_peer_info
for the local node, and then to remove an entry for
the local node in system.peers if `update_tokens(endpoint, ...)`
was called for this node.

This change unhacks system_keyspace by considering
update of system.peers with the local address as
an internal error and fixing the call sites that do that.

Fixes #16425

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 10:10:51 +02:00
Patryk Jędrzejczak
da37e82fb9 test: add test_remove_alive_node
We add a test for the Raft-based topology's new feature - rejecting
the removenode operation on the topology coordinator side if the
node being removed is considered alive by the failure detector.

Additionally, the test tests a case when the removenode operation
is rejected on the initiator side.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
bd5ee04c18 topology_coordinator: reject removenode if the removed node is alive
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in storage_service::raft_removenode) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect. This patch adds a check on the topology
coordinator side.

Note that the only goal of this change is to improve the user
experience. The topology coordinator does not rely on the gossiper
to ensure correctness.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
cf955094c1 test: ManagerClient: remove unused wait_for_host_down
The previous commit removed the only call to wait_for_host_down.
Moreover, this function is identical to server_not_sees_other_server.
We can safely remove it.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
7038a033f2 test: remove_node: wait until the node being removed is dead
In the following commits, we make the topology coordinator reject
removenode requests if the node being removed is considered alive
by the gossiper. Before making this change, we need to adapt the
testing framework so that we don't have flaky removenode operations
that fail because the node being removed hasn't been marked as dead
yet. We achieve this by waiting until all other running nodes see
the node being removed as dead in all removenode operations.

Some tests are simplified after this change because they don't have
to call server_not_sees_other_server anymore.
2023-12-29 17:12:45 +01:00
Patryk Jędrzejczak
6ffacae0c7 storage_service: handle_state_left, handle_state_normal: improve logs
We log the information about ignoring the `handle_state_left`
function after logging the general entry information. It is better
to know what exactly is being ignored during debugging.

We also add the `permit_id` info to the log. All other functions
called through gossip notifications log it.
2023-12-29 15:10:56 +01:00
Patryk Jędrzejczak
3e551ef485 raft topology: do not update token metadata in on_alive and on_remove
In the Raft-based topology, we should never update token metadata
through gossip notifications. `storage_service::on_alive` and
`storage_service::on_remove` do it, so we ignore their parts that
touch token metadata.

There are other functions in storage_service called through gossip
notifications that are not ignored in the Raft-based topology.
However, we don't have to or cannot ignore them. We cannot ignore
`on_join` and `on_change` because they update the PEERS table used
by drivers. The rest of those functions don't have to be ignored.
These are:
- `before_change` - it does nothing,
- `on_dead` and `on_restart` - they only remove the RPC client and
  send notifications,
- `handle_state_bootstrap` and `handle_state_removed` - they are
  never called in the Raft-based topology.
2023-12-29 15:10:35 +01:00
Patryk Jędrzejczak
f1dea4bc8a storage_proxy: do not fence reads and writes to local tables
Fencing is necessary only for reads and writes to non-local tables.
Moreover, fencing a read or write to a local table can cause an
error on the bootstrapping node. It is explained in the comment
in storage_proxy::get_fence.

A scenario described in the comment has been reported in
scylladb/scylladb#16423. A write to the local RAFT table failed
because of fencing, and it killed server_impl::io_fiber.

Fixes scylladb/scylladb#16423

Closes scylladb/scylladb#16525
2023-12-28 19:34:27 +02:00
Nadav Har'El
91636f6d21 test/cql-pytest: reproducer of slightly too strict parser of timestamp
Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that
has 6 digits of precision for the fractional second, and only allows
3 digits of precision. This restriction makes sense - after all CQL
timestamp columns (note - this is NOT "using timestamp"!) only have
millisecond precision. Nevertheless, Cassandra does not have this
restriction and does allow these over-precise timestamps. In this patch
we add a test that demonstrates this difference.

Curiously, in the past Scylla *generated* this forbidden timestamp
format when outputting the timestamp to a string (e.g. toJson()),
which it then couldn't read back! This was issue #16575.
Today Scylla no longer generates this forbidden timestamp format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16576
2023-12-28 19:01:25 +02:00
Takuya ASADA
7275b614aa scylla_util.py: wait for apt operation on other processes
apt_install() / apt_uninstall() may fail if background process running
apt operation, such as unattended-upgrades.

To avoid this, we need to add two things:

1. For apt-get install / remove, we need to option "DPkg::Lock::Timeout=-1"
to wait for dpkg lock.

2. For apt-get update, there is no option to wait for cache lock.
Therefore, we need to implement retry-loop to wait for apt-get update
succeed.

Fixes #16537

Closes scylladb/scylladb#16561
2023-12-28 19:00:36 +02:00
Takuya ASADA
331d9ce788 install.sh: fix scylla-server.service failure on nonroot mode
On 3da346a86d, we moved
AmbientCapabilities to scylla-server.service, but it causes "Operation
not permitted" on nonroot mode.
It is because nonroot user does not have enough privilege to set
capabilities, we need to disable the parameter on nonroot mode.

Closes scylladb/scylladb#16574
2023-12-27 20:52:17 +02:00
Avi Kivity
6394854f04 Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El
This small series improves two things in the multi-node tests for tablet supports in materialized views:

1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)).
2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test).

The third patch in this series also fixes a comment typo discovered in a previous review.

Closes scylladb/scylladb#16440

* github.com:scylladb/scylladb:
  materialized views: fix typo in comment
  test_mv_tablets: simplify lookup of tablets
  alternator, tablets: improve Alternator LSI tablets test
2023-12-27 20:18:14 +02:00
Gleb Natapov
e31f6893af storage_service: topology coordinator: fix accessing outdated node in case of barrier failure
When metadata barrier fails a guard is released and node becomes
outdated. Failure handling path needs to re-take the guard and re-create
the node before continuing.

Fixes: #16568

Message-ID: <ZYxEm+SaBeFcRT8E@scylladb.com>
2023-12-27 18:40:10 +02:00
Avi Kivity
3ce0576a31 Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov
The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns.

closes: #16447
closes: #16449

Closes scylladb/scylladb#16565

* github.com:scylladb/scylladb:
  schema_tables: Use new_keyspace() sugar
  keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
  keyspace_metadata: Add default value for new_keyspace's durable_writes
  keyspace_metadata: Pack constructors with default arguments
2023-12-27 17:15:04 +02:00
Botond Dénes
1647b29cba tools/schema_loader: add db::config parameter to all load methods
So that a single centrally managed db::config instance can be shared by
all code requiring it, instead of creating local instances where needed.
This is required to load schema from encrypted schema-tables, and it
also helps memory consumption a bit (db::config consumes a lot of
memory).

Fixes: #16480

Closes scylladb/scylladb#16495
2023-12-27 16:28:38 +02:00
Nadav Har'El
e6dc9bca0d Merge 'Profile dumping rest api support' from Eliran Sinvani
This change is motivated by wanting to have code coverage reporting support.
Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't
suite all cases, more specifically:
1. In dtest, when some of the tests intentionally abruptly kill a node
2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and
teardown procedures (which currently kills the nodes).

This mini series adds two changes:
1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile')
2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current
test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage
reporting.

Refs #16323

Closes scylladb/scylladb#16557

* github.com:scylladb/scylladb:
  test.py: Dump coverage profile before killing a node
  rest api: Add an api for profile dumping
2023-12-27 12:06:39 +02:00
Eliran Sinvani
e49b3ffc89 test.py: Dump coverage profile before killing a node
Up until now the only way to get a coverage profile was to shut down the
ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage
profile was lost for every node that was killed abruptly (SIGKILL).
This in turn would have been requiring us to shut down all nodes
gracefully which is not something we set out to do.
Here we use the rest API for dumping the coverage profile which will
cause the most minimal impact possible on the test runs.
If the dumping fails (due to the node doesn't support the API or due to
a real error in dumping we ignore it as it is not part of the system we
would like to test.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:17:26 +02:00
Eliran Sinvani
4c60804c4c rest api: Add an api for profile dumping
As part of code coverage support we need to work with dumped profiles
for ScyllaDB executables.
Those profiles are created on two occasions:
1. When an application exits notmaly (which will trigger
   __llvm_dump_profile registered in the exit hooks.
2. For ScyllaDB commit d7b524cf10 introduced a manual call to
   __llvm_dump_profile upon receiving a SIGTERM signal.

This commit adds a third option, a rest API to dump the profile.
In addition the target file is logged and the counters are reset, which
enables incremental dumping of the profile.
Except for logging, if the executable is not instrumented, this API call
becomes a no-op so it bears minimal risk in keeping it in our releases.
Specifically for code coverage, the gain will be that we will not be
required to change the entire test run to shut down clusters gracefully
and this will cause minimal effect to the actual test behavior.

The change was tested by manually triggering the API in with and
without instrumentation as well as re triggering it with write
permissions for the profile file disabled (to test fault tolerance).

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:06:54 +02:00
Avi Kivity
2a76065e3d table, memtable: share log-structured allocator statistics across all memtables in a table
The log-structured allocator collects allocation statistics (which it
uses to manage memory reserves) in some objects kept in
memtable_table_shared_data. Right now, this object is local to memtable_list,
which itself is local to a tablet replica. Move it to table scope so
different tablets in the shard share the statistics. This helps a
newly-migrated tablet adjust more quickly.
2023-12-26 21:24:51 +02:00
Avi Kivity
02111d6754 memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.

Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.
2023-12-26 21:11:48 +02:00
Nadav Har'El
fc71c34597 Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani
Commit 62458b8e4f introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions.
Also added test that checks for regression on native functions execution and verified that it fails on authorization before
the fix and passes after the fix.

Closes scylladb/scylladb#16556

* github.com:scylladb/scylladb:
  test.py: Add test for native functions permissions
  select statement: verify EXECUTE permissions only for non native functions
2023-12-26 18:14:21 +02:00
Gleb Natapov
74d17719db test: add test to check failure handling in cdc generation commit 2023-12-26 16:01:34 +02:00
Gleb Natapov
21063b80fb storage_service: topology coordinator: rollback on failure to commit cdc generation
If the coordinator fail to notify all nodes about new cdc generation
during bootstrap it cannot proceed booting since it can cause data
lose with cdc. Rollback the topology operation if failure happens
during this state.
2023-12-26 15:58:15 +02:00
Pavel Emelyanov
129196db98 schema_tables: Use new_keyspace() sugar
The create_keyspace_from_schema_partition code creates ks metadata
without schemas and user-types. There's new_keyspace() convenience
helper for such cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 13:26:58 +03:00
Pavel Emelyanov
a1ad2571fc keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
It's only testing code that wants to call new_keyspace with existing
schemas, all the other callers either construct the ks metadata
directly, or use convenience new_keyspace with explicitly empty schemas.
By and large it's nicer if new_keyspace() doesn't requires this
argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 13:00:44 +03:00
Pavel Emelyanov
ffdafe4024 keyspace_metadata: Add default value for new_keyspace's durable_writes
Almost all callers call new_keyspace with durable writes ON, so it's
worth having default value for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 11:47:37 +03:00
Pavel Emelyanov
9ab0065796 keyspace_metadata: Pack constructors with default arguments
There's a cascade of keyspace_metadata constructors each adding one
default argument to the prevuous one. All this can be expressed shorter
with the help of native default argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 11:41:01 +03:00
Eliran Sinvani
a336550041 test.py: Add test for native functions permissions
Native functions (non UDF/UDA functions), should be usable even if a
user is not granted EXECUTE permissions on them.

This is a regression test that was added following:
https://github.com/scylladb/scylladb/issues/16526

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-26 10:27:04 +02:00
Eliran Sinvani
cac79977d6 select statement: verify EXECUTE permissions only for non native functions
Commit 62458b8e4f introduced the
enforcement of EXECUTE permissions of functions in cql select. However,
according to the reference in #12869, the permissions should be enforced
only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are
also unintentionally enforced also on native function.
This commit introduce the distinction and only enforces the permissions
on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer
supplied in #16526 and also with some the `min` and `max` native
functions.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-26 10:27:04 +02:00
Avi Kivity
3968fc11bf Merge 'cql: fix regression in SELECT * GROUP BY' from Nadav Har'El
This short series fixes a regression from Scylla 5.2 to Scylla 5.4 in "SELECT * GROUP BY" - this query was supposed to return just a single row from each partition (the first one in clustering order), but after the expression rewrite started to wrongly return all rows.

The series also includes a regression test that verifies that this query works doesn't work correctly before this series, but works with this patch - and also works as expected in Scylla 5.2 and in Cassadra.

Fixes #16531.

Closes scylladb/scylladb#16559

* github.com:scylladb/scylladb:
  test/cql-pytest: check that most aggregators don't take "*"
  cql-pytest: add reproducer for GROUP BY regression
  cql: fix regression in SELECT * GROUP BY
2023-12-25 19:53:55 +02:00
Avi Kivity
3da346a86d Merge 'Drop CentOS7 specific codes' from Takuya ASADA
Since we decided to drop CentOS7 support from latest version of Scylla, now we can drop CentOS7 specific codes from packaging scripts and setup scripts.

Related scylladb/scylla-enterprise#3502

Closes scylladb/scylladb#16365

* github.com:scylladb/scylladb:
  scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+
  dist: drop legacy control group parameters
  scylla-server.slice: Drop workaround for MemorySwapMax=0 bug
  dist: move AmbientCapabilities to scylla-server.service
  Revert "scylla_setup: add warning for CentOS7 default kernel"

[avi: CentOS 7 reached EOL on June 2024]
2023-12-25 18:25:05 +02:00
Kefu Chai
68c98d2203 build: cmake: link against boost static when --static-boost is specified
`--static-boost` is an option provided by `configure.py`. this option is
not used by our CI or building scripts. but in order to be compatible
with the existing behavior of `configure.py`, let's support this option
when building with CMake.

`Boost_USE_STATIC_LIBS` is a cmake variable supported by CMake's
FindBoost and Boost's own `BoostConfig.cmake`. see
https://cmake.org/cmake/help/latest/module/FindBoost.html#other-variables

by default boost is linked via its shared libraries. by setting
this variable, we link boost's static libraries.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16545
2023-12-25 18:23:49 +02:00
Avi Kivity
da022ca4e8 Merge 'build: cmake: add "mode_list" target ' from Kefu Chai
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named `list_modes`, which is added by `configure.py`.

so, in this change, the target is added to CMake as well. the variables
of "scylla_build_mode" defined by the per-mode configuration are
collected and printed by the `list_modes`.

because, by default, CMake generates a target for each build
configuration when a multi-config generator is used. but we only want to
print the build mode for a single time when "list_modes" is built. so
a "BYPRODUCTS" is deliberately added for the target, and the patch of
this "BYPRODUCTS" is named without the "$<CONFIG>" it its path.

Closes scylladb/scylladb#16532

* github.com:scylladb/scylladb:
  build: cmake: add "mode_list" target
  build: cmake: define scylla_build_mode
2023-12-25 18:20:34 +02:00
Kefu Chai
4a817f8a2a data_dictionary: use insert_or_assign() when appropriate
when compiling clang-18 in "release" mode, `assert()` is optimized out.
so `i` is not used. and clang complains like:

```
/home/kefu/dev/scylladb/data_dictionary/user_types_metadata.hh:29:14: error: unused variable 'i' [-Werror,-Wunused-variable]
   29 |         auto i = _user_types.find(type->_name);
      |              ^
```

in this change, we use `i` as the hint for the insertion, for two
reasons:

- silence the warning.
- avoid the looking up in the unordered_map twice with the same
  key.

`type` is not moved away when being passed to `insert_or_assign()`,
because otherwise, `type->_name` could be referencing a moved-away
shared_ptr, because the order of evaluating a function's parameter
is not determined. since `type` is a shared_ptr, the overhead is
negligible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16530
2023-12-25 18:18:20 +02:00
Takuya ASADA
0b894a7cac locator::ec2_snitch: change retry logic to exponential backoff
Since Amazon recommended to use exponential backoff logic when retries
to call AWS API, we should switch the logic on ec2_snitch.

see https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Related with #12160

Closes scylladb/scylladb#13442
2023-12-25 18:17:23 +02:00
Yaron Kaikov
8917947f29 build_docker: Add description and summary labels
Adding description and summary labels to our docker images per @tzach
and @mykaul request,

Closes scylladb/scylladb#16419
2023-12-25 18:14:56 +02:00
Pavel Emelyanov
ac3dd4bf5d test: Coroutinize some secondary_index_test cases
Now they are long then-chains that are hard to read

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16547
2023-12-25 18:08:19 +02:00
Nadav Har'El
55317666c6 test/cql-pytest: check that most aggregators don't take "*"
Although you can "SELECT COUNT(*)", this has special handling in the CQL
parser (it is converted into a special row-counting request) and you can't
give "*" to other aggregators - e.g., "SELECT SUM(*)". This patch includes
a simple test that confirms this.

I wanted to check this in relation to the previous patch, which did,
sort of, a "SELECT $$first$$(*)" - a syntax which this test shows
wouldn't have actually worked if we tried it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Nadav Har'El
e2773b4a3a cql-pytest: add reproducer for GROUP BY regression
test/cql-pytest/test_group_by.py has tests that verifies that requests
like

   SELECT p,c1,c2,v FROM tbl WHERE p=0 GROUP BY p

work as expected - the "GROUP BY p" means in this case that we should
only return the first row in the p=0 partition.

As a user discovered, it turns out that the almost identical request:

   SELECT * FROM tbl WHERE p=0 GROUP BY p

Doesn't work the same - before the fix in the previous patch, it
erroneously returned all rows in p=0, not just the first one.
The test in this patch demonstrates this - it fails on Scylla 5.4,
passes on Scylla 5.2 and on Cassandra - and passes when the fix
from the previous patch is used.

This patch includes another tiny test, to check the interaction of GROUP BY
with filtering. This second test passes on Scylla - but I want it in
anyway because it is yet another interaction that might break (the
user that reported #16531 also had filtering, and I was worried it might
have been related).

Refs #16531

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Nadav Har'El
1aea2136c8 cql: fix regression in SELECT * GROUP BY
Recently, the expression-rewrite effort changed the way that GROUP BY is
implemented. Usually GROUP BY involves an aggregation function (e.g., if
you want a separate SUM per partition). But there's also a query like

   SELECT p, c1, c2, v FROM tbl GROUP BY p

This query is supposed to return one row - the *first* row in clustering
order - per group (in this case, partition). The expression rewrite
re-implemented this feature by introducing a new internal aggregator,
first(), which returns the first aggregated value. The above query is
rewritten into:

   SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p

This case works correctly, and we even have a regression test for it.
But unfortunately the rewrite broke the following query:

   SELECT * FROM tbl GROUP BY p

Note the "*" instead of the explicit list of columns.
In our implementation, a selection of "*" is looks like an empty
selection, and it didn't get the "first()" treatment and it remained
a "SELECT *" - and wrongly returned all rows instead of just the first
one in each partition. This was a regression - it worked correctly in
Scylla 5.2 (and also in Cassandra) - see the next patch for a
regression test.

In this patch we fix this regression. When there is a GROUP BY, the "*"
is rewritten to the appropriate list of all visible columns and then
gets the first() treatment, so it will return only the first row as
expected. The next patch will be a test that confirms the bug and its
fix.

Fixes #16531

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:52:57 +02:00
Avi Kivity
a7efaca878 Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov
Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values.

This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate.

Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches
* schema_tables: Relax extract_scylla_specific_ks_info() check
* locator,schema: Move initial tablets from r.s. options to params

refs: #16319
refs: #16364

Closes scylladb/scylladb#16555

* github.com:scylladb/scylladb:
  test: Add sanity tests for tablets initialization and altering
  locator,schema: Move initial tablets from r.s. options to params
  schema_tables: Relax extract_scylla_specific_ks_info() check
  locator: Keep optional initial_tablets on r.s. params
  ks_prop_defs: Add initial_tablets& arg to prepare_options()
  keyspace_metadata: Carry optional<initial_tablets> on board
  locator: Pass abstract_replication_strategy& into validate_tablet_options()
  locator: Carry r.s. params into process_tablet_options()
  locator: Call create_replication_strategy() with r.s. params
  locator: Wrap replication_strategy_config_options into replication_strategy_params
  locator: Use local members in ..._replication_strategy constructors
2023-12-25 17:44:10 +02:00
Pavel Emelyanov
1d2c871219 test: Add sanity tests for tablets initialization and altering
Check that the initial_tablets appears in system_schema.scylla_keyspaces
if turned on explicitly

Check that it's possible to change initial_tablets with ALTER KEYSPACE

Check that changing r.s. from simple to network-topology doesn't
activate tablets

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:09:01 +03:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Pavel Emelyanov
30e7273658 schema_tables: Relax extract_scylla_specific_ks_info() check
Nowadays reading scylla-specific info from schema happens under
respective schema feature. However (at least in raft case) when a new
node joins the cluster merging schema for the first time may happen
_before_ features are merged and enabled. Thus merging schema can go the
wrong way by errorneously skipping the scylla-specific info.

On the other hand, if system_schema.scylla_keyspaces is there it's
there, there's no reason _not_ to pick this data up in that case.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:05:01 +03:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
2d480a2093 ks_prop_defs: Add initial_tablets& arg to prepare_options()
The prepare_options() method is in charge of pre-tuning the replication
strategy CQL parameters so that real keyspace and r.s. creation code
doesn't see some of those. The "initial_tablets" option is going to be
removed from the real options and be placed into scylla-specific part of
the schema. So the prepare_options() will need to modify both -- the
legacy options _and_ the (soon to be separate) initial_tablets thing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:00:50 +03:00
Pavel Emelyanov
a67c535539 keyspace_metadata: Carry optional<initial_tablets> on board
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.

This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:58:05 +03:00
Pavel Emelyanov
45f4276de6 locator: Pass abstract_replication_strategy& into validate_tablet_options()
It will need to check if the r.s. in question had been marked as
per-table one in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:49 +03:00
Pavel Emelyanov
bf824d79d9 locator: Carry r.s. params into process_tablet_options()
The latter method is the one that will need extended params in next
patches. It's called from network_topology_strategy() constructor which
already has params at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:02 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
f88ba0bf5a locator: Wrap replication_strategy_config_options into replication_strategy_params
When replication strategy class is created caller parr const reference
on the config options which is, in turn, a map<string, string>. In the
future r.s. classes will need to get "scylla specific" info along with
legacy options and this patch prepares for that by passing more generic
params argument into constructor. Currently the only inhabitant of the
new params is the legacy options.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:53:03 +03:00
Pavel Emelyanov
ecbafd81f2 locator: Use local members in ..._replication_strategy constructors
The `config_options` arg had been used to initialize `_config_options`
field of the base abstract_replication_strategy class, so it's more
idiomatic to use the latter. Also it makes next patches simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:51:51 +03:00
Pavel Emelyanov
f621afa3ec database: Copy storage options too when updating keyspace metadata
When altering a keyspace several keyspace_metadata objects are created
along the way. The last one, that is then kept on the keyspace_metadata
object, forgets to get its copy of storage options thus transparently
converting to LOCAL type.

The bug surfaces itself when altering replication strategy class for
S3-backed storage -- the 2nd attempt fails, because after the 1st one
the keyspace_metadata gets LOCAL storage options and changing storage
options is not allowed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16524
2023-12-25 13:31:15 +02:00
Benny Halevy
060b16f987 view: apply_to_remote_endpoints: fix use-after-free
b815aa021c added a yield before
the trace point, causing the moved `frozen_mutation_and_schema`
(and `inet_address_vector_topology_change`) to drop out of scope
and be destroyed, as the rvalue-referenced objects aren't moved
onto the coroutine frame.

This change passes them by value rather than by rvalue-reference
so they will be stored in the coroutine frame.

Fixes #16540

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16541
2023-12-24 21:43:48 +02:00
Botond Dénes
da033343b7 tools/schema_loader: read_schema_table_mutation(): close the reader
The reader used to read the sstables was not closed. This could
sometimes trigger an abort(), because the reader was destroyed, without
it being closed first.
Why only sometimes? This is due to two factors:
* read_mutation_from_flat_mutation_reader() - the method used to extract
  a mutation from the reader, uses consume(), which does not trigger
  `set_close_is_required()` (#16520). Due to this, the top-level
  combined reader did not complain when destroyed without close.
* The combined reader closes underlying readers who have no more data
  for the current range. If the circumstances are just right, all
  underlying readers are closed, before the combined reader is
  destoyed. Looks like this is what happens for the most time.

This bug was discovered in SCT testing. After fixing #16520, all
invokations of `scylla-sstable`, which use this code would trigger the
abort, without this patch. So no further testing is required.

Fixes: #16519

Closes scylladb/scylladb#16521
2023-12-24 17:21:32 +02:00
Nadav Har'El
6640278aa7 materialized views: fix typo in comment
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:12:44 +02:00
Nadav Har'El
f9f20e779c test_mv_tablets: simplify lookup of tablets
The tests looked up a table's tablets in an elaborate two-stage search -
first find the table's "id", and then look up this id in the list of
tablets. It is much simpler to just look up the table's name directly
in the list of tablets - although this name is not a key, an ALLOW
FILTERING search is good enough for a test.

As a bonus, with the new technique we don't care if the given name
is the name of a table or a view, further simplifying the test.

This is just a test code cleanup - there is no functional change in
the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:12:44 +02:00
Nadav Har'El
cdd5b19f12 alternator, tablets: improve Alternator LSI tablets test
The test test_tablet_alternator_lsi_consistency, checking that Alternator
LSI allow strongly-consistent reads even with tablets, used a large
cluster (10 nodes), to improve the chance of reaching an "unlucky" tablet
placement - and even then only failed in about half the runs without
the code fixed.

In this patch, we rewrite the test using a much more reliable approach:
We start only two nodes, and force the base's tablet onto one node, and
the view's table onto the second node. This ensures with 100% certainty
that the view update is remote, and the new test fails every single time
before the code fix (I reverted the fix to verify) - and passes after it.

The new test is not only more reliable, it's also significantly faster
because it doesn't need to start a 10-node cluster.

We can also remove the tag that excluded this test from debug build
mode tests because the 10-node boot was too slow.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:11:43 +02:00
Kefu Chai
2bec6751d3 build: cmake: add "mode_list" target
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named `list_modes`, which is added by `configure.py`.

so, in this change, the target is added to CMake as well. the variables
of "scylla_build_mode" defined by the per-mode configuration are
collected and printed by the `list_modes`.

because, by default, CMake generates a target for each build
configuration when a multi-config generator is used. but we only want to
print the build mode for a single time when "list_modes" is built. so
a "BYPRODUCTS" is deliberately added for the target, and the patch of
this "BYPRODUCTS" is named without the "$<CONFIG>" it its path.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-24 12:35:02 +08:00
Kefu Chai
79943e0516 build: cmake: define scylla_build_mode
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named "list_modes", which is added by `configure.py`.

so, in this change, to prepare for adding the target,
"scylla_build_mode" is defined, so we can reuse it in a following-up
change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-24 12:28:23 +08:00
Tomasz Grabiec
2590274f95 Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov
This switch is currently possible, but results in not supported keyspace state

Closes scylladb/scylladb#16513

* github.com:scylladb/scylladb:
  test: Add a test that switching between vnodes and tablets is banned
  cql3/statements: Don't allow switching between vnode and per-table replication strategies
  cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate
2023-12-22 17:22:36 +01:00
Kefu Chai
642652efab test/cql-pytest/test_tools.py: test shard-of with a single partition
test_scylla_sstable_shard_of takes lots of time preparing the keys for a
certain shard. with the debug build, it takes 3 minutes to complete the
test.

so in order to test the "shard-of" subcommand in an more efficient way,
in this change, we improve the test in two ways:

1. cache the output of 'scylla types shardof`. so we can avoid the
   overhead of running a seastar application repeatly for the
   same keys.
2. reduce the number of partitions from 42 to 1. as the number of
   partitions in an sstable does not matter when testing the
   output of "shard-of" command of a certain sstable. because,
   the sstable is always generated by a certain shard.

before this change, with pytest-profiling:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000  181.950   60.650 runner.py:219(call_and_report)
      4/3    0.000    0.000  181.948   60.649 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000  181.948   60.649 runner.py:318(from_call)
      4/3    0.000    0.000  181.948   60.649 runner.py:262(<lambda>)
    44/11    0.000    0.000  181.935   16.540 _hooks.py:427(__call__)
    43/11    0.000    0.000  181.935   16.540 _manager.py:103(_hookexec)
    43/11    0.000    0.000  181.935   16.540 _callers.py:30(_multicall)
      361    0.001    0.000  181.531    0.503 contextlib.py:141(__exit__)
   782/81    0.001    0.000  177.578    2.192 {built-in method builtins.next}
     1044    0.006    0.000   92.452    0.089 base_events.py:1894(_run_once)
       11    0.000    0.000   91.129    8.284 fixtures.py:686(<lambda>)
    17/11    0.000    0.000   91.129    8.284 fixtures.py:1025(finish)
        4    0.000    0.000   91.128   22.782 fixtures.py:913(_teardown_yield_fixture)
      2/1    0.000    0.000   91.055   91.055 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000   91.055   91.055 runner.py:119(runtestprotocol)
        2    0.000    0.000   91.052   45.526 conftest.py:50(cql)
        2    0.000    0.000   91.040   45.520 util.py:161(cql_session)
        1    0.000    0.000   91.040   91.040 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000   91.040   91.040 runner.py:509(teardown_exact)
     1945    0.002    0.000   90.722    0.047 events.py:82(_run)
```

after this change:
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000    8.271    2.757 runner.py:219(call_and_report)
    44/11    0.000    0.000    8.270    0.752 _hooks.py:427(__call__)
    44/11    0.000    0.000    8.270    0.752 _manager.py:103(_hookexec)
    44/11    0.000    0.000    8.270    0.752 _callers.py:30(_multicall)
      4/3    0.000    0.000    8.269    2.756 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000    8.269    2.756 runner.py:318(from_call)
      4/3    0.000    0.000    8.269    2.756 runner.py:262(<lambda>)
       48    0.000    0.000    8.269    0.172 {method 'send' of 'generator' objects}
       27    0.000    0.000    5.671    0.210 contextlib.py:141(__exit__)
       11    0.000    0.000    4.297    0.391 fixtures.py:686(<lambda>)
      2/1    0.000    0.000    4.228    4.228 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000    4.228    4.228 runner.py:119(runtestprotocol)
        2    0.000    0.000    4.213    2.106 capture.py:877(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:509(teardown_exact)
        2    0.000    0.000    3.628    1.814 capture.py:872(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 runner.py:160(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 python.py:1797(runtest)
   114/81    0.001    0.000    3.505    0.043 {built-in method builtins.next}
       15    0.784    0.052    3.183    0.212 subprocess.py:417(check_output)
```

Fixes #16516
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16523
2023-12-22 15:20:03 +02:00
Petr Gusev
c05fd8c018 storage_service: node_ops_cmd_handler: decommission rollback, ignore the node if's already removed
This is a regression after #15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.

Fixes #16466

Closes scylladb/scylladb#16508
2023-12-22 12:42:38 +01:00
Avi Kivity
6f6170aae7 Update seastar submodule
* seastar ae8449e04f...e0d515b6cf (18):
  > reactor: poll less frequently in debug mode
  > build: s/exec_program/execute_process/
  > Merge 'httpd: support temporary redirect from inside async reply' from Noah Watkins
  > Merge 'core: enable seastar to run multiple times in a single process' from Kefu Chai
  > rpc/rpc_types: add formatter for rpc::optional<T>
  > memory: do not set_reclaim_hook if cpu_mem_ptr is not set
  > circleci: do not set disable dpdk explicitly
  > fair_queue: Do not pop unplugged class immediately
  > build: install Finducontext.cmake and FindSystem-SDT.cmake
  > treewide: include used headers
  > build: define SEASTAR_COROUTINES_ENABLED for Seastar module
  > seastar.cc: include "core/prefault.hh"
  > build: enable build C++20 modules with GCC 14
  > build: replace seastar_supports_flag() with check_cxx_compiler_flag()
  > Merge 'build: cleanups configure.py to be more PEP8 compatible' from Kefu Chai
  > circleci: build with dpdk enabled
  > build: add "--enable-cxx-modules" option to configure.py
  > build: use a different *_CMAKE_API for CMake 3.27

Closes scylladb/scylladb#16500
2023-12-22 12:58:39 +02:00
Tzach Livyatan
45ffa5221e Improve nodetool scrub definition
fix #16505

Closes scylladb/scylladb#16518
2023-12-22 12:09:58 +02:00
Tomasz Grabiec
9c7e5f6277 Merge 'Fix secondary index feature with tablets' from Nadav Har'El
Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets:

1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396.
2. Then we need to keep the promise that our documentation makes - that **local** secondary index should be synchronously updated - Fixes #16371.

As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series.

Closes scylladb/scylladb#16436

* github.com:scylladb/scylladb:
  secondary-index, tablets: ensure that LSI are synchronous
  test: add missing "tags" schema extension to cql_test_env
  mv, test: fix delay_before_remote_view_update injection point
  secondary index: fix view creation when using tablets
2023-12-21 23:37:00 +01:00
Botond Dénes
1ce07c6f27 test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions
The test test_many_partitions is very slow, as it tests a slow scan over
a lot of partitions. This was observed to time out on the slower ARM
machines, making the test flaky. To prevent this, create an
extra-patient cql connection with a 10 minutes timeout for the scan
itself.
This is a follow-up to fb9379edf1, which
attempted to fix this, but didn't patch all the places doing slow scans.
This patch fixes the other scan, the one actually observed to time-out
in CI.

Fixes: #16145

Closes scylladb/scylladb#16370
2023-12-21 19:55:06 +02:00
Pavel Emelyanov
a03755d6d7 test: Add a test that switching between vnodes and tablets is banned
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:57:55 +03:00
Pavel Emelyanov
4de433ac23 cql3/statements: Don't allow switching between vnode and per-table replication strategies
When ALTER-ing a keyspace one may as well change its vnode/tablet
flavor, which is not currently supported, so prohibit this change
explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:57:00 +03:00
Pavel Emelyanov
299219833b cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate
For convenience of next patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:56:18 +03:00
Nadav Har'El
79011eeb24 Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity
Both virtual tables and schema registry contain thread_local caches that are destroyed
at thread exit. after a Seastar change[1], these destructions can happen after the reactor
is destroyed, triggering a use-after-free.

Fix by scoping the destruction so it takes place earlier.

[1] 101b245ed7

Closes scylladb/scylladb#16510

* github.com:scylladb/scylladb:
  schema_registry, database: flush entries when no longer in use
  virtual_tables: scope virtual tables registry in system_keyspace
2023-12-21 17:10:25 +02:00
Avi Kivity
c00b376a3e schema_registry, database: flush entries when no longer in use
The schema registry disarms internal timers when it is destroyed.
This accesses the Seastar reactor. However, after [1] we don't have ordering
between the reactor destruction and the thread_local registry destruction.

Fix this by flushing all entries when the database is destroyed. The
database object is fundamental so it's unlikely we'll have anything
using the registry after it's gone.

[1] 101b245ed7
2023-12-21 17:00:41 +02:00
Michał Chojnowski
d7b524cf10 main: add a call to LLVM profile dump before exit
Scylla skips exit hooks so we have to manually trigger the data dump to disk
from the LLVM profiling instrumentation runtime which we need in order
to support code coverage.
We use a weak symbol to get the address of the profile dump function. This
is legal: the function is a public interface of the instrumentation runtime.

Closes scylladb/scylladb#16430
2023-12-21 16:48:42 +02:00
Avi Kivity
2853f79f96 virtual_tables: scope virtual tables registry in system_keyspace
Virtual tables are kept in a thread_local registry for deduplication
purposes. The problem is that thread_local variables are destroyed late,
possibly after the schema registry and the reactor are destroyed.
Currently this isn't a problem, but after a seastar change to
destroy the reactor after termination [1], things break.

Fix by moving the registry to system_keyspace. system_keyspace was chosen
since it was the birthplace of virtual tables.

Pimpl is used to avoid increasing dependencies.

[1] 101b245ed7
2023-12-21 16:19:42 +02:00
Nadav Har'El
a41140f569 Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes
In other words, print more user-friendly messages, and avoid crashing.
Specifically:
* Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories.
* Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't.
* Add negative tests for schema-loading.

Fixes: https://github.com/scylladb/scylladb/issues/16459

Closes scylladb/scylladb#16494

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add test for failed schema loadig
  tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs
  tools/schema_loader: also check for empty table/column mutations
  tools/schema_loader: log more details when loading schema from schema tables
2023-12-21 15:40:51 +02:00
Kefu Chai
6018e0fea7 database: log when done with truncating
truncating is an unusual operation, and we write a logging message
when the truncate op starts with INFO level, it would be great if
we can have a matching logging messge indicating the end of truncate
on the server side. this would help with investigation the TRUNCATE
timeout spotted on the client. at least we can rule out the problem
happening we server is performing truncate.

Refs #15610
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16247
2023-12-21 13:59:09 +02:00
Raphael S. Carvalho
5e55954f27 replica: Make the storage snapshot survive concurrent compactions
Consider this:
1) file streaming takes storage snapshot = list of sstables
2) concurrent compaction unlink some of those sstables from file system
3) file streaming tries to send unlinked sstables, but files other
than data and index cannot be read as only data and index have file
descriptors opened

To fix it, the snapshot now returns a set of files, one per sstable
component, for each sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16476
2023-12-21 12:50:28 +02:00
Botond Dénes
e6147c1853 Merge 'Some cleanup in compaction group' from Raphael "Raph" Carvalho
Closes scylladb/scylladb#16448

* github.com:scylladb/scylladb:
  replica: Fix indentation
  replica: Kill unused calculate_disk_space_used_for()
2023-12-21 12:48:38 +02:00
Nadav Har'El
a613a3cad2 secondary-index, tablets: ensure that LSI are synchronous
CQL Local Secondary Index is a Scylla-only extension to Cassandra's
secondary index API where the index is separate per partition.
Scylla's documentation guarantees that:

  "As of Scylla Open Source 4.0, updates for local secondary indexes are
   performed synchronously. When updates are synchronous, the client
   acknowledges the write operation only after both the base table
   modification and the view up date are written."

This happened automatically with vnodes, because the base table and the
view have the same partition key, so base and view replicas are co-located,
and the view update is always local and therefore done synchronously.

But with tablets, this does NOT happen automatically - the base and view
tablets may be located on different nodes, and the view update may be
remote, and NOT synchronous.

So in this patch we explicitly mark the view as synchronous_update when
building the view for an LSI.

The bigger part of this patch is to add a test which reliably fails
before this patch, and passes after it. The test creates a two-node
cluster and a table with LSI, and pins the base's tablets to one node
and the view's to the second node, forcing the view updates to be
remote. It also uses an injection point to make the view update slower.
The test then writes to the base and immediately tries to use the index
to read. Before this patch, the read doesn't find the new data (contrary
to the guarantee in the documentation). After this patch, the read
does find the new data - because the write waited for the index to
be updated.

Fixes #16371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
7c5092cb8f test: add missing "tags" schema extension to cql_test_env
One of the unfortunate anti-features of cql_test_env (the framework used
in our CQL tests that are written in C++) is that it needs to repeat
various bizarre initializations steps done in main.cc, otherwise various
requests work incorrectly. One of these steps that main.cc is to initialize
various "schema extensions" which some of the Scylla features need to work
correctly.

We remembered to initialize some schema extensions in cql_test_env, but
forgot others. The one I will need in the following patch is the "tags"
extension, which we need to mark materialized views used by local
secondary indexes as "synchronous_updates" - without this patch the LSI
tests in secondary_index_test.cc will crash.

In addition to adding the missing extension, this patch also replaces
the segmentation-fault crash when it's missing (caused by a dynamic
cast failure) by a clearer on_internal_error() - so if we ever have
this bug again, it will be easier to debug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
b815aa021c mv, test: fix delay_before_remote_view_update injection point
The "delay_before_remote_view_update" is a recently-added injection
point which should add a delay before remove view updates, but NOT
force the writer to wait for it (whether the writer waits for it or
not depends on whether the view is configured as synchronous or not).

Unfortunately, the delay was added at the WRONG place, which caused
it to sometimes be done even on asynchronous views, breaking (with
false-negative) the tests that need this delay to reproduce bugs of
missing synchronous updates (Refs #16371).

The fix here is even simpler then the (wrong) old code - we just add
the sleep to the existing function apply_to_remote_endpoints() instead
of making the caller even more complex.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
8181e28731 secondary index: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

Unfortunately, We have the same problem when creating a secondary index,
because it does not use prepare_new_view_announcement(), and instead uses
a generic function to "update" the base table, which in some cases ends
up creating new views when a new index is requested. In this path, the
notifier did not get called to the notifier, so we must add it here too.
Unfortunately, the notifiers must run in a Seastar thread, which means
that yet another function now needs to run in a Seastar thread.

Before this patch, creating a secondary index in a table using tablets
fails with "Tablet map not found for table <uuid>". With this patch,
it works.

The patch also includes tests for creating a regular and local secondary
index. Both tests fail (with the aforementioned error) before this
patch, and pass with it.

Fixes #16396

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Raphael S. Carvalho
ee203f846e test: Fix segfault when running offstrategy test
Observer, that references table_for_test, must of course, not
outlive table_for_test. Observer can be called later after the
last input sstable is removed from sstable manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16428
2023-12-20 19:04:41 +02:00
David Garcia
9af6c7e40b docs: add myst parser
Closes scylladb/scylladb#16316
2023-12-20 19:04:41 +02:00
Raphael S. Carvalho
d1e6dfadea sstables: Harden estimate_droppable_tombstone_ratio() interface
The interface is fragile because the user may incorrectly use the
wrong "gc before". Given that sstable knows how to properly calculate
"gc before", let's do it in estimate__d__t__r(), leaving no room
for mistakes.

sstable_run's variant was also changed to conform to new interface,
allowing ICS to properly estimate droppable ratio, using GC before
that is calculated using each sstable's range. That's important for
upcoming tablets, as we want to query only the range that belongs
to a particular tablet in the repair history table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15931
2023-12-20 19:04:41 +02:00
Botond Dénes
758d9cf005 Merge 'build: cmake: map 'release' to 'RelWithDebInfo'' from Kefu Chai
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.

* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/

Closes scylladb/scylladb#16479

* github.com:scylladb/scylladb:
  build: cmake: map 'release' to 'RelWithDebInfo'
  build: define BuildType for enclosing build_by_default
2023-12-20 19:04:40 +02:00
Pavel Emelyanov
5866d265c3 Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes
Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation.

To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.

Fixes: #16451

Closes scylladb/scylladb#16456

* github.com:scylladb/scylladb:
  test: add regression tests for invoking tools with no args
  tools/utils: tool_app_template: handle the case of no args
  tools/utils: tool_app_template: remove "scylla-" prefix from app name
2023-12-20 19:04:40 +02:00
Kamil Braun
6fcaec75db Merge 'Add maintenance socket' from Mikołaj Grzebieluch
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.

To set up the maintenance socket, use the `maintenance-socket` option when starting the node.

* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.

The default value is `ignore`.

* With python driver

```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy

socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
                  # Driver tries to connect to other nodes in the cluster, so we need to filter them out.
                  load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```

Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.

Closes scylladb/scylladb#16172

* github.com:scylladb/scylladb:
  test.py: add maintenance socket test
  test.py: enable maintenance socket in tests by default
  docs: add maintenance socket documentation
  main: add maintenance socket
  main: refactor initialization of cql controller and auth service
  auth/service: don't create system_auth keyspace when used by maintenance socket
  cql_controller: maintenance socket: fix indentation
  cql_controller: add option to start maintenance socket
  db/config: add maintenance_socket_enabled bool class
  auth: add maintenance_socket_role_manager
  db/config: add maintenance_socket variable
2023-12-20 19:04:40 +02:00
Botond Dénes
5ef0d16eb3 test/cql-pytest: test_tools.py: add test for failed schema loadig 2023-12-20 10:31:03 -05:00
Botond Dénes
3e0058a594 tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs
The configuration is not guaranteed to have any, so use the safe
variant, to simply abort the schema load attempt, instead of crashing
the tool.
2023-12-20 10:31:03 -05:00
Botond Dénes
208d2e890e tools/schema_loader: also check for empty table/column mutations
system_schema.tables and system_schema.columns must have content for
every existing table. To detect a failed load of a table, before
attempting to invoke `db::schema_tables::create_table_from_mutations()`,
we check for the mutations read from these two tables, to not be
disengaged. There is another failure scenario however. The mutations are
not null, but do not have any clustering rows. This currently results in
a cryptic error message, about failing to lookup a row in a result-set.
This happens when the lookup-up keyspace exists, but the table doesn't.
Add this to the check, so we get a human-readeable error message when
this happens.
2023-12-20 10:31:00 -05:00
Botond Dénes
81e5033902 tools/schema_loader: log more details when loading schema from schema tables
Currently, there is no visibility at all into what happens when
attempting to load schema from schema tables. If it fails, we are left
guessing on what went wrong.
Add a logger and add various debug/trace logs to help following the
process and identify what went wrong.
2023-12-20 10:30:21 -05:00
Nadav Har'El
7ee55dd03e cdc, tablets: don't allow enabling CDC with tablets
We do not yet support enabling CDC in a keyspace that uses tablets
(Refs #16317). But the problem is that today, if this is attempted,
we get a nasty failure: the CDC code creates the extra CDC log table,
it doesn't get tablets, and Raft gets surprised and croaks with a
message like:

    Raft instance is stopped, reason: "background error,
    std::_Nested_exceptionraft::state_machine_error (State machine error at
    raft/server.cc:1230): std::runtime_error (Tablet map not found for
    table 48ca1620-9ea5-11ee-bd7c-22730ed96b85)

After Raft croaks, Scylla never recovers until it is rebooted.

In this patch, we replace this disaster by a graceful error -  a CREATE
TABLE or ALTER TABLE operation with CDC enabled will fail in a clear way,
and allowing Scylla to continue operating normally after this failed request.

This fix is important for allowing us to run tests on Scylla with
tablets, and although CDC tests will fail as expected, they won't
fail the other tests that follow (Refs #16473).

Fixes #16318

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16474
2023-12-20 10:06:34 +01:00
Kamil Braun
ffb6ae917f Merge 'Add support for tablets in Alternator' from Nadav Har'El
The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views)  to work.

After this series support for tablets in Alternator _mostly_ work, but not completely:
1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams").
2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace.

Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight.

Fixes #16203
Fixes #16313

Closes scylladb/scylladb#16353

* github.com:scylladb/scylladb:
  mv, tablets, alternator: test for Alternator LSI with tablets
  mv: coroutinize wait code for remote view updates
  mv, test: add injection point to delay remove view update
  alternator: explicitly request synchronous updates for LSI
  alternator: fix view creation when using tablets
  alternator: add experimental method to create a table with tablets
2023-12-20 10:00:31 +01:00
Kamil Braun
1f6460972b Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec
The observed crash was in the following piece on "cf" access:

        if (*table_is_dropped) {
            sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name());

Fixes #16181

Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific.

Closes scylladb/scylladb#16341

* github.com:scylladb/scylladb:
  test: tablets: Add test case which tests table drop concurrent with migration
  tests: tablets: Do read barrier in get_tablet_replicas()
  streaming: Keep table by shared ptr to avoid crash on table drop
2023-12-20 09:57:06 +01:00
Kefu Chai
db9e314965 treewide: apply codespell to the comments in source code
for less spelling errors in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16408
2023-12-20 10:25:03 +02:00
Kefu Chai
fafe9d9c38 build: cmake: map 'release' to 'RelWithDebInfo'
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.

* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-20 15:07:43 +08:00
Kefu Chai
72dcb2466d build: define BuildType for enclosing build_by_default
in existing `modes` defined in `configure.py`, "release" is mapped to
"RelWithDebInfo". this behavior matches that of seastar's
`configure.py`, where we also map "release" build mode to
"RelWithDebInfo" CMAKE_BUILD_TYPE.

but in scylladb's existing cmake settings, it maps "release" to
"Release", despite "Release" is listed as one of the typical
CMAKE_BUILD_TYPE values.

so, in this change, to prepare for the mapping, `BuildType` is
introduced to map a build mode to its related settings. the
building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`,
but the other settings, like if a build type should be enabled or
its mappings, are stored in `BuildType` in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-20 15:07:43 +08:00
Nadav Har'El
2e031f2d8e mv, tablets, alternator: test for Alternator LSI with tablets
This patch adds a test (in the topology test framework) for issue #16313 -
the bug where Alternator LSI must use synchronous view updates but didn't.
This test fails with high probability (around 50%) before the previous patch,
which fixed this bug - and passes consistently after the patch (I ran it
100 times and it didn't fail even once).

This is the first test in the topology framework that uses the DynamoDB
API and not CQL. This required a couple of tiny convenience functions,
which are introduced in the only test file that uses them - but if we
want we can later move them out to a library file.

Unfortunately, the standard AWS SDK for Python - boto3 - is *not*
asynchronous, so this test is also not really asynchronous, and will
block the event loop while making requests to Alternator. However,
for now it doesn't matter (we do NOT run multiple tests in the same
event loop), and if it ever matters, I mentioned a couple of options
what we can do in a comment.

Because this test uses a 10-node cluster, it is skipped in debug-mode
runs. In a later patch we will replace it by a more efficent - and
more reliable - 2-node test.

Refs #16313

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-19 15:41:15 +02:00
Avi Kivity
15acceb69f Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund
Fixes #16312

This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).

(First patch makes the test coroutinized to make for a nicer, easier fix change.

Closes scylladb/scylladb#16368

* github.com:scylladb/scylladb:
  commitlog_test::test_commitlog_reader: handle segment_truncation
  commitlog_test: coroutinize test_commitlog_reader
2023-12-19 15:33:38 +02:00
Botond Dénes
6abdced7b9 test: add regression tests for invoking tools with no args
This was recently found to produce a crash. Add a simple regression
test, to make sure future changes don't re-introduce problems with this
rarely used code-path.
2023-12-19 04:08:48 -05:00
Botond Dénes
76492407ab tools/utils: tool_app_template: handle the case of no args
Currently, tool_app_template::run_async() crashes when invoked with
empty argv (with just argv[0] populated). This can happen if the tool
app is invoked without any further args, e.g. just invoking `scylla
nodetool`. The crash happens because unconditional dereferencing of
argv[1] to get the current operation.
To fix, add an early-exit for this case, just printing a usage message
and exiting with exit code 2.
2023-12-19 04:08:33 -05:00
Botond Dénes
975c11a54b tools/utils: tool_app_template: remove "scylla-" prefix from app name
In other words, have all tools pass their name without the "scylla-"
prefix to `tool_app_template::config::name`. E.g., replace
"scylla-nodetool" with just "nodetool".
Patch all usages to re-add the prefix if needed.

The app name is just more flexible this way, some users might want the
name without the "scylla-" prefix (in the next patch).
2023-12-19 04:04:57 -05:00
Botond Dénes
ce317d50bc bytes.hh: correct spelling of delimiter and delimited
Pointed out by the new spellcheck workflow.

Closes scylladb/scylladb#16450
2023-12-18 20:46:21 +02:00
Mikołaj Grzebieluch
ef10b497e1 test.py: add maintenance socket test
Test that when connecting to the maintenance socket, the user has superuser permissions,
even if the authentication is enabled on the regular port.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
e327478bb5 test.py: enable maintenance socket in tests by default 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
21b3ba4927 docs: add maintenance socket documentation 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
f96d30c2b5 main: add maintenance socket
Add initialization of maintenance_auth_service and cql_maintenance_server_ctl.

Create maintenance socket which enables interaction with the node through
CQL protocol without authentication. The maintenance port is available
by Unix domain socket. It gives full-permission access.
It is created before the node joins the cluster.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
16ab2c28e4 main: refactor initialization of cql controller and auth service
Move initialization of cql controller and auth service to functions.
It will make it easier to create a new cql controller with a seperate auth service,
for example for the maintenance socket.

Make it possible to initialize new services before joining group0.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
999be1d14b auth/service: don't create system_auth keyspace when used by maintenance socket
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.

The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.

The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
2b9a88d17a cql_controller: maintenance socket: fix indentation 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
ac61d0f695 cql_controller: add option to start maintenance socket
Add an option to listen on the maintenance socket. It is set up on an unix domain socket
and the metrics are disabled.
This enables having an independent authentication mechanism for this socket.

To start the maintenance socket, a new cql_controller has to be created
with
`db::maintenance_socket_enabled::yes` argument.

Creating maintenance socket will raise an exception if
* the path is longer than 107 chars (due to linux limits),
* a file or a directory already exists in the path.

The indentation is fixed in the next commit.
2023-12-18 17:58:13 +01:00
Tomasz Grabiec
84ea8b32b2 test: tablets: Restart cluster in a graceful manner to avoid connection drop in the middle of request serving
After restarting each node, we should wait for other nodes to notice
the node is UP before restarting the next server. Otherwise, the next
node we restart may not send the shutdown notification to the
previously restarted node, if it still sees it as down when we
initiate its shutdown. In this case, the node will learn about the
restart from gossip later, possible when we already started CQL
requests. When a node learns that some node restarted while it
considers it as UP, it will close connections to that node. This will
fail RPC sent to that node, which will cause CQL request to time-out.

Fixes #14746

Closes scylladb/scylladb#16010
2023-12-18 16:22:02 +01:00
Raphael S. Carvalho
63e4d6c965 test: Enable debug compaction logging for sstable_compaction_test
It will make it easier to understand obscure issues like
https://github.com/scylladb/scylladb/issues/13280.

Refs #13280.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16426
2023-12-18 16:57:46 +03:00
Kefu Chai
db16048761 test/pylib: avoid using asyncio.get_event_loop()
asyncio.get_event_loop() returns the current event loop. but if there
is not, the result of `get_event_loop_policy().get_event_loop()` is
returned. but this behavior is deprecated since Python 3.12, so let's
use asyncio.run() as recommended by
https://docs.python.org/3/library/asyncio-eventloop.html.
asyncio.run() was introduced by Python 3.7, so we should be able to
use it.

this change should silence the waring when running this script
as a stand-alone script with Python 3.12.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16385
2023-12-18 16:47:31 +03:00
Raphael S. Carvalho
5fa69b8a67 replica: Fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-18 10:23:22 -03:00
Raphael S. Carvalho
8a9784d29c replica: Kill unused calculate_disk_space_used_for()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-18 10:22:19 -03:00
Avi Kivity
cd88f9eb76 Update tools/java submodule (native nodetool)
* tools/java 3963c3abf7...b7ebfd38ef (1):
  > Merge 'Add nodetool interposer script' from Botond Dénes
2023-12-18 14:50:25 +02:00
Mikołaj Grzebieluch
cf43787295 db/config: add maintenance_socket_enabled bool class 2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch
11a2748d7f auth: add maintenance_socket_role_manager
Add `maintenance_socket_role_manager` which will disable all operations
associated with roles to not depend on system_auth keyspace, which may
be not yet created when the maintenance socket starts listening
2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch
e682e362a3 db/config: add maintenance_socket variable
If set to "ignore", maintenance socket will be disabled.
If set to "workdir", maintenance socket will be opened on <scylla's
workdir>/cql.m.
Otherwise it will be opened on path provided by maintenance_socket
variable.

It is set by default to 'ignore'.
2023-12-18 11:42:05 +01:00
Kamil Braun
3b108f2e31 Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.

Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.

Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.

We also update documentation by removing information irrelevant in 5.5.

Fixes scylladb/scylladb#15854

Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.

Closes scylladb/scylladb#16334

* github.com:scylladb/scylladb:
  docs: update after making consistent_cluster_management mandatory
  system_keyspace, main, cql_test_env: fix indendations
  db: config: make consistent_cluster_management mandatory
  test: boost: schema_change_test: replace disable_raft_schema_config
  db: config: make override_decommission deprecated
  db: config: make force_schema_commit_log deprecated
2023-12-18 09:44:52 +01:00
Botond Dénes
a6200e99e6 Merge 'Handle S3 partial read overflows' from Pavel Emelyanov
The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this:

```
    uint64_t len = random(1, chunk_size);
    uint64_t offset = random(file_size) - len;
```

The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value.

Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is

```
    first_byte = offset;
    last_byte = offset + len - 1;
```

Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object.

But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %)

fixes: #16133

Closes scylladb/scylladb#16324

* github.com:scylladb/scylladb:
  test/s3: Avoid object range overflow
  s3/client: Handle GET-with-Range overflows correctly
2023-12-18 10:00:32 +02:00
Avi Kivity
081f30d149 Merge 'Add support to tablet storage splitting' from Raphael "Raph" Carvalho
Support for splitting tablet storage is added.
Until now, tablet storage was composed of a single compaction group, i.e. a group of sstables eligible to be compacted together.

For splitting, tablet storage can now be composed of multiple compaction groups, main, left and right.

Main group stores sstables that require splitting, whereas left and right groups store sstables that were already split according to the tablet's token range.

After table storage is put in splitting mode, new writes will only go to either left or right group, depending on the token.

When all main groups completed splitting their sstables, then coordinator can proceed with tablet metadata changes.
The coordination part is not implemented yet. Only the storage part. The former will come next and will be wired into the latter.

Missing:
- splitting monitor (verify whether coordinator asked for splitting and acts accordingly) (will come next)

Closes scylladb/scylladb#16158

* github.com:scylladb/scylladb:
  replica: Introduce storage group splitting
  replica: Add storage_group::memtable_count()
  replica: Add compaction_group::empty()
  replica: Rename compaction_group_manager to storage_group_manager
  replica: Introduce concept of storage group
  compaction: Add splitting compaction task to manager
  compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting
  compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor
  replica: Allow uncompacted SSTables to be moved into a new set
  compaction: Add splitting compaction
  flat_mutation_reader: Allow interposer consumers to be stacked
  mutation_writer: Introduce token-group-based mutation segregator
  locator: Introduce tablet_map::get_tablet_id_and_range_side(token)
2023-12-17 21:12:01 +02:00
Nadav Har'El
37b5c03865 mv: coroutinize wait code for remote view updates
In the previous patch we added a delay injection point (for testing)
in the view update code. Because the code was using continuation style,
this resulted in increased indentation and ugly repetition of captures.

So in this patch we coroutinize the code that waits for remote view
updates, making it simpler, shorter, and less indented.

Note that this function still uses continuations in one place:
The remote view update is still composed of two steps that need
to happen one after another, but we don't necessarily need to wait
for them to happen. This is easiest to do with chaining continuations,
and then either waiting or not waiting for the resulting future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:15:08 +02:00
Nadav Har'El
bf6848d277 mv, test: add injection point to delay remove view update
It's difficult to write a test (as we plan to do in to in the next patch)
that verifies that synchronous view updates are indeed synchronous, i.e.,
that write with CL=QUORUM on the base-table write returns only after
CL=QUORUM was also achieved in the view table. The difficulty is that in a
fast test machine, even if the synchronous-view-update is completely buggy,
it's likely that by the time the test reads from the view, all view updates
will have been completed anyway.

So in this patch we introduce an injection point, for testing, named
"delay_before_remote_view_update", which adds a delay before the base
replica sends its update to the remote view replica (in case the view
replica is indeed remote). As usual, this injection point isn't
configurable - when enabled it adds a fixed (0.5 second) delay, on all
view updates on all tables.

The existing code used continuation-style Seastar programming, and the
addition of the injection point in this patch made it even uglier, so
in the next patch we will coroutine-ize this code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:15:08 +02:00
Nadav Har'El
2c0b472f44 alternator: explicitly request synchronous updates for LSI
DynamoDB's *local* secondary index (LSI) allows strongly-consistent
reads from the materialized view, which must be able to read what was
previously written to the base. To support this, we need the view to
use the "synchronous_updates".

Previously, with vnodes, there was no need for using this option
explicitly, because an LSI has the same partition key as the base table
so the base and view replicas are the same, and the local writes are
done synchronously. But with tablets, this changes - there is no longer
a guarantee that the base and view tablets are located on the same node.
So to restore the strong consistency of LSIs when tablets are enabled,
this patch explicitly adds the "synchronous_updates" option to views
created by Alternator LSIs. We do *not* add this option for GSIs - those
do not support strongly-consistent reads.

This fix was tested by a test that will be introduced in the following
patches. The test showed that before this patch, it was possible that
reading with ConsistentRead=True from an LSI right after the base was
written would miss the new changes, but after this patch, it always
sees the new data in the LSI.

Fixes #16313.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:14:59 +02:00
Nadav Har'El
d11f5e9625 alternator: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

We have the same problem in Alternator when creating a view (GSI or LSI).
The Alternator code does not use prepare_new_view_announcement(), and
instead uses the lower-level function add_table_or_view_to_schema_mutation()
so it didn't get the call to the notifier, so we must add it here too.

Before this patch, creating an Alternator table with tablets (which has
become possible after the previous patch) fails with "Tablet map not found
for table <uuid>". With this patch, it works.

A test for materialized views in Alternator will come in a following
patch, and will test everything together - the CreateTable tag to use
tablets (from the previous patch), the LSI/GSI creation (fixed in this patch)
and the correct consistency of the LSI (fixed in the next patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:36 +02:00
Nadav Har'El
8e356d8c31 alternator: add experimental method to create a table with tablets
As explained in issue #16203, we cannot yet enable tablets on Alternator
keyspaces by default, because support for some of the features that
Alternator needs, such as CDC, is not yet available.
Nevertheless, to start testing Alternator integration with tablets,
we want to provide a way to enable tablets in Alternator for tests.

In this patch we add support for a tag, 'experimental:initial_tablets',
which if added on a table during creation, uses tablets for its keyspace.
The value of this tag is a numeric string, and it is exactly analogous
to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy.

We name this tag with the "experimental:" prefix to emphesize that it
is experimental, and the way to enable or disable tablets will probably
change later.

The new tag only has effect when added while *creating* a table.
Adding, deleting or changing it later on an existing table will have
no effect.

A later patch will have tests that use this tag to test Alternator with
tablets.

Refs #16203.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:30 +02:00
Kefu Chai
e436856cf7 token_metadata: pass node id when formatting it
before this change, we use the format string of
"Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended.

in this change, we pass `existing_node`, so it can be formatted, and the
intended error message can be printed in log.

Refs 11a4908683
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16342
2023-12-17 19:54:09 +02:00
Evgeniy Naydanov
10eebe3c66 test: use different IP addresses for listen and RPC addresses
Scylla can be configured to use different IPs for the internode communication
and client connections.  This test allocates and configure unique IP addresses
for the client connections (`rpc_address`) for 2-nodes cluster.

Two scenarios tested:
  1) Change RPC IPs sequentially
  2) Change RPC IPs simultaneously

Closes scylladb/scylladb#15965
2023-12-17 18:00:09 +02:00
Raphael S. Carvalho
546b31846a replica: Introduce storage group splitting
This introduces the ability to split a storage group.
The main compaction group is split into left and right groups.

set_split() is used to set the storage group to splitting mode, which
will create left and right compaction groups. Incoming writes will
now be placed into memtable of either left or right groups.

split() is used to complete the splitting of a group. It only
returns when all preexisting data is split. That means main
compaction group will be empty and all the data will be stored
in either left or right group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 12:02:01 -03:00
Raphael S. Carvalho
3c5b00ea04 replica: Add storage_group::memtable_count()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
e5a9299696 replica: Add compaction_group::empty()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
213b2f1382 replica: Rename compaction_group_manager to storage_group_manager
That's to reflect the fact that the manager now works with
storage groups instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
15de1cdcbc replica: Introduce concept of storage group
Storage group is the storage of tablets. This new concept is helpful
for tablet splitting, where the storage of tablet will be split
in multiple compaction groups, where each can be compacted
independently.

The reason for not going with arena concept is that it added
complexity, and it felt much more elegant to keep compaction
group unchanged which at the end of the day abstracts the concept
of a set of sstables that can be compacted and operated
independently.

When splitting, the storage group for a tablet may therefore own
multiple compaction groups, left, right, and main, where main
keeps the data that needs splitting. When splitting completes,
only left and right compaction groups will be populated.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
dd1a6d6309 compaction: Add splitting compaction task to manager
The task for splitting compaction will run until all sstables
in the main set are split. The only exceptions are shutdown
or user has explicitly asked for abort.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
f87161e556 compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
c96938c49b compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
55bcfba4de replica: Allow uncompacted SSTables to be moved into a new set
With off-strategy, we allow sstables to be moved into a new sstable
set even if they didn't undergo reshape compaction.
That's done by specifying a sstable is present both in input and
output, with the completion desc.

We want to do the same with other compaction types.
Think for example of split compaction: compaction manager may decide
a sstable doesn't need splitting, yet it wants that sstable to be
moved into a new sstable set.

Theoretically, we could introduce new code to do this movement,
but more code means increased maintenance burden and higher chances
of bugs. It makes sense to reuse the compaction completion path,
as we do today with off-strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
b1c5d5dd4e compaction: Add splitting compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:08 -03:00
Raphael S. Carvalho
3dcb800a96 flat_mutation_reader: Allow interposer consumers to be stacked
reader_consumer_v2 being a noncopyable_function imposes a restriction
when stacking one interposer consumer on top of another.

Think for example of a token-based segregator on top of a timestamp
based one.

To achieve that, the interposer consumer creator must be reentrant,
such that the consumer can be created on each "channel", but today
the creator becomes unusable after first usage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Raphael S. Carvalho
c8668b90e3 mutation_writer: Introduce token-group-based mutation segregator
Token group is an abstraction that allows us to easily segregate a
mutation stream into buckets. Groups share the same properties as
compaction groups. Groups follow the ring order and they don't
overlap each other. Groups are defined according to a classifier,
which return an id given a token. It's expected that classifier
return ids in monotonic increasing order.

The reasons for this abstraction are:
1) we don't want to make segregator aware of compaction groups
2) splitting happens before tablet metadata is changed, so the
the segregator will have to classify based on whether the token
belongs to left (group id 0) or right (group id 1) side of
the range to be split.

The reason for not extending sstable writer instead, is that
today, writer consumer can only tell producer to switch to a
new writer, when consuming the end of a partition, but that
would be too late for us, as we have to decide to move to
a new writer at partition start instead.

It will be wired into compaction when it happens in split mode.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Raphael S. Carvalho
bcbba9a5e3 locator: Introduce tablet_map::get_tablet_id_and_range_side(token)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Kefu Chai
c36945dea2 tasks: include used headers
when compiling with Clang-18 + libstdc++-13, the tree fails to build:
```
/home/kefu/dev/scylladb/tasks/task_manager.hh:45:36: error: no template named 'list' in namespace 'std'
   45 |     using foreign_task_list = std::list<foreign_task_ptr>;
      |                               ~~~~~^
```
so let's include the used header

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16433
2023-12-17 15:28:02 +02:00
Kefu Chai
81d5c4e661 db/system_keyspace: explicitly instantiate used template
future<std::optional<utils::UUID>>
system_keyspace::get_scylla_local_param_as<utils::UUID>(const sstring&)
is used by db/schema_tables.cc. so let's instantiate this template
explicitly.
otherwise we'd have following link failure:

```
: && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 -fuse-ld=lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections CMakeFiles/scylla_version.dir/Release/release.cc.o CMakeFiles/scylla.dir/Release/main.cc.o -o Release/scylla  Release/libscylla-main.a  api/Release/libapi.a  alternator/Release/libalternator.a  db/Release/libdb.a  cdc/Release/libcdc.a  compaction/Release/libcompaction.a  cql3/Release/libcql3.a  data_dictionary/Release/libdata_dictionary.a  gms/Release/libgms.a  index/Release/libindex.a  lang/Release/liblang.a  message/Release/libmessage.a  mutation/Release/libmutation.a  mutation_writer/Release/libmutation_writer.a  raft/Release/libraft.a  readers/Release/libreaders.a  redis/Release/libredis.a  repair/Release/librepair.a  replica/Release/libreplica.a  schema/Release/libschema.a  service/Release/libservice.a  sstables/Release/libsstables.a  streaming/Release/libstreaming.a  test/perf/Release/libtest-perf.a  thrift/Release/libthrift.a  tools/Release/libtools.a  transport/Release/libtransport.a  types/Release/libtypes.a  utils/Release/libutils.a  seastar/Release/libseastar.a  /usr/lib64/libboost_program_options.so.1.81.0  test/lib/Release/libtest-lib.a  Release/libscylla-main.a  -Xlinker --push-state -Xlinker --whole-archive  auth/Release/libscylla_auth.a  -Xlinker --pop-state  /usr/lib64/libcrypt.so  cdc/Release/libcdc.a  compaction/Release/libcompaction.a  mutation_writer/Release/libmutation_writer.a  -Xlinker --push-state -Xlinker --whole-archive  dht/Release/libscylla_dht.a  -Xlinker --pop-state  index/Release/libindex.a  -Xlinker --push-state -Xlinker --whole-archive  locator/Release/libscylla_locator.a  -Xlinker --pop-state  message/Release/libmessage.a  gms/Release/libgms.a  sstables/Release/libsstables.a  readers/Release/libreaders.a  schema/Release/libschema.a  -Xlinker --push-state -Xlinker --whole-archive  tracing/Release/libscylla_tracing.a  -Xlinker --pop-state  service/Release/libservice.a  node_ops/Release/libnode_ops.a  service/Release/libservice.a  node_ops/Release/libnode_ops.a  raft/Release/libraft.a  repair/Release/librepair.a  streaming/Release/libstreaming.a  replica/Release/libreplica.a  /usr/lib64/libabsl_raw_hash_set.so.2308.0.0  /usr/lib64/libabsl_hash.so.2308.0.0  /usr/lib64/libabsl_city.so.2308.0.0  /usr/lib64/libabsl_bad_variant_access.so.2308.0.0  /usr/lib64/libabsl_low_level_hash.so.2308.0.0  /usr/lib64/libabsl_bad_optional_access.so.2308.0.0  /usr/lib64/libabsl_hashtablez_sampler.so.2308.0.0  /usr/lib64/libabsl_exponential_biased.so.2308.0.0  /usr/lib64/libabsl_synchronization.so.2308.0.0  /usr/lib64/libabsl_graphcycles_internal.so.2308.0.0  /usr/lib64/libabsl_kernel_timeout_internal.so.2308.0.0  /usr/lib64/libabsl_stacktrace.so.2308.0.0  /usr/lib64/libabsl_symbolize.so.2308.0.0  /usr/lib64/libabsl_malloc_internal.so.2308.0.0  /usr/lib64/libabsl_debugging_internal.so.2308.0.0  /usr/lib64/libabsl_demangle_internal.so.2308.0.0  /usr/lib64/libabsl_time.so.2308.0.0  /usr/lib64/libabsl_strings.so.2308.0.0  /usr/lib64/libabsl_int128.so.2308.0.0  /usr/lib64/libabsl_strings_internal.so.2308.0.0  /usr/lib64/libabsl_string_view.so.2308.0.0  /usr/lib64/libabsl_throw_delegate.so.2308.0.0  /usr/lib64/libabsl_base.so.2308.0.0  /usr/lib64/libabsl_spinlock_wait.so.2308.0.0  /usr/lib64/libabsl_civil_time.so.2308.0.0  /usr/lib64/libabsl_time_zone.so.2308.0.0  /usr/lib64/libabsl_raw_logging_internal.so.2308.0.0  /usr/lib64/libabsl_log_severity.so.2308.0.0  -lsystemd  /usr/lib64/libz.so  /usr/lib64/libdeflate.so  types/Release/libtypes.a  utils/Release/libutils.a  /usr/lib64/libcryptopp.so  /usr/lib64/libboost_regex.so.1.81.0  /usr/lib64/libicui18n.so  /usr/lib64/libicuuc.so  /usr/lib64/libboost_unit_test_framework.so.1.81.0  seastar/Release/libseastar_perf_testing.a  /usr/lib64/libjsoncpp.so.1.9.5  interface/Release/libinterface.a  /usr/lib64/libthrift.so  db/Release/libdb.a  data_dictionary/Release/libdata_dictionary.a  cql3/Release/libcql3.a  transport/Release/libtransport.a  cql3/Release/libcql3.a  transport/Release/libtransport.a  lang/Release/liblang.a  /usr/lib64/liblua-5.4.so  -lm  rust/Release/libwasmtime_bindings.a  rust/librust_combined.a  /usr/lib64/libsnappy.so.1.1.10  mutation/Release/libmutation.a  seastar/Release/libseastar.a  /usr/lib64/libboost_program_options.so  /usr/lib64/libboost_thread.so  /usr/lib64/libboost_chrono.so  /usr/lib64/libboost_atomic.so  /usr/lib64/libcares.so  /usr/lib64/libcryptopp.so  /usr/lib64/libfmt.so.10.0.0  /usr/lib64/liblz4.so  -ldl  /usr/lib64/libgnutls.so  -latomic  /usr/lib64/libsctp.so  /usr/lib64/libyaml-cpp.so  /usr/lib64/libhwloc.so  //usr/lib64/liburing.so  /usr/lib64/libnuma.so  /usr/lib64/libxxhash.so && :
ld.lld: error: undefined symbol: seastar::future<std::optional<utils::UUID>> db::system_keyspace::get_scylla_local_param_as<utils::UUID>(seastar::basic_sstring<char, unsigned int, 15u, true> const&)
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()()) in archive db/Release/libdb.a
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::recalculate_schema_version(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&)::$_0::operator()() const) in archive db/Release/libdb.a
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()() (.resume)) in archive db/Release/libdb.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

it seems that, without the explicit instantiation, clang-18
just inlines the body of the instantiated template function at the
caller site.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16434
2023-12-17 15:12:05 +02:00
Wojciech Mitros
629ea63922 rust: update dependencies
The currently used versions of "time" and "rustix" depencies
had minor security vulnerabilities.
In this patch:
- the "rustix" crate is updated
- the "chrono" crate that we depend on was not compatible
with the version of the "time" crate that had fixes, so
we updated the "chrono" crate, which actually removed the
dependency on "time" completely.
Both updated were performed using "cargo update" on the
relevant package and the corresponding version.

Fixes #15772

Closes scylladb/scylladb#16378
2023-12-17 13:20:25 +02:00
Kefu Chai
10a11c2886 token_metadata: pass node id when formatting it
before this change, we use the format string of
"Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended.

in this change, we pass `existing_node`, so it can be formatted, and the
intended error message can be printed in log.

Refs 11a4908683
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16422
2023-12-15 16:43:44 +01:00
Kefu Chai
273ee36bee tools/scylla-sstable: add scylla sstable shard-of command
when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes #16343
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16345
2023-12-15 11:36:45 +02:00
Kefu Chai
fa3efe6166 .git: use ssh/key or token for auth
enable checkout action to get authenticated if the action need to
clone a non-public repo.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16421
2023-12-15 11:34:50 +02:00
Kamil Braun
6a4106edf3 migration_manager: don't attach empty system.scylla_local mutation in migration request handler
In effb9fb3cb migration request handler
(called when a node requests schema pull) was extended with a
`system.scylla_local` mutation:
```
        cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version());
```

This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is
disabled.

Nevertheless, it turned out to cause problems during upgrades.
The following scenario shows the problem:

We upgrade from 5.2 to enterprise version with the aforementioned patch.
In 5.2, `system.scylla_local` does not use schema commitlog.
After the first node upgrades to the enterprise version, it immediately
on boot creates a new enterprise-only table
(`system_replicated_keys.encrypted_keys`) -- the specific table is not
important, only the fact that a schema change is performed.
This happens before the restarting node notices other nodes being UP, so
the schema change is not immediately pushed to the other nodes.
Instead, soon after boot, the other non-upgraded nodes pull the schema
from the upgraded node.
The upgraded node attaches a `system.scylla_local` mutation to the
vector of returned mutations.
The non-upgraded nodes try to apply this vector of mutations. Because
some of these mutations are for tables that already use schema
commitlog, while the `system.scylla_local` table does not use schema
commitlog, this triggers the following error (even though the mutation
is empty):
```
    Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces
```

Fortunately, the fix is simple -- instead of attaching an empty
mutation, do not attach a mutation at all if the handler of migration
request notices that group0_schema_version is not present.

Note that group0_schema_version is only present if the
GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after
the whole upgrade finishes.

Refs: scylladb/scylladb#16414

Not using "Fixes" because the issue will only be fixed once this PR is
merged to `master` and the commit is cherry-picked onto next-enterprise.

Closes scylladb/scylladb#16416
2023-12-14 22:58:13 +01:00
Avi Kivity
2b8392b8b8 Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes
Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels.
We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not.

Fixes: https://github.com/scylladb/scylladb/issues/16402

Closes scylladb/scylladb#16383

* github.com:scylladb/scylladb:
  database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
  reader_concurrency_semaphore: add register_metrics constructor parameter
  sstables: name sstables_manager
2023-12-14 18:26:24 +02:00
Patryk Jędrzejczak
f23f8628b7 docs: update after making consistent_cluster_management mandatory
We remove Raft documentation irrelevant in 5.5.

One of the changes is removing a part of the "Enabling Raft" section
in raft.rst. Since Raft is mandatory in 5.5, the only way to enable
it in this version is by performing a rolling upgrade from 5.4. We
only need to have this case well-documented. In particular, we
remove information that also appears in the upgrade guides like
verifying schema synchronization.

Similarly, we remove a sentence from the "Manual Recovery Procedure"
section in handling-node-failures.rst because it mentions enabling
Raft manually, which is impossible in 5.5.

The rest of the changes are just removing information about
checking or setting consistent_cluster_management, which has become
unused.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
dced4bb924 system_keyspace, main, cql_test_env: fix indendations
Broken in the previous patch.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
7dd7ec8996 test: boost: schema_change_test: replace disable_raft_schema_config
In the following commits, we make consistent cluster management
mandatory. This will make disable_raft_schema_config unusable,
so we need to get rid of it. However, we don't want to remove
tests that use it.

The idea is to use the Raft RECOVERY mode instead of disabling
consistent cluster management directly.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
a54f9052fc db: config: make override_decommission deprecated
The override_decommission option is supported only when
consistent_cluster_management is disabled. In the following commit,
we make consistent_cluster_management mandatory, which makes
overwrite_decommission unusable.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
571db3c983 db: config: make force_schema_commit_log deprecated
In scylladb/scylladb#16254, we made force_schema_commit_log unused.
After this change, if someone passes this option as the command line
argument, the boot fails. This behavior is undesired. We only want
this option to be ignored. We can achieve this effect by making it
deprecated.
2023-12-14 16:53:46 +01:00
Paweł Zakrzewski
5af066578a doc: Offer replication_factor=3 as the default in the examples
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.

Referenced issue: #14290

Closes scylladb/scylladb#15947
2023-12-14 16:14:01 +01:00
Piotr Dulikowski
c0cf3e398a raft_rpc: use compat source location instead of std one
The std::source_location is broken on some versions of clang. In order
to be able to use its functionality in code, seastar defines
seastar::compat::source_location, which is a typedef over
std::source_location if the latter works, or s custom, dummy
implementation if the std type doesn't work. Therefore, sometimes
seastar::compat::source_location == std::source_location, but not
always.

In service/raft/raft_rpc.cc, both std source location and compat source
location are used and std source location sometimes passed as an
argument to compat source location, breaking builds on older toolchains.
Fix this by switching the code there to only use compat source location.

Fixes: scylladb/scylladb#16336

Closes scylladb/scylladb#16337
2023-12-14 16:14:01 +01:00
Kefu Chai
764d1e01da locator: include used headers
* exceptions/exceptions.hh is not used
* std::set is not used, while std::unordered_set is uset

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16406
2023-12-14 16:14:01 +01:00
Kefu Chai
37868e5fdc tools: fix spelling errors in user-facing messages
they are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16409
2023-12-13 21:39:46 +02:00
Kefu Chai
caa0230e5d test/cql-pytest: use raw string when appropriate
we use "\w" to represent a character class in Python. see
https://docs.python.org/3/library/re.html. but "\" should be
escaped as well, CPython accepts "\w" after trying to find
an escaped character of "\."  but failed, and leave "\." as it is.
but it complains.

in this change, we use raw string to avoid escaping "\" in
the regular expression.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16405
2023-12-13 21:14:32 +02:00
Israel Fruchter
514ef48d75 docker: put cqlsh configuration in correct place
since always we were putting cqlsh configuration into `~/.cqlshrc`
acording to commit from 8 years ago [1], this path is deprecated.

until this commit [2], actully remove this path from cqlsh code

as part of moving to scylla-cqlsh, we got [2], and didn't
notice until the first release with it.

this change write the configuration into `~/.casssndra/cqlshrc`
as this is the default place cqlsh is looking.

[1]: 13ea8a6669/bin/cqlsh.py (L264)
[2]: 2024ea4796
Fixes: scylladb/scylladb#16329

Closes scylladb/scylladb#16340
2023-12-13 18:40:52 +02:00
Kamil Braun
26cbd28883 Merge 'token_metadata: switch to host_id' from Petr Gusev
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:
  * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
  * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
  * Go over all the places which read `token_metadata` and switch them to the new version.
  * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.

These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

Closes scylladb/scylladb#15903

* github.com:scylladb/scylladb:
  topology: remove_endpoint: remove inet_address overload
  token_metadata: topology: cleanup add_or_update_endpoint
  token_metadata: add_replacing_endpoint: forbid replacing node with itself
  topology: drop key_kind, host_id is now the primary key
  dc_rack_fn: make it non-template
  token_metadata: drop the template
  shared_token_metadata: switch to the new token_metadata
  gossiper: use new token_metadata
  database: get_token_metadata -> new token_metadata
  erm: switch to the new token_metadata
  storage_service: get_token_metadata -> token_metadata2
  storage_service: get_token_to_endpoint_map: use new token_metadata
  api/token_metadata: switch to new version
  storage_service::on_change: switch to new token_metadata
  cdc: switch to token_metadata2
  calculate_natural_endpoints: fix indentation
  calculate_natural_endpoints: switch to token_metadata2
  storage_service: get_changed_ranges_for_leaving: use new token_metadata
  decommission_with_repair, removenode_with_repair -> new token_metadata
  rebuild_with_repair, replace_with_repair: use new token_metadata
  bootstrap: use new token_metadata
  tablets: switch to token_metadata2
  calculate_effective_replication_map: use new token_metadata
  calculate_natural_endpoints: fix formatting
  abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
  network_topology_strategy_test: update new token_metadata
  storage_service: on_alive: update new token_metadata
  storage_service: handle_state_bootstrap: update new token_metadata
  storage_service: snitch_reconfigured: update new token_metadata
  storage_service: leave_ring: update new token_metadata
  storage_service: node_ops_cmd_handler: update new token_metadata
  storage_service: node_ops_cmd_handler: add coordinator_host_id
  storage_service: bootstrap: update new token_metadata
  storage_service: join_token_ring: update new token_metadata
  storage_service: excise: update new token_metadata
  storage_service: join_cluster: update new token_metadata
  storage_service: on_remove: update new token_metadata
  storage_service: handle_state_normal: fill new token_metadata
  storage_service: topology_state_load: fill new token_metadata
  storage_service: adjust update_topology_change_info to update new token_metadata
  topology: set self host_id on the new topology
  locator::topology: allow being_replaced and replacing nodes to have the same IP
  token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
  token_metadata: get_host_id: exception -> on_internal_error
  token_metadata: add get_all_ips method
  token_metadata: support host_id-based version
  token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
  locator: make dc_rack_fn a template
  locator/topology: add key_kind parameter
  token_metadata: topology_change_info: change field types to token_metadata_ptr
  token_metadata: drop unused method get_endpoint_to_token_map_for_reading
2023-12-13 16:35:52 +01:00
Avi Kivity
7fce057cda database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
reader_concurrency_sempaphore are triplicated: each metrics is registered
for streaming, user, and system classes.

To fix, just move the metrics registration from database to
reader_concurrency_sempaphore, so each reader_concurrency_sempaphore
instantiated will register its metrics (if its creator asked for it).

Adjust the names given to reader_concurrency_sempaphore so we don't
change the labels.

scylla-gdb is adjusted to support the new names.
2023-12-13 09:16:18 -05:00
Nadav Har'El
89d311ec23 tablet, mv: fix doc on implicit synchronous update
The document docs/cql/cql-extensions.md documents Scylla's extension
of *synchronous* view updates, and mentioned a few cases where view
updates are synchronous even if synchronous updates are not requested
explicitly. But with tablets, these statements and examples are no
longer correct - with tablets, base and view tablets may find
themselves migrated to entirely different nodes. So in this patch
we correct the statements that are no longer accurate.

Note that after this patch we still have in this document, and in
other documents, similar promises about CQL *local secondary indexes*.
Either the documentation or the implementation needs to change in
that case too, but we'll do it in a separate patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16369
2023-12-13 14:58:06 +02:00
Botond Dénes
e1b30f50be reader_concurrency_semaphore: add register_metrics constructor parameter
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
2023-12-13 06:25:45 -05:00
Avi Kivity
814f3eb6b5 sstables: name sstables_manager
Soon, the reader_concurrency_semaphore will require a unique
and meaningful name in order to label its metrics. To prepare
for that, name sstable_manager instances. This will be used
to generate a name for sstable_manager's reader_concurrency_semaphore.
2023-12-13 04:40:33 -05:00
Kefu Chai
5ea3af067d .git: add codespell workflow
to identify misspelling in the code.

The GitHub actions in this workflow run codespell when a new pull
request is created targetting master or enterprise branch. Errors
will be annotated in the pull request. A new entry along with the
existing tests like build, unit test and dtest will be added to the
"checks" shown in github PR web UI. one can follow the "Details" to
find the details of the errors.

unfortunately, this check checks all text files unless they
are explicitly skipped, not just the new ones added / changed in the
PR under test. in other words, if there are 42 misspelling
errors in master, and you are adding a new one in your PR,
this workflow shows all of the 43 errors -- both the old
and new ones.

the misspelling in the code hurts the user experience and some
time developer's experience, but the text files under test/cql
can be sensitive to the text, sometimes, a tiny editing could
break the test, so it is added to the skip list.

So far, since there are lots of errors identified by the tool,
before we address all of them, the identified problem are only
annotated,  they are not considered as error. so, they don't
fail the check.

and in this change `only_warn` is set, so the check does not
fail even if there are misspellings. this prevents the distractions
before all problems are addressed. we can remove this setting in
future, once we either fix all the misspellings or add the ignore
words or skip files. but either way, the check is not considered
as blockers for merging the tested PR, even if this check fails --
the check failure is just represented for information purpose, unless
we make it a required in the github settings for the target
branch.

if want to change this, we can configure it in github's Branch
protectionn rule on a per-branch basis, to make this check a
must-pass.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16285
2023-12-13 10:53:09 +02:00
Aleksandra Martyniuk
9b9ea1193c tasks: keep task's children in list
If std::vector is resized its iterators and references may
get invalidated. While task_manager::task::impl::_children's
iterators are avoided throughout the code, references to its
elements are being used.

Since children vector does not need random access to its
elements, change its type to std::list<foreign_task_ptr>, which
iterators and references aren't invalidated on element insertion.

Fixes: #16380.

Closes scylladb/scylladb#16381
2023-12-13 10:47:27 +02:00
Yaniv Kaul
0b0a3ee7fc Typos: fix typos in code
Last batch, hopefully, sing codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16388
2023-12-13 10:45:21 +02:00
Botond Dénes
57f5ac03e1 Merge 'scripts/coverage.py: cleanups' from Kefu Chai
various cleanups in `scripts/coverage.py`. they do not change the behavior of this script in the happy path.

Closes scylladb/scylladb#16399

* github.com:scylladb/scylladb:
  scripts/coverage.py: s/exit/sys.exit/
  scripts/coverage.py: do not inherit Value from argparse.Action
  scripts/coverage.py: use `is not None`
  scripts/coverage.py: correct the formatted string in error message
  scripts/coverage.py: do not use f-string when nothing to format
  scripts/coverage.py: use raw string to avoid escaping "\"
2023-12-13 10:25:44 +02:00
Kefu Chai
1b57ba44eb scripts/coverage.py: s/exit/sys.exit/
the former is supposed to be used in "the interactive interpreter
shell and should not be used in programs.". this function
prints out its argument, and the exit code is 1. so just
print the error message using sys.exit()

see also
https://docs.python.org/3/library/sys.html#sys.exit and
https://docs.python.org/3/library/constants.html#exit

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:50:00 +08:00
Kefu Chai
7600b68d5c scripts/coverage.py: do not inherit Value from argparse.Action
as Value is not an argparse.Action, and it is not passed as the argument
of the "action" parameter. neither does it implement the `__call__`
function. so just derive it from object.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
9c112dacc4 scripts/coverage.py: use is not None
`is not None` is the more idiomatic Python way to check if an
expression evaluates to not None. and it is more readable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
0d15fc57d5 scripts/coverage.py: correct the formatted string in error message
the formatted string should be `basename`. `input_file` is not defined
in that context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
bc94b7bc04 scripts/coverage.py: do not use f-string when nothing to format
there is no string interpolation in this case, so drop the "f" prefix.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
c3c715236d scripts/coverage.py: use raw string to avoid escaping "\"
we use "\." to escape "." in a regular expression. but "\" should
be escaped as well, CPython accepts "\." after trying to find
an escaped character of "\."  but failed, and leave "\." as it is.
but it complains:

```
/home/kefu/dev/scylladb/scripts/coverage.py:107: SyntaxWarning: invalid escape sequence '\.'
  input_file_re_str = f"(.+)\.profraw(\.{__DISTINCT_ID_RE})?"
```

in this change, we use raw string to avoid escaping "\" in
the regular expression.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Tomasz Grabiec
cdc53d0a49 test: tablets: Add test case which tests table drop concurrent with migration 2023-12-13 00:06:56 +01:00
Avi Kivity
1f7c049791 Update tools/java submodule (minor security fixes)
* tools/java 29fe44da84...3963c3abf7 (2):
  > Revert "build: update `guava` dependency"
  > Merge "Update Netty , Guava and Logback dependencies" from Yaron Kaikov

    Ref scylladb/scylla-tools-java#363
    Ref scylladb/scylla-tools-java#364
2023-12-12 22:23:20 +02:00
Avi Kivity
c3d679e31e Merge 'sstables, utils: do not include unused header' from Kefu Chai
do not include unused header

Closes scylladb/scylladb#16386

* github.com:scylladb/scylladb:
  utils: bit_cast: drop unused #includes
  sstables: writer: do not include unused header
2023-12-12 22:22:36 +02:00
Avi Kivity
22b77edef3 Merge 'scylla-nodetool: implement the scrub command' from Botond Dénes
On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented:
* Expose quarantine-mode option of the scrub_keyspace REST API
* Exit with error and print a message, when scrub finishes with abort or validation_errors return code

The command comes with tests and all tests pass with both the new and the current nodetool implementations.

Refs: #15588
Refs: #16208

Closes scylladb/scylladb#16391

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the scrub command
  test/nodetool: rest_api_mock.py: add missing "f" to error message f string
  api: extract scrub_status into its own header
2023-12-12 22:22:35 +02:00
Petr Gusev
9d93a518ac topology: remove_endpoint: remove inet_address overload
The overload was used only in tests.
2023-12-12 23:19:54 +04:00
Petr Gusev
fbf507b1ba token_metadata: topology: cleanup add_or_update_endpoint
Make host_id parameter non-optional and
move it to the beginning of the arguments list.

Delete unused overloads of add_or_update_endpoint.

Delete unused overload of token_metadata::update_topology
with inet_address argument.
2023-12-12 23:19:54 +04:00
Petr Gusev
11a4908683 token_metadata: add_replacing_endpoint: forbid replacing node with itself
This used to work before in replace-with-same-ip scenario, but
with host_id-s it's no longer relevant.

base_token_metadata has been removed from topology_change_info
because the conditions needed for its creation
are no longer met.
2023-12-12 23:19:54 +04:00
Petr Gusev
3b59919a9c topology: drop key_kind, host_id is now the primary key 2023-12-12 23:19:54 +04:00
Petr Gusev
8c551f9104 dc_rack_fn: make it non-template 2023-12-12 23:19:54 +04:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Petr Gusev
c7314aa8e2 gossiper: use new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
e50dbef3e2 database: get_token_metadata -> new token_metadata
database::get_token_metadata() is switched to token_metadata2.

get_all_ips method is added to the host_id-based token_metadata, since
its convenient and will be used in several places. It returns all current
nodes converted to inet_address by means of the topology
contained within token_metadata.

hint_sender::can_send: if the node has already left the
cluster we may not find its host_id. This case is handled
in the same way as if it's not a normal token owner - we
simply send a hint to all replicas.
2023-12-12 23:19:53 +04:00
Petr Gusev
11cc21d0a9 erm: switch to the new token_metadata
In this commit we replace token_metadata with token_metadata2
in the erm interface and field types. To accommodate the change
some of strategy-related methods are also updated.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
309e08e597 storage_service: get_token_metadata -> token_metadata2
In this commit we change the return type of
storage_service::get_token_metadata_ptr() to
token_metadata2_ptr and fix whatever breaks.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f53f34f989 storage_service: get_token_to_endpoint_map: use new token_metadata
The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map
method was used only here. It's inlined in this
commit since it's too specific and incurs the overhead
of creating an intermediate map.
2023-12-12 23:19:53 +04:00
Petr Gusev
0e4c90dca6 api/token_metadata: switch to new version 2023-12-12 23:19:53 +04:00
Petr Gusev
b2d3dc33e2 storage_service::on_change: switch to new token_metadata
The check *ep == endpoint is needed when a node
changes its IP - on_change can be called by the
gossiper for old IP as part of its removal, after
handle_state_normal has already been called for
the new one. Without the check, the
do_update_system_peers_table call overwrites the IP
back to its old value.

Previously token_metadata used endpoint as the key
and the *ep == endpoint condition was followed from the
is_normal_token_owner check. Now with host_id-s we have
an additional layer of indirection, and we need
*ep == endpoint check to get the same end condition.

This case was revealed by the dtest
update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip
2023-12-12 23:19:53 +04:00
Petr Gusev
7eb7863635 cdc: switch to token_metadata2
Change the token_metadata type to token_metadata2 in
the signatures of CDC-related methods in
storage_service and cdc/generation. Use
get_new_strong to get a pointer to the new host_id-based
token_metadata from the inet_address-based one,
living in the shared_token_metadata.

The starting point of the patch is in
storage_service::handle_global_request. We change the
tmptr type to token_metadata2 and propagate the change
down the call chains. This includes token-related methods
of the boot_strapper class.
2023-12-12 23:19:53 +04:00
Petr Gusev
b2fb650098 calculate_natural_endpoints: fix indentation 2023-12-12 23:19:53 +04:00
Petr Gusev
80ccbc0d53 calculate_natural_endpoints: switch to token_metadata2
All usages of calculate_natural_endpoints are migrated,
now we can change its interface to take token_metadata2
instead of token_metadata.
2023-12-12 23:19:53 +04:00
Petr Gusev
933acb0f72 storage_service: get_changed_ranges_for_leaving: use new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
7c7dbe3779 decommission_with_repair, removenode_with_repair -> new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
ef534ac876 rebuild_with_repair, replace_with_repair: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
93263bf9e7 bootstrap: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
d9283bd025 tablets: switch to token_metadata2
locator_topology_test, network_topology_strategy_test and
tablets_test are fully switched to the host_id-based token_metadata,
meaning they no longer populate the old token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f5038f6c72 calculate_effective_replication_map: use new token_metadata
In this commit we switch the function
calculate_effective_replication_map to use the new
token_metadata. We do this by employing our new helper
calculate_natural_ips function. We can't use this helper for
current_endpoints/target_endpoints though,
since in that case we won't add the IP to the
pending_endpoints in the replace-with-same-ip scenario

The token_metadata_test is migrated to host_ids in the same
commit to make it pass. Other tests work because they fill
both versions of the token_metadata, but for this test it was
simpler to just migrate it straight away. The test constructs
the old token_metadata over the new token_metadata,
this means only the get_new() method will work on it. That's
why we also need to switch some other functions
(maybe_remove_node_being_replaced, do_get_natural_endpoints,
get_replication_factor) to the new version in the same commit.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
fe3c543c4e calculate_natural_endpoints: fix formatting 2023-12-12 23:19:53 +04:00
Petr Gusev
d5b4b02b28 abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
We've updated all the places where token_metadata
is mutated, and now we can progress to the next stage
of the refactoring - gradually switching the read
code paths.

The calculate_natural_endpoints function
is at the core of all of them. It decides to what nodes
the given token should be replicated to for the given
token_metadata. It has a lot of usages in various contexts,
we can't switch them all in one commit, so instead we
allowed the function to behave in both ways. If
use_host_id parameter is false, the function uses the provided
token_metadata as is and returns endpoint_set as a result.
If it's true, it uses get_new() on the provided token_metadata
and returns host_id_set as a result.

The scope of the whole refactoring is limited to the erm data
structure, its interface will be kept inet_address based for now.
This means we'll often need to resolve host_ids to inet_address-es
as soon as we got a result from calculated_natural_endpoints.
A new calculate_natural_ips function is added for convenience.
It uses the new token_metadata and immediately resolves
returned host_id-s to inet_address-es.

The auxiliary declarations natural_ep_type, set_type, vector_type,
get_self_id, select_tm are introduced only for the sake of
migration, they will be removed later.
2023-12-12 23:19:53 +04:00
Petr Gusev
1960436d93 network_topology_strategy_test: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
90234861ac storage_service: on_alive: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
5c04a47d6f storage_service: handle_state_bootstrap: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
4e03ba3ede storage_service: snitch_reconfigured: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
0aab20d3fe storage_service: leave_ring: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
278c832285 storage_service: node_ops_cmd_handler: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
1745a1551a storage_service: node_ops_cmd_handler: add coordinator_host_id
We'll need it in the next commits to address to
replacing and bootstrapping nodes by id.

We assume this change will be shipped in 6.0 with upgrade
from 5.4, where host_id already exists in client_info.
We don't support upgrade between non-adjacent versions.
2023-12-12 23:19:48 +04:00
Botond Dénes
47450ae4db tools/scylla-nodetool: implement the scrub command
On top of the capabilities of the java-nodetool command, the following
additional functionalit is implemented:
* Expose quarantine-mode option of the scrub_keyspace REST API
* Exit with error and print a message, when scrub finishes with abort or
  validation_errors return code
2023-12-12 09:39:58 -05:00
Botond Dénes
892683cace test/nodetool: rest_api_mock.py: add missing "f" to error message f string 2023-12-12 09:33:39 -05:00
Botond Dénes
8064d17f78 api: extract scrub_status into its own header
So it can be shared with scylla-nodetool code.
2023-12-12 09:33:39 -05:00
Petr Gusev
2794b14a80 storage_service: bootstrap: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
c20c8c653c storage_service: join_token_ring: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
fde20bddc0 storage_service: excise: update new token_metadata
excise is called from handle_state_left, the endpoint
may have already been removed from tm by then -
test_raft_upgrade_majority_loss fails if we use
unconditional tmptr->get_new()->get_host_id
instead of get_host_id_if_known
2023-12-12 17:27:25 +04:00
Petr Gusev
23811486d8 storage_service: join_cluster: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
711aaa0e29 storage_service: on_remove: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
6412cd64f1 storage_service: handle_state_normal: fill new token_metadata 2023-12-12 17:27:15 +04:00
Kefu Chai
c485644303 utils: bit_cast: drop unused #includes
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-12 21:09:51 +08:00
Kefu Chai
af0ba3d648 sstables: writer: do not include unused header
the helpers in bit_cast.hh are not used, so drop this #include.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-12 21:09:51 +08:00
Tomasz Grabiec
9b0d9e7c6b tests: tablets: Do read barrier in get_tablet_replicas()
In order for the call to see all prior changes to group0. Also, we
should query on the host on which we executed the barrier.

I hope this will reduce flakiness observed in CI runs on
https://github.com/scylladb/scylladb/pull/16341 where the expected
tablet replica didn't match the one returned by get_tablet_replica()
after tablet movement, possibly because the node is still behind
group0 changes.
2023-12-12 12:46:39 +01:00
Botond Dénes
493b6bc65f Merge 'Guard tables in compaction tasks' from Benny Halevy
Currently, if a compaction function enters the table
or compaction_group async_gate, we can't stop it
on the table/compaction_group stop path as they co_await
their respective async_gate.close().

This series introduces a table_ptr smart pointer to guards
the table object by entering its async_gate, and
it also defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.

Fixes #16305

Closes scylladb/scylladb#16351

* github.com:scylladb/scylladb:
  compaction: run_on_table: skip compaction also on gate_closed_exception
  compaction: run_on_table: hold table
  table: add table_holder and hold method
  table: stop: allow compactions to be stopped while closing async_gate
2023-12-12 12:50:17 +02:00
Botond Dénes
885a807c71 Merge 'api: storage_service: api for starting async compaction' from Aleksandra Martyniuk
For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api.

Closes scylladb/scylladb#15092

* github.com:scylladb/scylladb:
  test: use async api in test_not_created_compaction_task_abort
  test: test compaction task started asynchronously
  api: tasks: api for starting async compaction
  api: compaction: pass pointer to top level compaction tasks
2023-12-12 12:06:52 +02:00
Asias He
5f20e33e15 api: Reject unsupported http api options for repair
If an option is not supported, reject the request instead of silently
ignoring the unsupported options.

It prevents the user thinks the option is supported but it is ignored by
scylla core.

Fixes #16299

Closes scylladb/scylladb#16300
2023-12-12 09:18:00 +02:00
Benny Halevy
7843025a53 compaction: run_on_table: skip compaction also on gate_closed_exception
Similar to the no_such_column_family error,
gate_closed_exception indicates that the table
is stopped and we should skip compaction on it
gracefully.

Fixes #16305

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:46:37 +02:00
Benny Halevy
92c718c60a compaction: run_on_table: hold table
To ensure the table will not be dropped while
the compaction task is ongoing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:45:59 +02:00
Benny Halevy
cddcf3ad0c table: add table_holder and hold method
A smart pointer that guards the table object
while it's being used by async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:43:49 +02:00
Benny Halevy
c8768f9102 table: stop: allow compactions to be stopped while closing async_gate
To make sure a table object is kept valid throughout the lifetime
of compaction a following patch will enter the table's
_async_gate when the compaction task starts.

This change defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:31:50 +02:00
Anna Stuchlik
ff2457157d doc: add the 5.4-to-5.5 upgrade guide
This commit adds the upgrade guide from version
5.4 to 5.5.
Also, it removes all previous OSS guides not related
to version 5.5.

The guide includes the required Raft-related
information.

NOTE: The content of the guide must be further
verified closer to the release. I'm making
these updates now to avoid errors and warnings
related to outdated upgrade guides in other PRs,
and to include the Raft information.

Closes scylladb/scylladb#16350
2023-12-11 16:58:43 +01:00
Botond Dénes
3c125891f4 Update ./tools/java submodule
* ./tools/java 26f5f71c...29fe44da (3):
  > tools: catch and print UnsupportedOperationException
  > tools/SSTableMetadataViewer: continue if sstable does not exist
  > throw more informative error when fail to parse sstable generation

Fixes: scylladb/scylla-tools-java#360
2023-12-11 17:08:01 +02:00
Tomasz Grabiec
a33d45f889 streaming: Keep table by shared ptr to avoid crash on table drop
The observed crash was in the following piece on "cf" access:

    if (*table_is_dropped) {
        sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name());

Fixes #16181
2023-12-11 14:58:04 +01:00
Calle Wilund
b34366957e commitlog_test::test_commitlog_reader: handle segment_truncation
Fixes #16312

This test replays a segment before it might be closed or even fully flushed,
thus it can (with the new semantics) generate a segment_truncation exception
if hitting eof earlier than expected. (Note: test does not use pre-allocated
segments).
2023-12-11 11:53:12 +00:00
Calle Wilund
d85c0ea26f commitlog_test: coroutinize test_commitlog_reader
To make it easier to read and modify.
2023-12-11 11:47:48 +00:00
Takuya ASADA
7c38aff368 scylla_swap_setup: fix AttributeError
On dffadabb94 we mistakenly added
"if args.overwrite_unit_file", but the option is comming from unmerged
patch.
So we need to drop this to fix script error.

Fixes #16331

Closes scylladb/scylladb#16358
2023-12-11 13:41:00 +02:00
Tomasz Grabiec
effb9fb3cb Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).

If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).

When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.

We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.

Fixes: #7620
Fixes: #13957

---

This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller.

I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR.

Closes scylladb/scylladb#16242

* github.com:scylladb/scylladb:
  docs: describe group 0 schema versioning in raft docs
  test: add test for group 0 schema versioning
  feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
  schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
  migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
  schema_tables: use schema version from group 0 if present
  migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
  system_keyspace: make `get/set_scylla_local_param` public
  feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
2023-12-11 12:17:57 +01:00
Eliran Sinvani
befd910a06 install-dependencies.sh : Add packages for supporting code coverage
As part of code coverage we need some additional packages in order to
being able to process the code coverage data and being able to provide
some meaningful information in logs.
Here we add the following packages:
fedora packages:
----------------
lcov - A package of utilities to manipulate lcov traces and generate
       coverage html reports

fedora python3 packages:
------------------------
The following packages are added into fedora_packages and not the
python3_packages since we don't need them to be packaged into
scylla-python3 package but we only require them for the build
environment.

python3-unidiff - A python library for working with patch files, this is
                  required in order to generate "patch coverage" reports.
python3-humanfriendly - A python library to format some quantities into
                        a human readable strings (time spans, sizes, etc...)
                        we use it to print meaningful logs that tracks
                        the volume and time it takes to process coverage
                        data so we can better debug and optimize it in the
                        future.
python3-jinja3 - This is a template based generator that will eventually
                 will allow to consolidate and rearrange several reports into one so we
                 can publish a single report "site" for all of the coverage information.
                 For example, include both, coverage report as well as
                 patch report in a tab based site.

pip packages:
-------------
treelib - A tree data structure that supports also pretty printing of
          the tree data. We use it to log the coverage processing steps in
          order to have debugging capabilities in the future.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#16330

[avi: regenerate toolchain]

Closes scylladb/scylladb#16357
2023-12-11 13:12:05 +02:00
Aleksandra Martyniuk
31977a1cde test: use async api in test_not_created_compaction_task_abort 2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk
68f6886d50 test: test compaction task started asynchronously
Check whether task id returned by asynchronous api is correct and
whether tasks of proper type are created.
2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk
b485897704 api: tasks: api for starting async compaction
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
2023-12-11 11:39:33 +01:00
Takuya ASADA
cc90ff1646 scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+
Since we dropped CentOS7 support, now we can switch from
"PermissionsStartsOnly=True" to "ExecStartPre=+".

Fixes scylladb/scylla-enterprise#1067
2023-12-11 19:38:28 +09:00
Takuya ASADA
6f1fff58ba dist: drop legacy control group parameters
Since we dropped CentOS7 support, now we can drop legacy control group
parameters which is deprecated on systemd v252.
2023-12-11 19:38:28 +09:00
Takuya ASADA
dcb5fd6fce scylla-server.slice: Drop workaround for MemorySwapMax=0 bug
It was workaround for https://github.com/systemd/systemd/issues/8363,
but the bug was fixed at
906bdbf5e7
and merged from systemd v239-8.
Since we dropped support CentOS7, now we don't need the workaround
anymore.
2023-12-11 19:38:28 +09:00
Takuya ASADA
6d7cb97645 dist: move AmbientCapabilities to scylla-server.service
Since we dropped support CentOS7, now we always can use AmbientCapabilities
without systemd version check, so we can move it from capabilities.conf
to scylla-server.service.
Although, we still cannnot hardcode CAP_PERFMON since it is too new,
only newer kernel supported this, so keep it on scylla_post_install.sh
2023-12-11 19:38:28 +09:00
Takuya ASADA
1dc4feb68d Revert "scylla_setup: add warning for CentOS7 default kernel"
This reverts commit 85339d1820.
2023-12-11 19:38:28 +09:00
Aleksandra Martyniuk
ceec5577d8 api: compaction: pass pointer to top level compaction tasks
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
2023-12-11 11:36:10 +01:00
Nadav Har'El
12f0007ede Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov
When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead.

Closes scylladb/scylladb#16220

* github.com:scylladb/scylladb:
  database: Do not auto snapshot non-local storages' tables
  database: Simplify snapshot booleans in truncate_table_on_all_shards()
2023-12-11 12:13:48 +02:00
Petr Gusev
b6fbbe28aa storage_service: topology_state_load: fill new token_metadata
For each inet_address-based modification of token_metadata we
make a corresponding host_id-based change in token_metadata->get_new().

The _gossiper.add_saved_endpoint logic is switched to the new token_metadata.
2023-12-11 12:51:34 +04:00
Piotr Dulikowski
e7e1c4e63c storage_service: adjust update_topology_change_info to update new token_metadata
Both versions of the token_metadata need to be updated. For
the new version we provide a dc_rack_fn function which looks
for dc_rack by host_id in topology_state_machine if raft
topology is on. Otherwise, it looks for IP for the given
host_id and falls back to the gossiper-based function
get_dc_rack_for.
2023-12-11 12:51:34 +04:00
Petr Gusev
66c30e4f8e topology: set self host_id on the new topology
With this commit, we begin the next stage of the
refactoring - updating the new version of the token_metadata
in all places where the old version is currently being updated.

In this commit we assign host_id of this node, both in main.cc
and in boost tests.
2023-12-11 12:51:34 +04:00
Petr Gusev
e4253776a1 locator::topology: allow being_replaced and replacing nodes to have the same IP
When we're replacing a node with the same IP address, we want
the following behavior:
  * host_id -> IP mapping should work and return the same IP address for two
  different host_ids - old and new.
  * the IP -> host_id mapping should return the host_id of the old (replaced)
  host.
This variant is most convenient for preserving the current behavior
of the code, especially the functions maybe_remove_node_being_replaced,
erm::get_natural_endpoints_without_node_being_replaced,
erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in
maybe_remove_node_being_replaced and 'replacing' node will be added to
the pending_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
5a1418fdba token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
This commit fixes an inconsistency in method names:
get_host_id and get_host_id_if_known are
(internal_error, returns null), but there was only
one method for the opposite conversion - get_endpoint_for_host_id,
and it returns null. In this commit we change it to on_internal_error
if it can't find the argument and add another method
get_endpoint_for_host_id_if_known which returns null in this case.

We can't use get_endpoint_for_host_id/get_host_id
in host_id_or_endpoint::resolve since it's called
from storage_service::parse_node_list
-> token_metadata::parse_host_id_and_endpoint,
and exceptions are caught and handled in
`storage_service::parse_node_list`.
2023-12-11 12:51:34 +04:00
Petr Gusev
08b47d645a token_metadata: get_host_id: exception -> on_internal_error
It's a bug to use get_host_id on a non-existent endpoint,
so on_internal_error is more appropriate. Also, it's
easier to debug since it provides a backtrace.

If a missing inet_address is expected, get_host_id_if_known
should be used instead. We update one such case in
storage_service::force_remove_completion. Other
usages of get_host_id are correct.
2023-12-11 12:51:34 +04:00
Petr Gusev
39bbe5f457 token_metadata: add get_all_ips method
This is convenient for migrating code that uses
get_all_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
9edf0709e6 token_metadata: support host_id-based version
In this commit we enhance token_metadata with a pointer to the
new host_id-based generic_token_metadata specialisation (token_metadata2).
The idea is that in the following commits we'll go over all token_metadata
modifications and make the corresponding modifications to its new
host_id-based alternative.

The pointer to token_metadata2 is stored in the
generic_token_metadata::_new_value field. The pointer can be
mutable, immutable, or absent altogether (std::monostate).
It's mutable if this generic_token_metadata owns it, meaning
it was created using the generic_token_metadata(config cfg)
constructor. It's immutable if the
generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value);
constructor was used. This means this old token_metadata is a wrapper for
new token_metadata and we can only use the get_new() method on it. The field
_new_value is empty for the new host_id-based token_metadata version.

The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value);
constructor is used for clone methods. We clone both versions,
and we need to pass a cloned token_metadata2 into constructor.

There are two overloads of get_new, for mutable and immutable
generic_token_metadata. Both of them throws an exception if
they can't get the appropriate pointer. There is also a
get_new_strong method, which returns an immutable owning
pointer. This is convenient since a lot of API's want an
owning pointer. We can't make the get_new/get_new_strong API
simpler and use get_new_strong everywhere since it mutate the
original generic_token_metadata by incrementing the reference
counter and this causes raises when it's passed between
shards in replicate_to_all_cores.
2023-12-11 12:51:34 +04:00
Petr Gusev
63f64f3303 token_metadata: make it a template with NodeId=inet_address/host_id
NodeId is used in all internal token_metadata data structures, that
previously used inet_address. We choose topology::key_kind based
on the value of the template parameter.

generic_token_metadata::update_topology overload with host_id
parameter is added to make update_topology_change_info work,
it now uses NodeId as a parameter type.

topology::remove_endpoint(host_id) is added to make
generic_token_metadata::remove_endpoint(NodeId) work.

pending_endpoints_for and endpoints_for_reading are just removed - they
are not used and not implemented. The declarations were left by mistake
from a refactoring in which these methods were moved to erm.

generic_token_metadata_base is extracted to contain declarations, common
to both token_metadata versions.

Templates are explicitly instantiated inside token_metadata.cc, since
implementation part is also a template and it's not exposed to the header.

There are no other behavioral changes in this commit, just syntax
fixes to make token_metadata a template.
2023-12-11 12:51:34 +04:00
Petr Gusev
c9fbe3d377 locator: make dc_rack_fn a template
In the next commits token_metadata will be
made a template with NodeId=inet_address|host_id
parameter. This parameter will be passed to dc_rack_fn
function, so it also should be made a template.
2023-12-11 12:51:33 +04:00
Piotr Dulikowski
5227b71363 locator/topology: add key_kind parameter
For the host_id-based token_metadata we want host_id
to be the main node key, meaning it should be used
in add_or_update_endpoint to find the node to update.
For the inet_address-based token_metadata version
we want to retain the old behaviour during transition period.

In this commit we introduce key_kind parameter and use
key_kind::inet_address in all current topology usages.
Later we'll use key_kind::host_id for the new token_metadata.

In the last commits of the series, when the new token_metadata
version is used everywhere, we will remove key_kind enum.
2023-12-11 12:51:33 +04:00
Petr Gusev
2f137776c3 token_metadata: topology_change_info: change field types to token_metadata_ptr
In subsequent commits we'll need the following api for token_metadata:
  token_metadata(token_metadata2_ptr);
  get_new() -> token_metadata2*
where token_metadata2 is the new version of token_metadata,
based on host_id.

In other words:
* token_metadata knows the new version of itself and returns a pointer
to it through get_new()
* token_metadata can be constructed based solely on the new version,
without its own implementation. In this case the only method we can
use on it is get_new.

This allows to pass token_metadata2 to API's with token_metadata in method
signature, if these APIs are known to only use the get_new method on the
passed token_metadata.

And back to topology_change_info - if we got it from the new token_metadata
we want to be able to construct token_metadata from token_metadata2 contained
in it, and this requires it to be a ptr, not value.
2023-12-11 12:51:33 +04:00
Petr Gusev
f21f23483c token_metadata: drop unused method get_endpoint_to_token_map_for_reading 2023-12-11 12:51:22 +04:00
Alexander Turetskiy
f30b5473ab cql: Reject empty options while altering a keyspace
Reject ALTER KEYSPACE request for NetworkTopologyStrategy when
replication options are missed.

Also reject CREATE KEYSPACE with no replication factor options.
Cassandra has a default_keyspace_rf configuration that may allow such
CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028).

fixes #10036

Closes scylladb/scylladb#16221
2023-12-10 17:44:35 +02:00
Kefu Chai
818343b57d build: build session.cc in CMake building system
this source file was added in d3d83869. so let's update cmake
as well.

sessions_tests was added in the same commit, so add it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16344
2023-12-09 22:14:47 +02:00
Avi Kivity
d62a5fc60b Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes
This PR implements the following new nodetool commands:
* decomission
* rebuild
* removenode
* getlogginglevels
* setlogginglevel
* move
* refresh

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#16348

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the refresh command
  tools/scylla-nodetool: implement the move command
  tools/scylla-nodetool: implement setlogginglevel command
  tools/sclla-sstable: implement the getlogginglevels command
  tools/scylla-nodetool: implement the removenode command
  tools/scylla-nodetool: implement the rebuild command
  tools/scylla-nodetool: implement the decommission command
2023-12-09 21:47:22 +02:00
Pavel Emelyanov
5e69415387 guardrails: Do not validate initial_tablets as replication factor
When checking replication strategy options the code assumes (and it's
stated in the preceeding code comment) that all options are replication
factors. Nowadays it's no longer so, the initial_tablets one is not
replication factor and should be skipped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16335
2023-12-09 15:56:41 +02:00
Kamil Braun
3352d9bccc docs: describe group 0 schema versioning in raft docs 2023-12-08 17:46:31 +01:00
Kamil Braun
30fc36f8d2 test: add test for group 0 schema versioning
Perform schema changes while mixing nodes in RECOVERY mode with nodes in
group 0 mode:
- schema changes originating from RECOVERY node use
  digest-based schema versioning.
- schema changes originating from group 0
  nodes use persisted versions committed through group 0.

Verify that schema versions are in sync after each schema change, and
that each schema change results in a different version.

Also add a simple upgrade test, performing a schema change before we
enable Raft (which also enables the new versioning feature) in the
entire cluster, then once upgrade is finished.

One important upgrade test is missing, which we should add to dtest:
create a cluster in Raft mode but in a Scylla version that doesn't
understand GROUP0_SCHEMA_VERSIONING. Then start upgrading to a version
that has this patchset. Perform schema changes while the cluster is
mixed, both on non-upgraded and on upgraded nodes. Such test is
especially important because we're adding a new column to the
`system.scylla_local` table (which we then redact from the schema
definition when we see that the feature is disabled).
2023-12-08 17:46:31 +01:00
Kamil Braun
7dad31c78f feature_service: enable GROUP0_SCHEMA_VERSIONING in Raft mode
As promised in earlier commits:
Fixes: #7620
Fixes: #13957

Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
2023-12-08 17:46:31 +01:00
Kamil Braun
522540da40 schema_tables: don't delete version cell from scylla_tables mutations from group 0
As explained in the previous commit, we use the new
`committed_by_group0` flag attached to each row of a `scylla_tables`
mutation to decide whether the `version` cell needs to be deleted or
not.

The rest of #13957 is solved by pre-existing code -- if the `version`
column is present in the mutation, we don't calculate a hash for
`schema::version()`, but take the value from the column:

```
table_schema_version schema_mutations::digest(db::schema_features sf)
const {
    if (_scylla_tables) {
        auto rs = query::result_set(*_scylla_tables);
        if (!rs.empty()) {
            auto&& row = rs.row(0);
            auto val = row.get<utils::UUID>("version");
            if (val) {
                return table_schema_version(*val);
            }
        }
    }

    ...
```

The issue will therefore be fixed once we enable
`GROUP0_SCHEMA_VERSIONING`.
2023-12-08 17:46:31 +01:00
Kamil Braun
defcf9915c migration_manager: add committed_by_group0 flag to system.scylla_tables mutations
As described in #13957, when creating or altering a table in group 0
mode, we don't want each node to calculate `schema::version()`s
independently using a hash algorithm. Instead, we want to all nodes to
use a single version for that table, commited by the group 0 command.

There's even a column ready for this in `system.scylla_tables` --
`version`. This column is currently being set for system tables, but
it's not being used for user tables.

Similarly to what we did with global schema version in earlier commits,
the obvious thing to do would be to include a live cell for the `version`
column in the `system.scylla_tables` mutation when we perform the schema
change in Raft mode, and to include a tombstone when performing it
outside of Raft mode, for the RECOVERY case.

But it's not that simple because as it turns out, we're *already*
sending a `version` live cell (and also a tombstone, with timestamp
decremented by 1) in all `system.scylla_tables` mutations. But then we
delete that cell when doing schema merge (which begs the question
why were we sending it in the first place? but I digress):
```
        // We must force recalculation of schema version after the merge, since the resulting
        // schema may be a mix of the old and new schemas.
        delete_schema_version(mutation);
```
the above function removes the `version` cell from the mutation.

So we need another way of distinguishing the cases of schema change
originating from group 0 vs outside group 0 (e.g. RECOVERY).

The method I chose is to extend `system.scylla_tables` with a boolean
column, `committed_by_group0`, and extend schema mutations to set
this column.

In the next commit we'll decide whether or not the `version` cell should
be deleted based on the value of this new column.
2023-12-08 17:46:31 +01:00
Kamil Braun
87b2c8a041 schema_tables: use schema version from group 0 if present
As promised in the previous commit, if we persisted a schema version
through a group 0 command, use it after a schema merge instead of
calculating a digest.

Ref: #7620

The above issue will be fixed once we enable the
`GROUP0_SCHEMA_VERSIONING` feature.
2023-12-08 17:46:31 +01:00
Kamil Braun
3db8ac80cb migration_manager: store group0_schema_version in scylla_local during schema changes
We extend schema mutations with an additional mutation to the
`system.scylla_local` table which:
- in Raft mode, stores a UUID under the `group0_schema_version` key.
- outside Raft mode, stores a tombstone under that key.

As we will see in later commits, nodes will use this after applying
schema mutations. If the key is absent or has a tombstone, they'll
calculate the global schema digest on their own -- using the old way. If
the key is present, they'll take the schema version from there.

The Raft-mode schema version is equal to the group 0 state ID of this
schema command.

The tombstone is necessary for the case of performing a schema change in
RECOVERY mode. It will force a revert to the old digest-based way.

Note that extending schema mutations with a `system.scylla_local`
mutation is possible thanks to earlier commits which moved
`system.scylla_local` to schema commitlog, so all mutations in the
schema mutations vector still go to the same commitlog domain.

Also, since we introduce a replicated tombstone to
`system.scylla_local`, we need to set GC grace to nonzero. We set it to
`schema_gc_grace`, which makes sense given the use case.
2023-12-08 17:45:41 +01:00
Botond Dénes
496459165e tools/scylla-nodetool: implement the refresh command 2023-12-08 08:58:16 -05:00
Botond Dénes
ad148a9dbc tools/scylla-nodetool: implement the move command
In the java nodetool, this command ends up calling an API endpoint which
just throws an exception saying moving tokens is not supported. So in
the native implementation we just throw an exception to the same effect
in scylla-nodetool itself.
2023-12-08 08:29:39 -05:00
Botond Dénes
58d3850da1 tools/scylla-nodetool: implement setlogginglevel command 2023-12-08 08:18:56 -05:00
Botond Dénes
3a8590e1af tools/sclla-sstable: implement the getlogginglevels command 2023-12-08 07:32:45 -05:00
Botond Dénes
c35ed794de tools/scylla-nodetool: implement the removenode command 2023-12-08 07:32:31 -05:00
Botond Dénes
9a484cb145 tools/scylla-nodetool: implement the rebuild command 2023-12-08 07:05:30 -05:00
Botond Dénes
ea62f7c848 tools/scylla-nodetool: implement the decommission command 2023-12-08 06:14:36 -05:00
Kefu Chai
893f319004 sstables: add formatter for index_consume_entry_context_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, in order to enable the code in the header to
access the formatter without being moved down after the full specialization's
definition, we

* move the enum definition out of the class and before the
  class,
* rename the enum's name from state to index_consume_entry_context_state
* define a formatter for index_consume_entry_context_state
* remove its operator<<().

as fmt v10 is able to use `format_as()` as a fallback, the formatter
full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we
will remove it after we start build with fmt v10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16204
2023-12-08 12:45:38 +02:00
Kurashkin Nikita
c071cd92b5 cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements
Modified Cassandra tests to check for Scylla's error messages
Fixes #12474

Closes scylladb/scylladb#15811
2023-12-07 21:25:18 +02:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Avi Kivity
4b1ef00dbb Merge 'File stream for tablet preparation' from Asias He
This series adds preparation patches for file stream tablet implementation in enterprise branch. It minimizes the differences between those two branches.

Closes scylladb/scylladb#16297

* github.com:scylladb/scylladb:
  messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb
  compaction_group_for_token: Handle minimum_token and maximum_token token
  serializer: Add temporary_buffer support
  cql_test_env: Allow messaging_service to start listen
2023-12-07 16:26:22 +02:00
Pavel Emelyanov
3eaadfcd4a database: Do not auto snapshot non-local storages' tables
Snapshotting is not yet supported for those (see #13025) and
auto-snapshot would step on internal error. Skip it and print a warning
into logs

fixes #16078

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:47:12 +03:00
Avi Kivity
ed2a9b8750 Merge 'Commitlog: Fix reading/writing position calculations and allocation size checks' from Calle Wilund
Fixes #16298

The adjusted buffer position calculation in buffer_position(), introduced in https://github.com/scylladb/scylladb/pull/15494
was in fact broken. It calculated (like previously) a "position" based on diff between
underlying buffer size and ostream size() (i.e. avail), then adjusted this according to
sector overhead rules.

However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted.
The two cannot be compared as such, which means the "positions" we get here are borked.

Luckily for us (sarcasm), the position calculation in replayer made a similar error,
in that it adjusts up current position by one sector overhead to much, leading to us
more or less getting the same, erroneous results in both ends.

However, when/iff one needs to adjust the segment file format further, one might very
quickly realize that this does not work well if, say, one needs to be able to safely
read some extra bytes before first chunk in a segment. Conversely, trying to adjust
this also exposes a latent potential error in the skip mechanism, manifesting here.

Issue fixed by keeping track of the initial ostream capacity for segment buffer, and
use this for position calculation, and in the case of replayer, move file pos adjustment
from read_data() to subroutine (shared with skipping), that better takes data stream
position vs. file position adjustment. In implementaion terms, we first inc the
"data stream" pos (i.e. pos in data without overhead), then adjust for overhead.

Also fix replayer::skip, so that we handle the buffer/pos relation correctly now.

Added test for intial entry position, as well as data replay consistency for single
entry_writer paths.

Fixes #16301

The calculation on whether data may be added is based on position vs. size of incoming data.
However, it did not take sector overhead into account, which lead us to writing past allowed
segment end, which in turn also leads to metrics overflows.

Closes scylladb/scylladb#16302

* github.com:scylladb/scylladb:
  commitlog: Fix allocation size check to take sector overhead into account.
  commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart
2023-12-07 12:27:54 +02:00
Pavel Emelyanov
44c076472c database: Simplify snapshot booleans in truncate_table_on_all_shards()
There are three of them in this function -- with_snapshot argument,
auto_snapshot local copy of db::config option and the should_snapshot
local variable that's && of the above two. The code can go with just one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:06:28 +03:00
Botond Dénes
fb9379edf1 test/cql-pytest: test_select_from_mutation_fragments: bump timeout for slow test
The test test_many_partitions is very slow, as it tests a slow scan over
a lot of partitions. This was observed to time out on the slower ARM
machines, making the test flaky. To prevent this, create an
extra-patient cql connection with a 10 minutes timeout for the scan
itself.

Fixes: #16145

Closes scylladb/scylladb#16303
2023-12-07 11:55:53 +02:00
Yaniv Kaul
862909ee4f Typos: fix typos in documentation
Using codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16275
2023-12-07 11:10:17 +02:00
Anna Stuchlik
8b01cb7fb8 doc: set 5.4 as the latest stable version
This commit updates the configuration for
ScyllaDB documentation so that:
- 5.4 is the latest version.
- 5.4 is removed from the list of unstable versions.

It must be merged when ScyllaDB 5.4 is released.

No backport is required.

Closes scylladb/scylladb#16308
2023-12-07 10:04:26 +02:00
Pavel Emelyanov
76705b6ba2 test/s3: Avoid object range overflow
There's a test case the validates uploading sink by getting random
portions of the uploaded object. The portions are generated as

   len = random % chunk_size
   off = random % file_size - len

The latter may apparently render negative value which will translate
into huuuuge 64-bit range offset which, in turn, would result in invalid
http range specifier and getting object part fails with status OK

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:54:54 +03:00
Pavel Emelyanov
3e9309caf4 s3/client: Handle GET-with-Range overflows correctly
The get_object_contiguous() accepts optional range argument in a form of
offset:lengh and then converts it into first_byte:last_byte pair to
satisfy http's Range header range-specifier.

If the lat_byte, which is offset + lenght - 1, overflows 64-bits the
range specifier becomes invalid. According to RFC9110 servers may ignore
invalid ranges if they want to and this is what minio does.

The result is pretty interesting. Since the range is specified, client
expect PartialContent response, but since the range is ignored by server
the result is OK, as if the full object was requested. So instead of
some sane "overflow" error, the get_object_contiguous() fails with
status "success".

The fix is in pre-checking provided ranges and failing early

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:50:55 +03:00
Calle Wilund
dba39b47bd commitlog: Fix allocation size check to take sector overhead into account.
Fixes #16301

The calculation on whether data may be added is based on position vs. size of incoming data.
However, it did not take sector overhead into account, which lead us to writing past allowed
segment end, which in turn also leads to metrics overflows.
2023-12-07 07:36:27 +00:00
Calle Wilund
0d35c96ef4 commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart
Fixes #16298

The adjusted buffer position calculation in buffer_position(), introduced in #15494
was in fact broken. It calculated (like previously) a "position" based on diff between
underlying buffer size and ostream size() (i.e. avail), then adjusted this according to
sector overhead rules.

However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted.
The two cannot be compared as such, which means the "positions" we get here are borked.

Luckily for us (sarcasm), the position calculation in replayer made a similar error,
in that it adjusts up current position by one sector overhead to much, leading to us
more or less getting the same, erroneous results in both ends.

However, when/iff one needs to adjust the segment file format further, one might very
quickly realize that this does not work well if, say, one needs to be able to safely
read some extra bytes before first chunk in a segment. Conversely, trying to adjust
this also exposes a latent potential error in the skip mechanism, manifesting here.

Issue fixed by keeping track of the initial ostream capacity for segment buffer, and
use this for position calculation, and in the case of replayer, move file pos adjustment
from read_data() to subroutine (shared with skipping), that better takes data stream
position vs. file position adjustment. In implementaion terms, we first inc the
"data stream" pos (i.e. pos in data without overhead), then adjust for overhead.

Also fix replayer::skip, so that we handle the buffer/pos relation correctly now.

Added test for intial entry position, as well as data replay consistency for single
entry_writer paths.
2023-12-07 07:36:27 +00:00
Asias He
6beadab9e6 messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb
They will be used to implement file stream for tablet in the future. Reserve
the verb ID.
2023-12-07 14:54:12 +08:00
Asias He
67cfa12c7d compaction_group_for_token: Handle minimum_token and maximum_token token
The following error was seen:

[shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum
token,-6917529027641081857] does not contain token=minimum token

Since minimum_token or maximum_token will not be inside a token range. Skip
the in token range check.
2023-12-07 14:54:12 +08:00
Asias He
974b28a750 serializer: Add temporary_buffer support
It will be used by file stream for tablet.
2023-12-07 09:46:37 +08:00
Asias He
faaf58f62c cql_test_env: Allow messaging_service to start listen
This is needed for rpc calls to work in the tests. With this patch, by
default, messaging_service does not listen as it was before.

This is useful for file stream for tablet test.
2023-12-07 09:46:36 +08:00
Avi Kivity
92d61def57 Merge 'scylla_swap_setup: run error check before allocating swap and increase swap allocation speed' from Takuya ASADA
This patch fixes error check and speed up swap allocation.

Following patches are included:
 - scylla_swap_setup: run error check before allocating swap
   avoid create swapfile before running error check
 - scylla_swap_setup: use fallocate on ext4
   this inclease swap allocation speed on ext4

Closes scylladb/scylladb#12668

* github.com:scylladb/scylladb:
  scylla_swap_setup: use fallocate on ext4
  scylla_swap_setup: run error check before allocating swap
2023-12-06 21:40:10 +02:00
Avi Kivity
55dacb8480 Merge 'Generalize atomic sstables deletion' from Pavel Emelyanov
The current implementation starts in sstables_manager that gets the deletion function from storage which, in turn, should atomically do sst.unlink() over a list of sstables (s3 driver is still not atomic though #13567).

This PR generalizes the atomic deletion inside sstables_manager method and removes the atomic deletor function that nobody liked when it was introduced (#13562)

Closes scylladb/scylladb#16290

* github.com:scylladb/scylladb:
  sstables/storage: Drop atomic deleter
  sstables/storage: Reimplement atomic deletion in sstables_manager
  sstables/storage: Add prepare/complete skaffold for atomic deletion
2023-12-06 19:48:07 +02:00
Tomasz Grabiec
7d0f4c10a2 test: tablets: Add test for failed streaming being fenced away 2023-12-06 18:37:01 +01:00
Tomasz Grabiec
083a0279a9 error_injection: Introduce poll_for_message()
To allow more complex waiting, which involves other exit conditions.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
ce0dc9e940 error_injection: Make is_enabled() public 2023-12-06 18:36:17 +01:00
Tomasz Grabiec
733eb21601 api: Add API to kill connection to a particular host
For testing failure scenarios.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
9dac0febce range_streamer: Do not block topology change barriers around streaming
Streaming was keeping effective_replication_map_ptr around the whole
process, which blocks topology change barriers.

This will inhibit progress of tablet load balancer or concurrent
migrations, resulting in worse performance.

Fix by switching to the most recent erm on sharder
calls. multishard_writer calls shard_of() for each new partition.

A better way would be to switch immediately when topology version
changes, but this is left for later.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
c228f2c940 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
7a59acf248 tablets: Fail gracefully when migrating tablet has no pending replica
Before the patch we SIGSEGV trying to access pending replica in this
case. Fail early instead.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
d1c1b59236 storage_service, api: Add API to disable tablet balancing
Load balancing needs to be disabled before making a series of manual
migrations so that we don't fight with the load balancer.

Also will be used in tests to ensure tablets stick to expected locations.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
1f57d1ea28 storage_service, api: Add API to migrate a tablet
Will be used in tests, or for hot fixes in production.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
31c995332c storage_service, raft topology: Run streaming under session topology guard
Prevents stale streaming operation from running beyond topology
operation they were started in. After the session field is cleared, or
changed to something else, the old topology_guard used by streaming is
interrupted and fenced and the next barrier will join with any
remaining work.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
080169cad6 storage_service, tablets: Use session to guard tablet streaming 2023-12-06 18:36:17 +01:00
Tomasz Grabiec
5381792401 tablets: Add per-tablet session id field to tablet metadata
range_streamer will pick it up when creating topology_guard.

It's materialized in memory only for migrating tablets in
tablet_transition_info.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
fd3c089ccc service: range_streamer: Propagate topology_guard to receivers 2023-12-06 18:36:16 +01:00
Tomasz Grabiec
063095ea50 streaming: Always close the rpc::sink
rpc::sink::~sink aborts if not closed. There is a try/catch clause
which ensures that close() is called, but there was code after sink is
created which is not covered by it. Move sink construction past that
code.
2023-12-06 18:35:41 +01:00
Nadav Har'El
300e549267 tablets, mv: disable self-pairing when tablets are used
A write to a base table can generate one or more writes to a materialized
view. The write to RF base replicas need to cause writes to RF view
replicas. Our MV implementation, based on Cassandra's implementation,
does this via "pairing": Each one of the base replicas involved in this
write sends each view update to exactly one view replica. The function
get_view_natural_endpoint() tells a base replica which of the view
replicas it should send the update to.

The standard pairing is based on the ring order: The first owner of the
base token sends to the first owner of the view token, the second to the
second, and so on. However, the existing code also uses an optimization
we call self-pairing: If a single node is both a base replica and a base
replica, the pairing is modified so this node sends the update to itself.

This patch *disables* the self-pairing optimization in keyspaces that
use tablets:

The self-pairing optimization can cause the pairing to change after
token ranges are moved between nodes, so it can break base-view consistency
in some edge cases, leading to "ghost rows". With tablets, these range
movements become even more frequent - they can happen even if the
cluster doesn't grow.  This is why we want to solve this problem for tablets.

For backward compatibility and to avoid sudden inconsistencies emerging
during upgrades, we decided to continue using the self-pairing optimization
for keyspaces that are *not* using tablets (i.e., using vnoodes).

Currently, we don't introduce a "CREATE MATERIALIZED VIEW" option to
override these defaults - i.e., we don't provide a way to disable
self-pairing with vnodes or to enable them with tablets. We could introduce
such a schema flag later, if we ever want to (and I'm not sure we want to).

It's important to note, that in some cases, this change has implications
on when view updates become synchronous, in the tablets case.
For example:

  * If we have 3 nodes and RF=3, with the self-pairing optimization each
    node is paired with itself, the view update is local, and is
    implicitly synchronous (without requiring a "synchronous_updates"
    flag).
  * In the same setup with tablets, without the self-pairing optimization
    (due to this patch), this is not guaranteed. Some view updates may not
    be synchronous, i.e., the base write will not wait for the view
    write. If the user really wants synchronous updates, they should
    be requested explicitly, with the "synchronous_updates" view option.

Fixes #16260.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16272
2023-12-06 17:11:17 +02:00
Kefu Chai
f483309165 compaction, api: drop unused functions
run_on_existing_tables() is not used at all. and we have two of them.
in this change, let's drop them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16304
2023-12-06 14:31:08 +02:00
Takuya ASADA
f90c10260f scylla_post_install.sh: Add CAP_PERFMON to AmbientCapabilities
Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable
perf_event based stall detector in Seastar.

However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work
because it sets kernel.perf_event_paranoid=4 which disallow all non-root
user access.
(On Debian it kernel.perf_event_paranoid=3)
So we need to configure kernel.perf_event_paranoid=2 on these distros.
see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do

Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not
have this capability.
To enable older kernel environment such as CentOS7, we need to configure
kernel.perf_event_paranoid=1 to allow non-root user access even without
the capability.

Fixes #15743

Closes scylladb/scylladb#16070
2023-12-06 13:53:08 +02:00
Avi Kivity
3e8f37f0a4 Update seastar submodule
* seastar 55a821524d...ae8449e04f (22):
  > Revert "Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai"
  > http/exception: Make unexpected status message more informative
  > docker: bump up to clang {16,17} and gcc {12,13}
  > doc: replace space (0xA0) in unicode with ASCII space (0x20)
  > file: Remove reactor class friendship
  > dpdk: adjust for poller in internal namespace
  > http: make_requests accept optional expected
  > Merge 'future: future_state_base: assert owner shard in debug mode' from Benny Halevy
  > Merge 'Keep pollers in internal/poll.hh' from Pavel Emelyanov
  > sharded: access instance promise only on instance shard
  > test: network_interface_test: add tests for format and parse
  > Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc (v2)
  > reactor: set local_engine after it is fully initialized
  > build: do not error when running into GCC BZ-1017852
  > Merge 'shared_future: make available() immediate after set_value()' from Piotr Dulikowski
  > tls: add format_as(subject_alt_name_type) overload
  > tls: linearize small packets on send
  > shared_future: remove unused #include
  > shared_ptr: add fmt::formatter for shared_ptr types
  > lazy: add fmt::formatter for lazy_eval types
  > Merge 'file: use unbuffered generator in experimental_list_directory()' from Kefu Chai

Closes scylladb/scylladb#16274
2023-12-06 13:24:53 +02:00
Kamil Braun
9b73bff752 docs: raft: mention unavailability for topology changes under quorum loss
Closes scylladb/scylladb#16307
2023-12-06 13:18:28 +02:00
Botond Dénes
56c3515751 Merge 'doc: fix Rust Driver release information' from Anna Stuchlik
This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA.

In addition, it replaces "Scylla" with "ScyllaDB".

Fixes https://github.com/scylladb/scylladb/issues/16178

(nobackport)

Closes scylladb/scylladb#16199

* github.com:scylladb/scylladb:
  doc: remove the "preview" label from Rust driver
  doc: fix Rust Driver release information
2023-12-06 08:59:49 +02:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Avi Kivity
12f160045b Merge 'Get rid of fb_utilities' from Benny Halevy
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.

Closes scylladb/scylladb#16250

* github.com:scylladb/scylladb:
  treewide: get rid of now unused fb_utilities
  tracing: use locator::topology rather than fb_utilities
  streaming: use locator::topology rather than fb_utilities
  raft: use locator::topology/messaging rather than fb_utilities
  storage_service: use locator::topology rather than fb_utilities
  storage_proxy: use locator::topology rather than fb_utilities
  service_level_controller: use locator::topology rather than fb_utilities
  misc_services: use locator::topology rather than fb_utilities
  migration_manager: use messaging rather than fb_utilities
  forward_service: use messaging rather than fb_utilities
  messaging_service: accept broadcast_addr in config rather than via fb_utilities
  messaging_service: move listen_address and port getters inline
  test: manual: modernize message test
  table: use gossiper rather than fb_utilities
  repair: use locator::topology rather than fb_utilities
  dht/range_streamer: use locator::topology rather than fb_utilities
  db/view: use locator::topology rather than fb_utilities
  database: use locator::topology rather than fb_utilities
  db/system_keyspace: use topology via db rather than fb_utilities
  db/system_keyspace: save_local_info: get broadcast addresses from caller
  db/hints/manager: use locator::topology rather than fb_utilities
  db/consistency_level: use locator::topology rather than fb_utilities
  api: use locator::topology rather than fb_utilities
  alternator: ttl: use locator::topology rather than fb_utilities
  gossiper: use locator::topology rather than fb_utilities
  gossiper: add get_this_endpoint_state_ptr
  test: lib: cql_test_env: pass broadcast_address in cql_test_config
  init: get_seeds_from_db_config: accept broadcast_address
  locator: replication strategies: use locator::topology rather than fb_utilities
  locator: topology: add helpers to retrieve this host_id and address
  snitch: pass broadcast_address in snitch_config
  snitch: add optional get_broadcast_address method
  locator: ec2_multi_region_snitch: keep local public address as member
  ec2_multi_region_snitch: reindent load_config
  ec2_multi_region_snitch: coroutinize load_config
  ec2_snitch: reindent load_config
  ec2_snitch: coroutinize load_config
  thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
2023-12-05 19:40:14 +02:00
Eliran Sinvani
d1aaca893c install-dependencies.sh: Complete the pip install logic
install-dependencies.sh includes a list of pip packages that the build
environment requires.
This functionality was added in
729d0feef0, however, the actual use of the
list is missing and instead the `pip install` commands are hard coded
into the logic.

This change complete the transition to pip-packages list.
It includes also modifying the `pip_packages` array to include a
constrain (if needed) for every package.

Fixes #16269

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#16282
2023-12-05 16:35:31 +02:00
Benny Halevy
0bcce35abd treewide: get rid of now unused fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:49 +02:00
Benny Halevy
f8a957898b tracing: use locator::topology rather than fb_utilities
Get my_address via query_processor->proxy and pass it
to all static make_ methods, instead of getting it from
utils::fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:15 +02:00
Benny Halevy
6f7de427f0 streaming: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:12:11 +02:00
Anna Stuchlik
409e20e5ab doc: enabling experimental Raft-managed topology
This commit adds a short paragraph to the Raft
page to explain how to enable consistent
topology updates with Raft - an experimental
feature in version 5.4.

The paragraph should satisfy the requirements
for version 5.4. The Raft page will be
rewritten in the next release when consistent
topology changes with Raft will be GA.

Fixes https://github.com/scylladb/scylladb/issues/15080

Requires backport to branch-5.4.

Closes scylladb/scylladb#16273
2023-12-05 14:49:17 +01:00
Pavel Emelyanov
b9abd504be sstables/storage: Drop atomic deleter
Now the deleter function is not in use and can be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:47:52 +03:00
Pavel Emelyanov
604279f064 sstables/storage: Reimplement atomic deletion in sstables_manager
Right now the atomic deletion is called on manager, but it gets the
actual deletion function from storage and off-loads the deletion to it.
This patch makes the manager fully responsible for the delition by
implemeting the sequence of

    auto ctx = storage.prepare()
    for sst in sstables:
        sst.unlink()
    storage.complate(ctx)

Storage implementations provide the prepare/complete methods. The
filesystem storage does it via deletion log and the s3 storage is still
not atomic :(

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:46:01 +03:00
Pavel Emelyanov
4ecf4c4a6a sstables/storage: Add prepare/complete skaffold for atomic deletion
The atomic deletion is going to look like

    auto ctx = storage.prepare()
    for sst in sstables:
        sst.unlink()
    storage.complate(ctx)

and this patch prepares the class storage for that by extending it with
prepare and complete methods. The opaque ctx object is also here

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:44:13 +03:00
Yaniv Kaul
fef565482c Update unified/build_unified.sh
fix sentence overall
2023-12-05 15:23:38 +02:00
Yaniv Kaul
8f97429b16 Update main.cc
fix sentence overall, not just the typo
2023-12-05 15:21:48 +02:00
Yaniv Kaul
f2b810a16a Update dist/common/scripts/scylla-housekeeping
cobvert -> convert
2023-12-05 15:20:35 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Tomasz Grabiec
0e42fe4c3c storage_service: Introduce concept of a topology_guard
topology_guard is used to track distributed operations started by the
topology change coordinator, e.g. streaming, to make sure that those
operations have no side effects after topology change coordinator
moved to the next migration stage, of a given tablet or of the whole
ring.

topology_guard can be sent over the wire in the form of
frozen_topology_guard. It can be materialized again on the other
side. While in transit, it doesn't block the coordinator barriers. But
if the coordinator moved on, materialization of the guard will
fail. So tracking safety is preserved.

In this patch, the guard implementation is based on tracking work
under global sessions, but the concept is flexible and other
mechanisms can be used without changing user code.
2023-12-05 14:09:35 +01:00
Tomasz Grabiec
d3d83869ce storage_service: Introduce session concept 2023-12-05 14:09:34 +01:00
Tomasz Grabiec
2d4cd9c574 tablets: Fix topology_metadata_guard holding on to the old erm
Since abort callbacks are fired synchronously, we must change the
table's erm before we do that so that the callbacks obtain the new
erm.

Otherwise, we will block barriers.
2023-12-05 14:09:34 +01:00
Tomasz Grabiec
6cd310fc1a docs: Document the topology_guard mechanism 2023-12-05 14:09:34 +01:00
Botond Dénes
5fb0d667cb tools/scylla-sstable: always read scylla.yaml
Currently, scylla.yaml is read conditionally, if either the user
provided `--scylla-yaml-file` command line parameter, or if deducing the
data dir location from the sstable path failed.
We want the scylla.yaml file to be always read, so that when working
with encrypted file (enterprise), scylla-sstable can pick up the
configuration for the encryption.
This patch makes scylla-sstable always attempt to read the scylla-yaml
file, whether the user provided a location for it or not. When not, the
default location is used (also considering the `SCYLLA_CONF` and
`SCYLLA_HOME` environment variables.
Failing to find the scylla.yaml file is not considered an error. The
rational is that the user will discover this if they attempt to do an
operation that requires this anyway.
There is a debug-level log about whether it was successfully read or
not.

Fixes: #16132

Closes scylladb/scylladb#16174
2023-12-05 15:06:29 +02:00
Kefu Chai
2ebdc40b0b docs: add Deprecated to value_status_count
despite that the "value_status_count" is not rendered/used yet,
it'd be better to keep it in sync with the code.

since 5fd30578d7 added
"Deprecated" to `value_status` enum, let's update the sphinx
extension accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16236
2023-12-05 14:52:13 +02:00
Avi Kivity
4498979b14 Merge 'When discarding table's sstables, delete them in one atomic batch' from Pavel Emelyanov
The table::discard_sstables() removes sstables attached to a table. For that it tries to atomically delete _each_ suitable sstable, which is a bit heavyweight -- each atomic deletion operation results in a deletion log file written. This PR deletes all table's sstables in one atomic batch. While at it, the body of the discard_sstables is simplified not to allocate the "pruner" object. The latter is possible after the method had become coroutine

Closes scylladb/scylladb#16202

* github.com:scylladb/scylladb:
  discard_sstables: Atomically delete all sstables
  discard_sstables: Indentation and formatting fix after previous patch
  discard_sstable: Open-code local prune() lambda
  discard_sstables: Do not allocate pruner
2023-12-05 14:17:06 +02:00
Kamil Braun
1763c65662 system_keyspace: make get/set_scylla_local_param public
We'll use it outside `system_keyspace` code in later commit.
2023-12-05 13:03:29 +01:00
Kamil Braun
07984215a3 feature_service: add GROUP0_SCHEMA_VERSIONING feature
This feature, when enabled, will modify how schema versions
are calculated and stored.

- In group 0 mode, schema versions are persisted by the group 0 command
  that performs the schema change, then reused by each node instead of
  being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
  perform a schema change, we revert to the old digest-based way, taking
  into account the possibility of having performed group0-mode schema
  changes (that used persistent versions). As we will see in future
  commits, this will be done by storing additional flags and tombstones
  in system tables.

By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).

For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
2023-12-05 13:03:28 +01:00
Benny Halevy
6c00c9a45d raft: use locator::topology/messaging rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 13:26:46 +02:00
Benny Halevy
b3bede8141 storage_service: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 13:23:27 +02:00
Kamil Braun
52ae6b8738 Merge 'fix shutdown order between group0 and storage service' from Gleb
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. What it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch
series also introduces.

* 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev:
  storage_service: topology coordinator: ignore abort_requested_exception in background fibers
  storage_service: fix de-initialization order between storage service and group0_service
2023-12-05 11:20:52 +01:00
Kefu Chai
e88bd9c5bd gms/inet_address: pass sstring param by std::move()
less overhead this way. the caller of lookup() always passes
a rvalue reference. and seastar::dns::get_host_by_name() actually
moves away from the parameter, so let's pass by std::move() for
slightly better performance, and to match the expectation of
the underlying seastar API.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16280
2023-12-05 12:05:21 +03:00
Benny Halevy
a529097d96 storage_proxy: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:44:13 +02:00
Benny Halevy
0b310c471c service_level_controller: use locator::topology rather than fb_utilities
Expose cql3::query_processor in auth::service
to get to the topology via storage_proxy.replica::database

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:17:47 +02:00
Pavel Emelyanov
9bbbe7a99f discard_sstables: Atomically delete all sstables
When collected sstables are deleted each is passed into
sstables_manager.delete_atomically(). For on-disk sstables this creates
a deletion log for each removed stable, which is quite an overkill. The
atomic deletion callback already accepts vector of shared sstables, so
it's simpler (and a bit faster) to remove them all in a batch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:14:23 +03:00
Pavel Emelyanov
96bc530a57 discard_sstables: Indentation and formatting fix after previous patch
By "formatting" fix I mean -- remove the temporary on-stack references
that were left for the ease of patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Pavel Emelyanov
6d135fea43 discard_sstable: Open-code local prune() lambda
The lambda in question was the struct pruner method and was left there
for the ease of patching. Now, when this lambda is only called once
inside the function it is declared in, it can be open-coded into the
place where it's called

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Pavel Emelyanov
68cb2e66fc discard_sstables: Do not allocate pruner
This allocation remained from the pre-coroutine times of the method. Now
the contents of prumer -- refernce on table, vector and replay_position
can reside on coroutine frame

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Benny Halevy
0e5754adc6 misc_services: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:01:36 +02:00
Benny Halevy
d49d10dbdb migration_manager: use messaging rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:48:33 +02:00
Benny Halevy
860b2d38c6 forward_service: use messaging rather than fb_utilities
Use _forwarder._messaging to get to the broadcast address
rather than the global fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:48:12 +02:00
Benny Halevy
984a576405 messaging_service: accept broadcast_addr in config rather than via fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:46:25 +02:00
Benny Halevy
586f35bb55 messaging_service: move listen_address and port getters inline
And make them const noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:44:41 +02:00
Benny Halevy
eabd4570da test: manual: modernize message test
Basically, make it work (great) again.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:44:26 +02:00
Benny Halevy
f9acc90926 table: use gossiper rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:43:47 +02:00
Benny Halevy
6826d87052 repair: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:09:06 +02:00
Benny Halevy
e1239e63bf dht/range_streamer: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:01:31 +02:00
Benny Halevy
63b556123b db/view: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:55:46 +02:00
Benny Halevy
f40bb7c583 database: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
64145388c9 db/system_keyspace: use topology via db rather than fb_utilities
So not to rely on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4bb4d673c3 db/system_keyspace: save_local_info: get broadcast addresses from caller
So not to rely on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
6e79d647e6 db/hints/manager: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4c20b84680 db/consistency_level: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
e5d3c6741f api: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
03fe674314 alternator: ttl: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
f3e0358563 gossiper: use locator::topology rather than fb_utilities
And add `get_endpoint_state_ptr` for this_node.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
25754f843b gossiper: add get_this_endpoint_state_ptr
Returns this node's endpoint_state_ptr.
With this entry point, the caller doesn't need to
get_broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
21ace44f03 test: lib: cql_test_env: pass broadcast_address in cql_test_config
For getting rid of fb_utilities.

In the future, that could be used to instantiate
multiple scylla node instances.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
3c846d3801 init: get_seeds_from_db_config: accept broadcast_address
Pass the broadcast_address from main to get_seeds_from_db_config
rather than getting it from fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4d461fc788 locator: replication strategies: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
86716b2048 locator: topology: add helpers to retrieve this host_id and address
And respective `is_me()` predicates,
to prepare for getting rid of fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
52412087b7 snitch: pass broadcast_address in snitch_config
To untangle snitch from fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
94fc8e2a9a snitch: add optional get_broadcast_address method
and set broadcast_address / broadcast_rpc_address in main
to remove this dependency of snitch on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
1d0e71308b locator: ec2_multi_region_snitch: keep local public address as member
To be used in the next patch to retrieve the broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
90af71ffa7 ec2_multi_region_snitch: reindent load_config
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
fecb597ad6 ec2_multi_region_snitch: coroutinize load_config
Now that ec2_snitch::load_config is a coroutine
there's no need for a seastar thread here either.

Refs #16241

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
cb7e096a59 ec2_snitch: reindent load_config
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
1c1a048d3f ec2_snitch: coroutinize load_config
Fixes #16241

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:48 +02:00
Benny Halevy
9e1dd78539 thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:48 +02:00
Kefu Chai
50332f796e script/base36-uuid.py: interpret timestamp with Gregorian calendar
UUID v1 uses an epoch derived frmo Gregorian calendar. but
base36-uuid.py interprets the timestamp with the UNIX epoch time.
that's why it prints a UUID like

```console
$ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx
date = 2411-02-16 16:05:52
decimicro_seconds = 0x7ad550
lsb = 0xafe141a195fe0d59
```

even this UUID is generated on nov 30, 2023. so in this change,
we shift the time with the timestamp of UNIX epoch derived from
the Gregorian calendar's day 0. so, after this change, we have:

```console
$ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx
date = 2023-11-30 16:05:52
decimicro_seconds = 0x7ad550
lsb = 0xafe141a195fe0d59
```

see https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.4

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16235
2023-12-05 07:39:34 +02:00
Anna Stuchlik
97244eb68e doc: add metric upgrade info to the 5.4 upgrade
This commit adds the information about metrics
update to the 5.2-to-5.4 upgrade guide.

Fixes https://github.com/scylladb/scylladb/issues/15966

Closes scylladb/scylladb#16161
2023-12-05 07:36:29 +02:00
Kefu Chai
3608d9be97 gms/inet_address: remove unused '#include'
neither <iomanip> nor "utils/to_string.hh" is used in
`gms/inet_address.cc`, so let's remove their "#include"s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16281
2023-12-05 08:30:03 +03:00
Kurashkin Nikita
1438e531f8 cql3: statement_restrictions: cartesian product size error message fix.
This commit fixes:
1.The error message will be specific about what type of keys
exceeds the limit (e.g clustering keys or partition keys).
2.Error message will be more general about what causes it, cartesian product
or simple list.
3.Error message will advise to use --max-partition-key-restrictions-per-query
or --max-clustering-key-restrictions-per-query configuration options to
override current (100) limit.

Fixes #15627

Closes scylladb/scylladb#16226
2023-12-05 07:27:03 +02:00
Kefu Chai
a03be17da7 test/boost/sstable_generation_test: s/LE/LT/ when appropriate
in 7a1fbb38, a new test is added to an existing test for
comparing the UUIDs with different time stamps, but we should tighten
the test a little bit to reflect the intention of the test:

 the timestamp of "2023-11-24 23:41:56" should be less than
 "2023-11-24 23:41:57".

in this change, we replace LE with LT to correct it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16245
2023-12-05 08:25:04 +03:00
Anna Stuchlik
1e80bdb440 doc: fix rollback in the 4.6-to-5.0 upgrade guide
This commit fixes the rollback procedure in
the 4.6-to-5.0 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16155
2023-12-05 07:17:49 +02:00
Anna Stuchlik
52c2698978 doc: fix rollback for RHEL (install) in 5.4
This commit fixes the installation command
in the Rollback section for RHEL/Centos
in the 5.2-5.4 upgrade guide.

It's a follow-up to https://github.com/scylladb/scylladb/pull/16114
where the command was not updated.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4.

Closes scylladb/scylladb#16156
2023-12-05 07:17:14 +02:00
Anna Stuchlik
91cddb606f doc: fix rollback in the 5.1-to-5.2 upgrade guide
This commit fixes the rollback procedure in
the 5.1-to-5.2 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4 and branch-5.2.

Closes scylladb/scylladb#16152
2023-12-05 07:16:44 +02:00
Anna Stuchlik
7ad0b92559 doc: fix rollback in the 5.0-to-5.1 upgrade guide
This commit fixes the rollback procedure in
the 5.0-to-5.1 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16154
2023-12-05 07:15:41 +02:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00
Calle Wilund
75a8be5b87 commitlog.hh: Fix numeric constant for file format version 3 to be actual '3'
Fixes #16277

When the PR for 'tagged pages' was submitted for RFC, it was assumed that PR #12849
(compression) would be committed first. The latter introduced v3 format, and the
format in #12849 (tagged pages) was assumed to have to be bumped to 4.

This ended up not the case, and I missed that the code went in with file format
tag numeric value being '4' (and constant named v3).

While not detrimental, it is confusing, and should be changed asap (before anything
depends on files with the tag applied).

Closes scylladb/scylladb#16278
2023-12-04 21:01:44 +02:00
Calle Wilund
e94070db64 commitlog_test: Add test for commit log replay skip past EOF
Refs #15269

Unit test to check that trying to skip past EOF in a borked segment
will not crash the process. file_data_input_impl asserts iff caller
tries this.
2023-12-04 20:50:42 +02:00
Takuya ASADA
6eb9344cb3 dist: introduce scylla-tune-sched.service to tune kernel scheduler
On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to
tune the scheduler for lower latency.
This is mostly to prevent softirq threads processing tcp and reactor threads
from injecting latency into each other.
However, these parameters are moved to debugfs from linux-5.13+, so we lost
scheduler tuneing on recent kernels.

To support tuning recent kernel, let's add a new service which support
to configure both sysctl and debugfs.
The service named scylla-tune-sched.service
The service will unconditionally enables when installed, on older kernel
it will tune via sysctl, on recent kernel it will tune via debugfs.

Fixes #16077

Closes scylladb/scylladb#16122
2023-12-04 19:29:46 +02:00
Kefu Chai
3ffd8737e4 gms/inet_address: format gms::inet_address via net::inet_address
in 4ea6e06c, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.

in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.

since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.

Refs #16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16267
2023-12-04 19:24:00 +02:00
Kefu Chai
28906725df repair: add formatter for row_level_diff_detect_algorithm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
row_level_diff_detect_algorithm. but its operator<<() is preserved,
as we are still using our homebrew the generic formatter for
std::vector, and this formatter is still using operator<< for formatting
the elements in the vector.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16248
2023-12-04 18:59:52 +02:00
Yaniv Kaul
21cce458d8 test: alternator: fix typo passs instead of pass in test_gsi.py
Fix a typo.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16258
2023-12-04 18:58:31 +02:00
Avi Kivity
c1d0baf11a Merge 'build: add an option to create building system with CMake' from Kefu Chai
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.

in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15916

* github.com:scylladb/scylladb:
  build: cmake: add compatibility target of dev-headers
  build: add an option to use CMake as the build build system
2023-12-04 18:51:24 +02:00
Kefu Chai
3a8a3100af raft: add formatter for raft::logical_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for logical_clock::time_point, as fmt does not
  provide formatter for this time_point, as it is not a part of the
  standard library
* remove operator<<() for logical_clock::time_point, as its soly
  purpose is to generate the corresponding fmt::formatter when
  FMT_DEPRECATED_OSTREAM is defined.
* remove operator<<() for logical_clock::duration, as fmt provides
  a default implementation for formatting
  std::chrono::nanoseconds already, which uses `int64_t` as its rep
  template parameter as well.
* include "fmt/chrono.h" so that the source files including this
  header can have access the formatter without including it by
  themselves, this preserve the existing behavior which we have
  before removal of "operator<<()".

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16263
2023-12-04 18:32:03 +02:00
Nadav Har'El
4505a86f46 tablets, mv: fix base-view pairing to consider base replication map
In the view update code, the function get_view_natural_endpoint()
determines which view replica this base replica should send an update
to. It currently gets the *view* table's replication map (i.e., the map
from view tokens to lists of replicas holding the token), but assumes
that this is also the *base* table's replication map.

This assumption was true with vnodes, but is no longer true with
tablets - the base table's replication map can be completely different
from the view table's. By looking at the wrong mapping,
get_view_natural_endpoint() can believe that this node isn't really
a base-replica and drop the view update. Alternatively, it can think
it is a base replica - but use the wrong base-view pairing and create
base-view inconsistencies.

This patch solves this bug - get_view_natural_endpoint() now gets two
separate replication maps - the base's and the view's. The callers
need to remember what the base table was (in some cases they didn't
care at the point of the call), and pass it to the function call.

This patch also includes a simple test that reproduces the bug, and
confirms it is fixed: The test has a 6-node cluster using tablets
and a base table with RF=1, and writes one row to it. Before this
patch, the code usually gets confused, thinking the base replica
isn't a replica and loses the view update. With this patch, the
view update works.

Fixes #16227.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16228
2023-12-04 16:38:54 +02:00
Avi Kivity
60af2f3cb2 Merge 'New commitlog file format using tagged pages' from Calle Wilund
Prototype implementation of format suggested/requested by @avikivity:

Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content.
When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one.

If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is
written in batch or periodic mode, as well as explicit sync, can mean data loss.

Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones.

Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking).

Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency.

v2:
* Removed entry CRC:s in file format.
* Added docs on format v3
* Added one more test for recycling-truncation

v3:
* Fixed typos in size calc and docs
* Changed sect metadata order
* Explicit iter type

Closes scylladb/scylladb#15494

* github.com:scylladb/scylladb:
  commitlog_test: Add test for replaying large-ish mutation
  commitlog_test: Add additional test for segmnent truncation
  docs: Add docs on commitlog format 3
  commitlog: Remove entry CRC from file format
  commitlog: Implement new format using CRC:ed sectors
  commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
  fragmented_temporary_buffer: Add const iterator access to underlying buffers
  commitlog_replayer: differentiate between truncated file and corrupt entries
2023-12-04 13:31:13 +01:00
Avi Kivity
8fa2e3ad2a Merge 'Remove sstables::remove_by_toc_name()' from Pavel Emelyanov
The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use.

Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it.

Closes scylladb/scylladb#16240

* github.com:scylladb/scylladb:
  sstables: Drop remove_by_name()
  sstables/fs_storage: Wipe by recognized+unrecognized components
  sstable_directory: Enlight deletion log replay
  sstables: Split remove_by_toc_name()
  test: Add test case to validate deletion log work
  sstable_directory: Close dir on exception
  sstable_directory: Fix indentation after previous patch
  sstable_directory: Coroutinize delete_with_pending_deletion_log()
  test: Sstable on_delete() is not necessarily in a thread
  sstable_directory: Split delete_with_pending_deletion_log()
2023-12-03 17:29:34 +02:00
Wojciech Mitros
a8c9451fb2 commitlog: add max disk size api
Currently, the max size of commitlog is obtained either from the
config parameter commitlog_total_space_in_mb or, when the parameter
is -1, from the total memory allocated for Scylla.
To facilitate testing of the behavior of commitlog hard limit,
expose the value of commitlog max_disk_size in a dedicated API.

Closes scylladb/scylladb#16020
2023-12-03 17:16:58 +02:00
Kefu Chai
39b2ee9751 dist/redhat: avoid mixed use of spaces and tabs
rpmlint complains about "mixed-use-of-spaces-and-tabs". and it
does not good in the editor. so let's replace tab with spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16246
2023-12-03 17:11:03 +02:00
Nadav Har'El
59ff27ea4a Merge 'Typos: fix typos in comments' from Yaniv Kaul
Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16257

* github.com:scylladb/scylladb:
  Update service/topology_state_machine.hh
  Update raft/tracker.hh
  Update db/view/view.cc
  Typos: fix typos in comments
2023-12-03 11:23:51 +02:00
Yaniv Kaul
030d421931 Update service/topology_state_machine.hh 2023-12-03 10:08:11 +02:00
Yaniv Kaul
7c4b742583 Update raft/tracker.hh 2023-12-03 10:07:55 +02:00
Yaniv Kaul
2b73793a39 Update db/view/view.cc 2023-12-03 10:07:45 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Kamil Braun
01e54f5b12 Merge 'test: delete topology_raft_disabled suite' from Patryk Jędrzejczak
This PR is a necessary step to fix #15854 -- making consistent
cluster management mandatory on master.

Before making consistent cluster management mandatory, we have
to get rid of all tests that depend on the
`consistent_cluster_management=false` config. These are the tests
in the `topology_raft_disabled` suite.

There's the internal Raft upgrade procedure, which is the bulk of the
upgrade logic. Then, there are two thin "layers" around it that
invoke it underneath: recovery procedure and
enable-raft-in-the-cluster procedure. We're getting rid of the
second one by making Raft always enabled, so we naturally have to
get rid of tests that depend on it. The idea is to replace every
necessary enable-raft-in-the-cluster procedure in these tests with
the recovery procedure. Then, we will still be testing the internal
Raft upgrade procedure in the in-tree tests. The
enable-raft-in-the-cluster procedure is already tested by QA tests,
so we don't need to worry about these changes.

Unfortunately, we cannot adapt `test_raft_upgrade_no_schema`.
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
`consistent_cluster_management: True`,
`force_schema_commit_log: False`
is considered a bad configuration. These changes will make
`test_raft_upgrade_no_schema` unimplementable in the Scylla repo.
Therefore, we remove this test. If we want to keep it, we must
rewrite it as an upgrade dtest.

After making all tests in `topology_raft_disabled` use consistent
cluster management, there is no point in keeping this suite.
Therefore, we delete it and move all the tests to `topology_custom`.

Closes scylladb/scylladb#16192

* github.com:scylladb/scylladb:
  test: delete topology_raft_disabled suite
  test: topology_raft_disabled: move tests to topology_custom suite
  test: topology_raft_disabled: move utils to topology suite
  test: topology_raft_disabled: use consistent cluster management
  test: topology_raft_disabled: add new util functions
  test: topology_raft_disabled: delete test_raft_upgrade_no_schema
2023-12-01 17:11:32 +01:00
Pavel Emelyanov
17fd558df8 sstables: Drop remove_by_name()
It was used by deletion log replay and by storage wipe, now it's unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
4405a625f6 sstables/fs_storage: Wipe by recognized+unrecognized components
Currently wiping fs-backed sstable happens via reading and parsing its
TOC file back. Then the three-step process goes:

- move TOC -> TOC.tmp
- remove components (obtained from TOC.tmp)
- remove TOC.tmp

However, wiping sstable happens in one of two cases -- the sstable was
loaded from the TOC file _or_ sstable had evaluated the needed
components and generated TOC file. With that, the 2nd step can be made
without reading the TOC file, just by looking at all components sitting
on the sstable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
de931702ec sstable_directory: Enlight deletion log replay
Garbage collection of sstables is scattered between two strages -- g.c.
per-se and the regular processing.

The former stage collects deletion logs and for each log found goes
ahead and deletes the full sstable with the standard sequence:

- move TOC -> TOC.tmp
- remove components
- remove TOC.tmp

The latter stage picks up partially unlinked sstables that didn't go via
atomic deletion with the log. This comes as

- collect all components
  - keep TOC's and TOC.tmp's in separate lists
  - attach other components to TOC/TOC.tmp by generation value
- for all TOC.tmp's get all attached components and remove them
- continue loading TOC's with attached components

Said that, replaying deletion log can be as light as just the first step
out of the above sequence -- just move TOC to TOC.tmp. After that the
regular processing would pick the remaining components and clean them

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
5ff5946520 sstables: Split remove_by_toc_name()
The helper consists of three phases:

- move TOC -> TOC.tmp
- remove components listed in TOC
- remove TOC.tmp

The first step is needed separately by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
b10ca96e07 test: Add test case to validate deletion log work
The test sequence is

- create several sstables
- create deletion log for a sub-set of them
- partially unlink smaller sub-sub-set
- make sstable directory do the processing with g.c.
- check that the sstables loaded do NOT include the deleted ones

The .throw_on_missing_toc bit set additionally validates that the
directory doesn't contain garbage not attached to any other TOCs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
fcf080b63b sstable_directory: Close dir on exception
When committing the deletion log creation its containing directory is
sync-ed via opened file. This place is not exception safe and directory
can be left unclosed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
bb167dcca5 sstable_directory: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
28b1289d4b sstable_directory: Coroutinize delete_with_pending_deletion_log()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
92f0aa04d0 test: Sstable on_delete() is not necessarily in a thread
One of the test cases injects an observer into sstable->unlink() method
via its _on_delete() callback. The test's callback assumes that it runs
in an async context, but it's a happy coincidence, because deletion via
the deletion log runs so. Next patch is changing it and the test case
will no longer work. But since it's a test case it can just directly
call a libc function for its needs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
ed043e5762 sstable_directory: Split delete_with_pending_deletion_log()
The helper consists of three parts -- prepare the deletion log, unlink
sstables and drop the deletion log. For testing the first part is needed
as a separate step, so here's this split.

It renders two nested async contexts, but it will change soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:37 +03:00
Nadav Har'El
bae6f3387f CODEOWNERS: remove some entries
The ".github/CODEOWNERS" is used by github to recommend reviewers for
pull requests depending on the directories touched in the pull request.
Github ignores entries on that file who are not **maintainers**. Since
Jan is no longer a Scylla maintainer, I remove his entries in the list.

Additionally, I am removing *myself* from *some* of the directories.
For many years, it was an (unwritten) policy that experienced Scylla
developers are expected to help in reviewing pieces of the code they
are familiar with - even if they no longer work on that code today.
But as ScyllaDB the company grew, this is no longer true; The policy
is now that experienced developers are requested review only code in
their own or their team's area of responsibility - experienced developers
should help review *designs* of other parts, but not the actual code.
For this reason I'm removing my name from various directories.
I can still help review such code if asked specifically - but I will no
longer be the "default" reviewer for such code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16239
2023-11-30 20:29:05 +02:00
Tomasz Grabiec
c64ae7b733 scripts: Introduce tablet-mon.py
Closes scylladb/scylladb#15512
2023-11-30 19:15:36 +02:00
Nadav Har'El
49860952f9 Merge 'LIST EFFECTIVE SERVICE LEVEL statement' from Michał Jadwiszczak
Add `LIST EFFECTIVE SERVICE LEVEL` statement to be able to display from which service level come which service level options.

Example:
There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch).
Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2).

```
> LIST EFFECTIVE SERVICE LEVEL OF role2;

 service_level_option | effective_service_level | value
----------------------+-------------------------+-------------
        workload_type |                     sl2 |       batch
              timeout |                     sl1 |          2s
```

Fixes: https://github.com/scylladb/scylladb/issues/15604

Closes scylladb/scylladb#14431

* github.com:scylladb/scylladb:
  cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test
  docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs
  cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement
  service:qos: add option to include effective names to SLO
2023-11-30 18:12:52 +02:00
Gleb Natapov
3ddc1458ee storage_service: topology coordinator: ignore abort_requested_exception in background fibers
The exception may be thrown by "event" CV during shutdown.
2023-11-30 17:52:40 +02:00
Gleb Natapov
8ed8b151da storage_service: fix de-initialization order between storage service and group0_service
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. But what it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch also
introduces.
2023-11-30 17:52:38 +02:00
Patryk Jędrzejczak
77c4ee92e5 test: delete topology_raft_disabled suite
After moving all tests out of topology_raft_disabled, we can safely
remove this suite.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
ba990d90bb test: topology_raft_disabled: move tests to topology_custom suite
We move the remaining tests in topology_raft_disabled to
topology_custom. We choose topology_custom because these tests
cannot use consistent topology changes.

We need to modify these tests a bit because we cannot pass
RandomTables to a test case function if the initial cluster size
equals 0. RandomTables.__init__ requires manager.cql to be present.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
659ac9c7f5 test: topology_raft_disabled: move utils to topology suite
We move all used util functions from topology_raft_disabled to
topology before we remove topology_raft_disabled. After this
change, util.py in topology will be the single util file for all
topology tests.

Some util functions in topology_raft_disabled aren't used anymore.
We don't move such functions and remove them instead.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
684b070b20 test: topology_raft_disabled: use consistent cluster management
Soon, we will make consistent cluster management mandatory on
master. Before this, we have to change all tests in the
topology_raft_disabled suite so that they do not depend on the
consistent_cluster_management=false config.

Adapting test_raft_upgrade_majority_loss is simple. We only have
to get rid of the initial upgrade. This initial upgrade didn't
test anything. Every test in topology_raft_disabled had to do it
at the beginning because of consistent_cluster_management=false.

Adapting test_raft_upgrade_basic and test_raft_upgrade_stuck is
more difficult. It requires changing the initial upgrade to
clearing Raft data in RECOVERY mode on all servers and restarting
them. Then, the servers will run the same upgrade procedure as
before.

After changing the tests, we also update their names appropriately.

test_raft_upgrade_stuck becomes a bit slower, so we remove the
comment about running time. Also, one TODO was fixed in the process
of rewriting the test. This fix forced us to skip the test in the
release mode since we cannot update the list of error injections
through manager.server_update_config in this mode.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
1059fece19 test: topology_raft_disabled: add new util functions
They are shorter and more readable than long CQL queries. We use
them even more in the following commit.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
7e43ebf88e test: topology_raft_disabled: delete test_raft_upgrade_no_schema
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
consistent_cluster_management: True,
force_schema_commit_log: False
is considered a bad configuration. These changes will make
test_raft_upgrade_no_schema unimplementable in the Scylla repo, so
we remove it.

If we want to keep this test, we must rewrite it as an upgrade
dtest.
2023-11-30 15:50:21 +01:00
Kefu Chai
7a1fbb38f9 sstable: order uuid-based generation as timeuuid
under most circumstances, we don't care the ordering of the sstable
identifiers, as they are just identifiers. so, as long as they can be
compared, we are good. but we have tests with expect that the sstables
can be ordered by the time they are created. for instance,
sstable_run_based_compaction_test has this expectaion.

before this change, we compare two UUID-based generations by its
(MSB, LSB) lexicographically. but UUID v1 put the lower bits of
the timestamp at the higher bits of MSB, so the ordering of the
"time" in timeuuid is not preserved when comparing the UUID-based
generations. this breaks the test of sstable_run_based_compaction_test,
which feeds the sstables to be compacted in a set, and the set is
ordered with the generation of the sstables.

after this change, we consider the UUID-based generation as
a timeuuid when comparing them.

Fixes #16215
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16238
2023-11-30 14:50:44 +02:00
Michał Jadwiszczak
e3515cfc1b cql-pytest: add LIST EFFECTIVE SERVICE LEVEL OF test 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
e1d86f9afb docs: add LIST EFFECTIVE SERVICE LEVEL statement docs 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
2438965b6a cql3:statements: add LIST EFFECTIVE SERVICE LEVEL statement
Add statement to print effective service level of a specified role.
2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
1b08338fe7 service:qos: add option to include effective names to SLO
Allow to include `slo_effective_names` in `service_level_options`
to be able to determine from which service level the specific option value comes from.
2023-11-30 13:07:20 +01:00
Yaron Kaikov
7ce6962141 build_docker.sh: Upgrade package during creation and remove sshd service
When scanning our latest docker image using `trivy` (command: `trivy
image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS
packages which are out of date.

Also removing `openssh-server` and `openssh-client` since we don't use
it for our docker images

Fixes: https://github.com/scylladb/scylladb/issues/16222

Closes scylladb/scylladb#16224
2023-11-30 14:00:15 +02:00
Botond Dénes
d6d9751dd8 tools/scylla-sstable: validate,validate-checksums: print JSON last
Said commands print errors as they validate the sstables. Currently this
intermingles with the regular JSON output of these commands, resulting
in ugly and confusing output.
This is not a problem for scripted use, as logs go to stderr while the
JSON go to stdout, but it is a problem for human users.
Solve this by outputting the JSON into a std::stringstream and printing
it in one go at the very end. This means JSON is accumulated in a memory
buffer, but these commands don't output a lot of JSON, so this shouldn't
be a problem.

Closes scylladb/scylladb#16216
2023-11-30 09:53:47 +03:00
Piotr Smaroń
5fd30578d7 config: introduce value_status::Deprecated
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing

With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN  2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN  2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```

Fixes #15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928

Closes scylladb/scylladb#16184
2023-11-30 08:52:57 +03:00
Avi Kivity
8e9d3af431 Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani
This miniset, completes the prerequisites for enabling commitlog hard limit on by default.
Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances.
It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however,
once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant.

Tests: unit tests.
          LWT write test (#9331)
A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations:
1. The default configuration
2. Hard limit on (without changing the flush threshold)
3. the changes in this PR applied.
The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others.

Closes scylladb/scylladb#10974

* github.com:scylladb/scylladb:
  commitlog: enforce commitlog size hard limit by default
  commitlog: set flush threshold to half of the  limit size
  commitlog: unfold flush threshold assignment
2023-11-29 20:55:53 +02:00
Kamil Braun
8a14839a00 Merge 'handle more failures during topology operations' from Gleb
This series adds handling for more failures during a topology operation
(we already handle a failure during streaming). Here we add handling of
tablet draining errors by aborting the operation and handling of errors
after streaming where an operation cannot be aborted any longer. If the
error happens when rollback is no longer possible we wait for ring delay
and proceed to the next step. Each individual patch that adds the sleep
has an explanation what the consequences of the patch are.

* 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev:
  test: add test to check errro handling during tablet draining
  test: fix test_topology_streaming_failure test to not grep the whole file
  storage_service: add error injection into the tablet migration code
  storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
  storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
  storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
  storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
  storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
  storage_service: topology coordinator: add rollback_to_normal node state
  storage_service: topology coordinator: put fence version into the raft state
  storage_service: topology coordinator: do fencing even if draining failed
2023-11-29 19:02:35 +01:00
Avi Kivity
cccd2e7fa7 Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov
TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.

Closes scylladb/scylladb#16175

* github.com:scylladb/scylladb:
  sstable: Generalize toc file read and parse
  s3/client: Don't GET object contents on out-of-bound reads
  s3/client: Cache stats on readable_file
2023-11-29 19:18:31 +02:00
Nadav Har'El
62f89d49e5 tablets, mv: fix on_internal_error on write to base table
This situation before this patch is that when tablets are enabled for
a keyspace, we can create a materialized view but later any write to
the base table fails with an on_internal_error(), saying that:

     "Tried to obtain per-keyspace effective replication map of test
      but it's per-table."

Indeed, with tablets, the replication is different for each table - it's
not the same for the entire keyspace.

So this patch changes the view update code to take the replication
map from the specific base table, not the keyspace.

This is good enough to get materialized-views reads and writes working
in a simple single-node case, as the included test demonstrates (the
test fails with on_internal_error() before this patch, and passes
afterwards).

But this fix is not perfect - the base-view pairing code really needs
to consider not only the base table's replication map, but also the
view table's replication map - as those can be different. We'll fix
this remaining problem as a followup in a separate patch - it will
require a substantially more elaborate test to reproduce the need
for the different mapping and to verify that fix.

Fixes #16209.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16211
2023-11-29 15:29:17 +01:00
Anna Stuchlik
ce6b15af34 doc: remove the "preview" label from Rust driver 2023-11-29 15:01:31 +01:00
Avi Kivity
cd732b1364 Update seastar submodule
* seastar 830ce8673...55a821524 (34):
  > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc"
  > epoll: Avoid spinning on aborted connections
Fixes #12774
Fixes #7753
Fixes #13337
  > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov
  > test/unit: fix fmt version check
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc
  > build: add spaces before () and after commands
  > reactor: use zero-initialization to initialize io_uring_params
  > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai
  > memory: do not use variable length array
  > build: use tri_state_option() to link against Sanitizers
  > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds
  > Revert "shared_future: make available() immediate after set_value()"
  > test_runner: do not throw when seastar.app fails to start
  > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester
  > defer, closeable: do not use [[nodiscard(str)]]
  > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai
  > treewide: use *_v and *_t for better readability
  > build: use different names for .pc files for each build mode
  > perftune.py: skip discovering IRQs for iSCSI disks
  > io-tester: explicit use uint64_t for boost::irange(...)
  > gate: correct the typo in doxygen comment
  > shared_future: make available() immediate after set_value()
  > smp: drop unused templates
  > include fmt/ostream.h to make headers self-sufficient
  > Support ccache in ./configure.py
  > rpc_tester: Disable -Wuninitialized when including boost.accumulators
  > file: construct directory_entry with aggregated ctor
  > file: s/ino64_t/ino_t/, s/off64_t/off_t/
  > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0
  > file: do not include coroutine headers if coroutine is disabled
  > fair_queue::unregister_priority_class:fix assertion
  > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala
  > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov
  > http/file_handler: remove unnecessary cast

Closes scylladb/scylladb#16201
2023-11-29 14:34:30 +02:00
Kefu Chai
c40da20092 utils/pretty_printers: stop using undocumented fmt api
format_parse_context::on_error() is an undocumented API in fmt v9
and in fmt v10, see

- https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE
- https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE

despite that this API was once used in its document for fmt v10.0.0, see
https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's
still, well, undocumented.

so, to have better compatibility, let's use the documented API in place
of undocumented one. please note, `throw_format_error()` was still
not a public API before 10.1.0, so before that release we have to
throw `fmt::format_error` explicitly. so we cannot use it yet during
the transitional period.

because the class of `fmt::format_error` is defined in `fmt/format.h`,
we need to include this header for using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16212
2023-11-29 12:49:04 +02:00
Pavel Emelyanov
0da37d5fa6 sstable: Generalize toc file read and parse
There are several places where TOC file is parsed into a vector of
components -- sstable::read_toc(), remove_by_toc_name() and
remove_by_registry_entry(). All three deserve some generalization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
c5d85bdf79 s3/client: Don't GET object contents on out-of-bound reads
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.

That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
339182287f s3/client: Cache stats on readable_file
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.

But the main intention of this patch is to provide stats for read calls
in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:06:54 +03:00
Calle Wilund
3b70fde3cd commitlog: Make named_files in delete_segments have updated size
Fixes #16207

commitlog::delete_segments deletes (or recycles) segments replayed.
The actual file size here is added to footprint so actual delete then
can determine iff things should be recycled or removed.
However, we build a pending delete list of named_files, and the files
we added did not have size set. Bad. Actual deletion then treated files
as zero-byte sized, i.e. footprint calculations borked.

Simple fix is just filling in the size of the objects when addind.
Added unit test for the problem.

Closes scylladb/scylladb#16210
2023-11-29 09:58:47 +02:00
Yaron Kaikov
c3ee53f3be test.py: enable xml validation
Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862

Adding back xml validation

Closes: https://github.com/scylladb/scylla-pkg/issues/3441

Closes scylladb/scylladb#16198
2023-11-29 09:02:36 +02:00
Botond Dénes
3ed6925673 Merge 'Major compaction: flush commitlog by forcing new active segment and flushing all tables' from Benny Halevy
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Closes scylladb/scylladb#15820

* github.com:scylladb/scylladb:
  docs: nodetool: flush: enrich examples
  docs: nodetool: compact: fix example
  api: add /storage_service/compact
  api: add /storage_service/flush
  compaction_manager: flush_all_tables before major compaction
  database: add flush_all_tables
  api: compaction: add flush_memtables option
  test/nodetool: jmx: fix path to scripts/scylla-jmx
  scylla-nodetool, docs: improve optional params documentation
2023-11-29 08:48:40 +02:00
Kefu Chai
65994b1e83 build: cmake: add compatibility target of dev-headers
our CI builds "dev-headers" as a gating check. but the target names
generated by CMake's Ninja Multi-Config generator does not follow
this naming convention. we could have headers:Dev, but still, it's
different from what we are using, before completely switching to
CMake, let's keep this backward compatibility by adding a target
with the same name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-29 10:08:59 +08:00
Kefu Chai
2d284f4749 build: add an option to use CMake as the build build system
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.

in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.

Fixes #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-29 10:08:59 +08:00
Nadav Har'El
88a5ddabce tablets, mv: create tablets for a new materialized view
Before this patch, trying to create a materialized view when tablets
are enabled for a keyspace results in a failure: "Tablet map not found
for table <uuid>", with uuid referring to the new view.

When a table schema is created, the handler on_before_create_column_family()
is called - and this function creates the tablet map for the new table.
The bug was that we forgot to do the same when creating a materialized
view - which also a bona-fide table.

In this patch we call on_before_create_column_family() also when
creating the materialized view. I decided *not* to create a new
callback (e.g., on_before_create_view()) and rather call the existing
on_before_create_column_family() callback - after all, a view is
a column family too.

This patch also includes a test for this issue, which fails to create
the view before this patch, and passes with the patch. The test is
in the test/topology_experimental_raft suite, which runs Scylla with
the tablets experimental feature, and will also allow me to create
tests that need multiple nodes. However, the first test added here
only needs a single node to reproduce the bug and validate its fix.

Fixes #16194.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16205
2023-11-28 21:54:32 +01:00
Kamil Braun
3582095b79 schema_tables: use smaller timestamp for base mutations included with view update
When a view schema is changed, the schema change command also includes
mutations for the corresponding base table; these mutations don't modify
the base schema but are included in case if the receiver of view
mutations somehow didn't receive base mutations yet (this may in theory
happen outside Raft mode).

There are situations where the schema change command contains both
mutations that describe the current state of the base table -- included
by a view update, as explained above -- and mutations that want to
modify the base table. Such situation arises, for example, when we
update a user-defined type which is referenced by both a view and its
corresponding base table. This triggers a schema change of the view,
which generates mutations to modify the view and includes mutations of
the current base schema, and at the same time it triggers a schema
change of the base, which generates mutations to modify the base.

These two sets of mutations are conflicting with each other. One set
wants to preserve the current state of the base table while the other
wants to modify it. And the two sets of mutations are generated using
the same timestamp, which means that conflict resolution between them is
made on a per-mutation-cell basis, comparing the values in each cell and
taking the "larger" one (meaning of "larger" depends on the type of each
cell).

Fortunately, this conflict is currently benign -- or at least there is
no known situation where it causes problems.

Unfortunately, it started causing problems when I attempted to implement
group 0 schema versioning (PR scylladb/scylladb#15331), where instead of
calculating table versions as hashes of schema mutations, we would send
versions as part of schema change command. These versions would be
stored inside the `system_schema.scylla_tables` table, `version` column,
and sent as part of schema change mutations.

And then the conflict showed. One set of mutations wanted to preserve
the old value of `version` column while the other wanted to update it.
It turned out that sometimes the old `version` prevailed, because the
`version` column in `system_schema.scylla_tables` uses UUID-based
comparison (not timeuuid-based comparison). This manifested as issue
scylladb/scylladb#15530.

To prevent this, the idea in this commit is simple: when generating
mutations for the base table as part of corresponding view update, do
not use the provided timestamp directly -- instead, decrement it by one.
This way, if the schema change command contains mutations that want to
modify the base table, these modifying mutations will win all conflicts
based on the timestamp alone (they are using the same provided
timestamp, but not decremented).

One could argue that the choice of this timestamp is anyway arbitrary.
The original purpose of including base mutations during view update was
to ensure that a node which somehow missed the base mutations, gets them
when applying the view. But in that case, the "most correct" solution
should have been to use the *original* base mutations -- i.e. the ones
that we have on disk -- instead of generating new mutations for the base
with a refreshed timestamp. The base mutations that we have on disk have
smaller timestamps already (since these mutations are from the past,
when the base was last modified or created), so the conflict would also
not happen in this case.

But that solution would require doing a disk read, and we can avoid the
read while still fixing the conflict by using an intermediate solution:
regenerating the mutations but with `timestamp - 1`.

Ref: scylladb/scylladb#15530

Closes scylladb/scylladb#16139
2023-11-28 21:51:18 +01:00
Benny Halevy
310ff20e1e docs: nodetool: flush: enrich examples
Provide 3 examples, like in the nodetool/compact page:
global, per-keyspace, per-table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:48:22 +02:00
Benny Halevy
d32b90155a docs: nodetool: compact: fix example
It looks like `nodetool compact standard1` is meant
to show how to compact a specified table, not a keyspace.
Note that the previous example like is for a keyspace.
So fix the table compaction example to:
`nodetool compact keyspace1 standard1`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:45:20 +02:00
Benny Halevy
b12b142232 api: add /storage_service/compact
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1b576f358b api: add /storage_service/flush
For flushing all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool flush` translates to
a sequence of `/storage_service/keyspace_flush` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
be763bea34 database: add flush_all_tables
Flushes all tables after forcing force_new_active_segment
of the commitlog to make sure all commitlog segments can
get recycled.

Otherwise, due to "false sharing", rarely-written tables
might inhibit recycling of the commitlog segments they reference.

After f42eb4d1ce,
that won't allow compaction to purge some tombstones based on
the min_gc_time.

To be used in the next patch by major compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1fd85bd37b api: compaction: add flush_memtables option
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.

This is useful to prevent creation of small sstables
due to excessive memtable flushing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
7f860d612a test/nodetool: jmx: fix path to scripts/scylla-jmx
The current implementation makes no sense.

Like `nodetool_path`, base the default `jmx_path`
on the assumption that the test is run using, e.g.
```
(cd test/nodetool; pytest --nodetool=cassandra test_compact.py)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
9324363e55 scylla-nodetool, docs: improve optional params documentation
Document the behavior if no keyspace is specified
or no table(s) are specified for a given keyspace.

Fixes scylladb/scylladb#16032

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Anna Stuchlik
bfe19c0ed2 doc: add experimental support for object storage
This commit adds information on how to enable
object storage for a keyspace.

The "Keyspace storage options" section already
existed in the doc, but it was not valid as
the support was only added in version 5.4

The scope of this commit:
- Update the "Keyspace storage options" section.
- Add the information about object storage support
  to the Data Definition> CREATE KEYSPACE section
  * Marked as "Experimental".
  * Excluded from the Enterprise docs with the
    .. only:: opensource directive.

This commit must be backported to branch-5.4,
as support for object storage was added
in version 5.4.

Closes scylladb/scylladb#16081
2023-11-28 14:27:01 +02:00
Anna Stuchlik
37f20f2628 doc: fix Rust Driver release information
This PR removes the incorrect information that
the ScyllaDB Rust Driver is not GA.

In addition, it replaces "Scylla" with "ScyllaDB".

Fixes https://github.com/scylladb/scylladb/issues/16178
2023-11-28 10:32:08 +01:00
Botond Dénes
f46cdce9d3 Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov
Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal.

Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well.

fixes: #13745

Closes scylladb/scylladb#15635

* github.com:scylladb/scylladb:
  test: Add object_store test to validate config reloading works
  test: Add config update facility to test cluster
  test: Make S3_Server export config file as pathlib.Path
  config: Make object storage config updateable_value_source
  memtable: Extend list of checking codes
  sstables/storage/s3: Fix missing TOC status check
  s3/client: Map http exceptions into storage_io_error
  exceptions: Extend storage_io_error construction options
2023-11-28 09:33:37 +02:00
Botond Dénes
3ccf1e020b Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#16177

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-28 09:08:04 +02:00
Pavel Emelyanov
1efddc228d sstable: Do not nest io-check wrappers into each other
When sealing an sstable on local storage  the storage driver performs
several flushes on a file that is directory open via checked-file.
Flush calls are wrapped with sstable_write_io_check, but that's
excessive, the checked file will wrap flushes with io-checks on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16173
2023-11-27 15:53:02 +02:00
Kefu Chai
724a6e26f3 cql3: define format_as() for formatting cql3::cql3_type::raw
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

to define a formatter which can be used by raw class and its derived
classes, we have to put the full template specialization before the
call sites. also, please note, the forward declaration is not enough,
as the compile-time formatter check of fmt requires the definition of
formatter. since fmt v10 also enables us to use `format_as()` to format
a certain type with the return value of `format_as()`.

this fulfills our needs.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16125
2023-11-27 15:28:19 +02:00
Kefu Chai
0b69a1badc transport: cast unaligned<T> to T for formatting it
in fmt v10, it does not cast unaligned<T> to T when formatting it,
instead it insists on finding a matched fmt::formatter<> specialization for it.
that's why we have FTBFS with fmt v10 when printing
these packed<T> variables with fmtlib v10.

in this change, we just cast them to the underlying types before
formatting them. because seastar::unaligned<T> does not provide
a method for accessing the raw value, neither does it provide
a type alias of the type of the underlying raw value, we have
to cast to the type without deducing it from the printed value.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16167
2023-11-27 15:26:13 +02:00
Gleb Natapov
e68e998b15 test: add test to check errro handling during tablet draining
The test checks that the topology operation is aborted if an error
happens during tablet migration stage.
2023-11-27 15:06:52 +02:00
Gleb Natapov
b1c0b57acf test: fix test_topology_streaming_failure test to not grep the whole file
A cluster can be reused between tests, so lets grep only the part of the
log that is relevant for the test itself.
2023-11-27 15:05:21 +02:00
Petr Gusev
dca28417b2 storage_service: drop unused method handle_state_replacing_update_pending_ranges 2023-11-27 12:37:26 +01:00
Tomasz Grabiec
ae5220478c tablets: Release group0 guard when waiting for streaming to finish
This bug manifested as delays in DDL statement execution, which had to
wait until streaming is finished so that the topology change
coordinator releases the guard.

The reason is that topology change coordinator didn't release the
group0 guard if there is no work to do with active migrations, and
awaits the condition variable without leaving the scope.

Fixes #16182

Closes scylladb/scylladb#16183
2023-11-27 12:24:27 +01:00
Gleb Natapov
c83ff5a0dd storage_service: add error injection into the tablet migration code 2023-11-27 13:09:58 +02:00
Gleb Natapov
4ebdddc31b storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
During remove or decommission as a first step tables are drained from
the leaving node. Theoretically this step may fail. Rollback the
topology operation if it happen. Since some tables may stay in migration
state the topology needs to go to the tablet_migration state. Lets do it
always since it should be save to do it even if there is no on going
tablet migrations.
2023-11-27 13:09:58 +02:00
Nadav Har'El
8d040325ab cql: fix SELECT toJson() or SELECT JSON of time column
The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column
of type "time" forgot to put the time string in quotes. The result was
invalid JSON. This is patch is a one-liner fixing this bug.

This patch also removes the "xfail" marker from one xfailing test
for this issue which now starts to pass. We also add a second test for
this issue - the existing test was for "SELECT TOJSON(t)", and the second
test shows that "SELECT JSON t" had exactly the same bug - and both are
fixed by the same patch.

We also had a test translated from Cassandra which exposed this bug,
but that test continues to fail because of other bugs, so we just
need to update the xfail string.

The patch also fixes one C++ test, test/boost/json_cql_query_test.cc,
which enshrined the *wrong* behavior - JSON output that isn't even
valid JSON - and had to be fixed. Unlike the Python tests, the C++ test
can't be run against Cassandra, and doesn't even run a JSON parser
on the output, which explains how it came to enshrine wrong output
instead of helping to discover the bug.

Fixes #7988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16121
2023-11-27 10:03:04 +02:00
Anna Stuchlik
24d5dbd66f doc: replace the OSS-only link on the Raft page
This commit replaces the link to the OSS-only page
(the 5.2-to-5.4 upgrade guide not present in
the Enterprise docs) on the Raft page.

While providing the link to the specific upgrade
guide is more user-friendly, it causes build failures
of the Enterprise documentation. I've replaced
it with the link to the general Upgrade section.

The ".. only:: opensource" directive used to wrap
the OSS-only content correctly excludes the content
form the Enterprise docs - but it doesn't prevent
build warnings.

This commit must be backported to branch-5.4 to
prevent errors in all versions.

Closes scylladb/scylladb#16176
2023-11-27 08:52:58 +02:00
Kefu Chai
c937827308 mutation_query: add formatter for reconcilable_result::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
reconcilable_result::printer, and remove its operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16186
2023-11-26 20:20:50 +02:00
Konstantin Osipov
f0aa325187 test: provide overview of the contents of test/ directory
Fixes #16080

Closes scylladb/scylladb#16088
2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz
81be3e0935 test/alternator/run: port -h and --omit-scylla-output options from cql-pytest
Closes scylladb/scylladb#16171
2023-11-26 13:52:01 +02:00
Botond Dénes
fe7c81ea30 Update ./tools/jmx and ./tools/java submodules
* ./tools/jmx 05bb7b68...80ce5996 (4):
  > StorageService: Normalize endpoint inetaddress strings to java form

Fixes #16039

  > ColumnFamilyStore: only quote table names if necessary
  > APIBuilder: allow quoted scope names
  > ColumnFamilyStore: don't fail if there is a table with ":" in its name

Fixes #16153

* ./tools/java 10480342...26f5f71c (1):
  > NodeProbe: allow addressing table name with colon in it

Also needed for #16153

Closes scylladb/scylladb#16146
2023-11-26 13:35:38 +02:00
Kefu Chai
ba3dce3815 build: do escape "\" in regular string
in Python, a raw string is created using 'r' or 'R' prefix. when
creating the regex using Python string, sometimes, we have to use
"\" to escape the parenthesis so the tools like "sed" can consider
the parenthesis as a capture group. but "\" is also used to escape
strings in Python, in order to put "\" as it is, we use "\" instead
of escaping "\" with "\\" which is obscure. when generating rules,
we use multiple-lines string and do not want to have an empty line
at the beginning of the string so added "\" continuation mark.

but we fail to escape some of the "\" in the string, and just put
"\(", despite that Python accepts it after failing to find a matched
escaped char for it, and interprets it as "\\(". this should still
be considered a misuse of oversight. with python's warning enabled,
one is able see its complaints.

in this change, we escape the "\" properly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16179
2023-11-26 13:34:10 +02:00
Kefu Chai
3053d63c7f main: notify systemd that the service is ready
this change addresses a regression introduced by
f4626f6b8e, which stopped notifying
systemd with the status that scylla is READY. without the
notification, systemd would wait in vain for the readiness of
scylla.

Refs f4626f6b8e

Fixes #16159
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16166
2023-11-26 10:38:53 +02:00
Aleksandra Martyniuk
9c2c964b8e test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk
8639eae0ce test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-11-24 19:25:20 +01:00
Botond Dénes
a472700309 Merge 'Minor fixes and refactors' from Kamil Braun
- remove some code that is obsolete in newer Scylla versions,
- fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless,
- refactor some code for better maintainability.

Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new.

Closes scylladb/scylladb#16162

* github.com:scylladb/scylladb:
  test/pylib: log_browsing: fix type hint
  migration_manager: take `abort_source&` in get_schema_for_read/write
  migration_manager: inline merge_schema_in_background
  migration_manager: remove unused merge_schema_from overload
  migration_manager: assume `canonical_mutation` support
  migration_manager: add `std::move` to avoid a copy
  schema_tables: refactor `scylla_tables(schema_features)`
  schema_tables: pass `reload` flag when calling `merge_schema` cross-shard
  system_keyspace: fix outdated comment
2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak
15d3ed4357 test: topology: update run_first lists
`run_first` lists in `suite.yaml` files provide a simple way to
shorten the tests' average running time by running the slowest
tests at first.

We update these lists, since they got outdated over time:
- `test_topology_ip` was renamed to `test_replace`
   and changed suite,
- `test_tablets` changed suite,
- new slow tests were added:
  - `test_cluster_features`,
  - `test_raft_cluster_features`,
  - `test_raft_ignore_nodes`,
  - `test_read_repair`.

Closes scylladb/scylladb#16104
2023-11-24 16:18:30 +01:00
Aleksandra Martyniuk
c74b3ec596 tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-11-24 15:45:00 +01:00
Aleksandra Martyniuk
aa7bba2d8b compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-11-24 15:44:34 +01:00
Kefu Chai
ca31dab9d2 sstable: drop repaired_at related code
before we support incremental repair, these is no point have the
code path setting / getting it. and even worse, it incurs confusion.

so, in this change, we

* just set the field to 0,
* drop the corresponding field in metadata_collector, as we never
  update it.
* add a comment to explain why this variable is initialized to 0

Fixes #16098
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16169
2023-11-24 15:12:25 +02:00
Botond Dénes
697cf41b9b Merge 'repair: Introduce small table optimization' from Asias He
repair: Introduce small table optimization

*) Problem:

We have seen in the field it takes longer than expected to repair system tables
like system_auth which has a tiny amount of data but is replicated to all nodes
in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The
main reason for the slowness is that even if the amount of data is small,
repair has to walk though all the token ranges, that is num_tokens *
number_of_nodes_in_the_cluster. The overhead of the repair protocol for each
token range dominates due to the small amount of data per token range. Another
reason is the high network latency between DCs makes the RPC calls used to
repair consume more time.

*) Solution:

To solve this problem, a small table optimization for repair is introduced in
this patch. A new repair option is added to turn on this optimization.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

*) Performance:

1)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Before:
```
repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1537, round_nr=4612,
round_nr_fast_path_already_synced=4611,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds,
tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2,
0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848},
{127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0},
{127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s,
tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={}
```

After:
```
repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds,
tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178},
{127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1},
{127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286},
{127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286},
{127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026},
{127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={},
rx_row_nr_peer={}
```

The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X

2)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Same test as above except 5ms delay is added to simulate multiple dc
network latency:

The time to repair is reduced from 333s to 0.2s.

333.26758 s / 0.22625381s = 1472.98

3)

3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3}
, 10 ms network latency

Before:

```
repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=2305, round_nr=6916,
round_nr_fast_path_already_synced=6915,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015
seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0},
{127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1},
{127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}},
row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2,
1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385},
{127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0},
{127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0},
{127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={}
```

After:

```
repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=1, round_nr=4,
round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915
seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178},
{127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}},
row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}},
row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2,
0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793},
{127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7,
0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634},
{127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4,
2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634},
{127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9,
2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
```

The time to repair is reduced from 986s (16 minutes) to 1.6s

*) Summary

So, a more than 1000X difference is observed for this common usage of
system table repair procedure.

Fixes #16011
Refs  #15159

Closes scylladb/scylladb#15974

* github.com:scylladb/scylladb:
  repair: Introduce small table optimization
  repair: Convert put_row_diff_with_rpc_stream to use coroutine
2023-11-24 15:11:42 +02:00
Kamil Braun
1f56962591 Merge 'test: topology: test concurrent bootstrap' from Patryk Jędrzejczak
We add a test for concurrent bootstrap in the raft-based topology.

Additionally, we extend the testing framework with a new function -
`ManagerClient.servers_add`. It allows adding multiple servers
concurrently to a cluster.

This PR is the first step to fix #15423. After merging it, if the new test
doesn't fail for some time in CI, we can:
- use `ManagerClient.servers_add` in other tests wherever possible,
- start initial servers concurrently in all suites with
  `initial_size > 0`.

Closes scylladb/scylladb#16102

* github.com:scylladb/scylladb:
  test: topology: add test_concurrent_bootstrap
  test: ManagerClient: introduce servers_add
  test: ManagerClient: introduce _create_server_add_data
2023-11-24 12:41:05 +01:00
Kefu Chai
f99223919a compaction: add formatter for map<timestamp_type, vector<shared_sstable>>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
map<timestamp_type, vector<shared_sstable>>. since the operator<<
for this type is only used in the .cc file, and the only use case
of it is to provide the formatter for fmt, so the operator<< based
formatter is remove in this change.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16163
2023-11-24 11:56:28 +02:00
Kamil Braun
5acfcd8ef5 Merge 'raft: send group0 RPCs only if the destination group0 server is seen as alive' from Piotr Dulikowski
In topology on raft mode, the events "new node starts its group0 server"
and "new node is added to group0 configuration" are not synchronized
with each other. Therefore it might happen that the cluster starts
sending commands to the new node before the node starts its server. This
might lead to harmless, but ugly messages like:

    INFO  2023-09-27 15:42:42,611 [shard 0:stat] rpc - client
    127.0.0.1:56352 msg_id 2:  exception "Raft group
    b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler
    ignored

In order to solve this, the failure detector verb is extended to report
information about whether group0 is alive. The raft rpc layer will drop
messages to nodes whose group0 is not seen as alive.

Tested by adding a delay before group0 is started on the joining node,
running all topology tests and grepping for the aforementioned log
messages.

Fixes: scylladb/scylladb#15853
Fixes: scylladb/scylladb#15167

Closes scylladb/scylladb#16071

* github.com:scylladb/scylladb:
  raft: rpc: introduce destination_not_alive_error
  raft: rpc: drop RPCs if the destination is not alive
  raft: pass raft::failure_detector to raft_rpc
  raft: transfer information about group0 liveness in direct_fd_ping
  raft: add server::is_alive
2023-11-24 10:34:05 +01:00
Patryk Jędrzejczak
a8d06aa9fd test: topology: add test_concurrent_bootstrap
We add a test for concurrent bootstrap support in the raft-based
topology.

The plan is to make this test temporary. In the future, we will:
- use ManagerClient.servers_add in other tests wherever possible,
- start initial servers concurrently in all suites with
  initial_size > 0.
So, this test will not test anything unique.

We could make the changes proposed above now instead of adding
this small test. However, if we did that and it turned out that
concurrent bootstrap is flaky in CI, we would make almost every CI
run fail with many failures. We want to avoid such a situation.
Running only this test for some time in CI will reduce the risk
and make investigating any potential failures easier.
2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak
cd7b282db6 test: ManagerClient: introduce servers_add
We add a new function - servers_add - that allows adding multiple
servers concurrently to a cluster. It makes use of a concurrent
bootstrap now supported in the raft-based topology.

servers_add doesn't have the replace_cfg parameter. The reason is
that we don't support concurrent replace operations, at least for
now.

There is an implementation detail in ScyllaCluster.add_servers. We
cannot simply do multiple calls to add_server concurrently. If we
did that in an empty cluster, every node would take itself as the
only seed and start a new cluster. To solve this, we introduce a
new field - initial_seed. It is used to choose one of the servers
as a seed for all servers added concurrently to an empty cluster.

Note that the add_server calls in asyncio.gather in add_servers
cannot race with each other when setting initial_seed because
there is only one thread.

In the future, we will also start all initial servers concurrently
in ScyllaCluster.install_and_start. The changes in this commit were
designed in a way that will make changing install_and_start easy.
2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak
aca90e6640 test: ManagerClient: introduce _create_server_add_data
We introduce this function to avoid code duplication. After the
following commits, it will also be used in the new
ManagerClient.servers_add function.
2023-11-24 09:39:01 +01:00
Botond Dénes
c47a63835e Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai
before this change, in sstable_run_based_compaction_test, we check
every 4 sstables, to verify that we close the sstable to be replaced
in a batch of 4.

since the integer-based generation identifier is monotonically
incremental, we can assume that the identifiers of sstables are like
0, 1, 2, 3, .... so if the compaction consumes sstable in a
batch of 4, the identifier of the first one in the batch should
always be the multiple of 4. unfortunately, this test does not work
if we use uuid-based identifier.

but if we take a closer look at how we create the dataset, we can
have following facts:

1. the `compaction_descriptor` returned by
   `sstable_run_based_compaction_strategy_for_tests` never
   set `owned_ranges` in the returned descriptor
2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no`
   is used, if `_owned_ranges_checker` is empty
3. `mutation_reader_merger` respects the `fwd_mr` passed to its
   ctor, so it closes current sstable immediately when the underlying
   mutation reader reaches the end of stream.

in other words, we close every sstable once it is fully consumed in
sstable_ompaction_test. and the reason why the existing test passes
is that we just sample the sstables whose generation id is a multiple
of 4. what happens when we perform compaction in this test is:

1. replace 5 with 33, closing 5
2. replace 6 with 34, closing 6
3. replace 7 with 35, closing 7
4. replace 8 with 36, closing 8   << let's check here.. good, go on!
5. replace 13 with 37, closing 13
...
8. replace 16 with 40, closing 16 << let's check here.. also, good, go on!

so, in this change, we just check all old sstables, to verify that
we close each of them once it is fully consumed.

Fixes https://github.com/scylladb/scylladb/issues/16073

Closes scylladb/scylladb#16074

* github.com:scylladb/scylladb:
  test/sstable_compaction_test: check every sstable replaced sstable
  test/sstable_compaction_test: s/old_sstables.front()/old_sstable/
2023-11-24 07:25:28 +02:00
Kamil Braun
35bb025f99 test/pylib: log_browsing: fix type hint 2023-11-23 17:23:47 +01:00
Kamil Braun
819f542ee6 migration_manager: take abort_source& in get_schema_for_read/write
No callsite needed the `nullptr` case, so we can convert pointer to
reference.
2023-11-23 17:23:47 +01:00
Kamil Braun
ddfe4f65a8 migration_manager: inline merge_schema_in_background
There was only one use site of this template.
2023-11-23 17:23:47 +01:00
Kamil Braun
42f6c5c2db migration_manager: remove unused merge_schema_from overload
The `frozen_mutation` version is now dead code.
2023-11-23 17:23:47 +01:00
Kamil Braun
8f5c2c88b8 migration_manager: assume canonical_mutation support
Support for `canonical_mutation`s was added way back in Scylla 3.2. A
lot of code in `migration_manager` is still checking whether the old
`frozen_mutations` are received or need to be sent.

We no longer need this code, since we don't support version skips during
upgrade (and certainly not upgrades like 3.2->5.4).

Leave a sanity checks in place, but otherwise delete the
`frozen_mutation` branches.
2023-11-23 17:23:47 +01:00
Kamil Braun
0479e5529a migration_manager: add std::move to avoid a copy 2023-11-23 17:23:47 +01:00
Kamil Braun
269a189526 schema_tables: refactor scylla_tables(schema_features)
The `scylla_tables` function gives a different schema definition
for the `system_schema.scylla_tables` table, depending on whether
certain schema features are enabled or not.

The way it was implemented, we had to write `θ(2^n)` amount
of code and comments to handle `n` features.

Refactor it so that the amount of code we have to write to handle `n`
features is `θ(n)`.
2023-11-23 17:23:47 +01:00
Raphael S. Carvalho
157a5c4b1b treewide: Avoid using namespace sstables in header to avoid conflicts
That's needed for compaction_group.hh to be included in headers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-23 17:36:57 +02:00
Kamil Braun
c3257bf546 Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown"
This reverts commit 93ee7b7df9.

It's causing assertion failures when shutting down `cql_test_env` in
boost unit tests: scylladb/scylladb#16144
2023-11-23 15:32:13 +01:00
Gleb Natapov
7267376eac storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for all reads to
old replica set to complete and fence the remaining requests.  If the
barrier fails we give the fence some time to propagate and continue with
the topology change. Of fence did not propagate we may have stale reads,
but this is not worse that we have with gossiper.
2023-11-23 15:30:10 +02:00
Gleb Natapov
7ea8fa459c storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for unfinished writes
to decommissioned node complete. If barrier fails we give them some time
to complete and then proceed with node decommission. The worse thing
that may happen if some write will fail because the node will be
shutdown.
2023-11-23 15:30:10 +02:00
Gleb Natapov
11b7ee32ec storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
Not that is removed is dead, so no need to talk to it.
2023-11-23 15:30:10 +02:00
Gleb Natapov
4c76b8b59f storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
Go through the rollback_to_normal state when the node needs to move to
normal during the rollback and update fence in this state before moving
the node to normal. This guaranties that the fence update will not
be missed. Not that when a node moves to left state it already passes
through left_token_ring which guaranties the same.
2023-11-23 15:29:36 +02:00
Gleb Natapov
95dd0e453d storage_service: topology coordinator: add rollback_to_normal node state
When a topology coordinator rolls back from unsuccessful topology operation it
advances the fence (which is now in the raft state) after moving to normal
state. We do not want this to fail (only majority of nodes is needed for
it to not to), but currently it may fail in case the coordinator moves
to another node after changing the rollback node's state to normal, but
before updating the fence. To solve that the rollback operation needs to
go through a new rollback_to_normal state that will do the fencing
before moving to normal. This patch introduces that state, but does not use
it yet.
2023-11-23 15:27:28 +02:00
Kamil Braun
5223d32fab schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_schema` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.
2023-11-23 14:06:40 +01:00
Kamil Braun
de3607810d system_keyspace: fix outdated comment 2023-11-23 14:06:27 +01:00
Piotr Dulikowski
c58ff554d8 raft: rpc: introduce destination_not_alive_error
Add a new destination_not_alive_error, thrown from two-way RPCs in case
when the RPC is not issued because the destination is not reported as
alive by the failure detector.

In snapshot transfer code, lower the verbosity of the message printed in
case it fails on the new error. This is done to prevent flakiness in the
CI - in case of slow runs, nodes might get spuriously marked as dead if
they are busy, and a message with the "error" verbosity can cause some
tests to fail.
2023-11-23 11:14:28 +01:00
Kamil Braun
03ecc8457c Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.

Apart from adding this change, this PR adds a test case -
`test_replacing_alive_node_fails` - that verifies it. A few testing
framework adjustments were necessary to implement this test and
to avoid flakiness in other tests that use the replace operation after
the change. From now, we need to ensure that all nodes see the
node being replaced as dead before starting the replace. Otherwise,
the check added in this PR could reject the replace.

Additionally, this PR changes the replace procedure in a way that
if the replacing node reuses the IP of the node being replaced, other
nodes can see it as alive only after the topology coordinator accepts
its join request. The replacing node may become alive before the
topology coordinator checks if the node being replaced is dead. If
that happens and the replacing node reuses the IP of the node being
replaced, the topology coordinator cannot know which of these two
nodes is alive and whether it should reject the join request.

Fixes #15863

Closes scylladb/scylladb#15926

* github.com:scylladb/scylladb:
  test: add test_replacing_alive_node_fails
  raft topology: reject replace if the node being replaced is not dead
  raft topology: add the gossiper ref to topology_coordinator
  test: test_cluster_features: stop gracefully before replace
  test: decrease failure_detector_timeout_in_ms in replace tests
  test: move test_replace to topology_custom
  test: server_add: wait until the node being replaced is dead
  test: server_add: add support for expected errors
  raft topology: join: delay advertising replacing node if it reuses IP
  raft topology: join: fix a condition in validate_joining_node
2023-11-23 10:31:59 +01:00
Kefu Chai
55103f4a6b hints: move formatter of db::hints::sync_point to test
the operator<<() based formatter is only used in its test, so
let's move it to where it is used.
we can always bring it back later if it is required in other places.
but better off implementing it as a fmt::formatter<> then.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16142
2023-11-23 11:22:31 +02:00
Kefu Chai
a9c1a435ec result_message: add formatter for result_message::rows
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
`cql_transport::messages::result_message::rows`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16143
2023-11-23 11:12:55 +02:00
Kefu Chai
6749d963ed config: define formatter for db::seed_provider_type
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define a formatter for db::seed_provider_type.

please note, we are still formatting vector<db::seed_provider_type>
with the helper provided by seastar/core/sstring.hh, which uses
operator<<() to print the elements in the vector being printed.
so we have to keep the operator<< formatter before disabling
the generic formatter for vector<T>.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16138
2023-11-23 11:04:35 +02:00
Kefu Chai
ef76c4566b gossiper: do not use {:d} fmt specifier when formating generation_number
generation_number's type is `generation_type`, which in turn is a
`utils::tagged_integer<struct generation_type_tag, int32_t>`,
which formats using either fmtlib which uses ostream_formatter backed by
operator<< . but `ostream_formatter` does not provide the specifier
support. so {:d} does apply to this type, when compiling with fmtlib
v10, it rejects the format specifier (the error is attached at the end
of the commit message).

so in this change, we just drop the format specifier. as fmtlib prints
`int32_t` as a decimal integer, so even if {:d} applied, it does not
change the behavior.

```
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: error: call to consteval function 'fmt::basic_format_string<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &>::basic_format_string<char[48], 0>' is not a constant expression
 1798 |                 auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
/usr/include/fmt/core.h:2395:14: note: in call to 'parse_presentation_type.operator()(1, 510)'
 2395 |       return parse_presentation_type(pres::dec, integral_set);
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2706:9: note: in call to 'parse_format_specs<char>(&"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47], formatter<mapped_type, char_type>().formatter::specs_, checker(s).context_, 13)'
 2706 |         detail::parse_format_specs(ctx.begin(), ctx.end(), specs_, ctx, type);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2561:10: note: in call to 'formatter<mapped_type, char_type>().parse<fmt::detail::compile_parse_context<char>>(checker(s).context_)'
 2561 |   return formatter<mapped_type, char_type>().parse(ctx);
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2647:39: note: in call to 'parse_format_specs<utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, fmt::detail::compile_parse_context<char>>(checker(s).context_)'
 2647 |     return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin;
      |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2485:15: note: in call to 'handler.on_format_specs(0, &"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47])'
 2485 |       begin = handler.on_format_specs(adapter.arg_id, begin + 1, end);
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2541:13: note: in call to 'parse_replacement_field<char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>> &>(&"Remote generation {:d} != local generation {:d}"[19], &"Remote generation {:d} != local generation {:d}"[47], checker(s))'
 2541 |     begin = parse_replacement_field(p, end, handler);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2769:7: note: in call to 'parse_format_string<true, char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>>>({&"Remote generation {:d} != local generation {:d}"[0], 47}, checker(s))'
 2769 |       detail::parse_format_string<true>(str_, checker(s));
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: note: in call to 'basic_format_string<char[48], 0>("Remote generation {:d} != local generation {:d}")'
 1798 |                 auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16126
2023-11-23 11:02:44 +02:00
Tzach Livyatan
225f0ff5aa Remove i3i from EC2 recommended EC2 instance types list
There is no reason to prefer i3i over i4i.

Closes scylladb/scylladb#16141
2023-11-23 10:09:34 +02:00
Kefu Chai
0e3f6186cb build: disable enum-constexpr-conversion
Clang-18 starts to complain when a constexp value is casted to a
enum and the value is out of the range of the enum values. in this
case, boost intentially cast the out-of-range values to the
type to be casted. so silence this warning at this moment.
since `lexical_cast.hpp` is included in multiple places in the
source tree, this warning is disabled globally.

the warning look like:

```
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'udt_buil
tin_mixture_enum' [-Wenum-constexpr-conversion]
   73 |     typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior;
      |                               ^
/usr/include/boost/mpl/aux_/static_cast.hpp:24:47: note: expanded from macro 'BOOST_MPL_AUX_STATIC_CAST'
   24 | #   define BOOST_MPL_AUX_STATIC_CAST(T, expr) static_cast<T>(expr)
      |                                               ^
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'int_float_mixture_enum' [-Wenum-constexpr-conversion]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16082
2023-11-23 10:08:56 +02:00
Kefu Chai
d28598763d build: s/-Wignore-qualifiers/-Wignored-qualifiers/
this was a typo introduced by 781b7de5. which intended to add
-Wignored-qualifiers to the compiling options, but it ended up
adding -Wignore-qualifiers.

in this change, the typo is corrected.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16124
2023-11-23 09:47:35 +02:00
Pavel Emelyanov
2f7f4ebb74 raft_state_machine: Check system.topology presense before tying to find it
The write_mutations_to_database() decides if it needs to flush the
database by checking if the mutations came to system.topology table and
performing some more checks if they did. Overall this looks like

    auto topo_schema = db.find_schema(system.topology)
    if (target_schema != topo_schema)
        return false;

    // extra checks go here

However, the system.topology table exists only if the feature named
CONSISTENT_TOPOLOGY_CHANGES is enabled via commandline. If it's not, the
call to db.find_schema(system.topology) throws and the whole attempt to
write mutations throws too stopping raft state machine.

Since the intention is to check if the target schema is the topology
table, the presense of this table should come first.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16089
2023-11-23 09:35:43 +02:00
Takuya ASADA
c9d77699e1 scylla_setup: stop listing virtual devices on the NIC prompt
Currently, the NIC prompt on scylla_setupshows up virtual devices such as
VLAN devices and bridge devices, but perftune.py does not support them.
To prevent causing error while running scylla_setup, we should stop listing
these devices from the NIC prompt.

closes #6757

Closes scylladb/scylladb#15958
2023-11-23 10:27:09 +03:00
Piotr Dulikowski
ab42932ba4 raft: rpc: drop RPCs if the destination is not alive
If the failure detector sees the destination as dead, there is no use to
send the RPC so drop it silently.

This only affects two-way RPCs and "request" one-way RPCs. The one-way
RPCs used as responses to other one-way RPCs are not affected.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
3e32ee2d36 raft: pass raft::failure_detector to raft_rpc
In following commits, raft_rpc will drop outgoing messages if the
destination is not seen as alive by the failure detector.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
a8ee4d543a raft: transfer information about group0 liveness in direct_fd_ping
Add a new variant of the reply to the direct_fd_ping which specifies
whether the local group0 is alive or not, and start actively using it.

There is no need to introduce a cluster feature. Due to how our
serialization framework works, nodes which do not recognize the new
variant will treat it as the existing std::monostate. The std::monostate
means "the node and group0 is alive"; nodes before the changes in this
commit would send a std::monostate anyway, so this is completely
transparent for the old nodes.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
a1ebfcf006 raft: add server::is_alive
Add a method which reports whether given raft server is running.

In following commits, the information about whether the local raft
group 0 is running or not will be included in the response to the
failure detector ping, and the is_alive method will be used there.
2023-11-23 00:34:22 +01:00
Avi Kivity
00d82c0d54 Update tools/java submodule
* tools/java 8485bef333...1048034277 (1):
  > resolver: download sigar artifact only for Linux / AMD64
2023-11-22 18:02:04 +02:00
Kefu Chai
cfcd34ba64 cql3: test_assignment: define formatter for assignment_testable
add fmt formatter for `assignment_testable`.

this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `assignment_testabe` without the help of `operator<<`.

since we are still printing the shared_ptr<assignment_testable> using
operator<<(.., const assignment_testable&), we cannot drop this operator
yet.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16127
2023-11-22 17:44:07 +02:00
Tomasz Grabiec
b06a0078fb Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko
There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed.

The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload.

Mechanism for sending custom_payload added.

Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable.

I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly.

Automatic tests added.

Closes scylladb/scylladb#15410

* github.com:scylladb/scylladb:
  docs: add documentation about sending tablet info to protocol extensions
  Add tests for sending tablet info
  cql3: send tablet if wrong node/shard is used during modification statement
  cql3: send tablet if wrong node/shard is used during select statement
  locator: add function to check locality
  locator: add function to check if host is local
  transport: add function to add tablet info to the result_message
  transport: add support for setting custom payload
2023-11-22 17:44:07 +02:00
Botond Dénes
0ae1335daa Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk"
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.

Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.

Fixes: #16113
2023-11-22 17:44:07 +02:00
Kefu Chai
48340380dd scylla-sstable: print "validate" result in JSON
instead of printing the result of the "validate" subcommand in a
free-style plain text, let's print it using JSON. for two reasons:

1. it is simpler to consume the output with other tools and tests.
2. more consistent with other commands.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16105
2023-11-22 17:44:07 +02:00
Botond Dénes
8c5f5b7722 service/migration_manager: only reload schema when enabling disabled features
Instead of unconditionally reloading schema when enabling any schema
feature, only create a listener, if the feature was disabled in the
first place. So that we don't trigger reloading of the schema on each
schema feature, on node restarts. In this case, the node will start with
all these features enabled already.
This prevents unnecessary work on restarts.

Fixes: #16112

Closes scylladb/scylladb#16118
2023-11-22 17:44:07 +02:00
Kefu Chai
ca1828c718 scylla-sstable: print "validate-checksum" result in JSON
instead of printing the result of the "validate-checksum" subcommand
with the logging message, let's print it using JSON. for three reasons:

1. it is simpler to consume the output with other tools and tests.
2. more consistent with other commands.
3. the logging system is used for audit the behavior and for debugging
   purposes, not for building a user-facing command line interface.
4. the behavior should match with the corresponding document. and
   in docs/operating-scylla/admin-tools/scylla-sstable.sst, we claim
   that `validate-checksums` subcommand prints a dict of

   ```
   $ROOT := { "$sstable_path": Bool, ... }
   ```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16106
2023-11-22 17:44:07 +02:00
Kefu Chai
43fd63e28c clocks-impl: format time_point using fmt
instead of relying on the operator<<() of an opaque type, use fmtlib
to print a timepoint for better support of new fmtlib which dropped
the default-generated formatter for types with operator<<().

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16116
2023-11-22 17:44:07 +02:00
Nadav Har'El
242a4b23c0 Merge 'tests: Skip unnecessary sleeps in cql_test_env teardown' from Tomasz Grabiec
This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests.

Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s.

Closes scylladb/scylladb#16111

* github.com:scylladb/scylladb:
  test: cql_test_env: Interrupt all components on cql_test_env teardown
  tests: cql_test_env: Skip gossip shutdown sleep
2023-11-22 17:44:07 +02:00
Anna Stuchlik
3751acce42 doc: fix rollback in the 5.2-to-5.4 upgrade guide
This commit fixes the rollback procedure in
the 5.2-to-5.4 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've removed the optional step to enable
consistent schema management from the list of
steps - the appropriate section has already
been removed, but it remained in the procedure
description, which was misleading.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4

Closes scylladb/scylladb#16114
2023-11-22 17:44:07 +02:00
Takuya ASADA
b97df92d76 scylla_setup: stop aborting on old kernel warning when non-interactive mode
On non-interactive mode setup, RHEL/CentOS7 old kernel check causes
"Setup aborted", this is not what we want.
We should keep warning but proceed setup, so default value of the kernel
check should be True, since it will automatically applied on
non-interactive mode.

Fixes #16045

Closes scylladb/scylladb#16100
2023-11-22 17:44:07 +02:00
Botond Dénes
b1a76ebb93 Merge 'Sanitize storage service init/deinit sequences' from Pavel Emelyanov
Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.

refs: #2795
refs: #2737

Closes scylladb/scylladb#16103

* github.com:scylladb/scylladb:
  storage_service: Drop (un)init_messaging_service_part() pair
  storage_service: Init/Deinit RPC handlers in constructor/stop
  storage_service: Dont capture container() on RPC handler
  storage_service: Use storage_service::_sys_dist_ks in some places
  storage_service: Add explicit dependency on system dist. keyspace
  storage_service: Rurn query processor pointer into reference
  storage_service: Add explicity query_processor dependency
  main: Start storage service later
2023-11-22 17:44:07 +02:00
sylwiaszunejko
ac51c417ea docs: add documentation about sending tablet info to protocol extensions 2023-11-22 09:23:43 +01:00
sylwiaszunejko
207d673ad6 Add tests for sending tablet info 2023-11-22 09:23:43 +01:00
sylwiaszunejko
cea4c40685 cql3: send tablet if wrong node/shard is used during modification statement 2023-11-22 09:23:43 +01:00
sylwiaszunejko
54f22927a3 cql3: send tablet if wrong node/shard is used during select statement 2023-11-22 09:23:43 +01:00
sylwiaszunejko
954d51389c locator: add function to check locality 2023-11-22 09:23:43 +01:00
Eliran Sinvani
bfa839ce92 commitlog: enforce commitlog size hard limit by default
Since the commitlog size hard limit is a failsafe mechanism,
we don't expect to ever hit it. If we do hit the limit, it means
that we have an exceptional condition in the system. Hence, the
impact of enforcing the commitlog hard limit is irrelevant.
Here we enforce the limit by default.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
Eliran Sinvani
63d62a7db2 commitlog: set flush threshold to half of the limit size
Once we enable commitlog hard limit by default, we would like
to have some room in case flushing memtables takes some time
to catch up. This threshold is half the limit.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
Eliran Sinvani
d2a8651bce commitlog: unfold flush threshold assignment
This commit is only a cosmetic change. It is meant to
make the flush threshold assignment more readable and
comprehensible so future changes are easier to review.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
sylwiaszunejko
a0c8531875 locator: add function to check if host is local 2023-11-21 15:15:20 +01:00
sylwiaszunejko
93420353f4 transport: add function to add tablet info to the result_message 2023-11-21 15:15:20 +01:00
sylwiaszunejko
75b3dbf7ea transport: add support for setting custom payload
A custom payload can now be added to response_message.
If it is set, it will be sent to client and the custom_payload
flag will be set.

write_string_bytes_map method is added to response class
and a missing custom_payload flag is added to
cql_frame_flags.
2023-11-21 15:09:36 +01:00
Pavel Emelyanov
74329e5aee test: Add object_store test to validate config reloading works
The test case is

- start scylla with broken object storage endpoint config
- create and populate s3-backed keyspace
- try flushing it (API call would hang, so do it in the background)
- wait for a few seconds, then fix the config
- wait for the flush to finish and stop scylla
- start scylla again and check that the keyspace is properly populated

Nice side effect of this test is that once flush fails (due to broken
config) it tries to remove the not-yet-sealed sstables and (!) fails
again, for the same reason. So during the restart there happen to be
several sstables in "creating" state with no stored objects, so this
additionally tests one more g.c. corner case

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
26f8202651 test: Add config update facility to test cluster
The Cluster wrapper used by object_store test already has the ability to
access cluster via CQL and via API. Add the sugar to make the cluster
re-read its scylla.yaml and other configs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
4a531e4129 test: Make S3_Server export config file as pathlib.Path
The pylib minio server does that already. A test case added by the next
patch would need to have both cases as path, not as string

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
210b01a5ce config: Make object storage config updateable_value_source
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.

In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
9eb96a03f0 memtable: Extend list of checking codes
When flushing an sstable there can be errors that are not fatal and
shouldn't cause the whole scylla to die. Currently only ENOSPC and
EDQUOT are considered as such, but there's one more possibility --
access denied errors.

Those can happen, for example, if datadir is chmod/chown-ed by mistake
or intentionally while scylla is running (doing it pre-start time won't
trigger the issue as distributed loader checks permissions of datadir on
boot). Another option to step on "access denied" error is to flush
memtable on S3 storage with broken configuration.

Anyway, seeing the access denied error is also a good reason not to
crash, but print a warning in logs and retry in a hope that the node
administrator fixed things.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
a34dae8c37 sstables/storage/s3: Fix missing TOC status check
When TOC file is missing while garbage collecting the S3 server would
resolve with storage_io_error(ENOENT) nowadays

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
855626f7de s3/client: Map http exceptions into storage_io_error
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.

The translation is, for now, pretty simple:

- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Patryk Jędrzejczak
566176bcd1 test: add test_replacing_alive_node_fails
We add a test for the Raft-based topology's new feature - rejecting
the replace operation if the node being replaced is considered
alive by the failure detector.

This test is not so fast, and it does not test any critical paths
so we run it only in dev mode.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
bf7a67224c raft topology: reject replace if the node being replaced is not dead
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
94ffdb4792 raft topology: add the gossiper ref to topology_coordinator
It is used in the following commit.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
8605cdd9cd test: test_cluster_features: stop gracefully before replace
In on of the previous commits, we have made
ManagerClient.server_add wait until all running nodes see the node
being replaced as dead. Unfortunately, the waiting time is around
20 s if we stop the node being replaced ungracefully. We change the
stop procedure to graceful to not slow down the test.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
206a446a02 test: decrease failure_detector_timeout_in_ms in replace tests
In one of the previous commits, we have made
ManagerClient.server_add wait until all running nodes see the node
being replaced as dead. Unfortunately, the waiting time can be
around 20 s if we stop the node being replaced ungracefully. 20 s
is the default value of the failure detector timeout.

We don't want to slow down the replace operations this much for no
good reason. We could use server_stop_gracefully instead of
server_stop everywhere, but we should have at least a few replace
tests with server_stop. For now, test_replace and
test_raft_ignore_nodes will be these tests. To keep them reasonably
fast, we decrease the failure_detector_timeout_in_ms value on all
initial servers.

We also skip test_replace in debug mode to avoid flakiness due to
low failure_detector_timeout_in_ms (test_raft_ignore_nodes is
already skipped).
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
7062ff145e test: move test_replace to topology_custom
In the following commit, we make all servers in test_replace use
failure-detector-timeout-in-ms = 2000. Therefore, we need
test_replace to be in a suite with initial_size equal to 0.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
9775b1c12d test: server_add: wait until the node being replaced is dead
In the following commits, we make the topology coordinator reject
join requests if the node being replaced is considered alive by the
gossiper. Before making this change, we need to adapt the testing
framework so that we don't have flaky replace operations that fail
because the node being replaced hasn't been marked as dead yet. We
achieve this by waiting until all other running nodes see the node
being replaced as dead in all replace operations.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
18ed89f760 test: server_add: add support for expected errors
After this change, if we try to add a server and it fails with an
expected error, the add_server function will not throw. Also, the
server will be correctly installed and stopped.

Two issues are motivating this feature.

The first one is that if we want to add a server while expecting
an error, we have to do it in two steps:
- call server_add with the start parameter set to False,
- call server_start with the expected_error parameter.
It is quite inconvenient.

The second one is that we want to be able to test the replace
operation when it is considered incorrect, for example when we try
to replace an alive node. To do this, we would have to remove
some assertions from ScyllaCluster.add_server. However, we should
not remove them because they give us clear information when we
write an incorrect test. After adding the expected_error parameter,
we can ignore these assertions only when we expect an error. In
this way, we enable testing failing replace operations without
sacrificing the testing framework's protection.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
ee45a1c430 raft topology: join: delay advertising replacing node if it reuses IP
After this change, other nodes can see the replacing node as alive
only after the topology coordinator accepts its join request.

In the following commits, we make the topology coordinator reject
join requests if the node being replaced is considered alive by the
gossiper. However, the replacing node may become alive before the
topology coordinator does the validation. If the replacing node
reuses the IP of the node being replaced, the topology coordinator
cannot know which of these two nodes is alive and whether it should
reject the join request.

The gossiper-based topology also delays the replacing node from
advertising itself if it reuses the IP. To achieve the same effect
in raft-based topology, we only need to move the definition of
replacing_a_node_with_same_ip. However, there is a code that puts
bootstrap tokens of the node being replaced into the gossiper
state, and it depends on replacing_a_node_with_same_ip and
replacing_a_node_with_diff_ip being always false in the raft-based
topology mode. We prevent it from breaking by changing the
condition.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
c0e4b8e9c0 raft topology: join: fix a condition in validate_joining_node
It was incorrect. node.rs->state evaluated to node_state::none
for both join and replace.
2023-11-21 12:39:13 +01:00
Tomasz Grabiec
93ee7b7df9 test: cql_test_env: Interrupt all components on cql_test_env teardown
This should interrupt all sleeps in component teardown.

Before this patch, there was a 1s sleep on gossiper shutdown, which I
don't know where it comes from. After the patch there is no such
sleep.
2023-11-21 12:22:32 +01:00
Tomasz Grabiec
7f3a74efab tests: cql_test_env: Skip gossip shutdown sleep
Removes unnecessary 2s sleep on each cql test env teardown.
2023-11-21 12:22:24 +01:00
Pavel Emelyanov
0e9428ab4a exceptions: Extend storage_io_error construction options
To make it possible to construct it with plain errno value and a string

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 13:37:52 +03:00
Calle Wilund
33fba28265 commitlog_test: Add test for replaying large-ish mutation
(i.e. cross several normal-sized buffers).
2023-11-21 08:50:57 +00:00
Calle Wilund
0d41769daa commitlog_test: Add additional test for segmnent truncation
Emulate replay of a non-sealed segment, verifying we don't get
data beyond termination point, as well as the correct exception.
2023-11-21 08:50:57 +00:00
Calle Wilund
57a4645c81 docs: Add docs on commitlog format 3 2023-11-21 08:50:57 +00:00
Calle Wilund
6b66daabfc commitlog: Remove entry CRC from file format
Since CRC is already handled by disk blocks, we can remove some of the
entry CRC:ing, both simplifying code and making at least that part of
both write and read faster.
2023-11-21 08:50:57 +00:00
Calle Wilund
e29bf6f9e8 commitlog: Implement new format using CRC:ed sectors
Breaks the file into individually tagged + crc:ed pages.
Each page (sized as disk write alignment) gets a trailing
12-byte metadata, including CRC of the first page-12 bytes,
and the ID of the segment being written.

When reading, each page read is CRC:ed and checked to be part
of the expected segment by comparing ID:s. If crc is broken,
we have broken data. If crc is ok, but ID does not match, we
have a prematurely terminated segment (truncated), which, depending
on whether we use batch mode or not, implied data loss.
2023-11-21 08:50:54 +00:00
Calle Wilund
18e79d730e commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
With somewhat less overhead than creating 100+ temporary_buffer proxies
2023-11-21 08:42:33 +00:00
Calle Wilund
560364d278 fragmented_temporary_buffer: Add const iterator access to underlying buffers
Breaks abstraction a bit, but some (me) might need something like it...
2023-11-21 08:42:33 +00:00
Calle Wilund
862f4f2ed3 commitlog_replayer: differentiate between truncated file and corrupt entries
Refs #11845

When replaying, differentiate between the two cases for failure we have:
 - A broken actual entry - i.e. entry header/data does not hold up to
   crc scrutiny
 - Truncated file - i.e. a chunk header is broken or unreadable. This can
   be due to either "corruption" (i.e. borked write, post-corruption, hw
   whatever), or simply an unterminated segment.

The difference is that the former is recoverable, the latter is not.
We now signal and report the two separately. The end result for a user
is not much different, in either case they imply data loss and the
need for repair. But there is some value in differentiating which
of the two we encountered.

Modifies and adds test cases.
2023-11-21 08:42:33 +00:00
Botond Dénes
65e42e4166 Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, will not be emitted from mutation_query.

This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.

In particular, range deletes performed while a replica is down will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.

As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.

This series fixes the bug and adds a minimal reproducer test.

Fixes #10598

Closes scylladb/scylladb#16003

* github.com:scylladb/scylladb:
  mutation_query_test: test that range tombstones are sent in reverse queries
  mutation_query: properly send range tombstones in reverse queries
2023-11-21 09:19:14 +02:00
Kefu Chai
691f7f6edb util: do not use variable length array
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.

so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16084
2023-11-20 23:02:41 +02:00
Nadav Har'El
0fd10690d4 Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov
Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that.

fixes: #15074

Closes scylladb/scylladb#16038

* github.com:scylladb/scylladb:
  test: Add validation of misconfigured storage creation
  sstables: Throw early if endpoint for keyspace is not configured
  replica: Move storage options validation to sstables manager
  test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
  sstables: Add has_endpoint_client() helper to manager
2023-11-20 21:12:48 +02:00
Kefu Chai
9a3c7cd768 build: cmake: drop Seastar_OptimizationLevel_*
in this change,

* all `Seastar_OptimizationLevel_*` are dropped.
* mode.Sanitize.cmake:
    s/CMAKE_CXX_FLAGS_COVERAGE/CMAKE_CXX_FLAGS_SANITIZE/
* mode.Dev.cmake:
    s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/

Seastar_OptimizationLevel_* variables have nothing to do with
Seastar, and they introduce unnecessary indirection. the function
of `update_cxx_flags()` already requires an option name for this
parameter, so there is no need to have a name for it.

the cached entry of `Seastar_OptimizationLevel_DEBUG` is also
dropped, if we really need to have knobs which can be configured
by user, we should define them in a more formal way. at this
moment, this is not necessary. so drop it along with this
variable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16059
2023-11-20 19:26:54 +02:00
Botond Dénes
6e9850067b Merge 'Make test-only write_memtable_to_sstable() overloads shorter' from Pavel Emelyanov
There are three of them, one is used by core, another by tests and the third one passes arguments between those two. And the ..._for_tests() helper in test utils. This PR leaves only one for tests out of three.

Closes scylladb/scylladb#16068

* github.com:scylladb/scylladb:
  tests: Shorten the write_memtable_to_sstable_for_test()
  replica: Squash two write_memtable_to_sstable()
  replica: Coroutinize one of write_memtable_to_sstable() overloads
2023-11-20 16:05:06 +02:00
Pavel Emelyanov
9b16c298e9 test: Add validation of misconfigured storage creation
In an attempt to create a non-local keyspace with unknown endpoint,
there should pop up the configuration exception.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:25:58 +03:00
Pavel Emelyanov
2bf1e2a294 sstables: Throw early if endpoint for keyspace is not configured
When a keyspace is created it initiaizes the storage for it and
initialization of S3 storage is the good place to check if the endpoint
for the storage is configured at all.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:25:58 +03:00
Pavel Emelyanov
f2a99ad30a replica: Move storage options validation to sstables manager
Currently the cql statement .validate() callback is responsible for
checking if the non-local storage options are allowed with the
respective feature. Next patch will need to extend this check to also
validate the details of the provided storage options, but doing it at
cql level doesn't seem correct -- it's "too far" from query processor
down to sstables manager.

Good news is that there's a lower-level validation of the new keyspace,
namely the database::validate_new_keyspace() call. Move the storage
options validation into sstables manager, while at it, reimplement it
as a visitor to facilitate further extentions and plug the new
validation to the aforementioned database::validate_new_keyspace().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:24:59 +03:00
Botond Dénes
f53961248d gms,service: add a feature to protect the usage of allow_mutation_read_page_without_live_row
allow_mutation_read_page_without_live_row is a new option in the
partition_slice::option option set. In a mixed clusters, old nodes
possibly don't know this new option, so its usage must be protected by a
cluster feature. This patch does just that.

Fixes: #15795

Closes scylladb/scylladb#15890
2023-11-20 13:03:55 +01:00
Botond Dénes
935065fd8d Update tools/java submodule
* tools/java b776096d...8485bef3 (2):
  > dist: Require jre-11-headless in from rpm
  > dist: remove duplicated java-headless from "Requires"
2023-11-20 13:55:55 +02:00
Pavel Emelyanov
b31b51ae90 test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
We're going to ban creation of a keyspace with S3 type in case the
requested endpoint is not configured. The problem is that this test case
of cql-pytest needs such keyspace to be created and in order to provide
the object storage configuration we'd need to touch the generic scylla
cluster management which is an overill for generic cql-pytest case.

Simpler solution is to make object_store test suite perform all the
S3-related checks, including the way DESCRIBE for S3-backed ks works.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:31:08 +03:00
Pavel Emelyanov
2c31cd7817 sstables: Add has_endpoint_client() helper to manager
It's the get_endpoint_client() peer that only checks the client
presense. To be used by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:31:08 +03:00
Pavel Emelyanov
8ae751a3ff tests: Shorten the write_memtable_to_sstable_for_test()
The wrapper just calls the test-only core write_memtable_to_sstable()
overload, tests can do it on their own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
1d7d2dff50 replica: Squash two write_memtable_to_sstable()
There are three of them and one acts purely as arguments passer between
other two.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
e9826858a9 replica: Coroutinize one of write_memtable_to_sstable() overloads
Simpler to read and patch further this way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
f4626f6b8e storage_service: Drop (un)init_messaging_service_part() pair
It's no longer needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:59:08 +03:00
Pavel Emelyanov
c42c13e658 storage_service: Init/Deinit RPC handlers in constructor/stop
All the services that need to register RPC handlers do it in service
constructor or .start() method. Unregistration happens in .stop().
Storage service explicitly (de)initializes its RPC handlers in dedicated
calls, but there's no point in that. The handlers' accessibility is
determined by messaging service start_lister/shutdown, handlers
themselves can be registered any time before it and unregistered any
time after it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:57:07 +03:00
Pavel Emelyanov
40cb9dd66f storage_service: Dont capture container() on RPC handler
The handlers are about to be initialized from inside storage_service
constructor. At that time container() is not yet available and its
invalid to capture it on handlers' lambda. Fortunately, there's only one
handler that does it, other handlers capture 'this' and call container()
explicitly. This patch fixes the remaining one to do the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:55:56 +03:00
Pavel Emelyanov
cc76f03f63 storage_service: Use storage_service::_sys_dist_ks in some places
The main goal here is to drop sys.dist.ks argument from the
init_messaging_service call to make future patching simpler. While doing
it it turned out that the argument was needed to be passed all the way
down to the mark_existing_views_as_built(), so this patch also dropes
this argument from this whole call-trace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:53:55 +03:00
Pavel Emelyanov
4df5af931a storage_service: Add explicit dependency on system dist. keyspace
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:42 +03:00
Pavel Emelyanov
a7f23930cb storage_service: Rurn query processor pointer into reference
It's non-nullptr all the time after previous patch and can be a
reference instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:04 +03:00
Pavel Emelyanov
e59544674a storage_service: Add explicity query_processor dependency
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:51:09 +03:00
Pavel Emelyanov
6ee8e7a031 main: Start storage service later
The storage service is top-level service which depends on many other
services. Recently (see d42685d0cb storage_service: Load tablet
metadata on boot and from group0 changes) it also got implicit
dependency on query processor, but it still starts too early for
explicit reference on q.p.

This patch moves storage service start to later times. This is possible
because storage service is not explicitly needed by any other component
start/init in between its old and new start places. Also, cql_test_ent
starts storage service "that late" too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:48:30 +03:00
Nadav Har'El
5752dc875b Merge 'Materialize_views: don't construct global_schema_ptr from views schemas that lacks base information' from Eliran Sinvani
This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas.
One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a
"hickup" in the materialized views updates if they are comming from a node with a different mv schema.
The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context.
When a schema is synced the problem becomes easy since we can just use the latest base information from the database.

Fixes #14011

Closes scylladb/scylladb#14861

* github.com:scylladb/scylladb:
  migration manager: fix incomplete mv schemas returned from get_schema_for_write
  migration_manager: do not globalize potentially incomplete schema
2023-11-20 11:54:01 +02:00
Pavel Emelyanov
3471f30b58 view_update_generator: Unplug from database later
Patch 967ebacaa4 (view_update_generator: Move abort kicking to
do_abort()) moved unplugging v.u.g from database from .stop() to
.do_abort(). The latter call happens very early on stop -- once scylla
receives SIGINT. However, database may still need v.u.g. plugged to
flush views.

This patch moves unplug to later, namely to .stop() method of v.u.g.
which happens after database is drained and should no longer continue
view updates.

fixes: #16001

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16091
2023-11-20 11:47:55 +02:00
Botond Dénes
fd11eeeaa3 Merge 'dist/redhat: drop unnecessary variables and tags' from Kefu Chai
this is a cleanup in `scylla.spec`.

Closes scylladb/scylladb#16097

* github.com:scylladb/scylladb:
  dist/redhat: group sub-package preambles together
  dist/redhat: drop unused `defines` variable
  dist/redhat: remove tags for subpackage which are same as main preamble
2023-11-20 11:46:56 +02:00
Asias He
c605220bb3 repair: Introduce small table optimization
*) Problem:

We have seen in the field it takes longer than expected to repair system tables
like system_auth which has a tiny amount of data but is replicated to all nodes
in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The
main reason for the slowness is that even if the amount of data is small,
repair has to walk though all the token ranges, that is num_tokens *
number_of_nodes_in_the_cluster. The overhead of the repair protocol for each
token range dominates due to the small amount of data per token range. Another
reason is the high network latency between DCs makes the RPC calls used to
repair consume more time.

*) Solution:

To solve this problem, a small table optimization for repair is introduced in
this patch. A new repair option is added to turn on this optimization.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

*) Performance:

1)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Before:
```
repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1537, round_nr=4612,
round_nr_fast_path_already_synced=4611,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds,
tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2,
0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848},
{127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0},
{127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s,
tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={}
```

After:
```
repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds,
tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178},
{127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1},
{127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286},
{127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286},
{127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026},
{127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={},
rx_row_nr_peer={}
```

The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X

2)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Same test as above except 5ms delay is added to simulate multiple dc
network latency:

The time to repair is reduced from 333s to 0.2s.

333.26758 s / 0.22625381s = 1472.98

3)

3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3}
, 10 ms network latency

Before:

```
repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=2305, round_nr=6916,
round_nr_fast_path_already_synced=6915,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015
seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0},
{127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1},
{127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}},
row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2,
1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385},
{127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0},
{127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0},
{127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={}
```

After:

```
repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=1, round_nr=4,
round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915
seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178},
{127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}},
row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}},
row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2,
0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793},
{127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7,
0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634},
{127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4,
2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634},
{127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9,
2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
```

The time to repair is reduced from 986s (16 minutes) to 1.6s

*) Summary

So, a more than 1000X difference is observed for this common usage of
system table repair procedure.

Fixes #16011
Refs  #15159
2023-11-20 15:11:16 +08:00
Kefu Chai
71f352896d dist/redhat: group sub-package preambles together
group sections like `%build` and `%install` together, to improve
the readability of the spec recipe.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Kefu Chai
3f108629b9 dist/redhat: drop unused defines variable
this variable was introduced in 6d7d0231. back then, we were still
building the binaries in .spec, but we've switched to the relocatable
package now, so there is no need to use keep these compilation related
flags anymore.

in this change, the `defines` variable is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Kefu Chai
d69b4838ea dist/redhat: remove tags for subpackage which are same as main preamble
this is a cleanup.

if a subpackage is licensed under a different license from the one
specified in the main preamble, we need to use a distinct License
tag on a per-subpackage basis. but if it is licensed with the
identical license, it is not necessary. since all three
subpackages of "*-{server, conf, kernel-conf}" are licensed under
AGPLv3, there is no need to repeat the "License:" tag in their
own preamble section.

the same applies to the "URL" tag.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Eliran Sinvani
63631257db migration manager: fix incomplete mv schemas returned from
get_schema_for_write

Sometimes a view registry can get deactivated inside the schema
registry, this happens due to dactivating and reactivating the registry
entry which doesn't rebuild the base table information in the view.
This error is later caught when trying to convert the schema into a
`global_schema_ptr`, however, the real bug here is that not all schemas
returned from `get_schema_for_write` are suitable for write because the
mv schemas can be incomplete.
This commit changes the aforementioned function in order to fix the bug.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-20 06:07:20 +02:00
Piotr Grabowski
321459ec51 install-dependencies.sh: update node_exporter to 1.7.0
Update node_exporter to 1.7.0.

The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.

[Botond: regenerate frozen toolchain]

Fixes #16085

Closes scylladb/scylladb#16086

Closes scylladb/scylladb#16090
2023-11-19 18:15:44 +02:00
Calle Wilund
6ffb482bf3 Commitlog replayer: Range-check skip call
Fixes #15269

If segment being replayed is corrupted/truncated we can attempt skipping
completely bogues byte amounts, which can cause assert (i.e. crash) in
file_data_source_impl. This is not a crash-level error, so ensure we
range check the distance in the reader.

v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will
    ensure we log the fact that things are broken

Closes scylladb/scylladb#15270
2023-11-19 17:44:55 +02:00
Gleb Natapov
6edbf4b663 storage_service: topology coordinator: put fence version into the raft state
Currently when the coordinator decides to move the fence it issues an
RPC to each node and each node locally advances fence version. This is
fine if there are no failures or failures are handled by retrying
fencing, but if we want to allow topology changes to progress even in
the presence of barrier failures it is easier to store the fence version
in the raft state. The nodes that missed fence rpc may easily catch up
to the latest fence version by simply executing a raft barrier.
2023-11-19 15:28:08 +02:00
Eliran Sinvani
562403b82f migration_manager: do not globalize potentially incomplete schema
There was a case where maybe sync function of a materialized view could
fail to sync if the view version was old. This is because adding the
base information to the view is only relevant until the record is
synced. This triggers an internal error in the `global_schem_ptr`
constructor.
The conversion to global pointer in that case was solely for logging
purposes so instead, we pass the pieces of information needed for the
logging itself.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-19 14:13:01 +02:00
Botond Dénes
eb674128ca Merge 'treewide: do not mark return value const if this has no effect ' from Kefu Chai
this change is a cleanup to add `-Wignore-qualifiers` when building the tree.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Closes scylladb/scylladb#16083

* github.com:scylladb/scylladb:
  build: enable -Wignore-qualifiers
  treewide: do not mark return value const if this has no effect
2023-11-17 15:35:20 +02:00
Kefu Chai
781b7de502 build: enable -Wignore-qualifiers
`-Wignore-qualifiers` is included by -Wextra. but we are not there yet,
with this change, we can keep the changes introducing -Wignore-qualifiers
warnings out of the repo, before applying `-Wextra`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:49:47 +08:00
Kefu Chai
15bfa09454 treewide: do not mark return value const if this has no effect
this change is a cleanup.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:46:19 +08:00
Tomasz Grabiec
6bcf3ac86c Merge 'Fix a few rare bugs in row cache' from Michał Chojnowski
This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details.

Fixes #15483

Closes scylladb/scylladb#15945

* github.com:scylladb/scylladb:
  partition_version: fix violation of "older versions are evicted first" during schema upgrades
  cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()
  cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity()
  cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads
  partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads
  cache_flat_mutation_reader: fix some cache mispopulations with reverse reads
  cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads
  cache_flat_mutation_reader: never make an unlinked last dummy continuous
2023-11-16 23:48:17 +01:00
Michał Chojnowski
9ccd4ea416 partition_version: fix violation of "older versions are evicted first" during schema upgrades
A schema upgrade appends a MVCC version B after an existing version A.

The last dummy in B is added to the front of LRU,
so it will be evicted after the entries in A.

This alone doesn't quite violate the "older versions are evicted first" rule,
because the new last dummy carries no information. But apply_monotonically
generally assumes that entries on the same position have the obvious
eviction order, even if they carry no information. Thus, after the merge,
the rule can become broken.

The proposed fix is as follows:

- In the case where A is merged into B, the merged last dummy
  inherits the link of A.
- The merging of B into anything is prevented until its merge with A is finished.

This is relatively hacky, because it still involves a state that
goes against some natural expectations granted by the "older versions..."
rule. A less hacky fix would be to ensure that the new dummy is inserted
into a proper place in the eviction order to begin with.

Or, better yet, we could eliminate the rule altogether.
Aside from being very hard to maintain, it also prevents the introduction
of any eviction algorithm other than LRU.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
2aac8690c7 cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()
ensure_population_lower_bound() guarantees that _last_row is valid or null.

However, it fails to provide this guarantee in the special rare case when
`_population_range_starts_before_all_rows == true` and _last_row is non-null.

(This can happen in practice if there is a dummy at before_all_clustering_rows
and eviction makes the `(before_all_clustering_rows, ...)` interval
discontinous. When the interval is read in this state, _last_row will point to
the dummy, while _population_range_starts_before_all_rows will still be true.)

In this special case, `ensure_population_lower_bound()` does not refresh
`_last_row`, so it can be non-null but invalid after the call.
If it is accessed in this state, undefined behaviour occurs.
This was observed to happen in a test,
in the `read_from_underlying() -- maybe_drop_last_entry()` codepath.

The proposed fix is to make the meaning of _population_range_starts_before_all_rows
closer to its real intention. Namely: it's supposed to handle the special case of a
left-open interval, not the case of an interval starting at -inf.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
0dcf91491e cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity()
To reflect the final range tombstone change in the populated range,
maybe_update_continuity() might insert a dummy at `before_key(_next_row.table_position())`.

But the relevant logic breaks down in the special case when that position is
equal to `_last_row.position()`. The code treats the dummy as a part of
the (_last_row, _next_row) range, but this is wrong in the special case.

This can lead to inconsistent state. For example, `_last_row` can be wrongly made
continuous, or its range tombstone can be wrongly nulled.

The proposed fix is to only modify the dummy if it was actually inserted.
If it had been inserted beforehand (which is true in the special case, because
of the `ensure_population_lower_bound()` call earlier), then it's already in a
valid state and doesn't need changes.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
6601c778dd cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads
Cache population routines insert new row entries.

In non-reverse reads, the new entries (except for the lower bound of the query
range) are filled with the correct continuity and range tombstones immediately
after insertion, because that information has already arrived from underlying.
when the entries are inserted.

But in reverse reads, it's the interval *after* the newly-inserted entry
that's made continuous. The continuity information in the new entries isn't
filled. When two population routines race, the one which comes later can
punch holes in the continuity left by the first routine, which can break
the "older versions are evicted first" rule and revert the affected
interval to an older version.

To fix this, we must make sure that inserting new row entries doesn't
change the total continuity of the version.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
47299d6b06 partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads
The FIXME comment claims that setting continity isn't very important in this
place, but in fact this is just wrong.

If two calls to read_from_underlying() get into a race, the one which finishes
later can call ensure_entry_in_latest() on a position which lies inside a
continuous interval in the newest version. If we don't take care to preserve
the total continuity of the version, this can punch a hole in the continuity of the
newest version, potentially reverting the affected interval to an older version.

Fix that.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
b5988fb389 cache_flat_mutation_reader: fix some cache mispopulations with reverse reads
`_last_row` is in table schema, but it is sometimes compared with positions in
query schema. This leads to unexpected behaviour when reverse reads
are used.
The previous patch fixed one such case, which was affecting correctness.

As far as I can tell, the three cases affected by this patch aren't
a correctness problem, but can cause some intervals to fail to be made continuous.
(And they won't be cached even if the same read is repeated many times).
2023-11-16 19:01:18 +01:00
Michał Chojnowski
f9eb64b8e0 cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads
`_last_row` is in table schema, while `cur.position()` is in query schema
(which is either equal to table schema, or its reverse).

Thus, the comparison affected by this patch doesn't work as intended.
In reverse reads, the check will pass even if `_last_row` has the same key,
but opposite bound weight to `cur`, which will lead to the dummy being inserted
at the wrong position, which can e.g. wrongly extend a range tombstone.

Fix that.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
ec364c3580 cache_flat_mutation_reader: never make an unlinked last dummy continuous
It is illegal for an unlinked last dummy to be continuous,
(this is how last dummies respect the "older verions are evicted first" rule),
but it is technically possible for an unlinked last dummy to be
made continuous by read_from_underlying. This commit fixes that.

Found by row_cache_test.

The bug is very unlikely to happen in practice because the relevant
rows_entry is bumped in LRU before read_from_underlying starts.
For the bug to manifest, the entry has to fall down to the end of the
LRU list and be evicted before read_from_underlying() ends.
Usually it takes several minutes for an entry to fall out of LRU,
and read_from_underlying takes maybe a few hundred milliseconds.

And even if the above happened, there still needs to appear a new
version, which needs to have its continuous last dummy evicted
before it's merged.
2023-11-16 19:01:18 +01:00
Anna Stuchlik
ca22de4843 doc: mark the link to upgrade guide as OSS-only
This commit adds the .. only:: opensource directive
to the Raft page to exclude the link to the 5.2-to-5.4
upgrade guide from the Enterprise documentation.

The Raft page belongs to both OSS and Enterprise
documentation sets, while the upgrade guide
is OSS-only. This causes documentation build
issues in the Enterprise repository, for example,
https://github.com/scylladb/scylla-enterprise/pull/3242.

As a rule, all OSS-only links should be provided
by using the .. only:: opensource directive.

This commit must be backported to branch-5.4
to prevent errors in the documentation for
ScyllaDB Enterprise 2024.1

(backport)

Closes scylladb/scylladb#16064
2023-11-16 10:36:27 +02:00
Kefu Chai
687ba9cacc test/sstable_compaction_test: check every sstable replaced sstable
before this change, in sstable_run_based_compaction_test, we check
every 4 sstables, to verify that we close the sstable to be replaced
in a batch of 4.

since the integer-based generation identifier is monotonically
incremental, we can assume that the identifiers of sstables are like
0, 1, 2, 3, .... so if the compaction consumes sstable in a
batch of 4, the identifier of the first one in the batch should
always be the multiple of 4. unfortunately, this test does not work
if we use uuid-based identifier.

but if we take a closer look at how we create the dataset, we can
have following facts:

1. the `compaction_descriptor` returned by
   `sstable_run_based_compaction_strategy_for_tests` never
   set `owned_ranges` in the returned descriptor
2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no`
   is used, if `_owned_ranges_checker` is empty
3. `mutation_reader_merger` respects the `fwd_mr` passed to its
   ctor, so it closes current sstable immediately when the underlying
   mutation reader reaches the end of stream.

in other words, we close every sstable once it is fully consumed in
sstable_ompaction_test. and the reason why the existing test passes
is that we just sample the sstables whose generation id is a multiple
of 4. what happens when we perform compaction in this test is:

1. replace 5 with 33, closing 5
2. replace 6 with 34, closing 6
3. replace 7 with 35, closing 7
4. replace 8 with 36, closing 8   << let's check here.. good, go on!
5. replace 13 with 37, closing 13
...
8. replace 16 with 40, closing 16 << let's check here.. also, good, go on!

so, in this change, we just check all old sstables, to verify that
we close each of them once it is fully consumed.

Fixes #16073
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-16 16:21:46 +08:00
Kefu Chai
18792fe059 test/sstable_compaction_test: s/old_sstables.front()/old_sstable/
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-16 16:21:40 +08:00
Botond Dénes
323e34e1ed Update tools/java submodule
* tools/java 97c49094...b776096d (2):
  > build: take care of old libthrift [PART 2/2]
  > build: take care of old libthrift [PART 1/2]
2023-11-16 10:14:38 +02:00
Kefu Chai
12f4f9f481 build: cmake: link against cryptopp::cryptopp
instead of linking against cryptopp, we should link against
crytopp::crytopp. the latter is the target exposed by
Findcryptopp.cmake, while the former is but a library name which
is not even exposed by any find_package() call.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16060
2023-11-15 17:14:04 +02:00
Anna Stuchlik
e8129d9a0c doc: remove DateTieredCompactionStrategy
This commit removes support for DateTieredCompactionStrategy
from the documentation.

Support for DTCS was removed in 5.4, so this commit
must be backported to branch-5.4.

Refs https://github.com/scylladb/scylladb/issues/15869#issuecomment-1784181274

The information is already added to the 5.2-to-5.4
upgrade guide: https://github.com/scylladb/scylladb/pull/15988

(backport)

Closes scylladb/scylladb#16061
2023-11-15 15:39:57 +02:00
Pavel Emelyanov
f4fd5c7207 s3/client: Tag pieces of jumbo uploader
The jumbo sink is there to upload files that can be potentially larger
than 50Gb (10000*5Mb). For that the sink uploads a set of so called
"pieces" -- files up to 50Gb each -- then uses the copy-upload APi call
to squash the pieces together. After copying the piece is removed. In
case of a crash while uploading pieces remain in the bucket forever
which is not great.

This patch tags pieces with 'kind=piece' tag in order to tell pieces
from regular objects. This can be used, for example, by setting up the
lifecycle tag-based policy and collect dangling pieces eventually.

fixes: #13670

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16023
2023-11-15 15:32:30 +02:00
Kefu Chai
6a753f9f06 build: cmake: define SCYLLA_BUILD_MODE=dev for Dev mode
it was a typo in b234c839. so let's correct it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16063
2023-11-15 13:17:30 +02:00
Kefu Chai
972b852e0a build: cmake: explain the build dependencies in check-headers
developer might notice that when he/she builds 'check-headers',
the whole tree is built. so let's explain this behavior.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16062
2023-11-15 13:16:01 +02:00
Botond Dénes
ba17ae2ab6 Merge 'Fix tests in test/cql-pytest/ that fail on Cassandra' from Nadav Har'El
As a general rule, tests in test/cql-pytest shouldn't just pass on Scylla - they also should not fail on Cassandra; A test that fails on Cassandra may indicate that the test is wrong, or that Scylla's behavior is wrong and the test just enshrines that wrong behavior. Each time we see a test fail on Cassandra we need to check if this is not the case. We also have special markers scylla_only and cassandra_bug to put on tests that we know _should_ fail on Cassandra because it is missing some Scylla-only feature or there is a bug in Cassandra, respectively. Such tests will be xfailed/skipped when running on Cassandra, and not report failures.

Unfortunately, over time more several tests got into our suite in that did not pass on Cassandra. In this series I went over all of them, and fixed each to pass - or be skipped - on Cassandra, in a way that each patch explains.

Fixes #16027

Closes scylladb/scylladb#16033

* github.com:scylladb/scylladb:
  test/cql-pytest: fix test_describe.py to not fail on Cassandra
  test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra
  test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra
  test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra
  test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
  test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
  test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
  test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
  test/cql-pytest: fix test_filtering.py to not fail on Cassandra
2023-11-15 09:13:09 +02:00
Nadav Har'El
8964cce04c test/cql-pytest: fix test_describe.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

Some of the tests checked on Cassandra things that don't exist there
(namely local secondary indexes) and could skip that part. Other tests
need to be skipped completely ("scylla_only") because they rely on a
Scylla-only feature. We have a bit too many of those in this file, but
I don't want to fix this now.

Yet another test found a real bug in Cassandra 4.1.1 (CASSANDRA-17918)
but passes in Cassandra 4.1.2 and up, so there's nothing to fix except
a comment about the situation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:40:30 +02:00
Nadav Har'El
6802dca6b5 test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra
In commit 52bbc1065c, we started to allow "IN NULL" - it started to
match nothing instead of being an error as it is in Cassandra. The
commit *incorrectly* "fixed" the existing translated Cassandra unit test
to match the new behavior - but after this "fix" the test started to
fail on Cassandra.

The appropriate fix is just to comment out this part of the test and
not do it. It's a small point where we deliberately decided to deviate
from Cassandra's behavior, so the test it had for this behavior is
irrelevant.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
d8997d49e7 test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra
Some error-message checks in this test file (which was translated in
the past from Cassandra) try operations which actually has two errors,
and expected to see one error message - but recent Cassandra prints
the other one. This caused several tests to fail when running on
Cassandra 4.1. Both messages are fine, so let's accept both.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
a7f5eb3621 test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra
Fixed two tests thich failed when running on Cassandra:

One test waited for a secondary index to appear, but in Cassandra, the
index can be broken (cause a read failure) for a short while and we
need to wait through this failure as well and not fail the entire test.

Another test was for local secondary index, which is a Scylla-only
feature, but we forgot the "scylla_only" tag.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
92f591dc38 test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
The test function test_mv_synchronous_updates checks the
synchronous_updates feature, which is a ScyllaDB extension and
doesn't exist in Cassandra. So it should be marked with "scylla_only"
so that it doesn't fail when running the tests on Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
301189ee28 test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

When testing some invalid cases of ALTER TABLE, the test required
that you cannot choose SimpleStrategy without specifying a
replication_factor. As explained in Refs #16028, this isn't true
in Cassandra 4.1 and up - it now has a default value for
replication_factor and it's no longer required.

So in this patch we split that part of the test to a separate test
function and mark it scylla_only.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
2b67cd3921 test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
The tests in test/cql-pytest/test_guardrail_replication_strategy.py
are for a Scylla-only feature that doesn't exist in Cassandra, so
obviously they all fail on Cassandra. Let's mark them all as
scylla_only.

We use an autouse fixture to automatically mark all tests in this file
as scylla-only, instead of marking each one separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
c4d3e08987 test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

This patch is only a partial fix - it fixes trivial differences in error
messages, but some potentially-real differences remain so three of the
tests still fail:

1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB
   ("must be between 0.0 and 1.0") but allowed in Cassandra.

2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the
   wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should
   have been fine?!) but allowed in Cassandra.

3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB
   ("invalid timestamp resolution SECONDS") but allowed in Cassandra.
   I don't think anybody wants to actually use "SECONDS", but it seems
   legal in Cassandra, so do we need to support it?

The patch also simplifies the test to use cql-pytest's util.py, instead
of cassandra_tests/porting.py. The latter was meant to make porting
existing Cassandra tests easier - not for writing new ones - and made
using a regular expression for testing error messages harder so I
switched to using pytest.raises() whose "match=" accepts a regular
expression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
8e51ebd8a0 test/cql-pytest: fix test_filtering.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

It turns out that when the token() function is used with incorrect
parameters (it needs to be passed all partition-key columns), the
error message is different in ScyllaDB and Cassandra. Both are
reasonable error messages, so if we insist on checking the error
message - we should allow both.

Also the same test called its second partition-key column "ck". This
is confusing, because we usually use the name "ck" to refer to a clustering
key. So just for clarity, we change this name to "pk2". This is not a
functional change in the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
64d1d5cf62 Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho
TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows.

Turns out we had two problems in this area that leads to suboptimal bloom filters.

1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed.
2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count.

For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong.

Fixes https://github.com/scylladb/scylladb/issues/15704.

Closes scylladb/scylladb#15938

* github.com:scylladb/scylladb:
  streaming: Improve partition estimation with TWCS
  streaming: Don't adjust partition estimate if segregation is postponed
2023-11-14 20:41:36 +02:00
Kefu Chai
d49ea833fd scylla-sstable: reject duplicate sstable names
before this change, `load_sstables()` fills the output sstables vector
by indexing it with the sstable's path. but if there are duplicated
items in the given sstable_names, the returned vector would have uninitialized
shared_sstable instance(s) in it. if we feed such a sstables to the
operation funcs, they would segfault when derferencing the empty
lw_shared_ptr.

in this change, we error out if duplicated sstable names are specified
in the command line.

an alternative is to tolerate this usage by initializing the sstables
vector with a back_inserter, as we always return a dictionary with the
sstable's name as the key, but it might be desirable from user's
perspective to preserve the order, like OrderedDict in Python. so
let's preserve the ordering of the sstables in the command line.

this should address the problem of the segfault if we pass duplicated
sstable paths to this tool.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16048
2023-11-14 19:37:14 +02:00
Botond Dénes
11cafd2fc8 Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#16050

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-14 14:55:17 +02:00
Kefu Chai
2bae14f743 dist: let scylla-server.service Wants var-lib-systemd-coredump
without adding `WantedBy=scylla-server.service` in
var-lib-systemd-coredump, if we starts `scylla-server.service`,
it does not necessarily starts `var-lib-systemd-coredump`
even if the latter is installed.

with `WantedBy=scylla-server.service` in var-lib-systemd-coredump,
if we starts `scylla-server.service`, var-lib-systemd-coredump
will be started also. and `Before=scylla-server.service` ensures
that, before `scylla-server.service` is started,
var-lib-systemd-coredump is already ready.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15984
2023-11-14 14:54:39 +02:00
Michał Jadwiszczak
0083ddd7a0 generic_server: use mutable reference in for_each_gently
Make `generic_server::gentle_iterator` a mutable iterator to allow
`for_each_gently` to make changes to the connections.

Fixes: #16035

Closes scylladb/scylladb#16036
2023-11-14 14:25:22 +02:00
Pavel Emelyanov
a87b5cfbec test/object_store: Generalize test table creation
All two and the upcoming third test cases in the test create the very
same ks.cf pair with the very same sequence of steps. Generalize them.

For the basic test case also tune up the way "expected" rows are
calculated -- now they are SELECT-ed right after insertion and the size
is checked to be non zero. Not _exactly_ the same check, but it's good
enough for basic testing purposes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15986
2023-11-14 13:55:02 +02:00
Takuya ASADA
338a9492c9 scylla_post_install.sh: detect RHEL correctly
$ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL
itself.
To detect RHEL correctly, we also need to check $ID = "rhel".

Fixes #16040

Closes scylladb/scylladb#16041
2023-11-14 13:53:35 +02:00
Kefu Chai
5a6c5320de test/sstable_compaction_test: use BOOST_REQUIRE_EQUAL when appropriate
Boost.Test prints the LHS and RHS when the predicate statement passed
to BOOST_REQUIRE_EQUAL() macro evaluates to false. so the error message
printed by Boost would be more developer friendly when the test fails.

in this test, we replace some BOOST_REQUIRE() with BOOST_REQUIRE_EQUAL()
when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16047
2023-11-14 13:51:47 +02:00
Botond Dénes
f63645ceab Merge 'test/cql-pytest: fix test_permissions.py to not fail on Cassandra' from Nadav Har'El
This short series fixes test/cql-pytest/test_permissions.py to stop failing on Cassandra.

The second patch fixes these failures (and explains why). The first patch is a new test for UDFs, which helped me prove that one of the test_permissions.py failures in Cassandra is a Cassandra bug - some esoteric error path that prints the right message when no permissions are involved, becomes wrong when permissions are added.

Fixes #15969

Closes scylladb/scylladb#15979

* github.com:scylladb/scylladb:
  test/cql-pytest: fix test_permissions.py to not fail on Cassandra
  test/cql-pytest: add test for DROP FUNCTION
2023-11-14 13:50:51 +02:00
Gleb Natapov
f04e890690 storage_service: topology coordinator: do fencing even if draining failed
Token metadata barrier consists for two steps. First old request are
drained and then requests that are not drained are fenced. But currently
if draining fails then fencing is note done. This is fine if the
barrier's failure handled by retrying, but we when to start handling
errors differently. In fact during topology operation rollback we
already do not retry failed barrier.

The patch fixes the metadata barrier to do fencing even if draining
failed.
2023-11-14 13:06:41 +02:00
Aleksandra Martyniuk
6af581301b test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-11-14 10:36:38 +01:00
Botond Dénes
a66ec1d3c1 Merge 'Drop compaction_manager_test' from Pavel Emelyanov
This is continuation of a34c8dc4 (Drop compaction_manager_for_testing).

There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class.

Closes scylladb/scylladb#16017

* github.com:scylladb/scylladb:
  test/utils: Drop compaction_manager_test
  test/utils: Get compaction manager from test_env
  test/sstables: Introduce test_env_compaction_manager::perform_compaction()
  test/env: Add sstables::test_env& to compaction_manager_test::run()
  test/utils: Add sstables::test_env& to compact_sstables()
  test/utils: Simplify and unify compaction_manager_test::run()
  test/utils: Squash two compact_sstables() helpers
  test/compaction: Use shorter compact_sstables() helper
  test/utils: Keep test task compaction gate on task itself
  test/utils: Move compaction_manager_test::propagate_replacement()
2023-11-14 11:25:17 +02:00
Kamil Braun
9212bdc6b1 migration_manager: more verbose logging for schema versions
We're observing nodes getting stuck during bootstrap inside
`storage_service::wait_for_ring_to_settle()`, which periodically checks
`migration_manager::have_schema_agreement()` until it becomes `true`:
scylladb/scylladb#15393.

There is no obvious reason why that happens -- according to the nodes'
logs, their latest in-memory schema version is the same.

So either the gossiped schema version is for some reason different
(perhaps there is a race in publishing `application_state::SCHEMA`) or
missing entirely.

Alternatively, `wait_for_ring_to_settle` is leaving the
`have_schema_agreement` loop and getting stuck in
`update_topology_change_info` trying to acquire a lock.

Modify logging inside `have_schema_agreement` so details about missing
schema or version mismatch are logged on INFO level, and an INFO level
message is printed before we return `true`. To prevent logs from getting
spammed, rate-limit the periodic messages to once every 5 seconds. This
will still show the reason in our tests which allow the node to hang for
many minutes before timing out. Also these schema agreement checks are
done on relatively rare occasions such as bootstrap, so the additional
logs should not be harmful.

Furthermore, when publishing schema version to gossip, log it on INFO
level. This is happening at most once per schema change so it's a rare
message. If there's a race in publishing schema versions, this should
allow us to observe it.

Ref: scylladb/scylladb#15393

Closes scylladb/scylladb#16021
2023-11-14 11:24:47 +02:00
Alexey Novikov
bd73536b33 When add duration field to UDT check whether this UDT is used in some clustering key
Having values of the duration type is not allowed for clustering
columns, because duration can't be ordered. This is correctly validated
when creating a table but do not validated when we alter the type.

Fixes #12913

Closes scylladb/scylladb#16022
2023-11-14 11:23:05 +02:00
Botond Dénes
4968f50ff7 Merge 'auth: fix error message when consistency level is not met' from Paweł Zakrzewski
Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh.

Fixes #2339

Closes scylladb/scylladb#15922

* github.com:scylladb/scylladb:
  test: add the auth_cluster test suite
  auth: fix error message when consistency level is not met
2023-11-14 11:22:38 +02:00
Kefu Chai
4f361b73c4 build: cmake: consolidate the setting of cxx_flags
before this change, we define the CMAKE_CXX_FLAGS_${CONFIG} directly.
and some of the configurations are supposed to generate debugging info with
"-g -gz" options, but they failed to include these options in the cxx
flags.

in this change:

* a macro named `update_cxx_flags` is introduced to set this option.
* this macro also sets -O option

instead of using function, this facility is implemented as a macro so
that we can update the CMAKE_CXX_FLAGS_${CONFIG} without setting
this variable with awkward syntax like set

```cmake
set(${flags} "${${flags}}" PARENT_SCOPE)
```

this mirrors the behavior in configure.py in sense that the latter
sets the option on a per-mode basis, and interprets the option to
compiling option.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16043
2023-11-14 11:21:52 +02:00
Kefu Chai
a846291ce8 build: cmake: define SCYLLA_BUILD_MODE for Release build
this macro definition was dropped in 2b961d8e3f by accident.
in this change, let's bring it back. this macro is always necessary,
as it is checked in scylla source.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16044
2023-11-14 11:21:33 +02:00
Tomasz Grabiec
dc6a0b2c35 gossiper: Elevate logging level for node restart events
They cause connection drops, which is a significant disruptive
event. We should log it so that we can know that this is the cause of
the problems it may cause, like requests timing out. Connection drop
will cause coordinator-side requests to time out in the absence of
speculation.

Refs #14746

Closes scylladb/scylladb#16018
2023-11-14 11:21:13 +02:00
Kefu Chai
58f3ced4d6 scylla-gdb: raise if no tasks are found
the "task" fixture is supposed to return a task for test, if it
fails to do so, it would be an issue not directly related to
the test. so let's fail it early.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16042
2023-11-14 11:12:43 +02:00
Botond Dénes
22381441b0 migration_manager: also reload schema on enabling digest_insensitive_to_expiry
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes scylladb/scylladb#16013
2023-11-13 23:32:20 +02:00
Aleksandra Martyniuk
a63a6dcd93 test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-11-13 16:06:36 +01:00
Aleksandra Martyniuk
2a9ee59cc4 tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-11-13 16:06:20 +01:00
Aleksandra Martyniuk
599d6ebd52 compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-11-13 15:46:58 +01:00
Kamil Braun
d24b305712 Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

Additionally, after removing the timeout, we adjust the topology
coordinator. We make it try sending the response (both acceptance
and rejection) only once since we do not care if it fails anymore. We
only need to ensure that the joining node is moved to the left state
if sending fails.

Fixes #15865

Closes scylladb/scylladb#15944

* github.com:scylladb/scylladb:
  raft topology: fix indentation
  raft topology: join: try sending the response only once
  raft topology: join: do not time out waiting for the node to be joined
  group 0: group0_handshaker: add the abort_source parameter to post_server_start
2023-11-13 15:02:27 +01:00
Paweł Zakrzewski
a0dcc154c1 test: add the auth_cluster test suite
This commit adds the auth_cluster test suite to test a custom scenario
involving password authentication:
- create a cluster of 2 nodes with password authentication
- down one node
- the other node should refuse login stating that it couldn't reach
  QUORUM

References ScyllaDB OSS #2339
2023-11-13 14:04:28 +01:00
Paweł Zakrzewski
400aa2e932 auth: fix error message when consistency level is not met
Propagate `exceptions::unavailable_exception` error message to the
client such as cqlsh.

Fixes #2339
2023-11-13 14:04:23 +01:00
Takuya ASADA
85339d1820 scylla_setup: add warning for CentOS7 default kernel
Since CentOS7 default kernel is too old, has performance issues and also
has some bugs, we have been recommended to use kernel-ml kernel.
Let's check kernel version in scylla_setup and print warning if the
kernel is CentOS7 default one.

related #7365

Closes scylladb/scylladb#15705
2023-11-13 13:47:06 +02:00
Botond Dénes
2b11a02b67 Merge 'Improvements to gossiper shadow round' from Kamil Braun
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.

Ref: https://github.com/scylladb/scylladb/issues/15675

Closes scylladb/scylladb#15941

* github.com:scylladb/scylladb:
  gossiper: do_shadow_round: increment `nodes_down` in case of timeout
  gossiper: do_shadow_round: fix `nodes_down` calculation
  storage_service: make shadow round mandatory during bootstrap/replace
  gossiper: do_shadow_round: remove default value for nodes param
  gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
2023-11-13 13:37:13 +02:00
Botond Dénes
dfd7981fa7 api/storage_service: start/stop native transport in the statement sg
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.

To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.

Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.

I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.

Fixes: #15485

Closes scylladb/scylladb#16019
2023-11-13 14:08:01 +03:00
Anna Stuchlik
8a4a8f077a doc: document full support for RBNO
This commit updates the Repair-Based Node
Operations page. In particular:
- Information about RBNO enabled for all
  node operations is added (before 5.4, RBNO
  was enabled for the replace operation, while
  it was experimental for others).
- The content is rewritten to remove redundant
  information about previous versions.

The improvement is part of the 5.4 release.
This commit must be backported to branch-5.4

Closes scylladb/scylladb#16015
2023-11-13 13:06:15 +02:00
Pavel Emelyanov
492b842929 messaging_service: Define metrics domain for client connections
Recent seastar update included RPC metrics (scylladb/seastar#1753). The
reported metrics groups together sockets based on their "metrics_domain"
configuration option. This patch makes use of this domain to make scylla
metrics sane.

The domain as this patch defines it includes two strings:

First, the datacenter the server lives in. This is because grouping
metrics for connections to different datacenters makes little sense for
several reasons. For example -- packet delays _will_ differ for local-DC
vs cross-DC traffic and mixing those latencies together is pointless.
Another example -- the amount of traffic may also differ for local- vs
cross-DC connections e.g. because of different usage of enryption and/or
compression.

Second, each verb-idx gets its own domain. That's to be able to analyze
e.g. query-related traffic from gossiper one. For that the existing
isolation cookie is taken as is.

Note, that the metrics is _not_ per-server node. So e.g. two gossiper
connections to two different nodes (in one DC) will belong to the same
domain and thus their stats will be summed when reported.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15785
2023-11-13 11:13:20 +01:00
Pavel Emelyanov
f4696f21a8 test/utils: Drop compaction_manager_test
This class only provides a .run() method which allocates a task and
calls sstables::test_env::perform_compaction(). This can be done in a
helper method, no need for the whole class for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
b68f9c32bb test/utils: Get compaction manager from test_env
This is just to reduce churn in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
9fd270566a test/sstables: Introduce test_env_compaction_manager::perform_compaction()
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
0160265c7d test/env: Add sstables::test_env& to compaction_manager_test::run()
Continuation of the previous patch that will also be used further.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
393c066f3e test/utils: Add sstables::test_env& to compact_sstables()
Will be used in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
ca18db4a71 test/utils: Simplify and unify compaction_manager_test::run()
The method is the simplified rewrite of the compaction_manager's
perform_compaction() one, but it makes task registration and
unregistration to hard way. Keep it shorter and simpler resembling the
compaction_manager's prototype.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
9a9e1fdd7d test/utils: Squash two compact_sstables() helpers
Now the one sitting in utils is only called from its peer in compaction
test. Things get simpler if they get merged.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
69657a2a97 test/compaction: Use shorter compact_sstables() helper
There are several of them spread between the test and utils. One of the
test cases can use its local shorter overload for brevity. Also this
makes one of the next patches shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
59943267c2 test/utils: Keep test task compaction gate on task itself
They both have the same scope, but keeping it on the task frees the
caller from the need to mess with its private fields. For now it's not a
problem, but it will be critical in one of the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
aec3fc493a test/utils: Move compaction_manager_test::propagate_replacement()
The purpose of this method is to turn public the private
compaction_manager method of the same name. The caller of this method is
having sstable_test_env at hand with its test_env_compaction_manager, so
the de-private-isation call can be moved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Avi Kivity
7b08886e8d Update tools/java submodule (dependencies update)
* tools/java 86a200e324...97c490947c (1):
  > Merge 'build: update several dependencies' from Piotr Grabowski

Ref https://github.com/scylladb/scylla-tools-java/issues/348
Ref https://github.com/scylladb/scylla-tools-java/issues/349
Ref https://github.com/scylladb/scylla-tools-java/issues/350
2023-11-12 18:17:04 +02:00
Nadav Har'El
7f34006ce2 test/cql-pytest: fix test_permissions.py to not fail on Cassandra
We shouldn't have cql-pytest tests that report failure when run on
Cassandra (with test/cql-pytest/run-cassandra): A test that passes
on Scylla but fails on Cassandra indicates a *difference* between
Scylla's behavior and Cassandra's, and this difference should always
be investigated:

 1. It can be a Scylla bug, which of should be fixed immediately
    or reported as a bug and the test changed to fail on Scylla ("xfail").

 2. It can be a minor difference in Scylla's and Cassandra's
    behavior where both can be accepted. In this case the test should
    me modified to accept both behaviors, and a comment added to
    explain why we decided to do that.

 3. It can be a Cassandra bug which causes a correct test to fail.
    This case should not be taken lightly, and a serious effort
    is needed to be convinced that this is really a Cassandra bug
    and not our misunderstanding of what Cassandra does. In
    this case the test should be marked "cassandra_bug" and a
    detailed comment should explain why.

 4. Or it can be an outright bug in the test that caused it to fail
    on Cassandra.

This test had most of these cases :-) There was a test bug in one place
(in a Cassandra-specific Java UDF), a minor and (aruably) acceptable
difference between the error codes returned by Scylla and Cassandra
in one case, and two minor Cassandra bugs (in the error path). All
of these are fixed here, and after this patch test/cql-pytest/run-cassandra
no longer fails on this file.

Fixes #15969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-12 17:14:09 +02:00
Nadav Har'El
0ecf84e83e test/cql-pytest: add test for DROP FUNCTION
We already have in test/cql-pytest various tests for UDF in the bigger
context of UDA (test_uda.py), WASM (test_wasm.py) and permissions, but
somehow we never had a file for simple tests only for UDF, so we
add one here, test/cql-pytest/test_udf.py

We add a test for checking something which was already assumed in
test_permissions.py - that it is possible to create two different
UDFs with the same name and different parameters, and then you must
specify the parameters when you want to DROP one of them. The test
confirms that ScyllaDB's and Cassandra's behavior is identical in
this, as hoped.

To allow the test to run on both ScyllaDB and Cassandra, it needs to
support both Lua (for ScyllaDB) or Java (for Cassandra), and we introduce
a fixture to make it easier to support both. This fixture can later
be used in more tests added to this file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-12 17:14:08 +02:00
Tomasz Grabiec
457d170078 Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes
There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.

Fixes: https://github.com/scylladb/scylladb/issues/14087

Closes scylladb/scylladb#15806

* github.com:scylladb/scylladb:
  test/boost/multishard_mutation_query_test: fix querier cache misses expectations
  test/lib/test_utils: add require_* variants for all comparators
2023-11-12 13:15:29 +01:00
Benny Halevy
68a7bbe582 compaction_manager: perform_cleanup: ignore condition_variable_timed_out
The polling loop was intended to ignore
`condition_variable_timed_out` and check for progress
using a longer `max_idle_duration` timeout in the loop.

Fixes #15669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15671
2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak
2d7bfeb3fa raft topology: fix indentation
Broken in the previous commit.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
e94c7cff28 raft topology: join: try sending the response only once
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response.
In the previous commit, we have made the operator responsible for
shutting down the joining node if the topology coordinator fails
to deliver a response by removing the timeout. In this commit, we
adjust the topology coordinator. We make it try sending the
response (both acceptance and rejection) only once since we do not
care if it fails anymore. We only need to ensure that the joining
node is moved to the left state if sending fails.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
4ffa692cb3 raft topology: join: do not time out waiting for the node to be joined
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

This change additionally fixes the TODO in
raft_group0::join_group0.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
5f36e1d7f2 group 0: group0_handshaker: add the abort_source parameter to post_server_start
Used in the following commit to enable the clean shutdown of a
node that does not receive the join rejection from the topology
coordinator.
2023-11-10 12:35:38 +01:00
Anna Stuchlik
8d618bbfc6 doc: update cqlsh compatibility with Python
This commit updates the cqlsh compatibility
with Python to Python 3.

In addition it:
- Replaces "Cassandra" with "ScyllaDB" in
  the description of cqlsh.
  The previous description was outdated, as
  we no longer can talk about using cqlsh
  released with Cassandra.
- Replaces occurrences of "Scylla" with "ScyllaDB".
- Adds additional locations of cqlsh (Docker Hub
  and PyPI), as well as the link to the scylla-cqlsh
  repository.

Closes scylladb/scylladb#16016
2023-11-10 09:19:41 +02:00
Avi Kivity
d8bf8f0f43 Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov
After 146e49d0dd (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3)

fixes: #13020

Closes scylladb/scylladb#16007

* github.com:scylladb/scylladb:
  test/object_store: Check that keyspace directory doesn't appear
  sstables/storage: Do storage init/destroy based on storage options
  replica/{ks|cf}: Move storage init/destroy to sstables manager
  database: Add get_sstables_manager(bool_class is_system) method
2023-11-09 20:35:13 +02:00
Kamil Braun
3bcee6a981 Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani"
This reverts commit 7c7baf71d5.

If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52  /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52    self.exit_artifacts = {}
12:35:52  RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52  Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52  Traceback (most recent call last):
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52      return fut.result()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52      return await self._transport._wait()
12:35:52             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52      return await waiter
12:35:52             ^^^^^^^^^^^^
12:35:52  asyncio.exceptions.CancelledError
12:35:52
12:35:52  The above exception was the direct cause of the following exception:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52      await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52      raise exceptions.TimeoutError() from exc
12:35:52  TimeoutError
12:35:52
12:35:52  During handling of the above exception, another exception occurred:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52      code = await main()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52      await run_all_tests(signaled, options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52      await reap(done, pending, signaled)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52      result = coro.result()
12:35:52               ^^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52      await test.run(options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52      async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52    File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52      await anext(self.gen)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52      await manager.stop()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52      await self.clusters.put(self.cluster, is_dirty=True)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52      await self.destroy(obj)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52      await cluster.stop_gracefully()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52      await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52      raise RuntimeError(
12:35:52  RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
2023-11-09 12:30:35 +01:00
Gleb Natapov
2dd8152c8b storage_service: topology coordinator: log rollback event before changing node's state
The test for the rollback relies on the log to be there after operation
fails, but if node's state is changed before the log the operation may
fail before the log is printed.

Fixes scylladb/scylladb#15980

Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>
2023-11-09 12:11:58 +01:00
Botond Dénes
d8b6771eb8 Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik
This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218.
It adds CQL Reference for Materialized Views to the Materialized Views page.

In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB".

(nobackport)

Closes scylladb/scylladb#15855

* github.com:scylladb/scylladb:
  doc: remove versions from Materialized Views
  doc: add CQL Reference for Materialized Views
2023-11-09 10:43:11 +01:00
Botond Dénes
1cccc86813 Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk"
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.

Reverting because rest_api.test_compaction_task started failing after
this was merged.

Fixes: #16005
2023-11-09 10:43:11 +01:00
Eliran Sinvani
c5956957f3 use_statement: Covert an exception to a future exception
The use statement execution code can throw if the keyspace is
doesn't exist, this can be a problem for code that will use
execute in a fiber since the exception will break the fiber even
if `then_wrapped` is used.

Fixes #14449

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#14394
2023-11-09 10:43:11 +01:00
Pavel Emelyanov
7e1017c7d8 test/object_store: Check that keyspace directory doesn't appear
When creating a S3-backed keyspace its storage dir shouldn't be made.
Also it shouldn't be "resurrected" by boot-time loader of existing
keyspaces.

For extra confidence check that the system keyspace's directory does
exists where the test expects keyspaces' directories to appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
f6eae191ff sstables/storage: Do storage init/destroy based on storage options
It's only local storage type that needs directores touch/remove, S3
storage initialization is for now a no-op, maybe some day soon it will
appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
11b704e8b8 replica/{ks|cf}: Move storage init/destroy to sstables manager
It's the manager that knows about storages and it should init/destroy
it. Also the "upload" and "staging" paths are about to be hidden in
sstables/ code, this code move also facilitates that.

The indentation in storage.cc is deliberately broken to make next patch
look nicer (spoiler: it won't have to shift those lines right).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
68cf26587c database: Add get_sstables_manager(bool_class is_system) method
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Michał Chojnowski
206e313c60 mutation_query_test: test that range tombstones are sent in reverse queries
Reproducer for #10598.
2023-11-08 14:54:48 +01:00
Michał Chojnowski
002357e238 mutation_query: properly send range tombstones in reverse queries
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, are not be emitted from mutation_query.

This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.

In particular, range deletes performed while a replica is down, will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.

As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.
2023-11-08 14:54:48 +01:00
Nadav Har'El
6453f41ca9 Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak
Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`).
This improves readability and matches client-side describe format.

Fixes: #14895

Closes scylladb/scylladb#15900

* github.com:scylladb/scylladb:
  cql-pytest:test_describe: add test for whitespaces in json objects
  schema: add whitespace to description of  table options
2023-11-08 15:26:49 +02:00
Anna Stuchlik
ca0f5f39b5 doc: fix info about in 5.4 upgrade guide
This commit fixes the information about
Raft-based consistent cluster management
in the 5.2-to-5.4 upgrade guide.

This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4.

In addition, it adds information about removing
DateTieredCompactionStrategy to the 5.2-to-5.4
upgrade guide, including the guideline to
migrate to TimeWindowCompactionStrategy.

Closes scylladb/scylladb#15988
2023-11-08 13:21:53 +01:00
Kamil Braun
3036a80334 docs: mention Raft getting enabled when upgrading to 5.4
Fixes: scylladb/scylladb#15952

Closes scylladb/scylladb#16000
2023-11-08 14:18:29 +02:00
Raphael S. Carvalho
b551f4abd2 streaming: Improve partition estimation with TWCS
When off-strategy is disabled, data segregation is not postponed,
meaning that getting partition estimate right is important to
decrease filter's false positives. With streaming, we don't
have min and max timestamps at destination, well, we could have
extended the RPC verb to send them, but turns out we can deduce
easily the amount of windows using default TTL. Given partitioner
random nature, it's not absurd to assume that a given range being
streamed may overlap with all windows, meaning that each range
will yield one sstable for each window when segregating incoming
data. Today, we assume the worst of 100 windows (which is the
max amount of sstables the input data can be segregated into)
due to the lack of metadata for estimating the window count.
But given that users are recommended to target a max of ~20
windows, it means partition estimate is being downsized 5x more
than needed. Let's improve it by using default TTL when
estimating window count, so even on absence of timestamp
metadata, the partition estimation won't be way off.

Fixes #15704.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-08 12:10:03 +02:00
Kamil Braun
f094e23d84 system_keyspace: use system memory for system.raft table
`system.raft` was using the "user memory pool", i.e. the
`dirty_memory_manager` for this table was set to
`database::_dirty_memory_manager` (instead of
`database::_system_dirty_memory_manager`).

This meant that if a write workload caused memory pressure on the user
memory pool, internal `system.raft` writes would have to wait for
memtables of user tables to get flushed before the write would proceed.

This was observed in SCT longevity tests which ran a heavy workload on
the cluster and concurrently, schema changes (which underneath use the
`system.raft` table). Raft would often get stuck waiting many seconds
for user memtables to get flushed. More details in issue #15622.
Experiments showed that moving Raft to system memory fixed this
particular issue, bringing the waits to reasonable levels.

Currently `system.raft` stores only one group, group 0, which is
internally used for cluster metadata operations (schema and topology
changes) -- so it makes sense to keep use system memory.

In the future we'd like to have other groups, for strongly consistent
tables. These groups should use the user memory pool. It means we won't
be able to use `system.raft` for them -- we'll just have to use a
separate table.

Fixes: scylladb/scylladb#15622

Closes scylladb/scylladb#15972
2023-11-08 11:21:14 +02:00
Nadav Har'El
284534f489 Merge 'Nodetool additional commands 4/N' from Botond Dénes
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15939

* github.com:scylladb/scylladb:
  test/nodetool: add README.md
  tools/scylla-nodetool: implement enableautocompaction command
  tools/scylla-nodetool: implement disableautocompaction command
  tools/scylla-nodetool: implement the flush command
  tools/scylla-nodetool: extract keyspace/table parsing
  tools/scylla-nodetool: implement the drain command
  tools/scylla-nodetool: implement the snapshot command
  test/nodetool: add support for matching aproximate query parameters
  utils/http: make dns_connection_factory::initialize() static
2023-11-08 11:18:35 +02:00
Kefu Chai
cf70970226 build: cmake: use $<CONFIG:cfgs> when appropriate
since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of
the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for
checking if a config is in a list of configurations.
and since the minimal required CMake of scylla is 3.27, so let's
use $<CONFIG:cfgs> when possible.

see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15989
2023-11-08 08:50:44 +02:00
Nadav Har'El
3729ea8bfd cql-pytest: translate Cassandra's test for CREATE operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs #2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15791
2023-11-08 08:46:27 +02:00
Botond Dénes
2860d43309 Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#15083

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-08 08:45:16 +02:00
Asias He
194507dffa repair: Convert put_row_diff_with_rpc_stream to use coroutine
It will be easier to add more logics in this function.
2023-11-08 13:52:34 +08:00
Nadav Har'El
a3621dbd3e Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz
alternator: add support for ReturnValuesOnConditionCheckFailure feature

As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481

Closes scylladb/scylladb#15125

* github.com:scylladb/scylladb:
  alternator: add support for ReturnValuesOnConditionCheckFailure feature
  alternator: add ability to send additional fields in api_error
2023-11-07 23:19:51 +02:00
Takuya ASADA
a4aeef2eb0 scylla_util.py: run apt-get update before apt-get install if it necessary
Unlike yum, "apt-get install" may fails because package cache is outdated.
Let's check package cache mtime and run "apt-get update" if it's too old.

Fixes #4059

Closes scylladb/scylladb#15960
2023-11-07 20:40:16 +02:00
Wojciech Mitros
ab743271f1 test: increase timeout for lua UDF execution
When running on a particularly slow setup, for example on
an ARM machine in debug mode, the execution time of even
a small Lua UDF that we're using in tests may exceed our
default limits.
To avoid timeout errors, the limit in tests is now increased
to a value that won't be exceeded in any reasonable scenario
(for the current set of tested UDFs), while not making the
test take an excessive amount of time in case of an error in
the UDF execution.

Fixes #15977

Closes scylladb/scylladb#15983
2023-11-07 20:28:28 +02:00
Kamil Braun
07e9522d6c Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski
When topology coordinator tries to fence the previous coordinator it
performs a group0 operation. The current topology coordinator might be
aborted in the meantime, which will result in a `raft::request_aborted`
exception being thrown. After the fix to scylladb/scylladb#15728 was
merged, the exception is caught, but then `sleep_abortable` is called
which immediately throws `abort_requested_exception` as it uses the same
abort source as the group0 operation. The `fence_previous_coordinator`
function which does all those things is not supposed to throw
exceptions, if it does - it causes `raft_state_monitor_fiber` to exit,
completely disabling the topology coordinator functionality on that
node.

Modify the code in the following way:

- Catch `abort_requested_exception` thrown from `sleep_abortable` and
  exit the function if it happens. In addition to the described issue,
it will also handle the case when abort is requested while
`sleep_abortable` happens,
- Catch `raft::request_aborted` thrown from group0 operation, log the
  exception with lower verbosity and exit the function explicitly.

Finally, wrap both `fence_previous_coordinator` and `run` functions in a
`try` block with `on_fatal_internal_error` in the catch handler in order
to implement the behavior that adding `noexcept` was originally supposed
to introduce.

Fixes: scylladb/scylladb#15747

Closes scylladb/scylladb#15948

* github.com:scylladb/scylladb:
  raft topology: catch and abort on exceptions from topology_coordinator::run
  Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
  raft topology: don't print an error when fencing previous coordinator is aborted
  raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
2023-11-07 17:17:49 +01:00
Botond Dénes
60ea940f9e Merge 'docs: render options with role' from Kefu Chai
this series tries to

1. render options with role. so the options can be cross referenced and defined.
2. move the formatting out of the content. so the representation can be defined in a more flexible way.

Closes scylladb/scylladb#15860

* github.com:scylladb/scylladb:
  docs: add divider using CSS
  docs: extract _clean_description as a filter
  docs: render option with role
  docs: parse source files right into rst
2023-11-07 17:01:22 +02:00
Botond Dénes
3088453a09 test/nodetool: add README.md 2023-11-07 09:49:56 -05:00
Botond Dénes
7ff7cdc86a tools/scylla-nodetool: implement enableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
0e0401a5c5 tools/scylla-nodetool: implement disableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
f5083f66f5 tools/scylla-nodetool: implement the flush command 2023-11-07 09:49:56 -05:00
Botond Dénes
f082cc8273 tools/scylla-nodetool: extract keyspace/table parsing
Having to extract 1 keyspace and N tables from the command-line is
proving to be a common pattern among commands. Extract this into a
method, so the boiler-plate can be shared. Add a forward-looking
overload as well, which will be used in the next patch.
2023-11-07 09:49:56 -05:00
Botond Dénes
ec5b24550a tools/scylla-nodetool: implement the drain command 2023-11-07 09:49:56 -05:00
Botond Dénes
598dbd100d tools/scylla-nodetool: implement the snapshot command 2023-11-07 09:49:56 -05:00
Benny Halevy
6a628dd9a6 docs: operating-scylla: nodetool: improve documentation for {en,dis}ableautocompaction
Fixes scylladb/scylladb#15554

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15950
2023-11-07 14:05:55 +02:00
Kamil Braun
e64613154f Merge 'cleanup no longer used gossiper states' from Gleb
Remove no longer used gossiper states that are not needed even for
compatibility any longer.

* 'remove_unused_states' of github.com:scylladb/scylla-dev:
  gossip: remove unused HIBERNATE gossiper status
  gossip: remove unused STATUS_MOVING state
2023-11-07 11:48:04 +01:00
Botond Dénes
07c7109eb6 test/nodetool: add support for matching aproximate query parameters
Match paramateres within some delta of the expected value. Useful when
nodetool generates a timestamp, whose exact value cannot be predicted in
an exact manner.
2023-11-07 04:58:41 -05:00
Botond Dénes
b61822900b utils/http: make dns_connection_factory::initialize() static
Said method can out-live the factory instance. This was not a problem
because the method takes care to keep all its need from `this` alive, by
copying them to the coroutine stack. However, this fact that this method
can out-live the instance is not obvious, and an unsuspecting developer
(me) added a new member (_logger) which was not kept alive.
This can cause a use-after-free in the factory. Fix by making
initialize() static, forcing the instance to pass all parameters
explicitely and add a comment explaining that this method can out-live
the instance.
2023-11-07 04:39:33 -05:00
Pavel Emelyanov
9443253f3d Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.

Fixes: scylladb/scylladb#15816

Closes scylladb/scylladb#15970

* github.com:scylladb/scylladb:
  test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
  api: failure_detector: fix indentation
  api: failure_detector: invoke on shard 0
2023-11-07 11:54:27 +03:00
Botond Dénes
76ab66ca1f Merge 'Support state change for S3-backed sstables' from Pavel Emelyanov
The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly.

There's also the upload facility that's implemented as state change too, but this PR doesn't support this part.

fixes: #13017

Closes scylladb/scylladb#15829

* github.com:scylladb/scylladb:
  test: Make test_sstables_excluding_staging_correctness run over s3 too
  sstables,s3: Support state change (without generation change)
  system_keyspace: Add state field to system.sstables
  sstable_directory: Tune up sstables entries processing comment
  system_keyspace: Tune up status change trace message
  sstables: Add state string to state enum class convert
2023-11-07 10:45:41 +02:00
Botond Dénes
74f68a472f Merge 'doc: add the upgrade guide from 5.2 to 5.4' from Anna Stuchlik
This PR adds the 5.2-5.4 upgrade guide.
In addition, it removes the redundant upgrade guide from 5.2 to 5.3 (as 5.3 was skipped), as well as some mentions of version 5.3.

This PR must be backported to branch-5.4.

Closes scylladb/scylladb#15880

* github.com:scylladb/scylladb:
  doc: add the upgrade guide from 5.2 to 5.4
  doc: remove version "5.3" from the docs
  doc: remove the 5.2-to-5.3 upgrade guide
2023-11-07 10:35:33 +02:00
David Garcia
afaeb30930 docs: add dynamic version on aws images extension
Closes scylladb/scylladb#15940
2023-11-07 10:30:23 +02:00
Takuya ASADA
2e7552a0ca dist/redhat: drop rpm conflict with ABRT, add systemd conflict instead
Currently, "yum install scylla" causes conflict when ABRT is installed.

To avoid this behavior and keep using systemd-coredump for scylla
coredump, let's drop "Conflicts: abrt" from rpm and
add "Conflicts=abrt-ccpp.service" to systemd unit.

Fixes #892

Closes scylladb/scylladb#15691
2023-11-07 10:30:23 +02:00
Botond Dénes
2f0284f30d Merge 'build: cmake: configure all available config types' from Kefu Chai
in this series, instead of assuming that we always have only one single `CMAKE_BUILD_TYPE`, we configure all available configurations, to be better prepared for the multi-config support.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15933

* github.com:scylladb/scylladb:
  build: cmake: set compile options with generator expression
  build: cmake: configure all available config types
  build: cmake: set per-mode stack usage threshold
  build: cmake: drop build_mode
  build: cmake: check for config type if multi-config is used
2023-11-07 09:45:57 +02:00
Botond Dénes
7679152209 Merge 'Sanitize usage of make_sstable_easy+make_memtable in tests' from Pavel Emelyanov
The helper makes sstable, writes mutations into it and loads one. Internally it uses the make_memtable() helper that prepares a memtable out of a vector of mutations. There are many test cases that don't use these facilities generating some code duplication.

The make_sstable() wrapper around make_sstable_easy() is removed along the way.

Closes scylladb/scylladb#15930

* github.com:scylladb/scylladb:
  tests: Use make_sstable_easy() where appropriate
  sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper
  sstable_mutation_test: Use make_sstable_easy() instead of make_sstable()
  tests: Make use of make_memtable() helper
  tests: Drop as_mutation_source helper
  test/sstable_utils: Hide assertion-related manipulations into branch
2023-11-07 09:29:30 +02:00
Kefu Chai
882e7eca25 build: cmake: set compile options with generator expression
instead of using a single compile option for all modes, use per-mode
compile options. this change keeps us away from using `CMAKE_BUILD_TYPE`
directly, and prepares us for the multi-config generator support.

because we only apply these settings in the configurations where
sanitizers are used, there is no need to check if these option can be
accepted by the compiler. if this turns out to be a problem, we can
always add the check back on a per-mode basis.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:35:20 +08:00
Kefu Chai
61a542ffd0 build: cmake: configure all available config types
if `CMAKE_CONFIGURATION_TYPES` is set, it implies that the
multi-config generator is used, in this case, we include all
available build types instead of only the one specified by
`CMAKE_BUILD_TYPE`, which is typically used by non-multi-config
generators.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:14:33 +08:00
Kefu Chai
6fcff51cf1 build: cmake: set per-mode stack usage threshold
instead of setting a single stack usage threshold, set per-mode
stack usage threshold. this prepares for the support of
multi-config generator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:13:50 +08:00
Kefu Chai
23bb644314 build: cmake: drop build_mode
there is no benefit having this variable. and it introduces
another layer of indirection. so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:10:59 +08:00
Kefu Chai
7369e2e3df build: cmake: check for config type if multi-config is used
we should not set_property() on a non-existant property. if a multi-config
generator is used, `CMAKE_BUILD_TYPE` is not added as a cached entry at all.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:10:59 +08:00
Paweł Zakrzewski
9e240c2dc8 test/cql-pytest: Verify that GRANT ALTER ALL allows changing the superuser password
This is a test for #14277. We do want to match Cassandra's behavior,
which means that a user who is granted ALTER ALL is able to change
the password of a superuser.

Closes scylladb/scylladb#15961
2023-11-06 18:39:53 +01:00
Takuya ASADA
a23278308f dist: fix local-fs.target dependency
systemd man page says:

systemd-fstab-generator(3) automatically adds dependencies of type Before= to
all mount units that refer to local mount points for this target unit.

So "Before=local-fs.taget" is the correct dependency for local mount
points, but we currently specify "After=local-fs.target", it should be
fixed.

Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target",
since .mount are not related with multi-user but depends local
filesystems.

Fixes #8761

Closes scylladb/scylladb#15647
2023-11-06 18:39:53 +01:00
Kefu Chai
d78ccab337 test/s3: add --keep-tmp option to preserve the tmp dir
before this change, the tempdir is always nuked no matter if the
test succceds. but sometimes, it would be important to check
scylla's sstables after the test finishes.

so, in this change, an option named `--keep-tmp` is added so
we can optionally preserve the temp directory. this option is off
by default.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15949
2023-11-06 18:39:53 +01:00
Anna Stuchlik
3756705520 doc: add OS support in version 5.4
This commit adds OS support information
in version 5.4 (removing the non-released
version 5.3).

In particular, it adds support for Oracle Linux
and Amazon Linux.

Also, it removes support for outdated versions.

Closes scylladb/scylladb#15923
2023-11-06 18:39:53 +01:00
Anna Stuchlik
1e0cbfe522 doc: update package installation in version 5.4
This commit updates the package installation
instructions in version 5.4.
- It updates the variables to include "5.4"
  as the version name.
- It adds the information for the newly supported
  Rocky/RHEL 9 - a new EPEL download link is required.

Closes scylladb/scylladb#15963
2023-11-06 18:39:53 +01:00
Pavel Emelyanov
bcec9c4ffc Merge 'test/object_store: PEP8 compliant cleanups' from Kefu Chai
this series applies fixes to make the test more PEP8 compliant. the goal is to improve the readability and maintainability.

Closes scylladb/scylladb#15946

* github.com:scylladb/scylladb:
  test/object_store: wrap line which is too long
  test/object_store: use pattern matching to capture variable in loop
  test/object_store: remove space after and before '{' and '}'
  test/object_store: add an empty line before nested function definition
  test/object_store: use two empty lines in-between global functions
2023-11-06 18:39:53 +01:00
Benny Halevy
0064fc55b0 interval: make default ctor and make_open_ended_both_sides constexpr
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15955
2023-11-06 18:39:53 +01:00
Kefu Chai
39340d23e5 storage_service: avoid using non-constexpr as format string
in order to use compile-time format check, we would need to use
compile-time constexpr for the format string. despite that we
might be able to find a way to tell if an expression is compile-time
constexpr in C++20, it'd be much simpler to always use a
known-to-be-constexpr format string. this would help us to eventually
migrate to the compile-time format check in seastar's logging subsystem.

so, in this change, instead of feeding `seastar::logger::info()` and
friends with a non-constexpr format string, let's just use "{}" for
printing it, or mark the format string with `constexpr` instead of
`const`. as the former tells the compiler it is a variable that
can be evaluated at compile-time, while the latter just inform the
compiler that the variable is not mutable after it is initialized.

This change also helps to address the compiling failure with the
yet-merged compile-time format check patch in Seastar:

```
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/storage_service.cc.o -MF service/CMakeFiles/service.dir/storage_service.cc.o.d -o service/CMakeFiles/service.dir/storage_service.cc.o -c /home/kefu/dev/scylladb/service/storage_service.cc
/home/kefu/dev/scylladb/service/storage_service.cc:2460:18: error: call to consteval function 'seastar::logger::format_info<>::format_info<const char *, 0>' is not a constant expression
    slogger.info(str.c_str());
                 ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15959
2023-11-06 18:39:53 +01:00
Kamil Braun
315c69cec2 test: rest_api: test that generation is nonzero in failure_detector/endpoints 2023-11-06 18:03:34 +01:00
Kamil Braun
eb6943b852 api: failure_detector: fix indentation 2023-11-06 17:12:17 +01:00
Kamil Braun
a89c69007e api: failure_detector: invoke on shard 0
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Fixes: scylladb/scylladb#15816
2023-11-06 17:03:38 +01:00
Piotr Dulikowski
85516c9155 raft topology: catch and abort on exceptions from topology_coordinator::run
The `topology_coordinator` function is supposed to handle all of the
exceptions internally. Assert, in runtime, that this is the case by
wrapping the `run` invocation with a try..catch; in case of an
exception, step down as a leader first and then abort.
2023-11-06 15:25:38 +01:00
Anna Stuchlik
a6fd4cccf2 doc: add the upgrade guide from 5.2 to 5.4
This commit adds the upgrade guide from
version 5.2 to 5.4.
Version 5.3 was never released.

This commit must be backported to branch-5.4.
2023-11-06 14:48:26 +01:00
Piotr Dulikowski
843f02eb5d Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
This reverts commit dcaaa74cd4. The
`noexcept` specifier that it added is only relevant to the function and
not the coroutine returned from that function. This was not the
intention and it looks confusing now, so remove it.
2023-11-06 12:00:42 +01:00
Piotr Dulikowski
41c2dac250 raft topology: don't print an error when fencing previous coordinator is aborted
An attempt to fence the previous coordinator may fail because the
current coordinator is aborted. It's not a critical error and it can
happen during normal operations, so lower the verbosity used to print a
message about this error to 'debug'.

Return from the function immediately in that case - the sleep_aborted
that happens as a next step would fail on abort_requested_exception
anyway, so make it more explicit.
2023-11-06 12:00:42 +01:00
Piotr Dulikowski
1408b7cfa8 raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
The fence_previous_coordinator function has a retry loop: if it fails to
perform a group0 operation, it will try again after a 1 second delay.
However, if the topology coordinator is aborted while it waits, an
exception will be thrown and will be propagated out of the function. The
function is supposed to handle all exceptions internally, so this is not
desired.

Fix this by catching the abort_requested_exception and returning from
the function if the exception is caught.
2023-11-06 12:00:41 +01:00
Michał Jadwiszczak
213e39a937 cql-pytest:test_describe: add test for whitespaces in json objects 2023-11-06 10:37:10 +01:00
Kamil Braun
15b441550b gossiper: do_shadow_round: increment nodes_down in case of timeout
Previously we would only increment `nodes_down` when getting
`rpc::closed_error`. Distinguishing between that and timeout is
unreliable. Consider:
1. if a node is dead but we can reach the IP, we'd get `closed_error`
2. if we cannot reach the IP (there's a network partition), the RPC
   would hang so we'd get `timeout_error`
3. if the node is both dead and the IP is unreachable, we'd get
   `timeout_error`

And there are probably other more complex scenarios as well. In general,
it is impossible to distinguish a dead node from a partitioned node in
asynchronous networks, and whether we end up with `closed_error` or
`timeout_error` is an implementation detail of the underlying protocol
that we use.

The fact that `nodes_down` was not incremented for timeouts would
prevent a node from starting if it cannot reach isolated IPs (whether or
not there were dead or alive nodes behind those IPs). This was observed
in a Jepsen test: https://github.com/scylladb/scylladb/issues/15675.

Note that `nodes_down` is only used to skip shadow round outside
bootstrap/replace, i.e. during restarts, where the shadow round was
"best effort" anyway (not mandatory). During bootstrap/replace it is now
mandatory.

Also fix grammar in the error message.
2023-11-06 10:28:08 +01:00
Kamil Braun
897cb6510e gossiper: do_shadow_round: fix nodes_down calculation
During shadow round we would calculate the number of nodes from which we
got `rpc::closed_error` using `nodes_counter`, and if the counter
reached the size of all contact points passed to shadow round, we would
skip the shadow round (and after the previous commit, we do it only in
the case of restart, not during bootstrap/replace which is unsafe).

However, shadow round might have multiple loops, and `nodes_down` was
initialized to `0` before the loop, then reused. So the same node might
be counted multiple times in `nodes_down`, and we might incorrectly
enter the skipping branch. Or we might go over `nodes.size()` and never
finish the loop.

Fix this by initializing `nodes_down = 0` inside the loop.
2023-11-06 10:28:07 +01:00
Kamil Braun
b03fa87551 storage_service: make shadow round mandatory during bootstrap/replace
It is unsafe to bootstrap or perform replace without performing the
shadow round, which is used to obtain features from the existing cluster
and verify that we support all enabled features.

Before this patch, I could easily produce the following scenario:
1. bootstrap first node in the cluster
2. shut it down
3. start bootstrapping second node, pointing to the first as seed
4. the second node skips shadow round because it gets
   `rpc::closed_error` when trying to connect to first node.
5. the node then passes the feature check (!) and proceeds to the next
   step, where it waits for nodes to show up in gossiper
6. we now restart the first node, and the second node finishes bootstrap

The shadow round must be mandatory during bootstrap/replace, which is
what this patch does.

On restart it can remain optional as it was until now. In fact it should
be completely unnecessary during restart, but since we did it until now
(as best-effort), we can keep doing it.
2023-11-06 10:28:07 +01:00
Kamil Braun
7e9e84200c gossiper: do_shadow_round: remove default value for nodes param 2023-11-06 10:28:07 +01:00
Kamil Braun
108aae09c5 gossiper: do_shadow_round: remove fall_back_to_syn_msg
If during shadow round we learned that a contact node does not
understand the GET_ENDPOINT_STATES verb, we'd fall back to old shadow
round method (using gossiper SYN messages).

The verb was added a long time ago and it ended up in Scylla 4.3 and
2021.1. So in newer versions we can make it mandatory, as we don't
support skipping major versions during upgrades. Even if someone
attempted to, they would just get an error and they can retry bootstrap
after finnishing upgrade.
2023-11-06 10:28:07 +01:00
Botond Dénes
2e1562d889 Merge 'dht: i_partitioner cleanup' from Benny Halevy
This series refactors the `dht/i_paritioner.hh` header file
and cleans up its usage so to reduce the dependencies on it,
since it is carries a lot of baggage that is rarely required in other header files.

Closes scylladb/scylladb#15954

* github.com:scylladb/scylladb:
  everywhere: reduce dependencies on i_partitioner.hh
  locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh
  cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration
  dht: reduce dependency on i_partitioner.hh
  dht: fold compatible_ring_position in ring_position.hh
  dht: refactor i_partitioner.hh
  dht: move token_comperator to token.{cc,hh}
  dht/i_partitioner: include i_partitioner_fwd.hh
2023-11-06 10:34:38 +02:00
Kefu Chai
2b961d8e3f build: cmake: define per-mode compile definition
instead of setting for a single CMAKE_BUILD_TYPE, set the compilation
definitions for each build configuration.

this prepares for the multi-config generator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15943
2023-11-06 10:34:38 +02:00
Kefu Chai
f2693752f1 build: cmake: avoid referencing CMAKE_BUILD_TYPE
use generator-expresion instead, so that the value can be evaluated
when generating the build system. this prepares for the multi-config
support.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15942
2023-11-06 10:34:38 +02:00
Botond Dénes
7c7baf71d5 Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani
This mini series purpose is to move all tests (that uses the infrastructure to create a Scylla cluster) to shut down gracefully
on shutdown.
One benefit is that the shutdown sequence for cluster will be tested better, however it is not the main purpose of this change. The main purpose of this change is to pave the way for coverage reporting on all tests and not only the ones that
has a standalone executables.

Full test runs are only slightly impacted by this change (~2.4% increase in runtime):

Without gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft   dev   [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 13.1%

real    4m50.587s
user    13m58.358s
sys     6m55.975s
```

With gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft   dev   [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 12.6%

real    4m57.637s
user    13m56.864s
sys     6m46.657s
```

Closes scylladb/scylladb#15851

* github.com:scylladb/scylladb:
  test.py: move to a gracefull temination of nodes on teardown
  test.py: Use stop lock also in the graceful version
2023-11-06 10:34:38 +02:00
Benny Halevy
a1acf6854b everywhere: reduce dependencies on i_partitioner.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:47:44 +02:00
Benny Halevy
6de1cc2993 locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh
define token_metadata_ptr in token_metadata_fwd.hh
So that the declaration of `make_splitter` can be moved
to token_range_splitter.hh, where it belongs,
and so token_metadata.hh won't have to include it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
182e5381d8 cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
4b184e950a dht: reduce dependency on i_partitioner.hh
include only the required header files where needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
aa70e3a536 dht: fold compatible_ring_position in ring_position.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
28b5482403 dht: refactor i_partitioner.hh
Extract decorated_key.hh and ring_position.hh
out of i_partitioner.hh so they can be included
selectively, since i_partitioner.hh contains too much
bagage that is not always needed in full.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:27 +02:00
Benny Halevy
232918eef0 dht: move token_comperator to token.{cc,hh}
Move the `token_comparator` definition and
implementation to token.{hh,cc}, respectively
since they are independent of i_partitioner.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:15 +02:00
Benny Halevy
8309cf743e dht/i_partitioner: include i_partitioner_fwd.hh
Rather than repeating the same declarations in i_partitioner.hh

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:14 +02:00
Kefu Chai
08f8796cf0 test/object_store: wrap line which is too long
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

also easier to read with smaller screen and/or large fonts.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
5c0e4df624 test/object_store: use pattern matching to capture variable in loop
instead of referencing the elements in tuple with their indexes, use
pattern matching to capture them. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
6208a05c40 test/object_store: remove space after and before '{' and '}'
to be compliant with PEP8, see
https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
231938f739 test/object_store: add an empty line before nested function definition
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
38d5e7cae2 test/object_store: use two empty lines in-between global functions
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Michał Jadwiszczak
cbfbcffc75 schema: add whitespace to description of table options
Values of `caching`, `tombstone_gc` and `cdc` are json object but they
were printed without any whitespaces. This commit adds them after
colons(:) and commas(,), so the values are more readable and it matches
format of old client-side describe.
2023-11-04 12:30:19 +01:00
Kefu Chai
ff12f1f678 docs: add divider using CSS
instead of hardwiring the formatting in the html code, do this using
CSS, more flexible this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
1694a7addc docs: extract _clean_description as a filter
would be better to split the parser from the formatter. in future,
we can apply more filter on top of the exiting one.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
9ddc639237 docs: render option with role
so we can cross-reference them with the syntax like

:confval:`alternator_timeout_in_ms`.

or even render an option like:

.. confval:: alternator_timeout_in_ms

in order to make the headerlink of the option visible,
a new CSS rule is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
53dfb5661d docs: parse source files right into rst
so we can render the rst without writing a temporary YAML.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:33 +08:00
Kamil Braun
6cc5bcae80 test: test_topology_ops: disable background writes
Recently, in a3ba4b3109, this test was
extended with a background task that continuously performs CQL writes.

This turned out to be very valuable and detected a couple of bugs,
including:
https://github.com/scylladb/scylladb/issues/15924
https://github.com/scylladb/scylladb/issues/15935

Unfortunately this causes CI to be flaky.
Until these bugs are fixed, we disable the background writes to unflake
CI.

Closes scylladb/scylladb#15937
2023-11-03 16:52:10 +02:00
Raphael S. Carvalho
cca85f5454 streaming: Don't adjust partition estimate if segregation is postponed
When off-strategy is enabled, data segregation is postponed to when
off-strategy runs. Turns out we're adjusting partition estimate even
when segregation is postponed, meaning that sstables in maintenance
set will smaller filters than they should otherwise have.
This condition is transient as the system eventually heal this
through compactions. But note that with TWCS, problem of inefficient
filters may persist for a long time as sstables written into older
windows may stay around for a significant amount of time.
In the future, we're planning to make this less fragile by dynamically
resizing filters on sstable write completion.
The problem aforementioned is solved by skipping adjustment when
segregation is postponed (i.e. off-strategy is enabled).

Refs #15704.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-03 16:22:07 +02:00
Asias He
2b2302d373 streaming: Ignore dropped table on both sides
It is possible the sender and receiver of streaming nodes have different
views on if a table is dropped or not.

For example:
- n1, n2 and n3 in the cluster

- n4 started to join the cluster and stream data from n1, n2, n3

- a table was dropped

- n4 failed to write data from n2 to sstable because a table was dropped

- n4 ended the streaming

- n2 checked if the table was present and would ignore the error if the table was dropped

- however n2 found the table was still present and was not dropped

- n2 marked the streaming as failed

This will fail the streaming when a table is dropped. We want streaming to
ignore such dropped tables.

In this patch, a status code is sent back to the sender to notify the
table is dropped so the sender could ignore the dropped table.

Fixes #15370

Closes scylladb/scylladb#15912
2023-11-03 13:38:48 +02:00
David Garcia
84e073d0ec docs: update theme 1.6
Closes scylladb/scylladb#15782
2023-11-03 09:45:16 +01:00
Piotr Dulikowski
70f4f8d799 test/pylib: increase control connection timeout in cql_is_up
After starting the associated node, ScyllaServer waits until the node
starts serving CQL requests. It does that by periodically trying to
establish a python driver session to the node.

During session establishment, the driver tries to fetch some metadata
from the system tables, and uses a pretty short timeout to do so (by
default it's 2 seconds). When running tests in debug mode, this timeout
can prove to be too short and may prevent the testing framework from
noticing that the node came up.

Fix the problem by increasing the timeout. Currently, after the session
is established, a query is sent in order to further verify that the
session works and it uses a very generous timeout of 1000 seconds to do
so - use the same timeout for internal queries in the python driver.

Fixes: scylladb/scylladb#15898

Closes scylladb/scylladb#15929
2023-11-03 09:32:11 +01:00
Kefu Chai
5b7feb8b95 build: s/create_building_system/create_build_system/
as build system is more correct in this context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15932
2023-11-03 09:37:44 +02:00
Pavel Emelyanov
3173336e97 tests: Use make_sstable_easy() where appropriate
There are two test cases out there that make sstable, write it and the
load, but the make_sstable_easy() is for that, so use it there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:32:43 +03:00
Pavel Emelyanov
cc89acff67 sstable_conforms_to_mutation_source_test: Open-code the make_sstable()
helper

This test case is pretty special in the sense that it uses custom path
for tempdir to create, write and load sstable to/from. It's better to
open-code the make_sstable() helper into the test case rather than
encourage callers to use custom tempdirs. "Good" test cases can use
make_sstable_easy() for the same purposes (in fact they alredy do).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:30:54 +03:00
Pavel Emelyanov
7f6423bc35 sstable_mutation_test: Use make_sstable_easy() instead of make_sstable()
The latter is only used in the former test case and doesn't provide
extra value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:30:02 +03:00
Pavel Emelyanov
eeee58def8 tests: Make use of make_memtable() helper
There's one in the utils that creates lw_shared_ptr<memtable> and
applies provided vector of mutations into it. Lots of other test cases
do literally the same by hand.

The make_memtable() assumes that the caller is sitting in the seastar
thread, and all the test cases that can benfit from it already are.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:28:35 +03:00
Pavel Emelyanov
c1824324bd tests: Drop as_mutation_source helper
It does nothing by calls the sstable method of the same name. Callers
can do it on their own, the method is public.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:27:59 +03:00
Pavel Emelyanov
3ff32a2ca5 test/sstable_utils: Hide assertion-related manipulations into branch
The make_sstable_containing() can validate the applied mutations are
produced by the resulting sstable if the callers asks for it. To do so
the mutations are merged prior to checking and this merging should only
happen if validation is requested, otherwise it just makes no sense.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:26:46 +03:00
Kamil Braun
8179296f56 Merge 'retry automatic announcements of the schema changes on concurrent operation' from Patryk Jędrzejczak
The follow-up to #15594.

We retry every automatic `migration_manager::announce` if
`group0_concurrent_modification` occurs. Concurrent operations can
happen during concurrent bootstrap in Raft-based topology, so we need
this change to enable support for concurrent bootstrap.

This PR adds retry loops in 4 places:
- `service::create_keyspace_if_missing`,
- `system_distributed_keyspace::start`,
- `redis::create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (used for creating the `system_traces` keyspace).

Fixes #15435

Closes scylladb/scylladb#15613

* github.com:scylladb/scylladb:
  table_helper: fix indentation
  table_helper: retry in setup_keyspace on concurrent operation
  table_helper: add logger
  redis/keyspace_utils: fix indentation
  redis: retry creating defualt databases on concurrent operation
  db/system_distributed_keyspace: fix indentation
  db/system_distributed_keyspace: retry start on concurrent operation
  auth/service: retry creating system_auth on concurrent operation
2023-11-02 17:24:52 +01:00
Kamil Braun
5cf18b18b2 Merge 'raft: topology: outside topology-on-raft mode, make sure not to use its RPCs' from Piotr Dulikowski
Topology on raft is still an experimental feature. The RPC verbs
introduced in that mode shouldn't be used when it's disabled, otherwise
we lose the right to make breaking changes to those verbs.

First, make sure that the aforementioned verbs are not sent outside the
mode. It turns out that `raft_pull_topology_snapshot` could be sent
outside topology-on-raft mode - after the PR, it no longer can.

Second, topology-on-raft mode verbs are now not registered at all on the
receiving side when the mode is disabled.

Additionally tested by running `topology/` tests with
`consistent_cluster_management: True` but with experimental features
disabled.

Fixes: scylladb/scylladb#15862

Closes scylladb/scylladb#15917

* github.com:scylladb/scylladb:
  storage_service: fix indentation
  raft: topology: only register verbs in topology-on-raft mode
  raft: topology: only pull topology snapshot in topology-on-raft mode
2023-11-02 16:44:18 +01:00
Kefu Chai
798eede61a build: cmake: update 3rd party library deps where it is found
move the code which updates the third-party library closer to where
the library is found. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15915
2023-11-02 17:20:57 +02:00
Kefu Chai
0421db2471 build: cmake: enable Seastar_UNUSED_RESULT_ERROR
this mirrors what we already have in `configure.py`.

so that Seastar can report [[nodiscard]] violations as error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15914
2023-11-02 17:19:31 +02:00
Patryk Jędrzejczak
dacec6374d table_helper: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
e10036babe table_helper: retry in setup_keyspace on concurrent operation
Currently, table_helper::setup_keyspace is used only for starting
the system_traces keyspace. We need to handle concurrent group 0
operations possible during concurrent bootstrap in the Raft-based
topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
e2894a081a table_helper: add logger
It will be used in the next commit to log information when
a concurrent group 0 modification occurs.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
3e8a307cd4 redis/keyspace_utils: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
24aa5bf72c redis: retry creating defualt databases on concurrent operation
A concurrent group 0 operation in
create_keyspace_if_not_exists_impl can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
0357636f16 db/system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
813c7a582c db/system_distributed_keyspace: retry start on concurrent operation
A concurrent group 0 operation in
system_distributed_keyspace::start can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
dfba0b9e9b auth/service: retry creating system_auth on concurrent operation
A concurrent group 0 operation in
service::create_keyspace_if_missing can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Pavel Emelyanov
1a44f362b2 pytest: Do not try to guess which scylla binary user wants to run
When running some pytest-based tests they start scylla binary by hand
instead of relying on test.py's "clusters". In automatic run (e.g. via
test.py itself) the correct scylla binary is the one pointed to by
SCYLLA environment, but when run from shell via pytest directly it tries
to be smart and looks at build/*/scylla binaries picking the one with
the greatest mtime.

That guess is not very nice, because if the developer switches between
build modes with configure.py and rebuilds binaries, binaries from
"older" or "previous" builds stay on the way and confuse the guessing
code. It's better to be explicit.

refs: #15679

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15684
2023-11-02 12:34:49 +02:00
Kamil Braun
0846d324d7 Merge 'rollback topology operation on streaming failure' from Gleb
This patch series adds error handling for streaming failure during
topology operations instead of an infinite retry. If streaming fails the
operation is rolled back: bootstrap/replace nodes move to left and
decommissioned/remove nodes move back to normal state.

* 'gleb/streaming-failure-rollback-v4' of github.com:scylladb/scylla-dev:
  raft: make sure that all operation forwarded to a leader are completed before destroying raft server
  storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier
  tests: add tests for streaming failure in bootstrap/replace/remove/decomission
  test/pylib: do not stop node if decommission failed with an expected error
  storage_service: raft topology: fix typo in "decommission" everywhere
  storage_service: raft topology: add streaming error injection
  storage_service: raft topology: do not increase topology version during CDC repair
  storage_service: raft topology: rollback topology operation on streaming failure.
  storage_service: raft topology: load request parameters in left_token_ring state as well
  storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error
  storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch
  storage_service: raft topology: make global_token_metadata_barrier node independent
  storage_service: raft topology: split get_excluded_nodes from exec_global_command
  storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true
  storage_service: raft topology: simplify streaming RPC failure handling
2023-11-02 10:15:45 +01:00
Kamil Braun
ae58e39743 Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during
bootstrap, upgrade etc.) by Scylla that are announced by multiple calls
to `migration_manager::announce` even though they are logically one
change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once.
There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive --
  taking a `read_barrier` is necessary, and that requires contacting a
leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce`
  if `group0_concurrent_modification` occurs to enable support for
concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs
mentioned above would be harder, and fixing the FIXMEs later would also
be harder.

This PR fixes the first two FIXMEs and improves the situation with the
last one by reducing the number of the `announce` calls to two.
Unfortunately, reducing this number to one requires a big refactor. We
can do it as a follow-up to a new, more specific issue. Also, we leave a
new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a
keyspace together with its tables. Until now, the code responsible for
preparing mutations for a new table could assume the existence of the
keyspace. This assumption wasn't necessary, but removing it required
some refactoring.

Fixes scylladb/scylladb#15437

Closes scylladb/scylladb#15897

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-11-02 09:32:35 +01:00
Piotr Dulikowski
6d15f0283e storage_service: fix indentation
It was broken by the previous commit.
2023-11-02 07:39:27 +01:00
Piotr Dulikowski
190d549bd5 raft: topology: only register verbs in topology-on-raft mode
Verbs related to topology on raft should not be sent outside the
topology on raft mode - and, after the previous commit, they aren't.

Make sure not to register handlers for those verbs if topology on raft
mode is not enabled.
2023-11-02 07:39:27 +01:00
Piotr Dulikowski
8727634e9c raft: topology: only pull topology snapshot in topology-on-raft mode
Currently, during group0 snapshot transfer, the node pulling
the snapshot will send the `raft_pull_topology_snapshot` verb even if
the cluster is not in topology-on-raft mode. The RPC handler returns an
empty snapshot in that case. However, using the verb outside topology on
raft causes problems:

- It can cause issues during rolling upgrade as the snapshot transfer
  will keep failing on the upgraded nodes until the leader node is
  upgraded,
- Topology changes on raft are still experimental, and using the RPC
  outside experimental mode will prevent us from doing breaking changes
  to it.

Solve the issue by passing the "topology changes on raft enabled" flag
to group0_state_machine and send the RPC only in topology on raft mode.
2023-11-02 07:39:27 +01:00
Yaniv Kaul
c662fe6444 Debian based Dockerfile: do not install 'suggested' pacakges
We can opt out from installing suggested packages. Mainly those related to Java and friends that we do not seem to need.

Fixes: #15579

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15580
2023-11-01 17:16:18 +02:00
Botond Dénes
a34c8dc485 Merge 'Drop compaction_manager_for_testing' from Pavel Emelyanov
There's such a wrapper class in test_services. After #15889 this class resembles the test_env_compaction_manager and can be replaced with it. However, two users of the former wrapper class need it just to construct table object, and the way they do it is re-implementation of table_for_tests class.

This PR patches the test cases to make use of table_for_tests and removes the compaction_manager_for_testing that becomes unused after it.

Closes scylladb/scylladb#15909

* github.com:scylladb/scylladb:
  test_services: Ditch compaction_manager_for_testing
  test/sstable_compaction_test: Make use of make_table_for_tests()
  test/sstable_3_x_test: Make use of make_table_for_tests()
  table_for_tests: Add const operator-> overload
  sstable_test_env: Add test_env_compaction_manager() getter
  sstable_test_env: Tune up maybe_start_compaction_manager() method
  test/sstable_compaction_test: Remove unused tracker allocation
2023-11-01 16:08:34 +02:00
Botond Dénes
665a5cb322 Update tools/jmx submodule
* tools/jmx 8d15342e...05bb7b68 (4):
  > README: replace 0xA0 (NBSP) character with space
  > scylla-apiclient: update Guava dependency
  > scylla-apiclient: update snakeyaml dependency
  > scylla-apiclient: update Jackson dependencies

[Botond: regenerate frozen toolchain]
2023-11-01 08:08:37 -04:00
Pavel Emelyanov
787c6576fe test_services: Ditch compaction_manager_for_testing
Now this wrapper is unused, all (both) test cases that needed it were
patched to use make_table_for_tests().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
731a82869a test/sstable_compaction_test: Make use of make_table_for_tests()
The max_ongoing_compaction_test test case constructs table object by
hand. For that it needs tracker, compaction manager and stats. Similarly
to previous patch, the test_env::make_table_for_tests() helper does
exactly that, so the test case can be simplified as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
5b3b8c2176 test/sstable_3_x_test: Make use of make_table_for_tests()
The compacted_sstable_reader() helper constructs table object and all
its "dependencies" by hand. The test_env::make_table_for_tests() helper
does the same, so the test code can be simplified.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
9b8f03bdb0 table_for_tests: Add const operator-> overload
Will be used later in boost transformation lambda

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
3021fb7b6c sstable_test_env: Add test_env_compaction_manager() getter
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
19b524d0f3 sstable_test_env: Tune up maybe_start_compaction_manager() method
Make it public and add `bool enable` flag so that test cases could start
the compaction manager (to call make_table_for_tests() later) but keep
it disabled for their testing purposes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
3f354c07a3 test/sstable_compaction_test: Remove unused tracker allocation
The sstable_run_based_compaction_test case allocates the tracker but
doesn't use it. Probably was left after the case was patched to use
make_table_for_tests() helper.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:12 +03:00
Kefu Chai
ef023dae44 s3: use rapixml/rapidxml.hpp as a fallback
on debian derivatives librapidxml-dev installs rapidxml.h as
rapixml/rapidxml.hpp, so let's use it as a fallback.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15814
2023-11-01 10:25:40 +03:00
Kefu Chai
7253369ad9 SCYLLA-VERSION-GEN: respect --date-stamp
before this change the argument passed to --date-stamp option is
ignored, as we don't reference the date-stamp specified with this option
at all. instead, we always overwrite it with the the output of
`date --utc +%Y%m%d`, if we are going to reference this value.

so, in this change instead of unconditionally overwriting it, we
keep its value intact if it is already set.

the change which introduced this regression was 839d8f40e6

Fixes #15894
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15895
2023-11-01 10:24:04 +03:00
Avi Kivity
fcd86d993d Merge 'Put table_for_tests on a diet' from Pavel Emelyanov
The object in question is used to facilitate creation of table objects for compaction tests. Currently the table_for_test carries a bunch of auxiliary objects that are needed for table creation, such as stats of all sorts and table state. However, there's also some "infrastructure" stuff onboard namely:

- reader concurrency semaphore
- cache tracker
- task manager
- compaction manager

And those four are excessive because all the tests in question run inside the sstables::test_env that has most of it.

This PR removes the mentioned objects from table_for_tests and re-uses those from test_env. Also, while at it, it also removes the table::config object from table_for_tests so that it looks more like core code that creates table does.

Closes scylladb/scylladb#15889

* github.com:scylladb/scylladb:
  table_for_tests: Use test_env's compaction manager
  sstables::test_env: Carry compaction manager on board
  table_for_tests: Stop table on stop
  table_for_tests: Get compaction manager from table
  table_for_tests: Ditch on-board concurrency semaphore
  table_for_tests: Require config argument to make table
  table_for_tests: Create table config locally
  table_for_tests: Get concurrency semaphore from table
  table_for_tests: Get table directory from table itself
  table_for_tests: Reuse cache tracker from sstables manager
  table_for_tests: Remove unused constructor
  tests: Split the compaction backlog test case
  sstable_test_env: Coroutinize and move to .cc test_env::stop()
2023-10-31 18:03:07 +02:00
Piotr Smaroń
8c464b2ddb guardrails: restrict replication strategy (RS)
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.

Resolves #5224
Refs #8892 (the replication strategy part, not the RF part)

Closes scylladb/scylladb#15399
2023-10-31 18:34:41 +03:00
Botond Dénes
287f05ad26 Merge 'scylla-sstable/tools: Use semi-properly initiated db::config + extensions to allow encrypted sstables' from Calle Wilund
Refs https://github.com/scylladb/scylla-enterprise/issues/3461
Refs https://github.com/scylladb/scylla-enterprise/issues/3210

Adds a tool-app global db::config + extensions to each tool invocation + configurable init.
Uses this in scylla-sstables, allowing both enterprise-only configs to be read, as well as (almost all)
encrypted sstables.

Note: Do not backport to enterprise before https://github.com/scylladb/scylla-enterprise/pull/3473 is merged, otherwise tools will break there.

Closes scylladb/scylladb#15615

* github.com:scylladb/scylladb:
  scylla-sstable: Use tool-global config + extensions
  tools: Add db config + extensions to tool app run
2023-10-31 14:21:57 +02:00
Pavel Emelyanov
b974d8ca1b stream_session: Do not print banign exceptions with error level
Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The
resulting future is then checked for being exceptional and an error
message is printed in logs.

However, if reader fails because of socket being closed by peer, the
error looks excessive. In that case the exception is just regular
handling of the socket/stream closure and can be demoted down to debug
level.

fixes: #15891

Similar cherry-picking of log level exists in e.g. storage proxy, see
for example 56bd9b5d (service: storage_proxy: do not report abort
    requests in handle_write )

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15892
2023-10-31 14:21:22 +02:00
Gleb Natapov
15a34f650d gossip: remove unused HIBERNATE gossiper status
The status is not used since 2ec1f719de
which is included in scylla-4.6.0. We cannot have mixed cluster with the
version so old, so the new version should not carry the compatibility
burden.
2023-10-31 14:08:38 +02:00
Gleb Natapov
35a1ac1a9a gossip: remove unused STATUS_MOVING state
Moving operation was removed by 4a0b561376
and since then the state is unused. Even back then it worked only for
the case of one token so it is safe to say we never used it. Lets
remove the remains of the code instead of carrying it forever.
2023-10-31 13:54:46 +02:00
Kefu Chai
2cd804b8e5 build: cmake: do not hardwire build_reloc.sh arguments
before this change, we feed `build_reloc.sh` with hardwired arguments
when building python3 submodule. but this is not flexible, and hurts
the maintainability.

in this change, we mirror the behavior of `configure.py`, and collect
the arguments from the output of `install-dependencies.sh`, and feed
the collected argument to `build_reloc.sh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15885
2023-10-31 13:27:12 +02:00
Botond Dénes
90a8489809 repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ranges
We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: #14330

Closes scylladb/scylladb#15879
2023-10-31 13:24:54 +02:00
Avi Kivity
ef7db6df99 Merge 'schema_tables: turn view schema fixing code into a sanity check' from Kamil Braun
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.

The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).

The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.

The main motivation of this PR is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.

The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.

Closes scylladb/scylladb#15695

* github.com:scylladb/scylladb:
  view: remove unused `_backing_secondary_index`
  schema_tables: turn view schema fixing code into a sanity check
  schema_tables: make comment more precise
  feature_service: make COMPUTED_COLUMNS feature unconditionally true
2023-10-31 13:23:19 +02:00
Kefu Chai
e853d7bb4b build: cmake: add Scylla_DATE_STAMP option
to be compatible with `configure.py` which allows us to optionally
specify the --date-stamp option for SCYLLA-VERSION-GEN. this option
is used by our CI workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15896
2023-10-31 13:21:30 +02:00
Eliran Sinvani
2a45fed0cf test.py: move to a gracefull temination of nodes on teardown
This change move existing suits which create cluster through the
testing infra to be stopped and uninstalled gracefully.
The motivation, besides the obvious advantage of testing our stop
sequence is that it will pave the way for applying code coverage support
to all tests (not only standalone unit and boost test executables).

testing:
	Ran all tests 10 times in a row in dev mode.
	Ran all tests once in release mode
	Ran all tests once in debug mode

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-10-31 13:12:49 +02:00
Eliran Sinvani
62ec1fe8e0 test.py: Use stop lock also in the graceful version
An already known race (see: https://github.com/scylladb/scylladb/issues/15755)
has been found once again as part of moving all tests to stop all nodes
gracefully on teardown.
The solution was to add the lock acquisition also to `stop_gracefully`.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-10-31 13:12:49 +02:00
Patryk Jędrzejczak
ba5275a6ae table_helper: announce twice in setup_keyspace
We refactor table_helper::setup_keyspace so that it calls
migration_manager::announce at most twice. We achieve it by
announcing all tables at once.

The number of announcements should further be reduced to one, but
it requires a big refactor. The CQL code used in
parse_new_cf_statement assumes the keyspace has already been
created. We cannot have such an assumption if we want to announce
a keyspace and its tables together. However, we shouldn't touch
the CQL code as it would impact user requests, too.

One solution is using schema_builder instead of the CQL statements
to create tables in table_helper.

Another approach is removing table_helper completely. It is used
only for the system_traces keyspace, which Scylla creates
automatically. We could refactor the way Scylla handles this
keyspace and make table_helper unneeded.
2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak
bf15d5f7bb table_helper: refactor setup_table
In the following commit, we reduce migration_manager::announce
calls in table_helper::setup_keyspace by announcing all tables
together. To do it, we cannot use table_helper::setup_table
anymore, which announces a single table itself. However, the new
code still has to translate CQL statements, so we extract it to the
new parse_new_cf_statement function to avoid duplication.
2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak
4dd5d8e5be redis: create_keyspace_if_not_exists_impl: fix indentation
Broken in the previous commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
3be7215163 redis: announce once in create_keyspace_if_not_exists_impl
We refactor create_keyspace_if_not_exists_impl so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
df199eec11 db: system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
91ff8007b3 db: system_distributed_keyspace: announce once in start
We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.

We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
5027c5f1e5 tablet_allocator: update on_before_create_column_family
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.

This change is necessary before merging migration_manager::announce
calls in the following commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a762179972 migration_listener: add parameter to on_before_create_column_family
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a2e48b1a5b alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
4ad2d895a3 alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
fb2703de50 migration_manager: add new prepare_new_column_family_announcement
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
2023-10-31 12:08:03 +01:00
Kefu Chai
9dd5af7fef alternator: avoid using the deprecated API
this change silences following compiling warning due to using the
deprecated API by using the recommended API in place of the deprecated
one:

```
/home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations]
            _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
                          ^
/home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here
    [[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]]
      ^
1 warning generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15884
2023-10-31 12:05:58 +03:00
Botond Dénes
4a0f16474f Merge 'row_cache: abort on exteral_updater::execute errors' from Benny Halevy
Currently the cache updaters aren't exception safe
yet they are intended to be.

Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.

Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.

Fixes scylladb/scylladb#15576

Closes scylladb/scylladb#15577

* github.com:scylladb/scylladb:
  row_cache: abort on exteral_updater::execute errors
  row_cache: do_update: simplify _prev_snapshot_pos setup
2023-10-31 10:07:01 +02:00
Pavel Emelyanov
4db80ed61f table_for_tests: Use test_env's compaction manager
Now when the sstables::test_env provides the compaction manager
instance, the table_for_tests can start using it and can remove c.m. and
the sidecar task_manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:42:19 +03:00
Pavel Emelyanov
2c78b46c78 sstables::test_env: Carry compaction manager on board
Most of the test cases that use sstables::test_env do not mess with
table objects, they only need sstables. However, compaction test cases
do need table objects and, respectively, a compaction manager instance.
Today those test cases create compaction manager instance for each table
they create, but that's a bit heaviweight and doesn't work the way core
code works. This patch prepares the sstables::test_env to provide
compaction manager on demand by starting it as soon as it's asked to
create table object.

For now this compaction manager is unused, but it will be in next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:39:54 +03:00
Pavel Emelyanov
b96d39e63a table_for_tests: Stop table on stop
Next patches will stop using compaction manager from table_for_tests in
favor of external one (spoiler: the one from sstables::test_env), thus
the compaction manager would outsurvive the table_for_tests object and
the table object wrapped by it. So in order for the table_for_tests to
stop correctly, it also needs to stop the wrapped table too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:38:03 +03:00
Pavel Emelyanov
e71409df38 table_for_tests: Get compaction manager from table
There's table_for_tests::get_compaction_manager() helper that's
excessive as compaction manager reference can be provided by the wrapped
table object itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:37:22 +03:00
Pavel Emelyanov
ac45aae0c4 table_for_tests: Ditch on-board concurrency semaphore
It's not used any longer and can be removed. This make table_for_tests
stopping code a bit shorter as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:36:59 +03:00
Pavel Emelyanov
21998296a7 table_for_tests: Require config argument to make table
This is the continuation of the previous patch. Make the caller of
table_for_tests constructor provide the table::config. This makes the
table_for_tests constructor shorter and more self-contained.

Also, the caller now needs to provide the reference to reader
concurrency semaphore, and that's good news, because the only caller for
today is the sstables::test_env that does have it. This makes the
semaphore sitting on table_for_tests itself unused and it will be
removed eventually.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:34:59 +03:00
Pavel Emelyanov
5ab1af3804 table_for_tests: Create table config locally
The table_for_tests keeps a copy of table::config on board. That's not
"idiomatic" as table config is a temporary object that should only be
needed while creating table object. Fortunately, the copy of config on
table_for_tests is no longer needed and it can be made temporary.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:33:29 +03:00
Pavel Emelyanov
76e57cc805 table_for_tests: Get concurrency semaphore from table
Making compaction permit needs a semaphore. Current code gets it from
the table_for_tests, but the very same semaphore reference sits on the
table. So get it from table, as the core code does. This will allow
removing the dedicated semaphore from table_for_tests in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:32:32 +03:00
Pavel Emelyanov
35f7ada949 table_for_tests: Get table directory from table itself
Making sstable for a table needs passing table directory as an argument.
Current table_for_tests's helper gets the directory from table config,
but the very same path sits on the table itself. This makes testing code
to construct sstable look closer to the core code and is also the
prerequisite for removing the table config from table_for_tests in the
future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:30:59 +03:00
Pavel Emelyanov
769d9f17eb table_for_tests: Reuse cache tracker from sstables manager
When making table object it needs the cache tracker reference. The
table_for_tests keeps one on board, but the very same object already
sits on the sstables manager which has public getter.

This makes the table_for_tests's cache tracker object not needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:29:49 +03:00
Pavel Emelyanov
89e253c77e table_for_tests: Remove unused constructor
No code constructs it with just sstables manager argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:29:29 +03:00
Pavel Emelyanov
cba8f633f1 tests: Split the compaction backlog test case
To improve parallelizm of embedded test sub-cases.
By coinsidence, indentation fix is not required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:27:57 +03:00
Pavel Emelyanov
8d704f2532 sstable_test_env: Coroutinize and move to .cc test_env::stop()
It's going to get larger, so better to move.
Also when coroutinized it's goind to be easier to extend.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:26:58 +03:00
Kefu Chai
89a75967b1 build: ignore FileExistsError when creating compile_commands.json
before this change, we only check the existence of compile_commands.json
before creating a symlink to build/*/compile_commands.json. but there are
chances that multiple ninja tasks are calling into `configure.py` for
updating `build.ninja`: this does not break the process, as the last one
wins: we just unconditionally `mv build.ninja.new build.ninja` for
updating the this file. but this could break the build of
`'compile_commands.json`: we create a symlink with Python, and if it
fails the Python script errors out.

in this change, we just ignore the `FileExistsError` when creating
the symlink to `compile_commands.json`. because, if this symlink,
we've achieved the goal, and should not consider it a failure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15870
2023-10-30 23:47:48 +02:00
Anna Stuchlik
d4b1e8441a doc: add the latest AWS image info to Installation
This commit adds the AWS image information for
the latest patch release to the Launch on AWS
page in the installation section.

This is a follow-up PR required to finalize
the AWS installation docs and should be
backported to branch-5.4.

Related:
https://github.com/scylladb/scylladb/pull/14153
https://github.com/scylladb/scylladb/pull/15651

Closes scylladb/scylladb#15867
2023-10-30 23:41:23 +02:00
Avi Kivity
949e9f1205 Merge 'Nodetool additional commands 3/N' from Botond Dénes
This PR implements the following new nodetool commands:
* cleanup
* clearsnapshots
* listsnapshots

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15843

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the listsnapshots command
  tools/scylla-nodetool: implement clearsnapshot command
  tools/scylla-nodetool: implement the cleanup command
  test/nodetool: rest_api_mock: add more options for multiple requests
  tools/scylla-nodetool: log responses with trace level
2023-10-30 21:53:36 +02:00
Avi Kivity
5a7d15a666 Update seastar submodule
* seastar 17183ed4e4...830ce86738 (6):
  > coroutine: fix use-after-free in parallel_for_each
  > build: do not provide zlib as an ingredient
  > http: do not use req.content_length as both input parameter
  > io_tester: disable -Wuninitialized when including boost.accumulators
  > scheduling: revise the doxygen comment of create_scheduling_group()
  > Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka

Closes scylladb/scylladb#15871
2023-10-30 21:39:12 +02:00
Avi Kivity
03a801b61b Merge 'Nodetools docs improvements 1/N' from Botond Dénes
While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code.
This PR contains fixes for nodetool compact, stop and scrub.

Closes scylladb/scylladb#15636

* github.com:scylladb/scylladb:
  docs: nodetool compact: remove common arguments
  docs: nodetool stop: fix compaction types and examples
  docs: nodetool compact: remove unsupported partition option
2023-10-30 20:17:14 +02:00
Pavel Emelyanov
c88de8f91e test/compaction: Use shorter make_table_for_tests() overload
There's one that doesn't need tempdir path argument since it gets one
from the env onboard tempdir anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15825
2023-10-30 20:16:29 +02:00
Paweł Zakrzewski
384427bd02 doc: Replace instances of SimpleStrategy with NetworkTopologyStrategy
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.

Referenced issue: #14290

Closes scylladb/scylladb#15856
2023-10-30 20:15:48 +02:00
Pavel Emelyanov
7fa7a9495d task_manager: Don't leave task_ttl uninitialized
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15859
2023-10-30 20:15:05 +02:00
Kefu Chai
d01b9f95a0 build: cmake: disable sanitize-address-use-after-scope only when needed
we enable sanitizer only in Debug and Sanitize build modes, if we pass
`-fno-sanitize-address-use-after-scope` to compiler when the sanitizer
is not enabled when compiling, Clang complains like:

```
clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

this breaks the build on the build modes where sanitizers are not
enabled.

so, in this change, we only disable the sanitize-address-use-after-scope
sanitizer if the sanitizers are enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15868
2023-10-30 20:14:12 +02:00
Anna Stuchlik
9f85b1dc38 doc: remove version "5.3" from the docs
Version 5.3 was never released. This commit
removes mentions of the version from the docs.
2023-10-30 15:56:53 +01:00
Anna Stuchlik
8723f71a3d doc: remove the 5.2-to-5.3 upgrade guide
Version 5.3 was never released, so the upgrade
guide must be removed.
2023-10-30 15:47:23 +01:00
Marcin Maliszkiewicz
3992d1c2ce alternator: add support for ReturnValuesOnConditionCheckFailure feature
As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/,
DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem),
ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the
current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481
2023-10-30 15:33:56 +01:00
Marcin Maliszkiewicz
b4c77a373d alternator: add ability to send additional fields in api_error
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.

The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.
2023-10-30 15:13:06 +01:00
Calle Wilund
b9e57583f3 scylla-sstable: Use tool-global config + extensions
Uses a single db::config + extensions, allowing both handling
of enterprise-only scylla.yaml keys, as well as loading sstables
utilizing extension in that universe.
2023-10-30 10:22:12 +00:00
Calle Wilund
6de4e7af21 tools: Add db config + extensions to tool app run
Initializes extensions for tools runs, allowing potentially more interaction
with, say, sstables in some versions of scylla.
2023-10-30 10:20:53 +00:00
Avi Kivity
d450a145ce Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak"
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
2023-10-29 18:32:06 +02:00
Wojciech Mitros
f08e7aad61 test: account for multiple flushes of commitlog segments
Currently, when we calculate the number of deactivated segments
in test_commitlog_delete_when_over_disk_limit, we only count the
segments that were active during the first flush. However, during
the test, there may have been more than one flush, and a segment
could have been created between them. This segment would sometimes
get deactivated and even destroyed, and as a result, the count of
destroyed segments would appear larger than the count of deactivated
ones.

This patch fixes this behavior by accounting for all segments that
were active during any flush instead of just segments active during
the first flush.

Fixes #10527

Closes scylladb/scylladb#14610
2023-10-29 18:30:32 +02:00
Michał Chojnowski
93ea3d41d8 position_in_partition: make operator= exception-safe
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.

The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.

The easy fix is to perform the only potentially-throwing step first.

Fixes #15822

Closes scylladb/scylladb#15864
2023-10-29 18:30:32 +02:00
Andrii Patsula
5807ef0bb7 test: Verify server exit code during graceful process shutdown.
Currently, it's possible for a test to pass even if the server crashes
during a graceful shutdown. Additionally, the server may crash in the
middle of a test, resulting in a test failure with an inaccurate
description.  This commit updates the test framework to monitor the
server's return code and throw an exception in the event of an abnormal
server shutdown.

Fixes scylladb/scylla#15365

Closes scylladb/scylladb#15660
2023-10-29 18:30:32 +02:00
Kefu Chai
2be5a86a14 test/pylib: unset the env variables set by MinIoServer
before this change, when running object_store tests with `pytest`
directly, an instance of MinIoServer is started as a function-scope
fixture, but the environmental variables set by it stay with the
process, even after the fixture is teared down. So, when the 2nd test
in the same process check these environmental variables, it would
under the impression that there is already a S3 server running, and
thinks it is drived by `test.py`, hence try to reuse the S3 server.
But the MinIoServer instance is teared down at that moment, when
the first test is completed.

So the test is likely to fail when the Scylla instance tries
to read the missing conf file previously created by the MinIoServer.

after this change, the environmental variables are reset, so they
won't be seen by the succeeding tests in the same pytest session.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15779
2023-10-29 18:30:32 +02:00
Botond Dénes
132ae92c75 Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15818

* github.com:scylladb/scylladb:
  build: move the code with side effects into a single function
  build: create outdir when outdir is explictly used
  build: group the code with side effects together
  build: do not rely on updating global with a dict
  build: extract generate_version() out
  build: extract get_release_cxxflags() out
  build: extract get_extra_cxxflags() out
  build: move thrift_libs to where it is used
  build: move pkg closer to where it is used
  build: remove unused variable
  build: move variable closer to where it is used
2023-10-29 18:30:32 +02:00
Avi Kivity
e349a2657c Merge 'Allow running perf-simple-query with tablets' from Tomasz Grabiec
Usage:

```
build/dev/scylla perf-simple-query --tablets
```

Closes scylladb/scylladb#15656

* github.com:scylladb/scylladb:
  perf_simple_query: Allow running with tablets
  tests: cql_test_env: Allow creating keyspace with tablets
  tests: cql_test_env: Register storage_service in migration notifier
  test: cql_test_env: Initialize node state in topology
2023-10-29 18:30:32 +02:00
Aleksandr Bykov
6b991b4791 doc: add note about run test.py with toolchain/dbuild
test.py tests could be run with toolchain/dbuild and in this case
there is no need to executed ./install-dependicies.sh.

Closes scylladb/scylladb#15837
2023-10-29 18:30:32 +02:00
Kefu Chai
3a6e359328 build: cmake: add token_metadata.cc to api
`token_metadata.cc` moved into api in e4c0a4d34d, let's update CMake
accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15857
2023-10-29 18:30:32 +02:00
Kefu Chai
8819865c8d build: cmake: correct the variable names in mode.Dev.cmake
it was a copy-pasta error.

- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15849
2023-10-29 18:30:32 +02:00
Kamil Braun
1c0ae2e7ef Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.

However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.

Refs: scylladb/scylladb#15686
Fixes: scylladb/scylladb#15738

Closes scylladb/scylladb#15724

* github.com:scylladb/scylladb:
  test: test_topology_ops: continuously write during the test
  raft topology: assign tokens after join node response rpc
  storage_service: fix indentation after previous commit
  raft topology: loosen assumptions about transition nodes having tokens
2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz
020a9c931b db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845
2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak
a6236072ee raft topology: join_node_request_handler: wait until first node becomes normal
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.

If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.

We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.

Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.

Fixes: scylladb/scylladb#15807

Closes scylladb/scylladb#15775
2023-10-29 18:30:32 +02:00
Botond Dénes
16ce212c31 tools/scylla-nodetool: implement the listsnapshots command
The output is changed slightly, compared to the current nodetool:
* Number columns are aligned to the right
* Number columns don't have decimal places
* There are no trailing whitespaces
2023-10-27 01:26:54 -04:00
Botond Dénes
27854a50be tools/scylla-nodetool: implement clearsnapshot command 2023-10-27 01:26:54 -04:00
Botond Dénes
b32ee54ba0 tools/scylla-nodetool: implement the cleanup command
The --jobs command-line argument is accepted but ignored, just like the
current nodetool does.
2023-10-27 01:26:53 -04:00
Botond Dénes
7e3a78d73d test/nodetool: rest_api_mock: add more options for multiple requests
Change the current bool multiple param to a weak enum, allowing for a
third value: ANY, which allows for 0 matches too.
2023-10-26 08:31:12 -04:00
Botond Dénes
b878dcc1c3 tools/scylla-nodetool: log responses with trace level
With this, both requests and responses to/from the remote are logged
when trace-level logging is enabled. This should greatly simplify
debugging any problems.
2023-10-26 08:28:37 -04:00
Anna Stuchlik
eb57c3bc22 doc: remove versions from Materialized Views
This commit removes irrelevant information
about versions from the Materialized Views
page (CQL Reference).
In addition, it replaces "Scylla" with
"ScyllaDB" on MV-related pages.
2023-10-26 12:08:13 +02:00
Anna Stuchlik
29bd044db3 doc: add CQL Reference for Materialized Views
This commit adds CQL Reference for Materialized
Views to the Materialized Views page.
2023-10-26 11:47:22 +02:00
Kefu Chai
227136ddf5 main.cc: specify shortname for scheduling groups
so, for instance, the logging message looks like:
```
INFO  2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode
```
instead of
```
INFO  2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode
```

Fixes #15267
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15821
2023-10-26 10:52:05 +03:00
Kefu Chai
d43afd576e cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.

so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.

see also a0ffbf3291

Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15718
2023-10-26 10:00:37 +03:00
Kefu Chai
bfd99fad7f build: move the code with side effects into a single function
so that we can optionally utilize CMake for generating the building
system instead.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
85cc9073c9 build: create outdir when outdir is explictly used
actually we've created outdir when using it as the parent directory
of `tempfile.tempdir`, but there are many places where we use
`tempfile.tempdir` for, for instance, testing the compiler flags,
and these tests will be removed once we migrate to CMake, so they
do not really matter when reviewing the change which migrates to
CMake.

the point of this change is to help the review understand the major
changes performed by the migration.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
6c7cc927b5 build: group the code with side effects together
so we can move them into a single function

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a375ce2ac1 build: do not rely on updating global with a dict
we use `globals().update(vars(args))` for updating the global variables
with a dict in `args`, this is convenient, but it hurts the readability.
let's reference the parsed options explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a25a153e9f build: extract generate_version() out
so we don't do less things with side effects in the global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
cb6531b1a8 build: extract get_release_cxxflags() out
prepare for the change to read the SCYLLA-*-FILE in functions not
doing this in global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
ec7ac3c750 build: extract get_extra_cxxflags() out
on top of per-mode cxxflags, we apply more of them based on settings
and building environment. to reduce the statements in global scope,
let's extract the related code into a function.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
8646e6c5d1 build: move thrift_libs to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:38 +08:00
Kefu Chai
8b76f2a835 build: move pkg closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
ea6bf6b908 build: remove unused variable
`optional_packages` was introduced in 8b0a26f06d, but we don't
offer the alternative versions of libsystemd anymore, and this
variable is not used in `configure.py`, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
846218a8bc build: move variable closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Yaniv Kaul
600822379d Docs: small typo in cql extensions page
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15840
2023-10-25 17:27:04 +03:00
Botond Dénes
5d1e9d8c46 Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov
This is the continuation for 19fc01be23

Registering API handlers for services need to

* use only the required service (sharded<> one if needed)
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)

There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency.

Closes scylladb/scylladb#15831

* github.com:scylladb/scylladb:
  api: Remove http::context -> token_metadata dependency
  api: Pass shared_token_metadata instead of storage_service
  api: Move snitch endpoints that use token metadata only
  api: Move storage_service endpoints that use token metadata only
2023-10-25 17:19:39 +03:00
Anna Stuchlik
ad29ba4cad doc: add info about encrypted tables to Backup
This commit updates the introduction of the Backup Your Data page to include information about encryption.

Fixes https://github.com/scylladb/scylladb/issues/15573

Closes scylladb/scylladb#15612
2023-10-25 17:15:15 +03:00
Avi Kivity
782c6a208a Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes
Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running.
To prevent this, keep the erm alive for the duration of the query.
Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it.

Fixes: #15802

Closes scylladb/scylladb#15808

* github.com:scylladb/scylladb:
  cql3: mutation_fragments_select_statement: use host_id instead of node
  cql3: mutation_fragments_select_statement: pin erm reference
2023-10-25 15:03:07 +03:00
Gleb Natapov
9f6e93c144 raft: make sure that all operation forwarded to a leader are completed before destroying raft server
Hold a gate around all operations that are forwarded to a leader to be
able to wait for them during server::abort() otherwise the abort() may
complete while those operations are still running which may cause use
after free.
2023-10-25 13:29:36 +03:00
Gleb Natapov
ba044b769a storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier
global_token_metadata_barrier and global_tablet_token_metadata_barrier
are doing practically the same thing now. Call the former from the
later.
2023-10-25 13:29:36 +03:00
Gleb Natapov
72419f1a61 tests: add tests for streaming failure in bootstrap/replace/remove/decomission 2023-10-25 13:29:36 +03:00
Gleb Natapov
b072ddd8a7 test/pylib: do not stop node if decommission failed with an expected error 2023-10-25 13:03:57 +03:00
Gleb Natapov
cee7aab32c storage_service: raft topology: fix typo in "decommission" everywhere 2023-10-25 13:03:57 +03:00
Gleb Natapov
0201304096 storage_service: raft topology: add streaming error injection
Add error injection into the stream_ranges topology command.
2023-10-25 13:03:57 +03:00
Gleb Natapov
ba217d9341 storage_service: raft topology: do not increase topology version during CDC repair
CDC repair operation does not change the topology, but it goes through
the same state as bootstrap that does. Distinguish between two cases and
increment the topology version only in the case of the bootstrap.
2023-10-25 13:03:56 +03:00
Gleb Natapov
8e393ea750 storage_service: raft topology: rollback topology operation on streaming failure.
Currently if a streaming fails during a topology operation the streaming
is retried until is succeeds. If it will never succeed it will be
retried forever. There is no way to stop the topology operation.

This patch introduce the rollback mechanism on streaming failure. If
streaming fails during bootstrap/replace the bootstrapping/replacing node
is moved to the left_token_ring state (and then left state)
and the operation has to be restarted after removing data directory. If
streaming fails during decommission/remove the node is moved back to
normal and the operation need to be restarted after the failure reason
is eliminated.
2023-10-25 13:03:55 +03:00
Gleb Natapov
0a8c3e5c78 storage_service: raft topology: load request parameters in left_token_ring state as well
Next patch will want to access request parameters in left_token_ring for
failure recovery purposes.
2023-10-25 12:56:27 +03:00
Gleb Natapov
49b6153d27 storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error
Term change is not an error. Do not report it as such.
2023-10-25 12:56:27 +03:00
Gleb Natapov
5b760572df storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch
Currently we get a future and check if it is failed, but with
co-routines the complication is not needed. And since we want to filer
out some errors in the next patch with try/catch it will be more
effective.
2023-10-25 12:56:27 +03:00
Gleb Natapov
466fe35474 storage_service: raft topology: make global_token_metadata_barrier node independent
We want to use global_token_metadata_barrier without the node, so make
it accept guard and excluded nodes directly.
2023-10-25 12:56:26 +03:00
Gleb Natapov
a49ae3ff87 storage_service: raft topology: split get_excluded_nodes from exec_global_command
Will be used later.
2023-10-25 12:56:26 +03:00
Gleb Natapov
897a7e599a storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true 2023-10-25 12:56:26 +03:00
Gleb Natapov
7f1aa41e86 storage_service: raft topology: simplify streaming RPC failure handling
Currently streaming failure handling is different for "removing" and all
other operations. Unify them in one try/catch.
2023-10-25 12:56:26 +03:00
Piotr Dulikowski
a3ba4b3109 test: test_topology_ops: continuously write during the test
In order to detect issues where requests are routed incorrectly during
topology changes, modify the test_topology_ops test so that it runs a
background process that continuously writes while the test performs
topology changes in the cluster.

At the end of the test check whether:

- All writes were successful (we only require CL=LOCAL_ONE)
- Whether there are any errors from the replica side logic in the nodes'
  logs (which happen e.g. when node receives writes before learning
  about the schema)
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
63aa9332aa raft topology: assign tokens after join node response rpc
Currently, when the topology coordinator accepts a node, it moves it to
bootstrap state and assigns tokens to it (either new ones during
bootstrap, or the replaced node's tokens). Only then it contacts the
joining node to tell it about the decision and let it perform a read
barrier.

However, this means that the tokens are inserted too early. After
inserting the tokens the cluster is free to route write requests to it,
but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join
node response RPC which forces the receiving node to perform a read
barrier.
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
46fce4cff3 storage_service: fix indentation after previous commit 2023-10-25 11:50:17 +02:00
Piotr Dulikowski
2d161676c7 raft topology: loosen assumptions about transition nodes having tokens
In later commits, tokens for a joining/replacing node will not be
inserted when the node enters `bootstrapping`/`replacing` state but at
some later step of the procedure. Loosen some of the assumptions in
`storage_service::topology_state_load` and
`system_keyspace::load_topology_state` appropriately.
2023-10-25 11:50:17 +02:00
Anna Stuchlik
e223624e2e doc: fix the Reference page layout
This commit fixes the layout of the Reference
page. Previously, the toctree level was "2",
which made the page hard to navigate.
This PR changes the level to "1".

In addition, the capitalization of page
titles is fixed.

This is a follow-up PR to the ones that
created and updated the Reference section.
It must be backported to branch-5.4.

Closes scylladb/scylladb#15830
2023-10-25 12:15:27 +03:00
Botond Dénes
ceb866fa2e Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov
When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case

fixes: #13014

Closes scylladb/scylladb#15824

* github.com:scylladb/scylladb:
  test: Add s3_client test for upload PUT fallback
  s3/client: Add PUT fallback to upload sink
2023-10-25 10:03:46 +03:00
Pavel Emelyanov
cb63d303f0 test: Make test_sstables_excluding_staging_correctness run over s3 too
This test checks the way sstable is moved and lives in staging state.
Now it passes on S3 as well

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
d827068d01 sstables,s3: Support state change (without generation change)
Now when the system.sstables has the state field, it can be changed
(UPDATEd). However, when changing the state AND generation, this still
won't work, because generation is the clustering key of the table in
question and cannot be just changed. This, nonetheless, is OK, as
generation changes with state only when moving an sstable from upload
dir into normal/staging and this is separate issue for S3 (#13018). For
now changing state only is OK.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
ca5d3d217f system_keyspace: Add state field to system.sstables
The state is one of <empty>(normal)/staging/quarantine. Currently when
sstable is moved to non-normal state the s3 backend state_change() call
throws thus such sstables do not appear. Next patches are going to
change that and the new field in the system.sstables is needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
295936c1d3 sstable_directory: Tune up sstables entries processing comment
In fact, this FIXME had been fixed by 2c9ec6bc (sstable_directory:
Garbage collect S3 sstables on reboot) and is no longer valid. However,
it's still good to know if GC failed or misbehaved, so replace the
comment with a warning.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
e4162227ff system_keyspace: Tune up status change trace message
There will appear very similar one tracing the state change, so it's
good to tell them from one another.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
63758d19ce sstables: Add state string to state enum class convert
There's the backward converter already out there. Next code will need to
convert string representation of the state back to the internal type.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
8e1ff745fa api: Remove http::context -> token_metadata dependency
Now the token metadata usage is fine grained by the relevant endpoint
handlers only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:49:05 +03:00
Pavel Emelyanov
be9ea0c647 api: Pass shared_token_metadata instead of storage_service
The token metadata endpoints need token metadata, not storage service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:48:27 +03:00
Pavel Emelyanov
c23193bed0 api: Move snitch endpoints that use token metadata only
Snitch is now a service can speaks for the local node only. In order to
get dc/rack for peers in the cluster one need to use topology which, in
turn, lives on token metadata. This patch moves the dc/rack getters to
api/token_metadata.cc next to other t.m. related endpoints.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:47:18 +03:00
Pavel Emelyanov
e4c0a4d34d api: Move storage_service endpoints that use token metadata only
There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:44:53 +03:00
Pavel Emelyanov
caa3e751f7 test: Add s3_client test for upload PUT fallback
The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 15:03:53 +03:00
Kamil Braun
db49ccccb0 view: remove unused _backing_secondary_index
This boolean was only used for a sanity check which was replaced with a
stronger sanity check in the previous commit that doesn't require the
boolean.
2023-10-24 13:33:36 +02:00
Kamil Braun
3976808b12 schema_tables: turn view schema fixing code into a sanity check
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.

The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).

The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.

The main motivation of this patch is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.

The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
2023-10-24 13:33:35 +02:00
Kamil Braun
f02ac9a9e7 schema_tables: make comment more precise
`maybe_fix_legacy_secondary_index_mv_schema` function has this piece of
code:

```
// If the first clustering key part of a view is a column with name not found in base schema,
// it implies it might be backing an index created before computed columns were introduced,
// and as such it must be recreated properly.
if (!base_schema->columns_by_name().contains(first_view_ck.name())) {
    schema_builder builder{schema_ptr(v)};
    builder.mark_column_computed(first_view_ck.name(), std::make_unique<legacy_token_column_computation>());
   if (preserve_version) {
       builder.with_version(v->version());
   }
   return view_ptr(builder.build());
}
```

The comment uses the phrase "it might be".
However, the code inside the `if` assumes that it "must be": once we
determined that the first column in this materialized view does not have
a corresponding name in the base table, we set it to be computed using
`legacy_token_column_computation`, so we assumed that the column was
indeed storing the token. Doing that for a column which is not the token
column would be a small disaster.

Assuming that the code is correct, we can make the comment more precise.

I checked the documentation and I don't see any other way how we could
have such a column other than the token column which is internally
created by Scylla when creating a secondary index (for example, it is
forbidden to use an alias in select statement when creating materialized
views, which I checked experimentally).
2023-10-24 13:30:13 +02:00
Kamil Braun
5397524875 feature_service: make COMPUTED_COLUMNS feature unconditionally true
The feature is assumed to be true, it was introduced in 2019.
It's still advertised in gossip, but it's assumed to always be present.

The `schema_feature` enum class still contains `COMPUTED_COLUMNS`,
and the `all_tables` function in schema_tables.cc still checks for the
schema feature when deciding if `computed_columns()` table should be
included. This is necessary because digest calculation tests contain
many digests calculated with the feature disabled, if we wanted to make
it unconditional in the schema_tables code we'd have to regenerate
almost all digests in the tests. It is simpler to leave the possibility
for the tests to disable the feature.
2023-10-24 13:30:13 +02:00
Pavel Emelyanov
63f2bdca01 s3/client: Add PUT fallback to upload sink
When the non-jumbo sink is flushed and notices that the real upload is
not started yet, it may just go ahead and PUT the buffers into the
object with the single request.

For jumbo sink the fallback is not implemented as it likely doesn't make
and any sense -- jumbo sinks are unlikely to produce less than 5Mb of
data so it's going to be dead code anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 10:59:46 +03:00
Botond Dénes
23898581d5 cql3: mutation_fragments_select_statement: use host_id instead of node
The statement only uses the node to get its host_id later. Simpler to
obtain and store only the host_id int he first place.
2023-10-24 03:12:58 -04:00
Botond Dénes
3cb1669340 cql3: mutation_fragments_select_statement: pin erm reference
This query bypasses the usual read-path in storage-proxy and therefore
also misses the erm pinning done by storage-proxy. To avoid a vnode
being pulled from under its feet, do the erm pinning in the statement
itself.
2023-10-24 03:12:36 -04:00
Botond Dénes
8180f61147 test/boost/multishard_mutation_query_test: fix querier cache misses expectations
There are two tests, test_read_all and
test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect,
because it is possible that on a given page, not all shards participate
and thus there won't be a saved reader on every shard. On the subsequent
page, a shard without a reader may produce a miss. This is fine.
Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.
2023-10-23 08:07:14 -04:00
Botond Dénes
0a34f29ea5 test/lib/test_utils: add require_* variants for all comparators
Not just equal. This allows for better error messages, printing both
values and the failed relation operator, instead of a generic fail
message.
2023-10-23 07:52:38 -04:00
Botond Dénes
f811a63e1b docs: nodetool compact: remove common arguments
These are already documented in the nodetool index page. The list in the
nodetool index page is less informative, so copy the list from nodetool
compact over there.
2023-10-20 10:16:42 -04:00
Botond Dénes
397f67990f docs: nodetool stop: fix compaction types and examples
Nodetool doesn't recognize RESHARD, even though ScyllaDB supports
stopping RESHARD compaction.
Remove VALIDATE from the list - ScyllaDB doesn't support it.
Add a note about the unimplemented --id option.
Fix the examples, they are broken.
Fix the entry in the nodetool command list, the command is called
`stop`, not `stop compaction`.
2023-10-20 10:15:47 -04:00
Botond Dénes
70ba6b94c3 docs: nodetool compact: remove unsupported partition option
This option is not supported by either the nodetool frontend, nor
ScyllaDB itself. Remove it.
Also improve the wording on the unsupported options.
2023-10-20 10:15:44 -04:00
Aleksandra Martyniuk
56221f2161 test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
520d9db92d test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
b91064bd2a tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
0681795417 compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-10-19 10:47:17 +02:00
Tomasz Grabiec
7862ffbd14 perf_simple_query: Allow running with tablets 2023-10-06 23:49:15 +02:00
Tomasz Grabiec
0edb39715d tests: cql_test_env: Allow creating keyspace with tablets 2023-10-06 23:49:15 +02:00
Tomasz Grabiec
0ff10c72de tests: cql_test_env: Register storage_service in migration notifier
The procedure in main already does this.

Processing of tablet metadata on schema changes relies on
this. Without this, creating a tablet-based table will fail on missing
tablet map in token metadata because the listener in storage service
does not fire.
2023-10-06 23:49:15 +02:00
Tomasz Grabiec
3c0d723ad4 test: cql_test_env: Initialize node state in topology
This is necessary for using tablets with cql_test_env in tools like
perf-simple-query.

Otherwise, the test will fail with:

  Shard count not known for node c06a7e7f-ee6c-44e5-9257-09cdc5b2bb10

The existing tablets_test works because it creates its own topology
bypassing the one in storage_service.
2023-10-06 23:49:15 +02:00
Benny Halevy
bec489409e row_cache: abort on exteral_updater::execute errors
Currently the cache updaters aren't exception safe
yet they are intended to be.

Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.

Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.

Fixes scylladb/scylladb#15576

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-28 09:11:04 +03:00
Benny Halevy
80bba3d4b7 row_cache: do_update: simplify _prev_snapshot_pos setup
ring_position::min() is noexcept since 6d7ae4ead1
So no need to call it outside of the critical noexcept block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-28 08:21:30 +03:00
Takuya ASADA
ea61b14f27 scylla_swap_setup: use fallocate on ext4
We stop using fallocate for allocating swap since it does not work on
xfs (#6650).
However, dd is much slower than fallocate since it filling data on the
file, let's use fallocate when filesystem is ext4 since it actually
works and faster.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2023-02-01 01:58:13 +09:00
Takuya ASADA
dffadabb94 scylla_swap_setup: run error check before allocating swap
We should run error check before running dd, otherwise it will left
swapfile on disk without completing swap setup.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2023-02-01 01:58:13 +09:00
1678 changed files with 66948 additions and 25912 deletions

36
.github/CODEOWNERS vendored
View File

@@ -19,13 +19,13 @@ db/batch* @elcallio
service/storage_proxy* @gleb-cloudius
# COMPACTION
compaction/* @raphaelsc @nyh
compaction/* @raphaelsc
# CQL TRANSPORT LAYER
transport/*
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @cvybhu @nyh
cql3/* @tgrabiec
# COUNTERS
counters* @jul-stas
@@ -45,44 +45,44 @@ dist/docker/*
utils/logalloc* @tgrabiec
# MATERIALIZED VIEWS
db/view/* @nyh @cvybhu @piodul
cql3/statements/*view* @nyh @cvybhu @piodul
test/boost/view_* @nyh @cvybhu @piodul
db/view/* @nyh @piodul
cql3/statements/*view* @nyh @piodul
test/boost/view_* @nyh @piodul
# PACKAGING
dist/* @syuu1228
# REPAIR
repair/* @tgrabiec @asias @nyh
repair/* @tgrabiec @asias
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec @nyh
db/legacy_schema_migrator* @tgrabiec @nyh
service/migration* @tgrabiec @nyh
schema* @tgrabiec @nyh
db/schema_tables* @tgrabiec
db/legacy_schema_migrator* @tgrabiec
service/migration* @tgrabiec
schema* @tgrabiec
# SECONDARY INDEXES
index/* @nyh @cvybhu @piodul
cql3/statements/*index* @nyh @cvybhu @piodul
test/boost/*index* @nyh @cvybhu @piodul
index/* @nyh @piodul
cql3/statements/*index* @nyh @piodul
test/boost/*index* @nyh @piodul
# SSTABLES
sstables/* @tgrabiec @raphaelsc @nyh
sstables/* @tgrabiec @raphaelsc
# STREAMING
streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh @havaker @nuivall
test/alternator/* @nyh @havaker @nuivall
alternator/* @havaker @nuivall
test/alternator/* @havaker @nuivall
# HINTED HANDOFF
db/hints/* @piodul @vladzcloudius @eliransin
# REDIS
redis/* @nyh @syuu1228
test/redis/* @nyh @syuu1228
redis/* @syuu1228
test/redis/* @syuu1228
# READERS
reader_* @denesb

67
.github/mergify.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
pull_request_rules:
- name: put PR in draft if conflicts
conditions:
- label = conflicts
- author = mergify[bot]
- head ~= ^mergify/
actions:
edit:
draft: true
- name: Delete mergify backport branch
conditions:
- base~=branch-
- or:
- merged
- closed
actions:
delete_head_branch:
- name: Automate backport pull request 5.2
conditions:
- or:
- closed
- merged
- or:
- base=master
- base=next
- label=backport/5.2 # The PR must have this label to trigger the backport
- label=promoted-to-master
actions:
copy:
title: "[Backport 5.2] {{ title }}"
body: |
{{ body }}
{% for c in commits %}
(cherry picked from commit {{ c.sha }})
{% endfor %}
Refs #{{number}}
branches:
- branch-5.2
assignees:
- "{{ author }}"
- name: Automate backport pull request 5.4
conditions:
- or:
- closed
- merged
- or:
- base=master
- base=next
- label=backport/5.4 # The PR must have this label to trigger the backport
- label=promoted-to-master
actions:
copy:
title: "[Backport 5.4] {{ title }}"
body: |
{{ body }}
{% for c in commits %}
(cherry picked from commit {{ c.sha }})
{% endfor %}
Refs #{{number}}
branches:
- branch-5.4
assignees:
- "{{ author }}"

59
.github/scripts/label_promoted_commits.py vendored Executable file
View File

@@ -0,0 +1,59 @@
import requests
from github import Github
import argparse
import re
import sys
import os
try:
github_token = os.environ["GITHUB_TOKEN"]
except KeyError:
print("Please set the 'GITHUB_TOKEN' environment variable")
sys.exit(1)
def parser():
parser = argparse.ArgumentParser()
parser.add_argument('--repository', type=str, required=True,
help='Github repository name (e.g., scylladb/scylladb)')
parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('
'newest commit).')
parser.add_argument('--commit_after_merge', type=str, required=True,
help='Git commit ID to end labeling at (oldest '
'commit, exclusive).')
parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
'done')
parser.add_argument('--label', type=str, required=True, help='Label to use')
return parser.parse_args()
def main():
args = parser()
pr_pattern = re.compile(r'Closes .*#([0-9]+)')
g = Github(github_token)
repo = g.get_repo(args.repository, lazy=False)
commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
# Print commit information
for commit in commits.commits:
print(commit.sha)
match = pr_pattern.search(commit.commit.message)
if match:
pr_number = match.group(1)
url = f'https://api.github.com/repos/{args.repository}/issues/{pr_number}/labels'
data = {
"labels": [f'{args.label}']
}
headers = {
"Authorization": f"token {github_token}",
"Accept": "application/vnd.github.v3+json"
}
response = requests.post(url, headers=headers, json=data)
if response.ok:
print(f"Label added successfully to {url}")
else:
print(f"No label was added to {url}")
if __name__ == "__main__":
main()

95
.github/scripts/sync_labels.py vendored Executable file
View File

@@ -0,0 +1,95 @@
#!/usr/bin/env python3
import argparse
import os
import sys
from github import Github
import re
try:
github_token = os.environ["GITHUB_TOKEN"]
except KeyError:
print("Please set the 'GITHUB_TOKEN' environment variable")
sys.exit(1)
def parser():
parse = argparse.ArgumentParser()
parse.add_argument('--repo', type=str, required=True, help='Github repository name (e.g., scylladb/scylladb)')
parse.add_argument('--number', type=int, required=True, help='Pull request or issue number to sync labels from')
parse.add_argument('--label', type=str, default=None, help='Label to add/remove from an issue or PR')
parse.add_argument('--is_issue', action='store_true', help='Determined if label change is in Issue or not')
parse.add_argument('--action', type=str, choices=['opened', 'labeled', 'unlabeled'], required=True, help='Sync labels action')
return parse.parse_args()
def copy_labels_from_linked_issues(repo, pr_number):
pr = repo.get_pull(pr_number)
if pr.body:
linked_issue_numbers = set(re.findall(r'Fixes:? (?:#|https.*?/issues/)(\d+)', pr.body))
for issue_number in linked_issue_numbers:
try:
issue = repo.get_issue(int(issue_number))
for label in issue.labels:
pr.add_to_labels(label.name)
print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")
except Exception as e:
print(f"Error processing issue #{issue_number}: {e}")
def get_linked_pr_from_issue_number(repo, number):
linked_prs = []
for pr in repo.get_pulls(state='all', base='master'):
if pr.body and f'{number}' in pr.body:
linked_prs.append(pr.number)
break
else:
continue
return linked_prs
def get_linked_issues_based_on_pr_body(repo, number):
pr = repo.get_pull(number)
repo_name = repo.full_name
pattern = rf"(?:fix(?:|es|ed)|resolve(?:|d|s))\s*:?\s*(?:(?:(?:{repo_name})?#)|https://github\.com/{repo_name}/issues/)(\d+)"
issue_number_from_pr_body = []
if pr.body is None:
return issue_number_from_pr_body
matches = re.findall(pattern, pr.body, re.IGNORECASE)
if matches:
for match in matches:
issue_number_from_pr_body.append(match)
print(f"Found issue number: {match}")
return issue_number_from_pr_body
def sync_labels(repo, number, label, action, is_issue=False):
if is_issue:
linked_prs_or_issues = get_linked_pr_from_issue_number(repo, number)
else:
linked_prs_or_issues = get_linked_issues_based_on_pr_body(repo, number)
for pr_or_issue_number in linked_prs_or_issues:
if is_issue:
target = repo.get_issue(pr_or_issue_number)
else:
target = repo.get_issue(int(pr_or_issue_number))
if action == 'labeled':
target.add_to_labels(label)
print(f"Label '{label}' successfully added.")
elif action == 'unlabeled':
target.remove_from_labels(label)
print(f"Label '{label}' successfully removed.")
elif action == 'opened':
copy_labels_from_linked_issues(repo, number)
else:
print("Invalid action. Use 'labeled', 'unlabeled' or 'opened'.")
def main():
args = parser()
github = Github(github_token)
repo = github.get_repo(args.repo)
sync_labels(repo, args.number, args.label, args.action, args.is_issue)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,26 @@
name: Check if commits are promoted
on:
push:
branches:
- master
jobs:
check-commit:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for all tags and branches
- name: Install dependencies
run: sudo apt-get install -y python3-github
- name: Run python script
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --label promoted-to-master

View File

@@ -0,0 +1,26 @@
name: Fixes validation for backport PR
on:
pull_request:
types: [opened, reopened, edited]
branches: [branch-*]
jobs:
check-fixes-prefix:
runs-on: ubuntu-latest
steps:
- name: Check PR body for "Fixes" prefix patterns
uses: actions/github-script@v7
with:
script: |
const body = context.payload.pull_request.body;
const repo = context.payload.repository.full_name;
// Regular expression pattern to check for "Fixes" prefix
// Adjusted to dynamically insert the repository full name
const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;
const regex = new RegExp(pattern);
if (!regex.test(body)) {
core.setFailed("PR body does not contain a valid 'Fixes' reference.");
}

17
.github/workflows/codespell.yaml vendored Normal file
View File

@@ -0,0 +1,17 @@
name: codespell
on:
pull_request:
branches:
- master
permissions: {}
jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: codespell-project/actions-codespell@master
with:
only_warn: 1
ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"
skip: "./.git,./build,./tools,*.js,*.thrift,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

View File

@@ -1,17 +0,0 @@
name: "Docs / Amplify enhanced"
on: issue_comment
jobs:
build:
runs-on: ubuntu-latest
if: ${{ github.event.issue.pull_request }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Amplify enhanced
env:
TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

View File

@@ -4,12 +4,14 @@ name: "Docs / Publish"
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}
on:
push:
branches:
- 'master'
- 'enterprise'
- 'branch-**'
paths:
- "docs/**"
workflow_dispatch:
@@ -19,14 +21,15 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
ref: ${{ env.DEFAULT_BRANCH }}
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: 3.7
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

View File

@@ -18,14 +18,14 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: 3.7
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

View File

@@ -0,0 +1,22 @@
name: PR require backport label
on:
pull_request:
types: [opened, labeled, unlabeled, synchronize]
branches:
- master
- next
jobs:
label:
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: mheap/github-action-required-labels@v5
with:
mode: minimum
count: 1
labels: "backport/none\nbackport/\\d.\\d"
use_regex: true
add_comment: false

45
.github/workflows/sync-labels.yaml vendored Normal file
View File

@@ -0,0 +1,45 @@
name: Sync labels
on:
pull_request_target:
types: [opened, labeled, unlabeled]
branches: [master, next]
issues:
types: [labeled, unlabeled]
jobs:
label-sync:
if: ${{ github.repository == 'scylladb/scylladb' }}
name: Synchronize labels between PR and the issue(s) fixed by it
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
sparse-checkout: |
.github/scripts/sync_labels.py
sparse-checkout-cone-mode: false
- name: Install dependencies
run: sudo apt-get install -y python3-github
- name: Pull request opened event
if: ${{ github.event.action == 'opened' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}
- name: Pull request labeled or unlabeled event
if: github.event_name == 'pull_request' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}
- name: Issue labeled or unlabeled event
if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

View File

@@ -10,22 +10,39 @@ list(APPEND CMAKE_MODULE_PATH
# Set the possible values of build type for cmake-gui
set(scylla_build_types
"Debug" "Release" "Dev" "Sanitize" "Coverage")
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
${scylla_build_types})
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release" CACHE
STRING "Choose the type of build." FORCE)
message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'Release'")
elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)
message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "
"Following types are supported: ${scylla_build_types}")
endif()
string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)
include(mode.${build_mode})
"Debug" "RelWithDebInfo" "Dev" "Sanitize" "Coverage")
if(DEFINED CMAKE_BUILD_TYPE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
${scylla_build_types})
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE
STRING "Choose the type of build." FORCE)
message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'RelWithDebInfo'")
elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)
message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "
"Following types are supported: ${scylla_build_types}")
endif()
endif(DEFINED CMAKE_BUILD_TYPE)
include(mode.common)
if(CMAKE_CONFIGURATION_TYPES)
foreach(config ${CMAKE_CONFIGURATION_TYPES})
include(mode.${config})
list(APPEND scylla_build_modes ${scylla_build_mode_${config}})
endforeach()
add_custom_target(mode_list
COMMAND ${CMAKE_COMMAND} -E echo "$<JOIN:${scylla_build_modes}, >"
COMMENT "List configured modes"
BYPRODUCTS mode-list.phony.stamp
COMMAND_EXPAND_LISTS)
else()
include(mode.${CMAKE_BUILD_TYPE})
add_custom_target(mode_list
${CMAKE_COMMAND} -E echo "${scylla_build_mode}"
COMMENT "List configured modes")
endif()
add_compile_definitions(
${Seastar_DEFINITIONS_${build_mode}}
FMT_DEPRECATED_OSTREAM)
include(limit_jobs)
# Configure Seastar compile options to align with Scylla
@@ -37,11 +54,17 @@ set(Seastar_TESTING ON CACHE BOOL "" FORCE)
set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)
set(Seastar_APPS ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)
add_subdirectory(seastar)
# System libraries dependencies
find_package(Boost REQUIRED
COMPONENTS filesystem program_options system thread regex unit_test_framework)
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
ICU::uc)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc i18n REQUIRED)
@@ -108,6 +131,32 @@ target_link_libraries(scylla-main
Snappy::snappy
systemd
ZLIB::ZLIB)
option(Scylla_CHECK_HEADERS
"Add check-headers target for checking the self-containness of headers")
if(Scylla_CHECK_HEADERS)
add_custom_target(check-headers)
# compatibility target used by CI, which builds "check-headers" only for
# the "Dev" mode.
# our CI currently builds "dev-headers" using ninja without specify a build
# mode. where "dev" is actually a prefix encoded in the target name for the
# underlying "headers" target. while we don't have this convention in CMake
# targets. in contrast, the "check-headers" which is built for all
# configurations defined by "CMAKE_DEFAULT_CONFIGS". however, we only need
# to build "check-headers" for the "Dev" configuration. Therefore, before
# updating the CI to use build "check-headers:Dev", let's add a new target
# that specifically builds "check-headers" only for Dev configuration. The
# new target will do nothing for other configurations.
add_custom_target(dev-headers
COMMAND ${CMAKE_COMMAND}
"$<IF:$<CONFIG:Dev>,--build;${CMAKE_BINARY_DIR};--config;$<CONFIG>;--target;check-headers,-E;echo;skipping;dev-headers;in;$<CONFIG>>"
COMMAND_EXPAND_LISTS)
endif()
include(check_headers)
check_headers(check-headers scylla-main
GLOB ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
add_subdirectory(api)
add_subdirectory(alternator)
add_subdirectory(db)
@@ -185,10 +234,6 @@ target_link_libraries(scylla PRIVATE
transport
types
utils)
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
ICU::uc)
target_link_libraries(scylla PRIVATE
seastar

View File

@@ -28,7 +28,7 @@ The files created are:
By default, these files are created in the 'build'
subdirectory under the directory containing the script.
The destination directory can be overriden by
The destination directory can be overridden by
using '-o PATH' option.
END
)
@@ -87,7 +87,6 @@ then
else
SCYLLA_VERSION=$VERSION
if [ -z "$SCYLLA_RELEASE" ]; then
DATE=$(date --utc +%Y%m%d)
GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
# For custom package builds, replace "0" with "counter.your_name",
# where counter starts at 1 and increments for successive versions.

View File

@@ -28,3 +28,6 @@ target_link_libraries(alternator
idl
Seastar::seastar
xxHash::xxhash)
check_headers(check-headers alternator
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -7,19 +7,17 @@
*/
#include "alternator/error.hh"
#include "auth/common.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "service/storage_proxy.hh"
#include "alternator/executor.hh"
#include "cql3/selection/selection.hh"
#include "query-result-set.hh"
#include "cql3/result_set.hh"
#include <seastar/core/coroutine.hh>
@@ -27,8 +25,8 @@ namespace alternator {
static logging::logger alogger("alternator-auth");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");
partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));
dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};
std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

View File

@@ -9,10 +9,8 @@
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
#include "auth/service.hh"
namespace service {
class storage_proxy;
@@ -22,6 +20,6 @@ namespace alternator {
using key_cache = utils::loading_cache<std::string, std::string, 1>;
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);
}

View File

@@ -6,12 +6,9 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "utils/rjson.hh"
#include "serialization.hh"
@@ -342,7 +339,7 @@ static bool check_NOT_NULL(const rjson::value* val) {
}
// Only types S, N or B (string, number or bytes) may be compared by the
// various comparion operators - lt, le, gt, ge, and between.
// various comparison operators - lt, le, gt, ge, and between.
// Note that in particular, if the value is missing (v->IsNull()), this
// check returns false.
static bool check_comparable_type(const rjson::value& v) {

View File

@@ -18,8 +18,6 @@
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
#include "expressions_types.hh"
namespace alternator {

View File

@@ -73,11 +73,11 @@ future<> controller::start_server() {
// shards - if necessary for LWT.
smp_service_group_config c;
c.max_nonlocal_requests = 5000;
_ssg = create_smp_service_group(c).get0();
_ssg = create_smp_service_group(c).get();
rmw_operation::set_default_write_isolation(_config.alternator_write_isolation());
net::inet_address addr = utils::resolve(_config.alternator_address, family).get0();
net::inet_address addr = utils::resolve(_config.alternator_address, family).get();
auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };
auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {

View File

@@ -10,6 +10,7 @@
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include "utils/rjson.hh"
namespace alternator {
@@ -27,10 +28,16 @@ public:
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
// Additional data attached to the error, null value if not set. It's wrapped in copyable_value
// class because copy constructor is required for exception classes otherwise it won't compile
// (despite that its use may be optimized away).
rjson::copyable_value _extra_fields;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request,
rjson::value extra_fields = rjson::null_value())
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
, _extra_fields(std::move(extra_fields))
{ }
// Factory functions for some common types of DynamoDB API errors
@@ -58,8 +65,13 @@ public:
static api_error access_denied(std::string msg) {
return api_error("AccessDeniedException", std::move(msg));
}
static api_error conditional_check_failed(std::string msg) {
return api_error("ConditionalCheckFailedException", std::move(msg));
static api_error conditional_check_failed(std::string msg, rjson::value&& item) {
if (!item.IsNull()) {
auto tmp = rjson::empty_object();
rjson::add(tmp, "Item", std::move(item));
item = std::move(tmp);
}
return api_error("ConditionalCheckFailedException", std::move(msg), status_type::bad_request, std::move(item));
}
static api_error expired_iterator(std::string msg) {
return api_error("ExpiredIteratorException", std::move(msg));

View File

@@ -6,13 +6,11 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "utils/base64.hh"
#include <seastar/core/sleep.hh>
#include "alternator/executor.hh"
#include "db/config.hh"
#include "log.hh"
#include "schema/schema_builder.hh"
#include "data_dictionary/keyspace_metadata.hh"
#include "exceptions/exceptions.hh"
#include "timestamp.hh"
#include "types/map.hh"
@@ -21,17 +19,13 @@
#include "query-result-reader.hh"
#include "cql3/selection/selection.hh"
#include "cql3/result_set.hh"
#include "cql3/type_json.hh"
#include "bytes.hh"
#include "cql3/update_parameters.hh"
#include "server.hh"
#include "service/pager/query_pagers.hh"
#include <functional>
#include "error.hh"
#include "serialization.hh"
#include "expressions.hh"
#include "conditions.hh"
#include "cql3/constants.hh"
#include "cql3/util.hh"
#include <optional>
#include "utils/overloaded_functor.hh"
@@ -41,6 +35,7 @@
#include "schema/schema.hh"
#include "db/tags/extension.hh"
#include "db/tags/utils.hh"
#include "replica/database.hh"
#include "alternator/rmw_operation.hh"
#include <seastar/core/coroutine.hh>
#include <boost/range/adaptors.hpp>
@@ -48,7 +43,6 @@
#include <unordered_set>
#include "service/storage_proxy.hh"
#include "gms/gossiper.hh"
#include "schema/schema_registry.hh"
#include "utils/error_injection.hh"
#include "db/schema_tables.hh"
#include "utils/rjson.hh"
@@ -77,10 +71,10 @@ static sstring_view table_status_to_sstring(table_status tbl_status) {
case table_status::deleting:
return "DELETING";
}
return "UKNOWN";
return "UNKNOWN";
}
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type);
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type, const std::map<sstring, sstring>& tags_map);
static map_type attrs_type() {
static thread_local auto t = map_type_impl::get_instance(utf8_type, bytes_type, true);
@@ -770,15 +764,33 @@ enum class update_tags_action { add_tags, delete_tags };
static void update_tags_map(const rjson::value& tags, std::map<sstring, sstring>& tags_map, update_tags_action action) {
if (action == update_tags_action::add_tags) {
for (auto it = tags.Begin(); it != tags.End(); ++it) {
const rjson::value& key = (*it)["Key"];
const rjson::value& value = (*it)["Value"];
auto tag_key = rjson::to_string_view(key);
if (tag_key.empty() || tag_key.size() > 128 || !validate_legal_tag_chars(tag_key)) {
throw api_error::validation("The Tag Key provided is invalid string");
if (!it->IsObject()) {
throw api_error::validation("invalid tag object");
}
auto tag_value = rjson::to_string_view(value);
if (tag_value.empty() || tag_value.size() > 256 || !validate_legal_tag_chars(tag_value)) {
throw api_error::validation("The Tag Value provided is invalid string");
const rjson::value* key = rjson::find(*it, "Key");
const rjson::value* value = rjson::find(*it, "Value");
if (!key || !key->IsString() || !value || !value->IsString()) {
throw api_error::validation("string Key and Value required");
}
auto tag_key = rjson::to_string_view(*key);
auto tag_value = rjson::to_string_view(*value);
if (tag_key.empty()) {
throw api_error::validation("A tag Key cannot be empty");
}
if (tag_key.size() > 128) {
throw api_error::validation("A tag Key is limited to 128 characters");
}
if (!validate_legal_tag_chars(tag_key)) {
throw api_error::validation("A tag Key can only contain letters, spaces, and [+-=._:/]");
}
// Note tag values are limited similarly to tag keys, but have a
// longer length limit, and *can* be empty.
if (tag_value.size() > 256) {
throw api_error::validation("A tag Value is limited to 256 characters");
}
if (!validate_legal_tag_chars(tag_value)) {
throw api_error::validation("A tag Value can only contain letters, spaces, and [+-=._:/]");
}
tags_map[sstring(tag_key)] = sstring(tag_value);
}
@@ -994,7 +1006,7 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key);
}
// Base key columns which aren't part of the index's key need to
// be added to the view nontheless, as (additional) clustering
// be added to the view nonetheless, as (additional) clustering
// key(s).
if (hash_key != view_hash_key && hash_key != view_range_key) {
add_column(view_builder, hash_key, attribute_definitions, column_kind::clustering_key);
@@ -1002,6 +1014,8 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
// GSIs have no tags:
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = format("{} AND {} IS NOT NULL", where_clause,
@@ -1051,7 +1065,7 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
}
add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key);
// Base key columns which aren't part of the index's key need to
// be added to the view nontheless, as (additional) clustering
// be added to the view nonetheless, as (additional) clustering
// key(s).
if (!range_key.empty() && view_range_key != range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
@@ -1066,6 +1080,11 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
// LSIs have no tags, but Scylla's "synchronous_updates" feature
// (which an LSIs need), is actually implemented as a tag so we
// need to add it here:
std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>(tags_map));
view_builders.emplace_back(std::move(view_builder));
}
}
@@ -1112,7 +1131,6 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
}
const bool include_all_columns = true;
view_builder.with_view_info(*schema, include_all_columns, *where_clause_it);
view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());
++where_clause_it;
}
@@ -1121,7 +1139,19 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
auto group0_guard = co_await mm.start_group0_operation();
auto ts = group0_guard.write_timestamp();
std::vector<mutation> schema_mutations;
auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts);
auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts, tags_map);
// Alternator Streams doesn't yet work when the table uses tablets (#16317)
if (stream_specification && stream_specification->IsObject()) {
auto stream_enabled = rjson::find(*stream_specification, "StreamEnabled");
if (stream_enabled && stream_enabled->IsBool() && stream_enabled->GetBool()) {
locator::replication_strategy_params params(ksm->strategy_options(), ksm->initial_tablets());
auto rs = locator::abstract_replication_strategy::create_replication_strategy(ksm->strategy_name(), params);
if (rs->uses_tablets()) {
co_return api_error::validation("Streams not yet supported on a table using tablets (issue #16317). "
"If you want to use streams, create a table with vnodes by setting the tag 'experimental:initial_tablets' set to 'none'.");
}
}
}
try {
schema_mutations = service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);
} catch (exceptions::already_exists_exception&) {
@@ -1135,8 +1165,18 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
}
co_await service::prepare_new_column_family_announcement(schema_mutations, sp, *ksm, schema, ts);
for (schema_builder& view_builder : view_builders) {
view_ptr view(view_builder.build());
db::schema_tables::add_table_or_view_to_schema_mutation(
view_ptr(view_builder.build()), ts, true, schema_mutations);
view, ts, true, schema_mutations);
// add_table_or_view_to_schema_mutation() is a low-level function that
// doesn't call the callbacks that prepare_new_view_announcement()
// calls. So we need to call this callback here :-( If we don't, among
// other things *tablets* will not be created for the new view.
// These callbacks need to be called in a Seastar thread.
co_await seastar::async([&sp, &ksm, &view, &schema_mutations, ts] {
return sp.local_db().get_notifier().before_create_column_family(*ksm, *view, schema_mutations, ts);
});
}
co_await mm.announce(std::move(schema_mutations), std::move(group0_guard), format("alternator-executor: create {} table", table_name));
@@ -1199,6 +1239,13 @@ future<executor::request_return_type> executor::update_table(client_state& clien
rjson::value* stream_specification = rjson::find(request, "StreamSpecification");
if (stream_specification && stream_specification->IsObject()) {
add_stream_options(*stream_specification, builder, p.local());
// Alternator Streams doesn't yet work when the table uses tablets (#16317)
auto stream_enabled = rjson::find(*stream_specification, "StreamEnabled");
if (stream_enabled && stream_enabled->IsBool() && stream_enabled->GetBool() &&
p.local().local_db().find_keyspace(tab->ks_name()).get_replication_strategy().uses_tablets()) {
co_return api_error::validation("Streams not yet supported on a table using tablets (issue #16317). "
"If you want to enable streams, re-create this table with vnodes (with the tag 'experimental:initial_tablets' set to 'none').");
}
}
auto schema = builder.build();
@@ -1489,11 +1536,31 @@ rmw_operation::returnvalues rmw_operation::parse_returnvalues(const rjson::value
}
}
rmw_operation::returnvalues_on_condition_check_failure
rmw_operation::parse_returnvalues_on_condition_check_failure(const rjson::value& request) {
const rjson::value* attribute_value = rjson::find(request, "ReturnValuesOnConditionCheckFailure");
if (!attribute_value) {
return rmw_operation::returnvalues_on_condition_check_failure::NONE;
}
if (!attribute_value->IsString()) {
throw api_error::validation(format("Expected string value for ReturnValuesOnConditionCheckFailure, got: {}", *attribute_value));
}
auto s = rjson::to_string_view(*attribute_value);
if (s == "NONE") {
return rmw_operation::returnvalues_on_condition_check_failure::NONE;
} else if (s == "ALL_OLD") {
return rmw_operation::returnvalues_on_condition_check_failure::ALL_OLD;
} else {
throw api_error::validation(format("Unrecognized value for ReturnValuesOnConditionCheckFailure: {}", s));
}
}
rmw_operation::rmw_operation(service::storage_proxy& proxy, rjson::value&& request)
: _request(std::move(request))
, _schema(get_table(proxy, _request))
, _write_isolation(get_write_isolation_for_schema(_schema))
, _returnvalues(parse_returnvalues(_request))
, _returnvalues_on_condition_check_failure(parse_returnvalues_on_condition_check_failure(_request))
{
// _pk and _ck will be assigned later, by the subclass's constructor
// (each operation puts the key in a slightly different location in
@@ -1599,7 +1666,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
[this, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {
std::optional<mutation> m = apply(std::move(previous_item), api::new_timestamp());
if (!m) {
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));
}
return proxy.mutate(std::vector<mutation>{std::move(*m)}, db::consistency_level::LOCAL_QUORUM, executor::default_timeout(), trace_state, std::move(permit), db::allow_per_partition_rate_limit::yes).then([this] () mutable {
return rmw_operation_return(std::move(_return_attributes));
@@ -1624,7 +1691,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
{timeout, std::move(permit), client_state, trace_state},
db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM, timeout, timeout).then([this, read_command] (bool is_applied) mutable {
if (!is_applied) {
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));
}
return rmw_operation_return(std::move(_return_attributes));
});
@@ -1713,6 +1780,10 @@ public:
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const override {
if (!verify_expected(_request, previous_item.get()) ||
!verify_condition_expression(_condition_expression, previous_item.get())) {
if (previous_item && _returnvalues_on_condition_check_failure ==
returnvalues_on_condition_check_failure::ALL_OLD) {
_return_attributes = std::move(*previous_item);
}
// If the update is to be cancelled because of an unfulfilled Expected
// condition, return an empty optional mutation, which is more
// efficient than throwing an exception.
@@ -1753,7 +1824,7 @@ future<executor::request_return_type> executor::put_item(client_state& client_st
});
}
return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {
_stats.api_operations.put_item_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.put_item_latency.mark(std::chrono::steady_clock::now() - start_time);
});
}
@@ -1798,6 +1869,10 @@ public:
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const override {
if (!verify_expected(_request, previous_item.get()) ||
!verify_condition_expression(_condition_expression, previous_item.get())) {
if (previous_item && _returnvalues_on_condition_check_failure ==
returnvalues_on_condition_check_failure::ALL_OLD) {
_return_attributes = std::move(*previous_item);
}
// If the update is to be cancelled because of an unfulfilled Expected
// condition, return an empty optional mutation, which is more
// efficient than throwing an exception.
@@ -1838,7 +1913,7 @@ future<executor::request_return_type> executor::delete_item(client_state& client
});
}
return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {
_stats.api_operations.delete_item_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.delete_item_latency.mark(std::chrono::steady_clock::now() - start_time);
});
}
@@ -2240,7 +2315,7 @@ enum class select_type { regular, count, projection };
static select_type parse_select(const rjson::value& request, table_or_view_type table_type) {
const rjson::value* select_value = rjson::find(request, "Select");
if (!select_value) {
// If "Select" is not specificed, it defaults to ALL_ATTRIBUTES
// If "Select" is not specified, it defaults to ALL_ATTRIBUTES
// on a base table, or ALL_PROJECTED_ATTRIBUTES on an index
return table_type == table_or_view_type::base ?
select_type::regular : select_type::projection;
@@ -2677,22 +2752,35 @@ static std::optional<rjson::value> action_result(
}, action._action);
}
}
// Print an attribute_path_map_node<action> as the list of paths it contains:
static std::ostream& operator<<(std::ostream& out, const attribute_path_map_node<parsed::update_expression::action>& h) {
template <> struct fmt::formatter<alternator::attribute_path_map_node<alternator::parsed::update_expression::action>> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
// this function recursively call into itself, so we have to forward declare it.
auto format(const alternator::attribute_path_map_node<alternator::parsed::update_expression::action>& h, fmt::format_context& ctx) const
-> decltype(ctx.out());
};
auto fmt::formatter<alternator::attribute_path_map_node<alternator::parsed::update_expression::action>>::format(const alternator::attribute_path_map_node<alternator::parsed::update_expression::action>& h, fmt::format_context& ctx) const
-> decltype(ctx.out()) {
auto out = ctx.out();
if (h.has_value()) {
out << " " << h.get_value()._path;
out = fmt::format_to(out, " {}", h.get_value()._path);
} else if (h.has_members()) {
for (auto& member : h.get_members()) {
out << *member.second;
out = fmt::format_to(out, "{}", *member.second);
}
} else if (h.has_indexes()) {
for (auto& index : h.get_indexes()) {
out << *index.second;
out = fmt::format_to(out, "{}", *index.second);
}
}
return out;
}
namespace alternator {
// Apply the hierarchy of actions in an attribute_path_map_node<action> to a
// JSON object which uses DynamoDB's serialization conventions. The complete,
// unmodified, previous_item is also necessary for the right-hand sides of the
@@ -2794,6 +2882,10 @@ std::optional<mutation>
update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const {
if (!verify_expected(_request, previous_item.get()) ||
!verify_condition_expression(_condition_expression, previous_item.get())) {
if (previous_item && _returnvalues_on_condition_check_failure ==
returnvalues_on_condition_check_failure::ALL_OLD) {
_return_attributes = std::move(*previous_item);
}
// If the update is to be cancelled because of an unfulfilled
// condition, return an empty optional mutation, which is more
// efficient than throwing an exception.
@@ -3085,14 +3177,14 @@ future<executor::request_return_type> executor::update_item(client_state& client
});
}
return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {
_stats.api_operations.update_item_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.update_item_latency.mark(std::chrono::steady_clock::now() - start_time);
});
}
// Check according to the request's "ConsistentRead" field, which consistency
// level we need to use for the read. The field can be True for strongly
// consistent reads, or False for eventually consistent reads, or if this
// field is absense, we default to eventually consistent reads.
// field is absence, we default to eventually consistent reads.
// In Scylla, eventually-consistent reads are implemented as consistency
// level LOCAL_ONE, and strongly-consistent reads as LOCAL_QUORUM.
static db::consistency_level get_read_consistency(const rjson::value& request) {
@@ -3169,7 +3261,7 @@ future<executor::request_return_type> executor::get_item(client_state& client_st
return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl,
service::storage_proxy::coordinator_query_options(executor::default_timeout(), std::move(permit), client_state, trace_state)).then(
[this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = std::move(attrs_to_get), start_time = std::move(start_time)] (service::storage_proxy::coordinator_query_result qr) mutable {
_stats.api_operations.get_item_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.get_item_latency.mark(std::chrono::steady_clock::now() - start_time);
return make_ready_future<executor::request_return_type>(make_jsonable(describe_item(schema, partition_slice, *selection, *qr.query_result, std::move(attrs_to_get))));
});
}
@@ -3540,7 +3632,7 @@ public:
// the JSON but take them out before finally returning the JSON.
if (_attrs_to_get) {
_filter.for_filters_on([&] (std::string_view attr) {
std::string a(attr); // no heterogenous maps searches :-(
std::string a(attr); // no heterogeneous maps searches :-(
if (!_attrs_to_get->contains(a)) {
_extra_filter_attrs.emplace(std::move(a));
}
@@ -3625,9 +3717,9 @@ public:
}
};
static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {
static future<std::tuple<rjson::value, size_t>> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {
describe_items_visitor visitor(selection.get_columns(), attrs_to_get, filter);
result_set->visit(visitor);
co_await result_set->visit_gently(visitor);
auto scanned_count = visitor.get_scanned_count();
rjson::value items = std::move(visitor).get_items();
rjson::value items_descr = rjson::empty_object();
@@ -3644,7 +3736,7 @@ static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::se
if (!attrs_to_get || !attrs_to_get->empty()) {
rjson::add(items_descr, "Items", std::move(items));
}
return {std::move(items_descr), size};
co_return std::tuple<rjson::value, size_t>{std::move(items_descr), size};
}
static rjson::value encode_paging_state(const schema& schema, const service::pager::paging_state& paging_state) {
@@ -3685,18 +3777,18 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
static future<executor::request_return_type> do_query(service::storage_proxy& proxy,
schema_ptr schema,
const rjson::value* exclusive_start_key,
dht::partition_range_vector&& partition_ranges,
std::vector<query::clustering_range>&& ck_bounds,
std::optional<attrs_to_get>&& attrs_to_get,
dht::partition_range_vector partition_ranges,
std::vector<query::clustering_range> ck_bounds,
std::optional<attrs_to_get> attrs_to_get,
uint32_t limit,
db::consistency_level cl,
filter&& filter,
filter filter,
query::partition_slice::option_set custom_opts,
service::client_state& client_state,
cql3::cql_stats& cql_stats,
tracing::trace_state_ptr trace_state,
service_permit permit) {
lw_shared_ptr<service::pager::paging_state> paging_state = nullptr;
lw_shared_ptr<service::pager::paging_state> old_paging_state = nullptr;
tracing::trace(trace_state, "Performing a database query");
@@ -3706,7 +3798,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
if (schema->clustering_key_size() > 0) {
pos = pos_from_json(*exclusive_start_key, schema);
}
paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);
old_paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);
}
auto regular_columns = boost::copy_range<query::column_id_vector>(
@@ -3725,34 +3817,28 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
// FIXME: should be moved above, set on opts, so get_max_result_size knows it?
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto query_options = std::make_unique<cql3::query_options>(cl, std::vector<cql3::raw_value>{});
query_options = std::make_unique<cql3::query_options>(std::move(query_options), std::move(paging_state));
query_options = std::make_unique<cql3::query_options>(std::move(query_options), std::move(old_paging_state));
auto p = service::pager::query_pagers::pager(proxy, schema, selection, *query_state_ptr, *query_options, command, std::move(partition_ranges), nullptr);
return p->fetch_page(limit, gc_clock::now(), executor::default_timeout()).then(
[p = std::move(p), schema, cql_stats, partition_slice = std::move(partition_slice),
selection = std::move(selection), query_state_ptr = std::move(query_state_ptr),
attrs_to_get = std::move(attrs_to_get),
query_options = std::move(query_options),
filter = std::move(filter)] (std::unique_ptr<cql3::result_set> rs) mutable {
if (!p->is_exhausted()) {
rs->get_metadata().set_paging_state(p->state());
}
auto paging_state = rs->get_metadata().paging_state();
bool has_filter = filter;
auto [items, size] = describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));
if (paging_state) {
rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));
}
if (has_filter){
cql_stats.filtered_rows_read_total += p->stats().rows_read_total;
// update our "filtered_row_matched_total" for all the rows matched, despited the filter
cql_stats.filtered_rows_matched_total += size;
}
if (is_big(items)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(items)));
}
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(items)));
});
std::unique_ptr<cql3::result_set> rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());
if (!p->is_exhausted()) {
rs->get_metadata().set_paging_state(p->state());
}
auto paging_state = rs->get_metadata().paging_state();
bool has_filter = filter;
auto [items, size] = co_await describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));
if (paging_state) {
rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));
}
if (has_filter){
cql_stats.filtered_rows_read_total += p->stats().rows_read_total;
// update our "filtered_row_matched_total" for all the rows matched, despited the filter
cql_stats.filtered_rows_matched_total += size;
}
if (is_big(items)) {
co_return executor::request_return_type(make_streamed(std::move(items)));
}
co_return executor::request_return_type(make_jsonable(std::move(items)));
}
static dht::token token_for_segment(int segment, int total_segments) {
@@ -4463,7 +4549,7 @@ future<executor::request_return_type> executor::describe_continuous_backups(clie
// of nodes in the cluster: A cluster with 3 or more live nodes, gets RF=3.
// A smaller cluster (presumably, a test only), gets RF=1. The user may
// manually create the keyspace to override this predefined behavior.
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts) {
static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts, const std::map<sstring, sstring>& tags_map) {
int endpoint_count = gossiper.num_endpoints();
int rf = 3;
if (endpoint_count < rf) {
@@ -4473,7 +4559,36 @@ static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_vie
}
auto opts = get_network_topology_options(sp, gossiper, rf);
return keyspace_metadata::new_keyspace(keyspace_name, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), true);
// Even if the "tablets" experimental feature is available, we currently
// do not enable tablets by default on Alternator tables because LWT is
// not yet fully supported with tablets.
// The user can override the choice of whether or not to use tablets at
// table-creation time by supplying the following tag with a numeric value
// (setting the value to 0 means enabling tablets with automatic selection
// of the best number of tablets).
// Setting this tag to any non-numeric value (e.g., an empty string or the
// word "none") will ask to disable tablets.
// If we make this tag a permanent feature, it will get a "system:" prefix -
// until then we give it the "experimental:" prefix to not commit to it.
static constexpr auto INITIAL_TABLETS_TAG_KEY = "experimental:initial_tablets";
// initial_tablets currently defaults to unset, so tablets will not be
// used by default on new Alternator tables. Change this initialization
// to 0 enable tablets by default, with automatic number of tablets.
std::optional<unsigned> initial_tablets;
if (sp.get_db().local().get_config().check_experimental(db::experimental_features_t::feature::TABLETS)) {
auto it = tags_map.find(INITIAL_TABLETS_TAG_KEY);
if (it != tags_map.end()) {
// Tag set. If it's a valid number, use it. If not - e.g., it's
// empty or a word like "none", disable tablets by setting
// initial_tablets to a disengaged optional.
try {
initial_tablets = std::stol(tags_map.at(INITIAL_TABLETS_TAG_KEY));
} catch(...) {
initial_tablets = std::nullopt;
}
}
}
return keyspace_metadata::new_keyspace(keyspace_name, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), initial_tablets);
}
future<> executor::start() {

View File

@@ -133,21 +133,6 @@ void path::check_depth_limit() {
}
}
std::ostream& operator<<(std::ostream& os, const path& p) {
os << p.root();
for (const auto& op : p.operators()) {
std::visit(overloaded_functor {
[&] (const std::string& member) {
os << '.' << member;
},
[&] (unsigned index) {
os << '[' << index << ']';
}
}, op);
}
return os;
}
} // namespace parsed
// The following resolve_*() functions resolve references in parsed
@@ -756,3 +741,20 @@ rjson::value calculate_value(const parsed::set_rhs& rhs,
}
} // namespace alternator
auto fmt::formatter<alternator::parsed::path>::format(const alternator::parsed::path& p, fmt::format_context& ctx) const
-> decltype(ctx.out()) {
auto out = ctx.out();
out = fmt::format_to(out, "{}", p.root());
for (const auto& op : p.operators()) {
std::visit(overloaded_functor {
[&] (const std::string& member) {
out = fmt::format_to(out, ".{}", member);
},
[&] (unsigned index) {
out = fmt::format_to(out, "[{}]", index);
}
}, op);
}
return out;
}

View File

@@ -60,24 +60,30 @@ enum class calculate_value_caller {
UpdateExpression, ConditionExpression, ConditionExpressionAlone
};
inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {
switch (caller) {
case calculate_value_caller::UpdateExpression:
out << "UpdateExpression";
break;
case calculate_value_caller::ConditionExpression:
out << "ConditionExpression";
break;
case calculate_value_caller::ConditionExpressionAlone:
out << "ConditionExpression";
break;
default:
out << "unknown type of expression";
break;
}
return out;
}
template <> struct fmt::formatter<alternator::calculate_value_caller> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
auto format(alternator::calculate_value_caller caller, fmt::format_context& ctx) const {
std::string_view name = "unknown type of expression";
switch (caller) {
using enum alternator::calculate_value_caller;
case UpdateExpression:
name = "UpdateExpression";
break;
case ConditionExpression:
name = "ConditionExpression";
break;
case ConditionExpressionAlone:
name = "ConditionExpression";
break;
}
return fmt::format_to(ctx.out(), "{}", name);
}
};
namespace alternator {
rjson::value calculate_value(const parsed::value& v,
calculate_value_caller caller,
const rjson::value* previous_item);

View File

@@ -255,3 +255,7 @@ public:
} // namespace parsed
} // namespace alternator
template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<std::string_view> {
auto format(const alternator::parsed::path&, fmt::format_context& ctx) const -> decltype(ctx.out());
};

View File

@@ -19,7 +19,7 @@ namespace alternator {
// operations which may involve a read of the item before the write
// (so-called Read-Modify-Write operations). These operations include PutItem,
// UpdateItem and DeleteItem: All of these may be conditional operations (the
// "Expected" parameter) which requir a read before the write, and UpdateItem
// "Expected" parameter) which require a read before the write, and UpdateItem
// may also have an update expression which refers to the item's old value.
//
// The code below supports running the read and the write together as one
@@ -69,7 +69,11 @@ protected:
enum class returnvalues {
NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW
} _returnvalues;
enum class returnvalues_on_condition_check_failure {
NONE, ALL_OLD
} _returnvalues_on_condition_check_failure;
static returnvalues parse_returnvalues(const rjson::value& request);
static returnvalues_on_condition_check_failure parse_returnvalues_on_condition_check_failure(const rjson::value& request);
// When _returnvalues != NONE, apply() should store here, in JSON form,
// the values which are to be returned in the "Attributes" field.
// The default null JSON means do not return an Attributes field at all.
@@ -77,6 +81,8 @@ protected:
// it (see explanation below), but note that because apply() may be
// called more than once, if apply() will sometimes set this field it
// must set it (even if just to the default empty value) every time.
// Additionally when _returnvalues_on_condition_check_failure is ALL_OLD
// then condition check failure will also result in storing values here.
mutable rjson::value _return_attributes;
public:
// The constructor of a rmw_operation subclass should parse the request

View File

@@ -11,7 +11,6 @@
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
#include "mutation/position_in_partition.hh"
@@ -59,7 +58,7 @@ type_representation represent_type(alternator_type atype) {
// calculate its magnitude and precision from its scale() and unscaled_value().
// So in the following ugly implementation we calculate them from the string
// representation instead. We assume the number was already parsed
// sucessfully to a big_decimal to it follows its syntax rules.
// successfully to a big_decimal to it follows its syntax rules.
//
// FIXME: rewrite this function to take a big_decimal, not a string.
// Maybe a snippet like this can help:

View File

@@ -23,7 +23,6 @@
#include "service/storage_proxy.hh"
#include "gms/gossiper.hh"
#include "utils/overloaded_functor.hh"
#include "utils/fb_utilities.hh"
#include "utils/aws_sigv4.hh"
static logging::logger slogger("alternator-server");
@@ -118,7 +117,7 @@ public:
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
auto res = resf.get0();
auto res = resf.get();
std::visit(overloaded_functor {
[&] (const json::json_return_type& json_return_value) {
slogger.trace("api_handler success case");
@@ -156,6 +155,9 @@ public:
protected:
void generate_error_reply(reply& rep, const api_error& err) {
rjson::value results = rjson::empty_object();
if (!err._extra_fields.IsNull() && err._extra_fields.IsObject()) {
results = rjson::copy(err._extra_fields);
}
rjson::add(results, "__type", rjson::from_string("com.amazonaws.dynamodb.v20120810#" + err._type));
rjson::add(results, "message", err._msg);
rep._content = rjson::print(std::move(results));
@@ -308,8 +310,8 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
}
}
auto cache_getter = [&proxy = _proxy] (std::string username) {
return get_key_from_roles(proxy, std::move(username));
auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {
return get_key_from_roles(proxy, as, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,
user = std::move(user),
@@ -566,14 +568,14 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_content_streaming(true);
_https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
slogger.info("Reloaded {}", files);
}
}).get0());
_https_server.listen(socket_address{addr, *https_port}).get();
}).get();
_https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});

View File

@@ -21,10 +21,12 @@ stats::stats() : api_operations{} {
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::make_summary("op_latency_summary", \
seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
OPERATION(batch_get_item, "BatchGetItem")
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")

View File

@@ -11,8 +11,8 @@
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "utils/histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
@@ -66,11 +66,11 @@ public:
uint64_t get_shard_iterator = 0;
uint64_t get_records = 0;
utils::time_estimated_histogram put_item_latency;
utils::time_estimated_histogram get_item_latency;
utils::time_estimated_histogram delete_item_latency;
utils::time_estimated_histogram update_item_latency;
utils::time_estimated_histogram get_records_latency;
utils::timed_rate_moving_average_summary_and_histogram put_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_item_latency;
utils::timed_rate_moving_average_summary_and_histogram delete_item_latency;
utils::timed_rate_moving_average_summary_and_histogram update_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_records_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;

View File

@@ -13,8 +13,6 @@
#include <seastar/json/formatter.hh>
#include "utils/base64.hh"
#include "log.hh"
#include "db/config.hh"
#include "cdc/log.hh"
@@ -25,7 +23,6 @@
#include "utils/UUID_gen.hh"
#include "cql3/selection/selection.hh"
#include "cql3/result_set.hh"
#include "cql3/type_json.hh"
#include "cql3/column_identifier.hh"
#include "schema/schema_builder.hh"
#include "service/storage_proxy.hh"
@@ -33,7 +30,6 @@
#include "gms/feature_service.hh"
#include "executor.hh"
#include "rmw_operation.hh"
#include "data_dictionary/data_dictionary.hh"
/**
@@ -280,7 +276,7 @@ struct sequence_number {
* Timeuuids viewed as msb<<64|lsb are _not_,
* but they are still sorted as
* timestamp() << 64|lsb
* so we can simpy unpack the mangled msb
* so we can simply unpack the mangled msb
* and use as hi 64 in our "bignum".
*/
uint128_t hi = uint64_t(num.uuid.timestamp());
@@ -419,7 +415,7 @@ using namespace std::string_literals;
*
* In scylla, this is sort of akin to an ID having corresponding ID/ID:s
* that cover the token range it represents. Because ID:s are per
* vnode shard however, this relation can be somewhat ambigous.
* vnode shard however, this relation can be somewhat ambiguous.
* We still provide some semblance of this by finding the ID in
* older generation that has token start < current ID token start.
* This will be a partial overlap, but it is the best we can do.
@@ -526,7 +522,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
// (see explanation above) since we want to find closest
// token boundary when determining parent.
// #7346 - we processed and searched children/parents in
// stored order, which is not neccesarily token order,
// stored order, which is not necessarily token order,
// so the finding of "closest" token boundary (using upper bound)
// could give somewhat weird results.
static auto token_cmp = [](const cdc::stream_id& id1, const cdc::stream_id& id2) {
@@ -1020,7 +1016,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
// shard did end, then the next read will have nrecords == 0 and
// will notice end end of shard and not return NextShardIterator.
rjson::add(ret, "NextShardIterator", next_iter);
_stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
@@ -1043,7 +1039,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
shard_iterator next_iter(iter.table, iter.shard, utils::UUID_gen::min_time_UUID(high_ts.time_since_epoch()), true);
rjson::add(ret, "NextShardIterator", iter);
}
_stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);
if (is_big(ret)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));
}

View File

@@ -32,13 +32,11 @@
#include "service/pager/paging_state.hh"
#include "service/pager/query_pagers.hh"
#include "gms/feature_service.hh"
#include "sstables/types.hh"
#include "mutation/mutation.hh"
#include "types/types.hh"
#include "types/map.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
#include "utils/fb_utilities.hh"
#include "cql3/selection/selection.hh"
#include "cql3/values.hh"
#include "cql3/query_options.hh"
@@ -81,6 +79,11 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
co_return api_error::validation("UpdateTimeToLive requires boolean Enabled");
}
bool enabled = v->GetBool();
// Alternator TTL doesn't yet work when the table uses tablets (#16567)
if (enabled && _proxy.local_db().find_keyspace(schema->ks_name()).get_replication_strategy().uses_tablets()) {
co_return api_error::validation("TTL not yet supported on a table using tablets (issue #16567). "
"Create a table with the tag 'experimental:initial_tablets' set to 'none' to use vnodes.");
}
v = rjson::find(*spec, "AttributeName");
if (!v || !v->IsString()) {
co_return api_error::validation("UpdateTimeToLive requires string AttributeName");
@@ -155,7 +158,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
// node owning this range as a "primary range" (the first node in the ring
// with this range), but when this node is down, the secondary owner (the
// second in the ring) may take over.
// An expiration thread is reponsible for all tables which need expiration
// An expiration thread is responsible for all tables which need expiration
// scans. Currently, the different tables are scanned sequentially (not in
// parallel).
// The expiration thread scans item using CL=QUORUM to ensures that it reads
@@ -417,6 +420,7 @@ class token_ranges_owned_by_this_shard {
};
schema_ptr _s;
locator::effective_replication_map_ptr _erm;
// _token_ranges will contain a list of token ranges owned by this node.
// We'll further need to split each such range to the pieces owned by
// the current shard, using _intersecter.
@@ -430,15 +434,14 @@ class token_ranges_owned_by_this_shard {
size_t _range_idx;
size_t _end_idx;
std::optional<dht::selective_token_range_sharder> _intersecter;
locator::effective_replication_map_ptr _erm;
public:
token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)
: _s(s)
, _token_ranges(db.find_keyspace(s->ks_name()).get_effective_replication_map(),
g, utils::fb_utilities::get_broadcast_address())
, _erm(s->table().get_effective_replication_map())
, _token_ranges(db.find_keyspace(s->ks_name()).get_vnode_effective_replication_map(),
g, _erm->get_topology().my_address())
, _range_idx(random_offset(0, _token_ranges.size() - 1))
, _end_idx(_range_idx + _token_ranges.size())
, _erm(s->table().get_effective_replication_map())
{
tlogger.debug("Generating token ranges starting from base range {} of {}", _range_idx, _token_ranges.size());
}

View File

@@ -15,10 +15,12 @@ set(swagger_files
api-doc/lsa.json
api-doc/messaging_service.json
api-doc/metrics.json
api-doc/raft.json
api-doc/storage_proxy.json
api-doc/storage_service.json
api-doc/stream_manager.json
api-doc/system.json
api-doc/tasks.json
api-doc/task_manager.json
api-doc/task_manager_test.json
api-doc/utils.json)
@@ -52,12 +54,15 @@ target_sources(api
hinted_handoff.cc
lsa.cc
messaging_service.cc
raft.cc
storage_proxy.cc
storage_service.cc
stream_manager.cc
system.cc
tasks.cc
task_manager.cc
task_manager_test.cc
token_metadata.cc
${swagger_gen_files})
target_include_directories(api
PUBLIC
@@ -66,6 +71,8 @@ target_include_directories(api
target_link_libraries(api
idl
wasmtime_bindings
Seastar::seastar
xxHash::xxhash)
check_headers(check-headers api
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -84,6 +84,14 @@
"type":"string",
"paramType":"path"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when the table is flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"split_output",
"description":"true if the output of the major compaction should be split in several sstables",
@@ -203,7 +211,7 @@
"operations":[
{
"method":"POST",
"summary":"Sets the minumum and maximum number of sstables in queue before compaction kicks off",
"summary":"Sets the minimum and maximum number of sstables in queue before compaction kicks off",
"type":"string",
"nickname":"set_compaction_threshold",
"produces":[

View File

@@ -144,6 +144,21 @@
"parameters": []
}
]
},
{
"path": "/commitlog/metrics/max_disk_size",
"operations": [
{
"method": "GET",
"summary": "Get max disk size",
"type": "long",
"nickname": "get_max_disk_size",
"produces": [
"application/json"
],
"parameters": []
}
]
}
]
}

View File

@@ -90,6 +90,30 @@
}
]
},
{
"path":"/v2/error_injection/disconnect/{ip}",
"operations":[
{
"method":"POST",
"summary":"Drop connection to a given IP",
"type":"void",
"nickname":"inject_disconnect",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ip",
"description":"IP address to disconnect from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/v2/error_injection/injection",
"operations":[

View File

@@ -12,7 +12,7 @@
"operations":[
{
"method":"GET",
"summary":"Get the addreses of the down endpoints",
"summary":"Get the addresses of the down endpoints",
"type":"array",
"items":{
"type":"string"
@@ -31,7 +31,7 @@
"operations":[
{
"method":"GET",
"summary":"Get the addreses of live endpoints",
"summary":"Get the addresses of live endpoints",
"type":"array",
"items":{
"type":"string"

View File

@@ -7,11 +7,11 @@
"items": {
"type": "string"
},
"description": "The source labels, a match is based on concatination of the labels"
"description": "The source labels, a match is based on concatenation of the labels"
},
"action": {
"type": "string",
"description": "The action to perfrom on match",
"description": "The action to perform on match",
"enum": ["skip_when_empty", "report_when_empty", "replace", "keep", "drop", "drop_label"]
},
"target_label": {
@@ -28,7 +28,7 @@
},
"separator": {
"type": "string",
"description": "The separator string to use when concatinating the labels"
"description": "The separator string to use when concatenating the labels"
}
}
}

67
api/api-doc/raft.json Normal file
View File

@@ -0,0 +1,67 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/raft",
"produces":[
"application/json"
],
"apis":[
{
"path":"/raft/trigger_snapshot/{group_id}",
"operations":[
{
"method":"POST",
"summary":"Triggers snapshot creation and log truncation for the given Raft group",
"type":"string",
"nickname":"trigger_snapshot",
"produces":[
"application/json"
],
"parameters":[
{
"name":"group_id",
"description":"The ID of the group which should get snapshotted",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"timeout",
"description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
]
},
{
"path":"/raft/leader_host",
"operations":[
{
"method":"GET",
"summary":"Returns host ID of the current leader of the given Raft group",
"type":"string",
"nickname":"get_leader_host",
"produces":[
"application/json"
],
"parameters":[
{
"name":"group_id",
"description":"The ID of the group. When absent, group0 is used.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -336,6 +336,14 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"cf",
"description":"Column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -368,25 +376,6 @@
}
]
},
{
"path":"/storage_service/describe_ring/",
"operations":[
{
"method":"GET",
"summary":"The TokenRange for a any keyspace",
"type":"array",
"items":{
"type":"token_range"
},
"nickname":"describe_any_ring",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/describe_ring/{keyspace}",
"operations":[
@@ -409,6 +398,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"table",
"description":"The name of table to fetch information about",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -436,6 +433,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -701,6 +706,30 @@
}
]
},
{
"path":"/storage_service/compact",
"operations":[
{
"method":"POST",
"summary":"Forces major compaction in all keyspaces",
"type":"void",
"nickname":"force_compaction",
"produces":[
"application/json"
],
"parameters":[
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/keyspace_compaction/{keyspace}",
"operations":[
@@ -715,7 +744,7 @@
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"description":"The keyspace to compact",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -728,6 +757,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -747,7 +784,7 @@
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"description":"The keyspace to cleanup",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -765,6 +802,21 @@
}
]
},
{
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
"summary":"Trigger a global cleanup",
"type":"long",
"nickname":"cleanup_all",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/keyspace_offstrategy_compaction/{keyspace}",
"operations":[
@@ -912,6 +964,21 @@
}
]
},
{
"path":"/storage_service/flush",
"operations":[
{
"method":"POST",
"summary":"Flush all memtables in all keyspaces.",
"type":"void",
"nickname":"force_flush",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/keyspace_flush/{keyspace}",
"operations":[
@@ -1122,6 +1189,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"small_table_optimization",
"description":"If the value is the string 'true' with any capitalization, perform small table optimization. When this option is enabled, user can send the repair request to any of the nodes in the cluster. There is no need to send repair requests to multiple nodes. All token ranges for the table will be repaired automatically.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
},
@@ -1455,6 +1530,15 @@
"type":"string",
"enum": [ "all", "user", "non_local_strategy" ],
"paramType":"query"
},
{
"name":"replication",
"description":"Filter keyspaces for the replication used: vnodes or tablets (default: all)",
"required":false,
"allowMultiple":false,
"type":"string",
"enum": [ "all", "vnodes", "tablets" ],
"paramType":"query"
}
]
}
@@ -1602,7 +1686,7 @@
},
{
"method":"POST",
"summary":"allows a user to reenable thrift",
"summary":"allows a user to re-enable thrift",
"type":"void",
"nickname":"start_rpc_server",
"produces":[
@@ -2410,6 +2494,238 @@
}
]
},
{
"path":"/storage_service/tablets/move",
"operations":[
{
"nickname":"move_tablet",
"method":"POST",
"summary":"Moves a tablet replica",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to move",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"src_host",
"description":"Source host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dst_host",
"description":"Destination host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"src_shard",
"description":"Source shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"dst_shard",
"description":"Destination shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/add_replica",
"operations":[
{
"nickname":"add_tablet_replica",
"method":"POST",
"summary":"Adds replica to tablet",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to add replica to",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"dst_host",
"description":"Destination host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dst_shard",
"description":"Destination shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/del_replica",
"operations":[
{
"nickname":"del_tablet_replica",
"method":"POST",
"summary":"Deletes replica from tablet",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to delete replica from",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"host",
"description":"Host id to remove replica from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"shard",
"description":"Shard number to remove replica from",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/balancing",
"operations":[
{
"nickname":"tablet_balancing_enable",
"method":"POST",
"summary":"Controls tablet load-balancing",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"enabled",
"description":"When set to false, tablet load balancing is disabled",
"required":true,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/metrics/total_hints",
"operations":[
@@ -2511,6 +2827,33 @@
]
}
]
},
{
"path":"/storage_service/raft_topology/upgrade",
"operations":[
{
"method":"POST",
"summary":"Trigger the upgrade to topology on raft.",
"type":"void",
"nickname":"upgrade_to_raft_topology",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"GET",
"summary":"Get information about the current upgrade status of topology on raft.",
"type":"string",
"nickname":"raft_topology_upgrade_status",
"produces":[
"application/json"
],
"parameters":[
]
}
]
}
],
"models":{

View File

@@ -179,6 +179,21 @@
]
}
]
},
{
"path":"/system/dump_llvm_profile",
"operations":[
{
"method":"POST",
"summary":"Dump llvm profile data (raw profile data) that can later be used for coverage reporting or PGO (no-op if the current binary is not instrumented)",
"type":"void",
"nickname":"dump_profile",
"produces":[
"application/json"
],
"parameters":[]
}
]
}
]
}

230
api/api-doc/tasks.json Normal file
View File

@@ -0,0 +1,230 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/tasks",
"produces":[
"application/json"
],
"apis":[
{
"path":"/tasks/compaction/keyspace_compaction/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Forces major compaction of a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type":"string",
"nickname":"force_keyspace_compaction_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_cleanup/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type": "string",
"nickname":"force_keyspace_cleanup_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_offstrategy_compaction/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Perform offstrategy compaction, if needed, in a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type":"string",
"nickname":"perform_keyspace_offstrategy_compaction_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to operate on",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_scrub/{keyspace}",
"operations":[
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, resolving corruptions if any) the given keyspace asynchronously, returns uuid which can be used to check progress with task manager. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false. Scrub has the following modes: Abort (default) - abort scrub if corruption is detected; Skip (same as `skip_corrupted=true`) skip over corrupt data, omitting them from the output; Segregate - segregate data into multiple sstables if needed, such that each sstable contains data with valid order; Validate - read (no rewrite) and validate data, logging any problems found.",
"type": "string",
"nickname":"scrub_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"disable_snapshot",
"description":"When set to true, disable snapshot",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"skip_corrupted",
"description":"When set to true, skip corrupted",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"scrub_mode",
"description":"How to handle corrupt data (overrides 'skip_corrupted'); ",
"required":false,
"allowMultiple":false,
"type":"string",
"enum":[
"ABORT",
"SKIP",
"SEGREGATE",
"VALIDATE"
],
"paramType":"query"
},
{
"name":"quarantine_mode",
"description":"Controls whether to scrub quarantined sstables (default INCLUDE)",
"required":false,
"allowMultiple":false,
"type":"string",
"enum":[
"INCLUDE",
"EXCLUDE",
"ONLY"
],
"paramType":"query"
},
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_upgrade_sstables/{keyspace}",
"operations":[
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first asynchronously, returns uuid which can be used to check progress with task manager.",
"type": "string",
"nickname":"upgrade_sstables_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"exclude_current_version",
"description":"When set to true exclude current version",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -11,6 +11,7 @@
#include <seastar/http/transformers.hh>
#include <seastar/http/api_docs.hh>
#include "storage_service.hh"
#include "token_metadata.hh"
#include "commitlog.hh"
#include "gossiper.hh"
#include "failure_detector.hh"
@@ -31,6 +32,8 @@
#include "api/config.hh"
#include "task_manager.hh"
#include "task_manager_test.hh"
#include "tasks.hh"
#include "raft.hh"
logging::logger apilog("api");
@@ -65,6 +68,9 @@ future<> set_server_init(http_context& ctx) {
"The system related API");
rb02->add_definitions_file(r, "metrics");
set_system(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
set_error_injection(ctx, r);
});
}
@@ -155,6 +161,14 @@ future<> unset_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });
}
future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm) {
return ctx.http_server.set_routes([&ctx, &tm] (routes& r) { set_token_metadata(ctx, r, tm); });
}
future<> unset_server_token_metadata(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_token_metadata(ctx, r); });
}
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {
set_endpoint_snitch(ctx, r, snitch);
@@ -172,14 +186,14 @@ future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {
});
}
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
return register_api(ctx, "column_family",
"The column family API", [&sys_ks] (http_context& ctx, routes& r) {
set_column_family(ctx, r, sys_ks);
});
}
future<> unset_server_load_sstable(http_context& ctx) {
future<> unset_server_column_family(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });
}
@@ -264,9 +278,6 @@ future<> set_server_done(http_context& ctx) {
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
set_error_injection(ctx, r);
});
}
@@ -302,6 +313,32 @@ future<> unset_server_task_manager_test(http_context& ctx) {
#endif
future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &ss, &snap_ctl](routes& r) {
rb->register_function(r, "tasks",
"The tasks API");
set_tasks_compaction_module(ctx, r, ss, snap_ctl);
});
}
future<> unset_server_tasks_compaction_module(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_tasks_compaction_module(ctx, r); });
}
future<> set_server_raft(http_context& ctx, sharded<service::raft_group_registry>& raft_gr) {
auto rb = std::make_shared<api_registry_builder>(ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &raft_gr] (routes& r) {
rb->register_function(r, "raft", "The Raft API");
set_raft(ctx, r, raft_gr);
});
}
future<> unset_server_raft(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_raft(ctx, r); });
}
void req_params::process(const request& req) {
// Process mandatory parameters
for (auto& [name, ent] : params) {

View File

@@ -14,11 +14,11 @@
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/units/detail/utility.hpp>
#include "api/api_init.hh"
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "utils/estimated_histogram.hh"
#include <seastar/http/exception.hh>
#include "api_init.hh"
#include "seastarx.hh"
namespace api {
@@ -26,7 +26,9 @@ namespace api {
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;
for (auto i : container) {
res.reserve(std::size(container));
for (const auto& i : container) {
res.push_back(fmt::to_string(i));
}
return res;
@@ -35,27 +37,31 @@ std::vector<sstring> container_to_vec(const T& container) {
template<class T>
std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {
std::vector<T> res;
for (auto i : map) {
res.reserve(map.size());
for (const auto& [key, value] : map) {
res.push_back(T());
res.back().key = i.first;
res.back().value = i.second;
res.back().key = key;
res.back().value = value;
}
return res;
}
template<class T, class MAP>
std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {
for (auto i : map) {
res.reserve(res.size() + std::size(map));
for (const auto& [key, value] : map) {
T val;
val.key = fmt::to_string(i.first);
val.value = fmt::to_string(i.second);
val.key = fmt::to_string(key);
val.value = fmt::to_string(value);
res.push_back(val);
}
return res;
}
template <typename T, typename S = T>
T map_sum(T&& dest, const S& src) {
for (auto i : src) {
for (const auto& i : src) {
dest[i.first] += i.second;
}
return std::move(dest);
@@ -64,6 +70,8 @@ T map_sum(T&& dest, const S& src) {
template <typename MAP>
std::vector<sstring> map_keys(const MAP& map) {
std::vector<sstring> res;
res.reserve(std::size(map));
for (const auto& i : map) {
res.push_back(fmt::to_string(i.first));
}

View File

@@ -23,6 +23,7 @@ class load_meter;
class storage_proxy;
class storage_service;
class raft_group0_client;
class raft_group_registry;
} // namespace service
@@ -32,6 +33,10 @@ namespace streaming {
class stream_manager;
}
namespace gms {
class inet_address;
}
namespace locator {
class token_metadata;
@@ -73,14 +78,12 @@ struct http_context {
httpd::http_server_control http_server;
distributed<replica::database>& db;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
http_context(distributed<replica::database>& _db,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), lmeter(_lm), shared_token_metadata(_stm) {
service::load_meter& _lm)
: db(_db), lmeter(_lm)
{
}
const locator::token_metadata& get_token_metadata();
};
future<> set_server_init(http_context& ctx);
@@ -103,9 +106,11 @@ future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service
future<> unset_server_authorization_cache(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);
future<> unset_server_token_metadata(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_load_sstable(http_context& ctx);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_column_family(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy);
@@ -122,5 +127,9 @@ future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);
future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_tasks_compaction_module(http_context& ctx);
future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);
future<> unset_server_raft(http_context&);
}

View File

@@ -9,8 +9,6 @@
#include "api/api-doc/authorization_cache.json.hh"
#include "api/authorization_cache.hh"
#include "api/api.hh"
#include "auth/common.hh"
#include "auth/service.hh"
namespace api {

View File

@@ -197,7 +197,7 @@ void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
return memory::stats().total_memory();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
@@ -240,9 +240,9 @@ void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
// We currently do not support weights, so we use raw size in bytes instead
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});

View File

@@ -10,9 +10,9 @@
#include "api/api-doc/collectd.json.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>
#include "api/api_init.hh"
namespace api {

View File

@@ -8,10 +8,13 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_collectd(http_context& ctx, httpd::routes& r);
struct http_context;
void set_collectd(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -7,6 +7,7 @@
*/
#include "column_family.hh"
#include "api/api.hh"
#include "api/api-doc/column_family.json.hh"
#include <vector>
#include <seastar/http/exception.hh>
@@ -306,7 +307,9 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared
void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
std::vector<sstring> res;
ctx.db.local().get_tables_metadata().for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {
const replica::database::tables_metadata& meta = ctx.db.local().get_tables_metadata();
res.reserve(meta.size());
meta.for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {
res.push_back(kscf.first + ":" + kscf.second);
});
return res;
@@ -326,8 +329,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){
std::vector<sstring> res;
for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!= ctx.db.local().get_keyspaces().cend(); i++) {
res.push_back(i->first);
const flat_hash_map<sstring, replica::keyspace>& keyspaces = ctx.db.local().get_keyspaces();
res.reserve(keyspaces.size());
for (const auto& i : keyspaces) {
res.push_back(i.first);
}
return res;
});
@@ -1047,12 +1052,19 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
if (req->get_query_param("split_output") != "") {
auto params = req_params({
std::pair("name", mandatory::yes),
std::pair("flush_memtables", mandatory::no),
std::pair("split_output", mandatory::no),
});
params.process(*req);
if (params.get("split_output")) {
fail(unimplemented::cause::API);
}
auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.info("column_family/force_major_compaction: name={} flush={}", req->param["name"], flush);
apilog.info("column_family/force_major_compaction: name={}", req->param["name"]);
auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);
auto keyspace = validate_keyspace(ctx, ks);
std::vector<table_info> table_infos = {table_info{
.name = cf,
@@ -1060,7 +1072,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
}};
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, std::move(table_infos));
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);
co_await task->done();
co_return json_void();
});

View File

@@ -8,11 +8,11 @@
#pragma once
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "replica/database.hh"
#include <seastar/core/future-util.hh>
#include <seastar/json/json_elements.hh>
#include <any>
#include "api/api_init.hh"
namespace db {
class system_keyspace;

View File

@@ -9,6 +9,7 @@
#include "commitlog.hh"
#include "db/commitlog/commitlog.hh"
#include "api/api-doc/commitlog.json.hh"
#include "api/api_init.hh"
#include "replica/database.hh"
#include <vector>
@@ -16,7 +17,7 @@ namespace api {
using namespace seastar::httpd;
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {
static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::commitlog*)> func) {
typedef T ret_type;
return ctx.db.map_reduce0([func = std::move(func)](replica::database& db) {
@@ -62,6 +63,9 @@ void set_commitlog(http_context& ctx, routes& r) {
httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
});
httpd::commitlog_json::get_max_disk_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));
});
}
}

View File

@@ -8,10 +8,12 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_commitlog(http_context& ctx, httpd::routes& r);
struct http_context;
void set_commitlog(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -10,6 +10,7 @@
#include "compaction_manager.hh"
#include "compaction/compaction_manager.hh"
#include "api/api.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
@@ -50,7 +51,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.id = c.compaction_uuid.to_sstring();
s.id = fmt::to_string(c.compaction_uuid);
s.ks = c.ks_name;
s.cf = c.cf_name;
s.unit = "keys";
@@ -115,9 +116,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {
table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());
}
auto type = req->get_query_param("type");
co_await ctx.db.invoke_on_all([&ks_name, &table_names, type] (replica::database& db) {
co_await ctx.db.invoke_on_all([&] (replica::database& db) {
auto& cm = db.get_compaction_manager();
return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {
return parallel_for_each(table_names, [&] (sstring& table_name) {
auto& t = db.find_column_family(ks_name, table_name);
return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {
return cm.stop_compaction(type, &ts);
@@ -157,7 +158,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
return s.write("[").then([&ctx, &s, &first] {
return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.id = fmt::to_string(entry.id);
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;

View File

@@ -8,10 +8,12 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_compaction_manager(http_context& ctx, httpd::routes& r);
struct http_context;
void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -11,6 +11,7 @@
#include "db/config.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>
#include <seastar/http/exception.hh>
namespace api {
using namespace seastar::httpd;

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
#include <seastar/http/api_docs.hh>
namespace api {

View File

@@ -6,45 +6,15 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "locator/token_metadata.hh"
#include "locator/snitch_base.hh"
#include "locator/production_snitch_base.hh"
#include "endpoint_snitch.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "utils/fb_utilities.hh"
namespace api {
using namespace seastar::httpd;
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {
static auto host_or_broadcast = [](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);
};
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.dc;
}
return topology.get_datacenter(ep);
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.rack;
}
return topology.get_rack(ep);
});
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {
return snitch.local()->get_name();
});
@@ -60,8 +30,6 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p
}
void unset_endpoint_snitch(http_context& ctx, routes& r) {
httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
httpd::endpoint_snitch_info_json::get_rack.unset(r);
httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);
httpd::storage_service_json::update_snitch.unset(r);
}

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace locator {
class snitch_ptr;

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace api {

View File

@@ -7,6 +7,7 @@
*/
#include "failure_detector.hh"
#include "api/api.hh"
#include "api/api-doc/failure_detector.json.hh"
#include "gms/application_state.hh"
#include "gms/gossiper.hh"
@@ -18,37 +19,43 @@ namespace fd = httpd::failure_detector_json;
void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
fd::get_all_endpoint_states.set(r, [&g](std::unique_ptr<request> req) {
std::vector<fd::endpoint_state> res;
res.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {
fd::endpoint_state val;
val.addrs = fmt::to_string(addr);
val.is_alive = g.is_alive(addr);
val.generation = eps.get_heart_beat_state().get_generation().value();
val.version = eps.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = eps.get_update_timestamp().time_since_epoch().count();
for (const auto& [as_type, app_state] : eps.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);
version_val.value = app_state.value();
version_val.version = app_state.version().value();
val.application_state.push(version_val);
}
res.emplace_back(std::move(val));
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::vector<fd::endpoint_state> res;
res.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {
fd::endpoint_state val;
val.addrs = fmt::to_string(addr);
val.is_alive = g.is_alive(addr);
val.generation = eps.get_heart_beat_state().get_generation().value();
val.version = eps.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = eps.get_update_timestamp().time_since_epoch().count();
for (const auto& [as_type, app_state] : eps.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);
version_val.value = app_state.value();
version_val.version = app_state.version().value();
val.application_state.push(version_val);
}
res.emplace_back(std::move(val));
});
return make_ready_future<json::json_return_type>(res);
});
return make_ready_future<json::json_return_type>(res);
});
fd::get_up_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {
int res = g.get_up_endpoint_count();
return make_ready_future<json::json_return_type>(res);
return g.container().invoke_on(0, [] (gms::gossiper& g) {
int res = g.get_up_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
});
fd::get_down_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {
int res = g.get_down_endpoint_count();
return make_ready_future<json::json_return_type>(res);
return g.container().invoke_on(0, [] (gms::gossiper& g) {
int res = g.get_down_endpoint_count();
return make_ready_future<json::json_return_type>(res);
});
});
fd::get_phi_convict_threshold.set(r, [] (std::unique_ptr<request> req) {
@@ -56,11 +63,13 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::get_simple_states.set(r, [&g] (std::unique_ptr<request> req) {
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");
});
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
});
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
});
fd::set_phi_convict_threshold.set(r, [](std::unique_ptr<request> req) {
@@ -71,13 +80,15 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::get_endpoint_state.set(r, [&g] (std::unique_ptr<request> req) {
auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));
if (!state) {
return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));
}
std::stringstream ss;
g.append_endpoint_state(ss, *state);
return make_ready_future<json::json_return_type>(sstring(ss.str()));
return g.container().invoke_on(0, [req = std::move(req)] (gms::gossiper& g) {
auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));
if (!state) {
return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));
}
std::stringstream ss;
g.append_endpoint_state(ss, *state);
return make_ready_future<json::json_return_type>(sstring(ss.str()));
});
});
fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api_init.hh"
namespace gms {

View File

@@ -12,6 +12,7 @@
#include "api/api-doc/gossiper.json.hh"
#include "gms/endpoint_state.hh"
#include "gms/gossiper.hh"
#include "api/api.hh"
namespace api {
using namespace seastar::httpd;

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace gms {

View File

@@ -6,10 +6,10 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <algorithm>
#include <vector>
#include "hinted_handoff.hh"
#include "api/api.hh"
#include "api/api-doc/hinted_handoff.json.hh"
#include "gms/inet_address.hh"

View File

@@ -9,7 +9,7 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "api/api_init.hh"
namespace service { class storage_proxy; }

View File

@@ -8,12 +8,10 @@
#include "api/api-doc/lsa.json.hh"
#include "api/lsa.hh"
#include "api/api.hh"
#include <seastar/http/exception.hh>
#include "utils/logalloc.hh"
#include "log.hh"
#include "replica/database.hh"
namespace api {
using namespace seastar::httpd;
@@ -21,9 +19,9 @@ using namespace seastar::httpd;
static logging::logger alogger("lsa-api");
void set_lsa(http_context& ctx, routes& r) {
httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {
httpd::lsa_json::lsa_compact.set(r, [](std::unique_ptr<request> req) {
alogger.info("Triggering compaction");
return ctx.db.invoke_on_all([] (replica::database&) {
return smp::invoke_on_all([] {
logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());
}).then([] {
return json::json_return_type(json::json_void());

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace api {

View File

@@ -10,8 +10,8 @@
#include "message/messaging_service.hh"
#include <seastar/rpc/rpc_types.hh>
#include "api/api-doc/messaging_service.json.hh"
#include <iostream>
#include <sstream>
#include "api/api-doc/error_injection.json.hh"
#include "api/api.hh"
using namespace seastar::httpd;
using namespace httpd::messaging_service_json;
@@ -19,6 +19,8 @@ using namespace netw;
namespace api {
namespace hf = httpd::error_injection_json;
using shard_info = messaging_service::shard_info;
using msg_addr = messaging_service::msg_addr;
@@ -112,7 +114,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
}));
get_version.set(r, [&ms](const_req req) {
return ms.local().get_raw_version(req.get_query_param("addr"));
return ms.local().get_raw_version(gms::inet_address(req.get_query_param("addr")));
});
get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {
@@ -142,6 +144,14 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
return make_ready_future<json::json_return_type>(res);
});
});
hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto ip = msg_addr(req->param["ip"]);
co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {
ms.remove_rpc_client(ip);
});
co_return json::json_void();
});
}
void unset_messaging_service(http_context& ctx, routes& r) {
@@ -155,6 +165,7 @@ void unset_messaging_service(http_context& ctx, routes& r) {
get_respond_completed_messages.unset(r);
get_version.unset(r);
get_dropped_messages_by_ver.unset(r);
hf::inject_disconnect.unset(r);
}
}

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace netw { class messaging_service; }

84
api/raft.cc Normal file
View File

@@ -0,0 +1,84 @@
/*
* Copyright (C) 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <seastar/core/coroutine.hh>
#include "api/api.hh"
#include "api/api-doc/raft.json.hh"
#include "service/raft/raft_group_registry.hh"
using namespace seastar::httpd;
extern logging::logger apilog;
namespace api {
namespace r = httpd::raft_json;
using namespace json;
void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr) {
r::trigger_snapshot.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
raft::group_id gid{utils::UUID{req->param["group_id"]}};
auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {
if (timeout_str.empty()) {
return std::chrono::seconds{60};
}
auto dur = std::stoll(timeout_str);
if (dur <= 0) {
throw std::runtime_error{"Timeout must be a positive number."};
}
return std::chrono::seconds{dur};
});
std::atomic<bool> found_srv{false};
co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
auto* srv = raft_gr.find_server(gid);
if (!srv) {
co_return;
}
found_srv = true;
abort_on_expiry aoe(lowres_clock::now() + timeout_dur);
apilog.info("Triggering Raft group {} snapshot", gid);
auto result = co_await srv->trigger_snapshot(&aoe.abort_source());
if (result) {
apilog.info("New snapshot for Raft group {} created", gid);
} else {
apilog.info("Could not create new snapshot for Raft group {}, no new entries applied", gid);
}
});
if (!found_srv) {
throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};
}
co_return json_void{};
});
r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
return smp::submit_to(0, [&] {
auto& srv = std::invoke([&] () -> raft::server& {
if (req->query_parameters.contains("group_id")) {
raft::group_id id{utils::UUID{req->get_query_param("group_id")}};
return raft_gr.local().get_server(id);
} else {
return raft_gr.local().group0();
}
});
return json_return_type(srv.current_leader().to_sstring());
});
});
}
void unset_raft(http_context&, httpd::routes& r) {
r::trigger_snapshot.unset(r);
r::get_leader_host.unset(r);
}
}

18
api/raft.hh Normal file
View File

@@ -0,0 +1,18 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include "api_init.hh"
namespace api {
void set_raft(http_context& ctx, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr);
void unset_raft(http_context& ctx, httpd::routes& r);
}

18
api/scrub_status.hh Normal file
View File

@@ -0,0 +1,18 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
namespace api {
enum class scrub_status {
successful = 0,
aborted,
unable_to_cancel, // Not used in Scylla, included to ensure compatibility with nodetool api.
validation_errors,
};
} // namespace api

View File

@@ -8,6 +8,7 @@
#include "storage_proxy.hh"
#include "service/storage_proxy.hh"
#include "api/api.hh"
#include "api/api-doc/storage_proxy.json.hh"
#include "api/api-doc/utils.json.hh"
#include "db/config.hh"
@@ -27,7 +28,7 @@ utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::t
}
/**
* This function implement a two dimentional map reduce where
* This function implement a two dimensional map reduce where
* the first level is a distributed storage_proxy class and the
* second level is the stats per scheduling group class.
* @param d - a reference to the storage_proxy distributed class.
@@ -48,7 +49,7 @@ future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
}
/**
* This function implement a two dimentional map reduce where
* This function implement a two dimensional map reduce where
* the first level is a distributed storage_proxy class and the
* second level is the stats per scheduling group class.
* @param d - a reference to the storage_proxy distributed class.

View File

@@ -9,7 +9,7 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "api/api_init.hh"
namespace service { class storage_proxy; }

View File

@@ -7,8 +7,11 @@
*/
#include "storage_service.hh"
#include "api/api.hh"
#include "api/api-doc/column_family.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "api/api-doc/storage_proxy.json.hh"
#include "api/scrub_status.hh"
#include "db/config.hh"
#include "db/schema_tables.hh"
#include "utils/hash.hh"
@@ -16,11 +19,14 @@
#include <sstream>
#include <time.h>
#include <algorithm>
#include <functional>
#include <iterator>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <boost/algorithm/string/trim_all.hpp>
#include <boost/algorithm/string/case_conv.hpp>
#include <boost/functional/hash.hpp>
#include "service/raft/raft_group0_client.hh"
#include "service/storage_service.hh"
#include "service/load_meter.hh"
#include "db/commitlog/commitlog.hh"
@@ -35,6 +41,7 @@
#include "log.hh"
#include "release.hh"
#include "compaction/compaction_manager.hh"
#include "compaction/task_manager_module.hh"
#include "sstables/sstables.hh"
#include "replica/database.hh"
#include "db/extensions.hh"
@@ -58,26 +65,66 @@ namespace ss = httpd::storage_service_json;
namespace sp = httpd::storage_proxy_json;
using namespace json;
sstring validate_keyspace(http_context& ctx, sstring ks_name) {
sstring validate_keyspace(const http_context& ctx, sstring ks_name) {
if (ctx.db.local().has_keyspace(ks_name)) {
return ks_name;
}
throw bad_param_exception(replica::no_such_keyspace(ks_name).what());
}
sstring validate_keyspace(http_context& ctx, const parameters& param) {
sstring validate_keyspace(const http_context& ctx, const parameters& param) {
return validate_keyspace(ctx, param["keyspace"]);
}
static void validate_table(const http_context& ctx, sstring ks_name, sstring table_name) {
auto& db = ctx.db.local();
try {
db.find_column_family(ks_name, table_name);
} catch (replica::no_such_column_family& e) {
throw bad_param_exception(e.what());
}
}
static void ensure_tablets_disabled(const http_context& ctx, const sstring& ks_name, const sstring& api_endpoint_path) {
if (ctx.db.local().find_keyspace(ks_name).uses_tablets()) {
throw bad_param_exception{fmt::format("{} is per-table in keyspace '{}'. Please provide table name using 'cf' parameter.", api_endpoint_path, ks_name)};
}
}
static bool any_of_keyspaces_use_tablets(const http_context& ctx) {
auto& db = ctx.db.local();
auto uses_tablets = [&db](const auto& ks_name) {
return db.find_keyspace(ks_name).uses_tablets();
};
auto keyspaces = db.get_all_keyspaces();
return std::any_of(std::begin(keyspaces), std::end(keyspaces), uses_tablets);
}
locator::host_id validate_host_id(const sstring& param) {
auto hoep = locator::host_id_or_endpoint(param, locator::host_id_or_endpoint::param_type::host_id);
return hoep.id;
return hoep.id();
}
bool validate_bool(const sstring& param) {
if (param == "true") {
return true;
} else if (param == "false") {
return false;
} else {
throw std::runtime_error("Parameter must be either 'true' or 'false'");
}
}
static
int64_t validate_int(const sstring& param) {
return std::atoll(param.c_str());
}
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, sstring value) {
std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, sstring value) {
if (value.empty()) {
return map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());
}
@@ -92,7 +139,7 @@ std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, sst
return names;
}
std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {
std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {
auto it = query_params.find(param_name);
if (it == query_params.end()) {
return {};
@@ -100,7 +147,7 @@ std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, con
return parse_tables(ks_name, ctx, it->second);
}
std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, sstring value) {
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, sstring value) {
std::vector<table_info> res;
try {
if (value.empty()) {
@@ -125,30 +172,11 @@ std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context&
return res;
}
std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {
auto it = query_params.find(param_name);
return parse_table_infos(ks_name, ctx, it != query_params.end() ? it->second : "");
}
// Run on all tables, skipping dropped tables
future<> run_on_existing_tables(sstring op, replica::database& db, std::string_view keyspace, const std::vector<table_info> local_tables, std::function<future<> (replica::table&)> func) {
std::exception_ptr ex;
for (const auto& ti : local_tables) {
apilog.debug("Starting {} on {}.{}", op, keyspace, ti);
try {
co_await func(db.find_column_family(ti.id));
} catch (const replica::no_such_column_family& e) {
apilog.warn("Skipping {} of {}.{}: {}", op, keyspace, ti, e.what());
} catch (...) {
ex = std::current_exception();
apilog.error("Failed {} of {}.{}: {}", op, keyspace, ti, ex);
}
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
}
}
static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
ss::token_range r;
r.start_token = d._start_token;
@@ -249,18 +277,86 @@ future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const
});
}
future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req) {
scrub_info info;
auto rp = req_params({
{"keyspace", {mandatory::yes}},
{"cf", {""}},
{"scrub_mode", {}},
{"skip_corrupted", {}},
{"disable_snapshot", {}},
{"quarantine_mode", {}},
});
rp.process(*req);
info.keyspace = validate_keyspace(ctx, *rp.get("keyspace"));
info.column_families = parse_tables(info.keyspace, ctx, *rp.get("cf"));
auto scrub_mode_opt = rp.get("scrub_mode");
auto scrub_mode = sstables::compaction_type_options::scrub::mode::abort;
if (!scrub_mode_opt) {
const auto skip_corrupted = rp.get_as<bool>("skip_corrupted").value_or(false);
if (skip_corrupted) {
scrub_mode = sstables::compaction_type_options::scrub::mode::skip;
}
} else {
auto scrub_mode_str = *scrub_mode_opt;
if (scrub_mode_str == "ABORT") {
scrub_mode = sstables::compaction_type_options::scrub::mode::abort;
} else if (scrub_mode_str == "SKIP") {
scrub_mode = sstables::compaction_type_options::scrub::mode::skip;
} else if (scrub_mode_str == "SEGREGATE") {
scrub_mode = sstables::compaction_type_options::scrub::mode::segregate;
} else if (scrub_mode_str == "VALIDATE") {
scrub_mode = sstables::compaction_type_options::scrub::mode::validate;
} else {
throw httpd::bad_param_exception(fmt::format("Unknown argument for 'scrub_mode' parameter: {}", scrub_mode_str));
}
}
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
co_await coroutine::parallel_for_each(info.column_families, [&snap_ctl, keyspace = info.keyspace, tag](sstring cf) {
// We always pass here db::snapshot_ctl::snap_views::no since:
// 1. When scrubbing particular tables, there's no need to auto-snapshot their views.
// 2. When scrubbing the whole keyspace, column_families will contain both base tables and views.
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::snap_views::no, db::snapshot_ctl::skip_flush::no);
});
}
info.opts = {
.operation_mode = scrub_mode,
};
const sstring quarantine_mode_str = req_param<sstring>(*req, "quarantine_mode", "INCLUDE");
if (quarantine_mode_str == "INCLUDE") {
info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::include;
} else if (quarantine_mode_str == "EXCLUDE") {
info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::exclude;
} else if (quarantine_mode_str == "ONLY") {
info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::only;
} else {
throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));
}
co_return info;
}
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {
ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {
ss::start_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return ctl.start_server();
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.start_server();
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {
ss::stop_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return ctl.request_stop_server();
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.request_stop_server();
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -282,17 +378,21 @@ void unset_transport_controller(http_context& ctx, routes& r) {
}
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {
ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {
ss::stop_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return ctl.request_stop_server();
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.request_stop_server();
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {
ss::start_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {
return smp::submit_to(0, [&] {
return ctl.start_server();
return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {
return ctl.start_server();
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
@@ -315,9 +415,22 @@ void unset_rpc_controller(http_context& ctx, routes& r) {
void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {
ss::repair_async.set(r, [&ctx, &repair](std::unique_ptr<http::request> req) {
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
static std::unordered_set<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "ignore_nodes", "trace",
"startToken", "endToken", "ranges_parallelism"};
"startToken", "endToken", "ranges_parallelism", "small_table_optimization"};
// Nodetool still sends those unsupported options. Ignore them to avoid failing nodetool repair.
static std::unordered_set<sstring> legacy_options_to_ignore = {"pullRepair", "ignoreUnreplicatedKeyspaces"};
for (auto& x : req->query_parameters) {
if (legacy_options_to_ignore.contains(x.first)) {
continue;
}
if (!options.contains(x.first)) {
return make_exception_future<json::json_return_type>(
httpd::bad_param_exception(format("option {} is not supported", x.first)));
}
}
std::unordered_map<sstring, sstring> options_map;
for (auto o : options) {
auto s = req->get_query_param(o);
@@ -347,7 +460,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
res = fut.get();
} catch(std::runtime_error& e) {
throw httpd::bad_param_exception(e.what());
}
@@ -380,7 +493,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {
.then_wrapped([] (future<repair_status>&& fut) {
ss::ns_repair_async_status::return_type_wrapper res;
try {
res = fut.get0();
res = fut.get();
} catch (std::exception& e) {
return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));
}
@@ -462,25 +575,11 @@ static future<json::json_return_type> describe_ring_as_json(sharded<service::sto
co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring(keyspace), token_range_endpoints_to_json));
}
static future<json::json_return_type> describe_ring_as_json_for_table(const sharded<service::storage_service>& ss, sstring keyspace, sstring table) {
co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring_for_table(keyspace, table), token_range_endpoints_to_json));
}
void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {
ss::local_hostid.set(r, [&ss](std::unique_ptr<http::request> req) {
auto id = ss.local().get_token_metadata().get_my_id();
return make_ready_future<json::json_return_type>(id.to_sstring());
});
ss::get_tokens.set(r, [&ss] (std::unique_ptr<http::request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ss.local().get_token_metadata().sorted_tokens(), [](const dht::token& i) {
return fmt::to_string(i);
}));
});
ss::get_node_tokens.set(r, [&ss] (std::unique_ptr<http::request> req) {
gms::inet_address addr(req->param["endpoint"]);
return make_ready_future<json::json_return_type>(stream_range_as_array(ss.local().get_token_metadata().get_tokens(addr), [](const dht::token& i) {
return fmt::to_string(i);
}));
});
ss::get_commitlog.set(r, [&ctx](const_req req) {
return ctx.db.local().commitlog()->active_config().commit_log_location;
});
@@ -544,24 +643,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
});
ss::get_leaving_nodes.set(r, [&ss](const_req req) {
return container_to_vec(ss.local().get_token_metadata().get_leaving_endpoints());
});
ss::get_moving_nodes.set(r, [](const_req req) {
std::unordered_set<sstring> addr;
return container_to_vec(addr);
});
ss::get_joining_nodes.set(r, [&ss](const_req req) {
auto points = ss.local().get_token_metadata().get_bootstrap_tokens();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(fmt::to_string(i.second));
}
return container_to_vec(addr);
});
ss::get_release_version.set(r, [&ss](const_req req) {
return ss.local().get_release_version();
});
@@ -583,8 +664,23 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::get_range_to_endpoint_map.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req->param);
auto table = req->get_query_param("cf");
auto erm = std::invoke([&]() -> locator::effective_replication_map_ptr {
auto& ks = ctx.db.local().find_keyspace(keyspace);
if (table.empty()) {
ensure_tablets_disabled(ctx, keyspace, "storage_service/range_to_endpoint_map");
return ks.get_vnode_effective_replication_map();
} else {
validate_table(ctx, keyspace, table);
auto& cf = ctx.db.local().find_column_family(keyspace, table);
return cf.get_effective_replication_map();
}
});
std::vector<ss::maplist_mapper> res;
co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(keyspace),
co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(erm),
[](const std::pair<dht::token_range, inet_address_vector_replica_set>& entry){
ss::maplist_mapper m;
if (entry.first.start()) {
@@ -612,25 +708,19 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
return make_ready_future<json::json_return_type>(res);
});
ss::describe_any_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {
// Find an arbitrary non-system keyspace.
auto keyspaces = ctx.db.local().get_non_local_vnode_based_strategy_keyspaces();
if (keyspaces.empty()) {
throw std::runtime_error("No keyspace provided and no non system kespace exist");
}
auto ks = keyspaces[0];
return describe_ring_as_json(ss, ks);
});
ss::describe_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {
if (!req->param.exists("keyspace")) {
throw bad_param_exception("The keyspace param is not provided");
}
auto keyspace = req->param["keyspace"];
auto table = req->get_query_param("table");
if (!table.empty()) {
validate_table(ctx, keyspace, table);
return describe_ring_as_json_for_table(ss, keyspace, table);
}
return describe_ring_as_json(ss, validate_keyspace(ctx, req->param));
});
ss::get_host_id_map.set(r, [&ss](const_req req) {
std::vector<ss::mapper> res;
return map_to_key_value(ss.local().get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);
});
ss::get_load.set(r, [&ctx](std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, &replica::column_family_stats::live_disk_space_used);
});
@@ -649,7 +739,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
ss::get_current_generation_number.set(r, [&ss](std::unique_ptr<http::request> req) {
gms::inet_address ep(utils::fb_utilities::get_broadcast_address());
auto ep = ss.local().get_token_metadata().get_topology().my_address();
return ss.local().gossiper().get_current_generation_number(ep).then([](gms::generation_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
@@ -669,14 +759,50 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
});
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
ss::force_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.debug("force_keyspace_compaction: keyspace={} tables={}", keyspace, table_infos);
auto params = req_params({
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.info("force_compaction: flush={}", flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos);
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<global_major_compaction_task_impl>({}, db, fmopt);
try {
co_await task->done();
} catch (...) {
apilog.error("force_compaction failed: {}", std::current_exception());
throw;
}
co_return json_void();
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.debug("force_keyspace_compaction: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);
try {
co_await task->done();
} catch (...) {
@@ -691,6 +817,12 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
const auto& rs = db.local().find_keyspace(keyspace).get_replication_strategy();
if (rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based()) {
auto reason = rs.get_type() == locator::replication_strategy_type::local ? "require" : "support";
apilog.info("Keyspace {} does not {} cleanup", keyspace, reason);
co_return json::json_return_type(0);
}
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
@@ -699,7 +831,8 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
}
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos);
auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>(
{}, std::move(keyspace), db, table_infos, flush_mode::all_tables);
try {
co_await task->done();
} catch (...) {
@@ -710,11 +843,36 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
co_return json::json_return_type(0);
});
ss::cleanup_all.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
apilog.info("cleanup_all");
auto done = co_await ss.invoke_on(0, [] (service::storage_service& ss) -> future<bool> {
if (!ss.is_topology_coordinator_enabled()) {
co_return false;
}
co_await ss.do_cluster_cleanup();
co_return true;
});
if (done) {
co_return json::json_return_type(0);
}
// fall back to the local global cleanup if topology coordinator is not enabled
auto& db = ctx.db;
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<global_cleanup_compaction_task_impl>({}, db);
try {
co_await task->done();
} catch (...) {
apilog.error("cleanup_all failed: {}", std::current_exception());
throw;
}
co_return json::json_return_type(0);
});
ss::perform_keyspace_offstrategy_compaction.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);
bool res = false;
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, res);
auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, &res);
try {
co_await task->done();
} catch (...) {
@@ -743,6 +901,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
co_return json::json_return_type(0);
}));
ss::force_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
apilog.info("flush all tables");
co_await ctx.db.invoke_on_all([] (replica::database& db) {
return db.flush_all_tables();
});
co_return json_void();
});
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");
@@ -860,12 +1026,22 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::get_keyspaces.set(r, [&ctx](const_req req) {
auto type = req.get_query_param("type");
auto replication = req.get_query_param("replication");
std::vector<sstring> keyspaces;
if (type == "user") {
return ctx.db.local().get_user_keyspaces();
keyspaces = ctx.db.local().get_user_keyspaces();
} else if (type == "non_local_strategy") {
return ctx.db.local().get_non_local_strategy_keyspaces();
keyspaces = ctx.db.local().get_non_local_strategy_keyspaces();
} else {
keyspaces = map_keys(ctx.db.local().get_keyspaces());
}
return map_keys(ctx.db.local().get_keyspaces());
if (replication.empty() || replication == "all") {
return keyspaces;
}
const auto want_tablets = replication == "tablets";
return boost::copy_range<std::vector<sstring>>(keyspaces | boost::adaptors::filtered([&ctx, want_tablets] (const sstring& ks) {
return ctx.db.local().find_keyspace(ks).get_replication_strategy().uses_tablets() == want_tablets;
}));
});
ss::stop_gossiping.set(r, [&ss](std::unique_ptr<http::request> req) {
@@ -897,7 +1073,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::is_initialized.set(r, [&ss](std::unique_ptr<http::request> req) {
return ss.local().get_operation_mode().then([&ss] (auto mode) {
bool is_initialized = mode >= service::storage_service::mode::STARTING;
bool is_initialized = mode >= service::storage_service::mode::STARTING && mode != service::storage_service::mode::MAINTENANCE;
if (mode == service::storage_service::mode::NORMAL) {
is_initialized = ss.local().gossiper().is_enabled();
}
@@ -911,7 +1087,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::is_joined.set(r, [&ss] (std::unique_ptr<http::request> req) {
return ss.local().get_operation_mode().then([] (auto mode) {
return make_ready_future<json::json_return_type>(mode >= service::storage_service::mode::JOINING);
return make_ready_future<json::json_return_type>(mode >= service::storage_service::mode::JOINING && mode != service::storage_service::mode::MAINTENANCE);
});
});
@@ -1194,7 +1370,11 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
return make_ready_future<json::json_return_type>(0);
});
ss::get_ownership.set(r, [&ss] (std::unique_ptr<http::request> req) {
ss::get_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {
if (any_of_keyspaces_use_tablets(ctx)) {
throw httpd::bad_param_exception("storage_service/ownership cannot be used when a keyspace uses tablets");
}
return ss.local().get_ownership().then([] (auto&& ownership) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
@@ -1203,7 +1383,17 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
ss::get_effective_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {
auto keyspace_name = req->param["keyspace"] == "null" ? "" : validate_keyspace(ctx, req->param);
return ss.local().effective_ownership(keyspace_name).then([] (auto&& ownership) {
auto table_name = req->get_query_param("cf");
if (!keyspace_name.empty()) {
if (table_name.empty()) {
ensure_tablets_disabled(ctx, keyspace_name, "storage_service/ownership");
} else {
validate_table(ctx, keyspace_name, table_name);
}
}
return ss.local().effective_ownership(keyspace_name, table_name).then([] (auto&& ownership) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
});
@@ -1348,6 +1538,90 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
co_return json_void();
});
ss::upgrade_to_raft_topology.set(r,
[&ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
apilog.info("Requested to schedule upgrade to raft topology");
try {
co_await ss.invoke_on(0, [] (auto& ss) {
return ss.start_upgrade_to_raft_topology();
});
} catch (...) {
auto ex = std::current_exception();
apilog.error("Failed to schedule upgrade to raft topology: {}", ex);
std::rethrow_exception(std::move(ex));
}
co_return json_void();
});
ss::raft_topology_upgrade_status.set(r,
[&ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
const auto ustate = co_await ss.invoke_on(0, [] (auto& ss) {
return ss.get_topology_upgrade_state();
});
co_return sstring(format("{}", ustate));
});
ss::move_tablet.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {
auto src_host_id = validate_host_id(req->get_query_param("src_host"));
shard_id src_shard_id = validate_int(req->get_query_param("src_shard"));
auto dst_host_id = validate_host_id(req->get_query_param("dst_host"));
shard_id dst_shard_id = validate_int(req->get_query_param("dst_shard"));
auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));
auto ks = req->get_query_param("ks");
auto table = req->get_query_param("table");
validate_table(ctx, ks, table);
auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();
auto force_str = req->get_query_param("force");
auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));
co_await ss.local().move_tablet(table_id, token,
locator::tablet_replica{src_host_id, src_shard_id},
locator::tablet_replica{dst_host_id, dst_shard_id},
force);
co_return json_void();
});
ss::add_tablet_replica.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {
auto dst_host_id = validate_host_id(req->get_query_param("dst_host"));
shard_id dst_shard_id = validate_int(req->get_query_param("dst_shard"));
auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));
auto ks = req->get_query_param("ks");
auto table = req->get_query_param("table");
auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();
auto force_str = req->get_query_param("force");
auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));
co_await ss.local().add_tablet_replica(table_id, token,
locator::tablet_replica{dst_host_id, dst_shard_id},
force);
co_return json_void();
});
ss::del_tablet_replica.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {
auto dst_host_id = validate_host_id(req->get_query_param("host"));
shard_id dst_shard_id = validate_int(req->get_query_param("shard"));
auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));
auto ks = req->get_query_param("ks");
auto table = req->get_query_param("table");
auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();
auto force_str = req->get_query_param("force");
auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));
co_await ss.local().del_tablet_replica(table_id, token,
locator::tablet_replica{dst_host_id, dst_shard_id},
force);
co_return json_void();
});
ss::tablet_balancing_enable.set(r, [&ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {
auto enabled = validate_bool(req->get_query_param("enabled"));
co_await ss.local().set_tablet_balancing_enabled(enabled);
co_return json_void();
});
sp::get_schema_versions.set(r, [&ss](std::unique_ptr<http::request> req) {
return ss.local().describe_schema_versions().then([] (auto result) {
std::vector<sp::mapper_list> res;
@@ -1363,15 +1637,9 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
}
void unset_storage_service(http_context& ctx, routes& r) {
ss::local_hostid.unset(r);
ss::get_tokens.unset(r);
ss::get_node_tokens.unset(r);
ss::get_commitlog.unset(r);
ss::get_token_endpoint.unset(r);
ss::toppartitions_generic.unset(r);
ss::get_leaving_nodes.unset(r);
ss::get_moving_nodes.unset(r);
ss::get_joining_nodes.unset(r);
ss::get_release_version.unset(r);
ss::get_scylla_release_version.unset(r);
ss::get_schema_version.unset(r);
@@ -1379,18 +1647,19 @@ void unset_storage_service(http_context& ctx, routes& r) {
ss::get_saved_caches_location.unset(r);
ss::get_range_to_endpoint_map.unset(r);
ss::get_pending_range_to_endpoint_map.unset(r);
ss::describe_any_ring.unset(r);
ss::describe_ring.unset(r);
ss::get_host_id_map.unset(r);
ss::get_load.unset(r);
ss::get_load_map.unset(r);
ss::get_current_generation_number.unset(r);
ss::get_natural_endpoints.unset(r);
ss::cdc_streams_check_and_repair.unset(r);
ss::force_compaction.unset(r);
ss::force_keyspace_compaction.unset(r);
ss::force_keyspace_cleanup.unset(r);
ss::cleanup_all.unset(r);
ss::perform_keyspace_offstrategy_compaction.unset(r);
ss::upgrade_sstables.unset(r);
ss::force_flush.unset(r);
ss::force_keyspace_flush.unset(r);
ss::decommission.unset(r);
ss::move.unset(r);
@@ -1450,51 +1719,45 @@ void unset_storage_service(http_context& ctx, routes& r) {
ss::get_effective_ownership.unset(r);
ss::sstable_info.unset(r);
ss::reload_raft_topology_state.unset(r);
ss::upgrade_to_raft_topology.unset(r);
ss::raft_topology_upgrade_status.unset(r);
ss::move_tablet.unset(r);
ss::add_tablet_replica.unset(r);
ss::del_tablet_replica.unset(r);
ss::tablet_balancing_enable.unset(r);
sp::get_schema_versions.unset(r);
}
enum class scrub_status {
successful = 0,
aborted,
unable_to_cancel, // Not used in Scylla, included to ensure compability with nodetool api.
validation_errors,
};
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {
ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) {
return snap_ctl.local().get_snapshot_details().then([] (std::unordered_map<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& result) {
std::function<future<>(output_stream<char>&&)> f = [result = std::move(result)](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [&result] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first, &result] {
return do_for_each(result, [&s, &first](std::tuple<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& map){
return do_with(ss::snapshots(), [&s, &first, &map](ss::snapshots& all_snapshots) {
all_snapshots.key = std::get<0>(map);
future<> f = first ? make_ready_future<>() : s.write(", ");
first = false;
std::vector<ss::snapshot> snapshot;
for (auto& cf: std::get<1>(map)) {
ss::snapshot snp;
snp.ks = cf.ks;
snp.cf = cf.cf;
snp.live = cf.live;
snp.total = cf.total;
snapshot.push_back(std::move(snp));
}
all_snapshots.value = std::move(snapshot);
return f.then([&s, &all_snapshots] {
return all_snapshots.write(s);
});
});
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
});
};
ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto result = co_await snap_ctl.local().get_snapshot_details();
co_return std::function([res = std::move(result)] (output_stream<char>&& o) -> future<> {
auto result = std::move(res);
output_stream<char> out = std::move(o);
bool first = true;
return make_ready_future<json::json_return_type>(std::move(f));
co_await out.write("[");
for (auto&& map : result) {
if (!first) {
co_await out.write(", ");
}
std::vector<ss::snapshot> snapshot;
for (auto& cf : std::get<1>(map)) {
ss::snapshot snp;
snp.ks = cf.ks;
snp.cf = cf.cf;
snp.live = cf.live;
snp.total = cf.total;
snapshot.push_back(std::move(snp));
}
ss::snapshots all_snapshots;
all_snapshots.key = std::get<0>(map);
all_snapshots.value = std::move(snapshot);
co_await all_snapshots.write(out);
first = false;
}
co_await out.write("]");
co_await out.close();
});
});
@@ -1554,68 +1817,11 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
ss::scrub.set(r, [&ctx, &snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto rp = req_params({
{"keyspace", {mandatory::yes}},
{"cf", {""}},
{"scrub_mode", {}},
{"skip_corrupted", {}},
{"disable_snapshot", {}},
{"quarantine_mode", {}},
});
rp.process(*req);
auto keyspace = validate_keyspace(ctx, *rp.get("keyspace"));
auto column_families = parse_tables(keyspace, ctx, *rp.get("cf"));
auto scrub_mode_opt = rp.get("scrub_mode");
auto scrub_mode = sstables::compaction_type_options::scrub::mode::abort;
if (!scrub_mode_opt) {
const auto skip_corrupted = rp.get_as<bool>("skip_corrupted").value_or(false);
if (skip_corrupted) {
scrub_mode = sstables::compaction_type_options::scrub::mode::skip;
}
} else {
auto scrub_mode_str = *scrub_mode_opt;
if (scrub_mode_str == "ABORT") {
scrub_mode = sstables::compaction_type_options::scrub::mode::abort;
} else if (scrub_mode_str == "SKIP") {
scrub_mode = sstables::compaction_type_options::scrub::mode::skip;
} else if (scrub_mode_str == "SEGREGATE") {
scrub_mode = sstables::compaction_type_options::scrub::mode::segregate;
} else if (scrub_mode_str == "VALIDATE") {
scrub_mode = sstables::compaction_type_options::scrub::mode::validate;
} else {
throw httpd::bad_param_exception(fmt::format("Unknown argument for 'scrub_mode' parameter: {}", scrub_mode_str));
}
}
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
co_await coroutine::parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {
// We always pass here db::snapshot_ctl::snap_views::no since:
// 1. When scrubbing particular tables, there's no need to auto-snapshot their views.
// 2. When scrubbing the whole keyspace, column_families will contain both base tables and views.
return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::snap_views::no, db::snapshot_ctl::skip_flush::no);
});
}
sstables::compaction_type_options::scrub opts = {
.operation_mode = scrub_mode,
};
const sstring quarantine_mode_str = req_param<sstring>(*req, "quarantine_mode", "INCLUDE");
if (quarantine_mode_str == "INCLUDE") {
opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::include;
} else if (quarantine_mode_str == "EXCLUDE") {
opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::exclude;
} else if (quarantine_mode_str == "ONLY") {
opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::only;
} else {
throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));
}
auto info = co_await parse_scrub_options(ctx, snap_ctl, std::move(req));
sstables::compaction_stats stats;
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(keyspace), db, column_families, opts, stats);
auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, info.keyspace, db, info.column_families, info.opts, &stats);
try {
co_await task->done();
if (stats.validation_errors) {
@@ -1624,7 +1830,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
} catch (const sstables::compaction_aborted_exception&) {
co_return json::json_return_type(static_cast<int>(scrub_status::aborted));
} catch (...) {
apilog.error("scrub keyspace={} tables={} failed: {}", keyspace, column_families, std::current_exception());
apilog.error("scrub keyspace={} tables={} failed: {}", info.keyspace, info.column_families, std::current_exception());
throw;
}

View File

@@ -8,10 +8,9 @@
#pragma once
#include <iostream>
#include <seastar/core/sharded.hh>
#include "api.hh"
#include <seastar/json/json_elements.hh>
#include "api/api_init.hh"
#include "db/data_listeners.hh"
namespace cql_transport { class controller; }
@@ -37,25 +36,35 @@ namespace api {
// verify that the keyspace is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(http_context& ctx, sstring ks_name);
sstring validate_keyspace(const http_context& ctx, sstring ks_name);
// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(http_context& ctx, const httpd::parameters& param);
sstring validate_keyspace(const http_context& ctx, const httpd::parameters& param);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns an empty vector if no parameter was found.
// If the parameter is found and empty, returns a list of all table names in the keyspace.
std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns a vector of all table infos given by the parameter, or
// if the parameter is not found or is empty, returns a list of all table infos in the keyspace.
std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, sstring value);
struct scrub_info {
sstables::compaction_type_options::scrub opts;
sstring keyspace;
std::vector<sstring> column_families;
};
future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req);
void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, service::raft_group0_client&);
void unset_storage_service(http_context& ctx, httpd::routes& r);

View File

@@ -9,8 +9,10 @@
#include "stream_manager.hh"
#include "streaming/stream_manager.hh"
#include "streaming/stream_result_future.hh"
#include "api/api.hh"
#include "api/api-doc/stream_manager.json.hh"
#include <vector>
#include <rapidjson/document.h>
#include "gms/gossiper.hh"
namespace api {

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace api {

View File

@@ -6,21 +6,20 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "api/api_init.hh"
#include "api/api-doc/system.json.hh"
#include "api/api-doc/metrics.json.hh"
#include "replica/database.hh"
#include "api/api.hh"
#include <rapidjson/document.h>
#include <seastar/core/reactor.hh>
#include <seastar/core/metrics_api.hh>
#include <seastar/core/relabel_config.hh>
#include <seastar/http/exception.hh>
#include <seastar/util/short_streams.hh>
#include <seastar/http/short_streams.hh>
#include "utils/rjson.hh"
#include "log.hh"
#include "replica/database.hh"
extern logging::logger apilog;
@@ -30,6 +29,10 @@ using namespace seastar::httpd;
namespace hs = httpd::system_json;
namespace hm = httpd::metrics_json;
extern "C" void __attribute__((weak)) __llvm_profile_dump();
extern "C" const char * __attribute__((weak)) __llvm_profile_get_filename();
extern "C" void __attribute__((weak)) __llvm_profile_reset_counters();
void set_system(http_context& ctx, routes& r) {
hm::get_metrics_config.set(r, [](const_req req) {
std::vector<hm::metrics_config> res;
@@ -158,6 +161,27 @@ void set_system(http_context& ctx, routes& r) {
return json::json_return_type(json::json_void());
});
});
hs::dump_profile.set(r, [](std::unique_ptr<request> req) {
if (!__llvm_profile_dump) {
apilog.info("Profile will not be dumped, executable is not instrumented with profile dumping.");
return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
}
sstring profile_dest(__llvm_profile_get_filename ? __llvm_profile_get_filename() : "disk");
apilog.info("Dumping profile to {}", profile_dest);
__llvm_profile_dump();
if (__llvm_profile_reset_counters) {
// If counters are not reset the profile dumping mechanism will issue a warning and exit
// next time it is attempted. If the counters are reset, profiles can be accumulated
// (if %m is present in LLVM_PROFILE_FILE pattern) so it can be dumped in stages or
// multiple times during runtime.
__llvm_profile_reset_counters();
} else {
apilog.warn("Could not reset profile counters, profile dumping will be skipped next time it is attempted");
}
apilog.info("Profile dumped to {}", profile_dest);
return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
}) ;
}
}

View File

@@ -7,13 +7,12 @@
*/
#include <seastar/core/coroutine.hh>
#include <seastar/http/exception.hh>
#include "task_manager.hh"
#include "api/api.hh"
#include "api/api-doc/task_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include "unimplemented.hh"
#include "storage_service.hh"
#include <utility>
#include <boost/range/adaptors.hpp>
@@ -232,8 +231,8 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
while (!q.empty()) {
auto& current = q.front();
res.push_back(co_await retrieve_status(current));
for (size_t i = 0; i < current->get_children().size(); ++i) {
q.push(co_await current->get_children()[i].copy());
for (auto& child: current->get_children()) {
q.push(co_await child.copy());
}
q.pop();
}

View File

@@ -9,7 +9,7 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "api/api_init.hh"
#include "db/config.hh"
namespace tasks {

118
api/tasks.cc Normal file
View File

@@ -0,0 +1,118 @@
/*
* Copyright (C) 2022-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <seastar/core/coroutine.hh>
#include "api/api.hh"
#include "api/storage_service.hh"
#include "api/api-doc/tasks.json.hh"
#include "compaction/compaction_manager.hh"
#include "compaction/task_manager_module.hh"
#include "service/storage_service.hh"
#include "tasks/task_manager.hh"
using namespace seastar::httpd;
extern logging::logger apilog;
namespace api {
namespace t = httpd::tasks_json;
using namespace json;
using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<http::request>, sstring, std::vector<table_info>)>;
static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
};
}
void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {
t::force_keyspace_compaction_async.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.debug("force_keyspace_compaction_async: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
t::force_keyspace_cleanup_async.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req->param);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("force_keyspace_cleanup_async: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
apilog.warn("force_keyspace_cleanup_async: keyspace={} tables={}: {}", keyspace, table_infos, msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos, flush_mode::all_tables);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
t::perform_keyspace_offstrategy_compaction_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, nullptr);
co_return json::json_return_type(task->get_status().id.to_sstring());
}));
t::upgrade_sstables_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
auto& db = ctx.db;
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);
co_return json::json_return_type(task->get_status().id.to_sstring());
}));
t::scrub_async.set(r, [&ctx, &snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto info = co_await parse_scrub_options(ctx, snap_ctl, std::move(req));
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(info.keyspace), db, std::move(info.column_families), info.opts, nullptr);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
}
void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r) {
t::force_keyspace_compaction_async.unset(r);
t::force_keyspace_cleanup_async.unset(r);
t::perform_keyspace_offstrategy_compaction_async.unset(r);
t::upgrade_sstables_async.unset(r);
t::scrub_async.unset(r);
}
}

19
api/tasks.hh Normal file
View File

@@ -0,0 +1,19 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include "api.hh"
#include "db/config.hh"
namespace api {
void set_tasks_compaction_module(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);
void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r);
}

114
api/token_metadata.cc Normal file
View File

@@ -0,0 +1,114 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "api/api.hh"
#include "api/api-doc/storage_service.json.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "locator/token_metadata.hh"
using namespace seastar::httpd;
namespace api {
namespace ss = httpd::storage_service_json;
using namespace json;
void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_token_metadata>& tm) {
ss::local_hostid.set(r, [&tm](std::unique_ptr<http::request> req) {
auto id = tm.local().get()->get_my_id();
return make_ready_future<json::json_return_type>(id.to_sstring());
});
ss::get_tokens.set(r, [&tm] (std::unique_ptr<http::request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(tm.local().get()->sorted_tokens(), [](const dht::token& i) {
return fmt::to_string(i);
}));
});
ss::get_node_tokens.set(r, [&tm] (std::unique_ptr<http::request> req) {
gms::inet_address addr(req->param["endpoint"]);
auto& local_tm = *tm.local().get();
const auto host_id = local_tm.get_host_id_if_known(addr);
return make_ready_future<json::json_return_type>(stream_range_as_array(host_id ? local_tm.get_tokens(*host_id): std::vector<dht::token>{}, [](const dht::token& i) {
return fmt::to_string(i);
}));
});
ss::get_leaving_nodes.set(r, [&tm](const_req req) {
const auto& local_tm = *tm.local().get();
const auto& leaving_host_ids = local_tm.get_leaving_endpoints();
std::unordered_set<gms::inet_address> eps;
eps.reserve(leaving_host_ids.size());
for (const auto host_id: leaving_host_ids) {
eps.insert(local_tm.get_endpoint_for_host_id(host_id));
}
return container_to_vec(eps);
});
ss::get_moving_nodes.set(r, [](const_req req) {
std::unordered_set<sstring> addr;
return container_to_vec(addr);
});
ss::get_joining_nodes.set(r, [&tm](const_req req) {
const auto& local_tm = *tm.local().get();
const auto& points = local_tm.get_bootstrap_tokens();
std::unordered_set<gms::inet_address> eps;
eps.reserve(points.size());
for (const auto& [token, host_id]: points) {
eps.insert(local_tm.get_endpoint_for_host_id(host_id));
}
return container_to_vec(eps);
});
ss::get_host_id_map.set(r, [&tm](const_req req) {
std::vector<ss::mapper> res;
return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map_for_reading(), res);
});
static auto host_or_broadcast = [&tm](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? tm.local().get()->get_topology().my_address() : gms::inet_address(host);
};
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&tm](const_req req) {
auto& topology = tm.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.dc;
}
return topology.get_datacenter(ep);
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [&tm](const_req req) {
auto& topology = tm.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.rack;
}
return topology.get_rack(ep);
});
}
void unset_token_metadata(http_context& ctx, routes& r) {
ss::local_hostid.unset(r);
ss::get_tokens.unset(r);
ss::get_node_tokens.unset(r);
ss::get_leaving_nodes.unset(r);
ss::get_moving_nodes.unset(r);
ss::get_joining_nodes.unset(r);
ss::get_host_id_map.unset(r);
httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
httpd::endpoint_snitch_info_json::get_rack.unset(r);
}
}

21
api/token_metadata.hh Normal file
View File

@@ -0,0 +1,21 @@
/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include <seastar/core/sharded.hh>
#include "api/api_init.hh"
namespace locator { class shared_token_metadata; }
namespace api {
void set_token_metadata(http_context& ctx, httpd::routes& r, sharded<locator::shared_token_metadata>& tm);
void unset_token_metadata(http_context& ctx, httpd::routes& r);
}

View File

@@ -20,7 +20,8 @@ target_sources(scylla_auth
sasl_challenge.cc
service.cc
standard_role_manager.cc
transitional.cc)
transitional.cc
maintenance_socket_role_manager.cc)
target_include_directories(scylla_auth
PUBLIC
${CMAKE_SOURCE_DIR})
@@ -35,3 +36,6 @@ target_link_libraries(scylla_auth
libxcrypt::libxcrypt)
add_whole_archive(auth scylla_auth)
check_headers(check-headers scylla_auth
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -20,6 +20,7 @@ static const class_registrator<
authenticator,
allow_all_authenticator,
cql3::query_processor&,
::service::raft_group0_client&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
}

View File

@@ -28,7 +28,7 @@ extern const std::string_view allow_all_authenticator_name;
class allow_all_authenticator final : public authenticator {
public:
allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {
}
virtual future<> start() override {
@@ -59,15 +59,15 @@ public:
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(std::string_view, const authentication_options& options) const override {
virtual future<> create(std::string_view, const authentication_options& options) override {
return make_ready_future();
}
virtual future<> alter(std::string_view, const authentication_options& options) const override {
virtual future<> alter(std::string_view, const authentication_options& options) override {
return make_ready_future();
}
virtual future<> drop(std::string_view) const override {
virtual future<> drop(std::string_view) override {
return make_ready_future();
}

View File

@@ -20,6 +20,7 @@ static const class_registrator<
authorizer,
allow_all_authorizer,
cql3::query_processor&,
::service::raft_group0_client&,
::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");
}

View File

@@ -9,7 +9,6 @@
#pragma once
#include "auth/authorizer.hh"
#include "exceptions/exceptions.hh"
namespace cql3 {
class query_processor;
@@ -17,6 +16,7 @@ class query_processor;
namespace service {
class migration_manager;
class raft_group0_client;
}
namespace auth {
@@ -25,7 +25,7 @@ extern const std::string_view allow_all_authorizer_name;
class allow_all_authorizer final : public authorizer {
public:
allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
allow_all_authorizer(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {
}
virtual future<> start() override {
@@ -44,12 +44,12 @@ public:
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(std::string_view, permission_set, const resource&) const override {
virtual future<> grant(std::string_view, permission_set, const resource&) override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(std::string_view, permission_set, const resource&) const override {
virtual future<> revoke(std::string_view, permission_set, const resource&) override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
@@ -60,12 +60,12 @@ public:
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(std::string_view) const override {
virtual future<> revoke_all(std::string_view) override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(const resource&) const override {
virtual future<> revoke_all(const resource&) override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}

View File

@@ -8,7 +8,6 @@
#pragma once
#include <iosfwd>
#include <optional>
#include <stdexcept>
#include <unordered_map>

View File

@@ -11,10 +11,6 @@
#include "auth/authenticator.hh"
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "utils/class_registrator.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");

View File

@@ -12,8 +12,6 @@
#include <string_view>
#include <memory>
#include <set>
#include <stdexcept>
#include <unordered_map>
#include <optional>
#include <functional>
@@ -26,9 +24,6 @@
#include "auth/authentication_options.hh"
#include "auth/resource.hh"
#include "auth/sasl_challenge.hh"
#include "bytes.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
namespace db {
class config;
@@ -111,7 +106,7 @@ public:
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;
virtual future<> create(std::string_view role_name, const authentication_options& options) = 0;
///
/// Alter the authentication record of an existing user.
@@ -120,12 +115,12 @@ public:
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;
virtual future<> alter(std::string_view role_name, const authentication_options& options) = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(std::string_view role_name) const = 0;
virtual future<> drop(std::string_view role_name) = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).

View File

@@ -11,8 +11,6 @@
#pragma once
#include <string_view>
#include <functional>
#include <optional>
#include <stdexcept>
#include <tuple>
#include <vector>
@@ -83,14 +81,14 @@ public:
///
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> grant(std::string_view role_name, permission_set, const resource&) = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> revoke(std::string_view role_name, permission_set, const resource&) = 0;
///
/// Query for all directly granted permissions.
@@ -104,14 +102,14 @@ public:
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(std::string_view role_name) const = 0;
virtual future<> revoke_all(std::string_view role_name) = 0;
///
/// Revoke all permissions granted to any role for a particular resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(const resource&) const = 0;
virtual future<> revoke_all(const resource&) = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.

View File

@@ -30,13 +30,14 @@ static const std::string cfg_source_altname = "ALTNAME";
static const class_registrator<auth::authenticator
, auth::certificate_authenticator
, cql3::query_processor&
, ::service::raft_group0_client&
, ::service::migration_manager&> cert_auth_reg(CERT_AUTH_NAME);
enum class auth::certificate_authenticator::query_source {
subject, altname
};
auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::migration_manager&)
auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&)
: _queries([&] {
auto& conf = qp.db().get_config();
auto queries = conf.auth_certificate_role_queries();
@@ -154,16 +155,16 @@ future<auth::authenticated_user> auth::certificate_authenticator::authenticate(c
throw exceptions::authentication_exception("Cannot authenticate using attribute map");
}
future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) const {
future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) {
// TODO: should we keep track of roles/enforce existence? Role manager should deal with this...
co_return;
}
future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) {
co_return;
}
future<> auth::certificate_authenticator::drop(std::string_view role_name) const {
future<> auth::certificate_authenticator::drop(std::string_view role_name) {
co_return;
}

View File

@@ -20,6 +20,7 @@ class query_processor;
namespace service {
class migration_manager;
class raft_group0_client;
}
namespace auth {
@@ -30,7 +31,7 @@ class certificate_authenticator : public authenticator {
enum class query_source;
std::vector<std::pair<query_source, boost::regex>> _queries;
public:
certificate_authenticator(cql3::query_processor&, ::service::migration_manager&);
certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);
~certificate_authenticator();
future<> start() override;
@@ -46,9 +47,9 @@ public:
future<authenticated_user> authenticate(const credentials_map& credentials) const override;
future<std::optional<authenticated_user>> authenticate(session_dn_func) const override;
future<> create(std::string_view role_name, const authentication_options& options) const override;
future<> alter(std::string_view role_name, const authentication_options& options) const override;
future<> drop(std::string_view role_name) const override;
future<> create(std::string_view role_name, const authentication_options& options) override;
future<> alter(std::string_view role_name, const authentication_options& options) override;
future<> drop(std::string_view role_name) override;
future<custom_options> query_custom_options(std::string_view role_name) const override;

View File

@@ -6,30 +6,51 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <seastar/core/coroutine.hh>
#include "auth/common.hh"
#include <optional>
#include <seastar/core/coroutine.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sharded.hh>
#include "mutation/canonical_mutation.hh"
#include "schema/schema_fwd.hh"
#include "timestamp.hh"
#include "utils/exponential_backoff_retry.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "replica/database.hh"
#include "schema/schema_builder.hh"
#include "service/migration_manager.hh"
#include "service/raft/group0_state_machine.hh"
#include "timeout_config.hh"
#include "db/config.hh"
#include "db/system_auth_keyspace.hh"
#include "utils/error_injection.hh"
namespace auth {
namespace meta {
constinit const std::string_view AUTH_KS("system_auth");
constinit const std::string_view USERS_CF("users");
namespace legacy {
constinit const std::string_view AUTH_KS("system_auth");
constinit const std::string_view USERS_CF("users");
} // namespace legacy
constinit const std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
}
} // namespace meta
static logging::logger auth_log("auth");
bool legacy_mode(cql3::query_processor& qp) {
return qp.auth_version < db::system_auth_keyspace::version_t::v2;
}
std::string_view get_auth_ks_name(cql3::query_processor& qp) {
if (legacy_mode(qp)) {
return meta::legacy::AUTH_KS;
}
return db::system_auth_keyspace::NAME;
}
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func) {
struct empty_state { };
@@ -55,7 +76,7 @@ static future<> create_metadata_table_if_missing_impl(
auto parsed_statement = cql3::query_processor::parse_statement(cql);
auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);
parsed_cf_statement.prepare_keyspace(meta::AUTH_KS);
parsed_cf_statement.prepare_keyspace(meta::legacy::AUTH_KS);
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_cf_statement.prepare(db, qp.get_cql_stats())->statement);
@@ -98,4 +119,88 @@ future<> create_metadata_table_if_missing(
return qs;
}
static future<> announce_mutations_with_guard(
::service::raft_group0_client& group0_client,
std::vector<canonical_mutation> muts,
::service::group0_guard group0_guard,
seastar::abort_source* as,
std::optional<::service::raft_timeout> timeout) {
auto group0_cmd = group0_client.prepare_command(
::service::write_mutations{
.mutations{std::move(muts)},
},
group0_guard,
"auth: modify internal data"
);
return group0_client.add_entry(std::move(group0_cmd), std::move(group0_guard), as, timeout);
}
future<> announce_mutations_with_batching(
::service::raft_group0_client& group0_client,
start_operation_func_t start_operation_func,
std::function<mutations_generator(api::timestamp_type& t)> gen,
seastar::abort_source* as,
std::optional<::service::raft_timeout> timeout) {
// account for command's overhead, it's better to use smaller threshold than constantly bounce off the limit
size_t memory_threshold = group0_client.max_command_size() * 0.75;
utils::get_local_injector().inject("auth_announce_mutations_command_max_size",
[&memory_threshold] {
memory_threshold = 1000;
});
size_t memory_usage = 0;
std::vector<canonical_mutation> muts;
// guard has to be taken before we execute code in gen as
// it can do read-before-write and we want announce_mutations
// operation to be linearizable with other such calls,
// for instance if we do select and then delete in gen
// we want both to operate on the same data or fail
// if someone else modified it in the middle
std::optional<::service::group0_guard> group0_guard;
group0_guard = co_await start_operation_func(as);
auto timestamp = group0_guard->write_timestamp();
auto g = gen(timestamp);
while (auto mut = co_await g()) {
muts.push_back(canonical_mutation{*mut});
memory_usage += muts.back().representation().size();
if (memory_usage >= memory_threshold) {
if (!group0_guard) {
group0_guard = co_await start_operation_func(as);
timestamp = group0_guard->write_timestamp();
}
co_await announce_mutations_with_guard(group0_client, std::move(muts), std::move(*group0_guard), as, timeout);
group0_guard = std::nullopt;
memory_usage = 0;
muts = {};
}
}
if (!muts.empty()) {
if (!group0_guard) {
group0_guard = co_await start_operation_func(as);
timestamp = group0_guard->write_timestamp();
}
co_await announce_mutations_with_guard(group0_client, std::move(muts), std::move(*group0_guard), as, timeout);
}
}
future<> announce_mutations(
cql3::query_processor& qp,
::service::raft_group0_client& group0_client,
const sstring query_string,
std::vector<data_value_or_unset> values,
seastar::abort_source* as,
std::optional<::service::raft_timeout> timeout) {
auto group0_guard = co_await group0_client.start_operation(as, timeout);
auto timestamp = group0_guard.write_timestamp();
auto muts = co_await qp.get_mutations_internal(
query_string,
internal_distributed_query_state(),
timestamp,
std::move(values));
std::vector<canonical_mutation> cmuts = {muts.begin(), muts.end()};
co_await announce_mutations_with_guard(group0_client, std::move(cmuts), std::move(group0_guard), as, timeout);
}
}

View File

@@ -8,7 +8,6 @@
#pragma once
#include <chrono>
#include <string_view>
#include <seastar/core/future.hh>
@@ -19,9 +18,9 @@
#include <seastar/core/sstring.hh>
#include <seastar/core/smp.hh>
#include "log.hh"
#include "seastarx.hh"
#include "utils/exponential_backoff_retry.hh"
#include "schema/schema_registry.hh"
#include "types/types.hh"
#include "service/raft/raft_group0_client.hh"
using namespace std::chrono_literals;
@@ -42,12 +41,22 @@ namespace auth {
namespace meta {
constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");
namespace legacy {
extern constinit const std::string_view AUTH_KS;
extern constinit const std::string_view USERS_CF;
} // namespace legacy
constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");
extern constinit const std::string_view AUTH_PACKAGE_NAME;
}
} // namespace meta
// This is a helper to check whether auth-v2 is on.
bool legacy_mode(cql3::query_processor& qp);
// We have legacy implementation using different keyspace
// and need to parametrize depending on runtime feature.
std::string_view get_auth_ks_name(cql3::query_processor& qp);
template <class Task>
future<> once_among_shards(Task&& f) {
@@ -72,4 +81,28 @@ future<> create_metadata_table_if_missing(
///
::service::query_state& internal_distributed_query_state() noexcept;
// Execute update query via group0 mechanism, mutations will be applied on all nodes.
// Use this function when need to perform read before write on a single guard or if
// you have more than one mutation and potentially exceed single command size limit.
using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source*)>;
using mutations_generator = coroutine::experimental::generator<mutation>;
future<> announce_mutations_with_batching(
::service::raft_group0_client& group0_client,
// since we can operate also in topology coordinator context where we need stronger
// guarantees than start_operation from group0_client gives we allow to inject custom
// function here
start_operation_func_t start_operation_func,
std::function<mutations_generator(api::timestamp_type& t)> gen,
seastar::abort_source* as,
std::optional<::service::raft_timeout> timeout);
// Execute update query via group0 mechanism, mutations will be applied on all nodes.
future<> announce_mutations(
cql3::query_processor& qp,
::service::raft_group0_client& group0_client,
const sstring query_string,
std::vector<data_value_or_unset> values,
seastar::abort_source* as,
std::optional<::service::raft_timeout> timeout);
}

View File

@@ -9,20 +9,18 @@
*/
#include "auth/default_authorizer.hh"
#include "db/system_auth_keyspace.hh"
extern "C" {
#include <crypt.h>
#include <unistd.h>
}
#include <chrono>
#include <random>
#include <boost/algorithm/string/join.hpp>
#include <boost/range.hpp>
#include <seastar/core/seastar.hh>
#include <seastar/core/sleep.hh>
#include "auth/authenticated_user.hh"
#include "auth/common.hh"
#include "auth/permission.hh"
#include "auth/role_or_anonymous.hh"
@@ -30,7 +28,6 @@ extern "C" {
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "replica/database.hh"
#include "utils/class_registrator.hh"
namespace auth {
@@ -51,10 +48,12 @@ static const class_registrator<
authorizer,
default_authorizer,
cql3::query_processor&,
::service::raft_group0_client&,
::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");
default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm)
: _qp(qp)
, _group0_client(g0)
, _migration_manager(mm) {
}
@@ -64,11 +63,11 @@ default_authorizer::~default_authorizer() {
static const sstring legacy_table_name{"permissions"};
bool default_authorizer::legacy_metadata_exists() const {
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::legacy::AUTH_KS, legacy_table_name);
}
future<bool> default_authorizer::any_granted() const {
static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
future<bool> default_authorizer::legacy_any_granted() const {
static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::legacy::AUTH_KS, PERMISSIONS_CF);
return _qp.execute_internal(
query,
@@ -79,9 +78,9 @@ future<bool> default_authorizer::any_granted() const {
});
}
future<> default_authorizer::migrate_legacy_metadata() const {
future<> default_authorizer::migrate_legacy_metadata() {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::legacy::AUTH_KS, legacy_table_name);
return _qp.execute_internal(
query,
@@ -112,7 +111,7 @@ future<> default_authorizer::start() {
"{} set<text>,"
"PRIMARY KEY({}, {})"
") WITH gc_grace_seconds={}",
meta::AUTH_KS,
meta::legacy::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME,
@@ -129,11 +128,11 @@ future<> default_authorizer::start() {
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
return async([this] {
_migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get0();
_migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
migrate_legacy_metadata().get0();
if (!legacy_any_granted().get()) {
migrate_legacy_metadata().get();
return;
}
@@ -153,27 +152,25 @@ future<> default_authorizer::stop() {
future<permission_set>
default_authorizer::authorize(const role_or_anonymous& maybe_role, const resource& r) const {
if (is_anonymous(maybe_role)) {
return make_ready_future<permission_set>(permissions::NONE);
co_return permissions::NONE;
}
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",
const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",
PERMISSIONS_NAME,
meta::AUTH_KS,
get_auth_ks_name(_qp),
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
return _qp.execute_internal(
const auto results = co_await _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{*maybe_role.name, r.name()},
cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return permissions::NONE;
}
return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));
});
cql3::query_processor::cache_internal::yes);
if (results->empty()) {
co_return permissions::NONE;
}
co_return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));
}
future<>
@@ -181,88 +178,88 @@ default_authorizer::modify(
std::string_view role_name,
permission_set set,
const resource& resource,
std::string_view op) const {
return do_with(
format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
PERMISSIONS_NAME,
PERMISSIONS_NAME,
op,
ROLE_NAME,
RESOURCE_NAME),
[this, &role_name, set, &resource](const auto& query) {
return _qp.execute_internal(
std::string_view op) {
const sstring query = format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",
get_auth_ks_name(_qp),
PERMISSIONS_CF,
PERMISSIONS_NAME,
PERMISSIONS_NAME,
op,
ROLE_NAME,
RESOURCE_NAME);
if (legacy_mode(_qp)) {
co_return co_await _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{permissions::to_strings(set), sstring(role_name), resource.name()},
cql3::query_processor::cache_internal::no).discard_result();
});
}
co_return co_await announce_mutations(_qp, _group0_client, query,
{permissions::to_strings(set), sstring(role_name), resource.name()}, &_as, ::service::raft_timeout{});
}
future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) {
return modify(role_name, std::move(set), resource, "+");
}
future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) {
return modify(role_name, std::move(set), resource, "-");
}
future<std::vector<permission_details>> default_authorizer::list_all() const {
static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",
const sstring query = format("SELECT {}, {}, {} FROM {}.{}",
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
meta::AUTH_KS,
get_auth_ks_name(_qp),
PERMISSIONS_CF);
return _qp.execute_internal(
const auto results = co_await _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{},
cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
cql3::query_processor::cache_internal::yes);
for (const auto& row : *results) {
if (row.has(PERMISSIONS_NAME)) {
auto role_name = row.get_as<sstring>(ROLE_NAME);
auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));
auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});
}
std::vector<permission_details> all_details;
for (const auto& row : *results) {
if (row.has(PERMISSIONS_NAME)) {
auto role_name = row.get_as<sstring>(ROLE_NAME);
auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));
auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});
}
return all_details;
});
}
co_return all_details;
}
future<> default_authorizer::revoke_all(std::string_view role_name) const {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME);
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{sstring(role_name)},
cql3::query_processor::cache_internal::no).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
future<> default_authorizer::revoke_all(std::string_view role_name) {
try {
const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",
get_auth_ks_name(_qp),
PERMISSIONS_CF,
ROLE_NAME);
if (legacy_mode(_qp)) {
co_await _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_query_state(),
{sstring(role_name)},
cql3::query_processor::cache_internal::no).discard_result();
} else {
co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});
}
});
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
}
}
future<> default_authorizer::revoke_all(const resource& resource) const {
future<> default_authorizer::revoke_all_legacy(const resource& resource) {
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",
ROLE_NAME,
meta::AUTH_KS,
get_auth_ks_name(_qp),
PERMISSIONS_CF,
RESOURCE_NAME);
@@ -272,13 +269,13 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
{resource.name()},
cql3::query_processor::cache_internal::no).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
auto res = f.get();
return parallel_for_each(
res->begin(),
res->end(),
[this, res, resource](const cql3::untyped_result_set::row& r) {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
get_auth_ks_name(_qp),
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
@@ -304,8 +301,55 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
});
}
future<> default_authorizer::revoke_all(const resource& resource) {
if (legacy_mode(_qp)) {
co_return co_await revoke_all_legacy(resource);
}
auto name = resource.name();
try {
auto gen = [this, name] (api::timestamp_type& t) -> mutations_generator {
const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",
ROLE_NAME,
get_auth_ks_name(_qp),
PERMISSIONS_CF,
RESOURCE_NAME);
auto res = co_await _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
{name},
cql3::query_processor::cache_internal::no);
for (const auto& r : *res) {
const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",
get_auth_ks_name(_qp),
PERMISSIONS_CF,
ROLE_NAME,
RESOURCE_NAME);
auto muts = co_await _qp.get_mutations_internal(
query,
internal_distributed_query_state(),
t,
{r.get_as<sstring>(ROLE_NAME), name});
if (muts.size() != 1) {
on_internal_error(alogger,
format("expecting single delete mutation, got {}", muts.size()));
}
co_yield std::move(muts[0]);
}
};
const auto timeout = ::service::raft_timeout{};
co_await announce_mutations_with_batching(
_group0_client,
[this, timeout](abort_source* as) { return _group0_client.start_operation(as, timeout); },
std::move(gen),
&_as,
timeout);
} catch (exceptions::request_execution_exception& e) {
alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", name, e);
}
}
const resource_set& default_authorizer::protected_resources() const {
static const resource_set resources({ make_data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
static const resource_set resources({ make_data_resource(meta::legacy::AUTH_KS, PERMISSIONS_CF) });
return resources;
}

Some files were not shown because too many files have changed in this diff Show More