There are two path parsers. One of them accepts keyspace and table names
and the other one doesn't. The latter is then supposed to parse the
ks.cf pair from path and put it on the descriptor. This patch makes this
method return ks.cf so that later it will be possible to remove these
strings from the desctiptor itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method really parses provided path, so the existing name is pretty
confusing. It's extra confusing in the table::get_snapshot_details()
where it's just called and the return value is simply ignored.
Named "parse_..." makes it clear what the method is for.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
this is redundant code that should have be gone a long time ago.
the snippet (which lies above the code being deleted):
db.invoke_on_all([] (replica::database& db) {
db.get_tables_metadata().for_each_table([] (table_id, lw_shared_ptr<replica::table> table) {
replica::table& t = *table;
t.enable_auto_compaction();
});
}).get();
provides the same thing as this code being deleted.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15597
The following new commands are implemented:
* disablebackup
* disablebinary
* disablegossip
* enablebackup
* enablebinary
* enablegossip
* gettraceprobability
* help
* settraceprobability
* statusbackup
* statusbinary
* statusgossip
* version
All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15593
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement help operation
tools/scylla-nodetool: implement the traceprobability commands
tools/scylla-nodetool: implement the gossip commands
tools/scylla-nodetool: implement the binary commands
tools/scylla-nodetool: implement backup related commands
tools/scylla-nodetool: implement version command
test/nodetool: introduce utils.check_nodetool_fails_with()
test/nodetool: return stdout of nodetool invokation
test/nodetool/rest_api_mock.py: fix request param matching
tools/scylla-nodetool: compact: remove --partition argument
tools/scylla-nodetool: scylla_rest_client: add support delete method
tools/scylla-nodetool: get rid of check_json_type()
tools/scylla-nodetool: log more details for failed requests
tools/scylla-*: use operation_option for positional options
tools/utils: add support for operation aliases
Checking that nodetool fails with a given message turned out to be a
common pattern, so extract the logic for checking this into a method of
its own. Refactor the existing tests to use it, instead of the
hand-coded equivalent.
This PR updates the information on the ScyllaDB vs. Cassandra compatibility. It covers the information from https://github.com/scylladb/scylladb/issues/15563, but there could more more to fix.
@tzach @scylladb/scylla-maint Please review this PR and the page covering our compatibility with Cassandra and let me know if you see anything else that needs to be fixed.
I've added the updates with separate commits in case you want to backport some info (e.g. about AzureSnitch).
Fixes https://github.com/scylladb/scylladb/issues/15563Closesscylladb/scylladb#15582
* github.com:scylladb/scylladb:
doc: deprecate Thrift in Cassandra compatibility
doc: remove row/key cache from Cassandra compatibility
doc: add AzureSnitch to Cassandra compatibility
The sentence says that if table args are provided compaction will run on
all tables. This is ambigous, so the sentence is rephrased to specify
that compaction will only run on the provided tables.
Closesscylladb/scylladb#15394
It may happen that wrapping up multipart upload fails too. However,
before sending the request the driver clears the _upload_id field thus
marking the whole process as "all is OK". So in case the finalization
method fails and thrown, the upload context remains on the server side
forever.
Fix this by keeping the _upload_id set, so even if finalization throws,
closing the uploader notices this and calls abort.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15521
This check is redundant. Originally it was intended to work around by
rapidjson using an assert by default to check that the fields have the
expected type. But turns out we already configure rapidjson to use a
plain exception in utils/rjson.hh, so check_json_type() is not needed
for graceful error handling.
Use operation_option to describe positional options. The structure used
before -- app_template::positional_option -- was not a good fit for
this, as it was designed to store a description that is immediately
passed to the boost::program_options subsystem and then discarded.
As such, it had a raw pointer member, which was expected to be
immediately wrapped by boost::shared_ptr<> by boost::program_options.
This produced memory leaks for tools, for options that ended up not
being used. To avoid this altogether, use operation_option, converting
to the app_template::positional_option at the last moment.
The test needs to call flush-keyspace API endpoint and currently it does it by hand. Not very convenient.
Also in the future there will be the need for _background_ API kicking, the currently used requests package cannot do it, while pylib REST API can
Closesscylladb/scylladb#15565
* github.com:scylladb/scylladb:
test/object_store: Use REST client from pylib
test/pylib: Add flush_keyspace() method to rest client
test/object_store: Wrap yielded managed cluster
Registering API handlers for services need to
- happen next to the corresponding service's start
- use only the provided service, not any other ones (if needed, the handler's service can use its internal dependencies to do its job)
- get the service to handle requests via argument, not from http context (http context, in turn, is going _not_ to depend on anything)
The storage proxy handlers don't follow any of that rules, this PR fixes them
Closesscylladb/scylladb#15584
* github.com:scylladb/scylladb:
api: Make storage_proxy handlers use proxy argument
api: Change some static helpers to use proxy instead of ctx
api: Pass sharded<storage_proxy> reference to storage_proxy handlers
api: Start (and stop) storage_proxy API earlier
api: Remove storage_service argument from storage_proxy setup
api: Move storage_proxy/ endpoint using storage_service
api: Remove storage_proxy.hh from storage_service.cc
main: Initialize API server early
Use NullCompactionStrategy for the test_table fixture
rather than using the `no_autocompaction_context`.
Besides being simpler, as regular compaction just comes in
the way for all tests that use `SELECT MUTATION_FRAGMENTS`
The latter would be problematic when we start run cql-pytest
test cases in parallel rather than in serial since it
will inadvertantly affect other test cases.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15574
And stop using proxy reference from http context. After a while the
proxy dependency will be removed from http context
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some helpers in storage_proxy.cc that get proxy reference from
passed http context argument. Next patch will stop using ctx for that
purpose, so prepare in advance by making the helpers use proxy reference
argument directly
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goals is to make handlers use proxy argument instead of keeping
proxt as dependency on http context (other handlers are mostly such
already)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The code setting up storage_proxy/ endpoints no longer needs
storage_service and related decoration
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage_proxy/get_schema_version is served by storage_service, so it
should be in storage_service.cc instead
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Surprisingly, but the dependency-less API server context is initialized
somewhere in the middle of main. By that time some "real" services had
already started and should have the ability to register their endpoints,
so API context should be initialized way ahead. This patch places its
initialization next to prometheus init.
One thing that's not nice here is that API port listening remains where
it was before the patch, so for the external ... observer API
initialization doesn't change. Likely API should start listening for
connection early as well, but that's left for future patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This PR is the second step in refactoring the Hinted Handoff module. It cleans up the contents of the file `hint_storage.cc`. The biggest change is the transition from continuations to coroutines.
Refs #15358Closesscylladb/scylladb#15496
* github.com:scylladb/scylladb:
db/hints: Alias segment list in hint_storage.cc
db/hints: Rename rebalance to rebalance_hints
db/hints: Clean up rebalance() in hint_storage.cc
db/hints: Coroutinize hint_storage.cc
db/hints: Clean up remove_irrelevant_shards_directories() in hint_storage.cc
db/hints: Clean up rebalance_segments() in hint_storage.cc
db/hints: Clean up rebalance_segments_for() in hint_storage.cc
db/hints: Clean up get_current_hints_segments() in hint_storage.cc
db/hints: Rename scan_for_hints_dirs to scan_shard_hint_directories
db/hints: Clean up scan_for_hints_dirs() in hint_storage.cc
db/hints: Wrap hint_storage.cc in an anonymous namespace
Add a REST API to reload Raft topology state without having to restart a node and use it in `test_fence_hints`. Restarting the node has undesired side effects which cause test flakiness; more details provided in commit messages.
Refactor the test a bit while at it.
Fixes: #15285Closesscylladb/scylladb#15523
* github.com:scylladb/scylladb:
test: test_fencing.py: enable hints_manager=trace logs in `test_fence_hints`
test: test_fencing.py: reload topology through REST API in `test_fence_hints`
test: refactor test_fencing.py
api: storage_service: add REST API to reload topology state
Allow disabling auto-compaction for given table(s)
using either the ks.table syntax or ks:table (as the
api suggests).
The first syntax would likely be more common since
the test tables we automatically create are named
as test_keyspace.test_table so we can pass that name
to `no_autocompaction_context` as is.
test_tools.system_scylla_local_sstable_prepared was
modified to disable auto-compaction only only
the `system.scylla_local` table rather than
the whole `system` keyspace, since it only relies
on this table. Plus, it helps test this change :)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15575
Enable TRACE level logging on the server that's supposed to send the
hints. Should make it easier to debug failures in the future, if any
happen again.
Restarting a node in order to reload topology may have side effects that
lead to test flakiness. While the node is shutting down, it gives up
leadership. Before it finishes shutting down, another node may become
Raft group 0 leader, then topology coordinator, then send a topology
command, triggering topology state reload on the shutting down node,
causing its topology version to get updated, allowing it to send a
successful hint before it shuts down and restarts. After it restarts, no
more hints will be sent, so the metrics condition we're waiting for (for
a hint to be sent) will never become true (metrics are not persisted
between restarts).
Instead of restarting, reload topology state through the new REST API.
This also makes the test a bit faster.
Fixes#15285
- use `manager.get_cql()` to silence mypy (`manager.cql` is `Optional`)
- extract `metrics.lines_by_prefix('scylla_hints_manager_')` to a helper
function
- when waiting for conditions on metrics, split the condition into
safety and liveness part, and fail early if the safety part does not
hold
- in `exactly_one_hint` send, don't check that `send_errors_metric` is
`0` (it won't be after the next commit)
Some tests may want to modify system.topology table directly. Add a REST
API to reload the state into memory. An alternative would be restarting
the server, but that's slower and may have other side effects undesired
in the test.
The API can also be called outside tests, it should not have any
observable effects unless the user modifies `system.topology` table
directly (which they should never do, outside perhaps some disaster
recovery scenarios).
This PR implements a new procedure for joining nodes to group 0, based on the description in the "Cluster features on Raft (v2)" document. This is a continuation of the previous PRs related to cluster features on raft (https://github.com/scylladb/scylladb/pull/14722, https://github.com/scylladb/scylladb/pull/14232), and the last piece necessary to replace cluster feature checks in gossip.
Current implementation relies on gossip shadow round to fetch the set of enabled features, determine whether the node supports all of the enabled features, and joins only if it is safe. As we are moving management of cluster features to group 0, we encounter a problem: the contents of group 0 itself may depend on features, hence it is not safe to join it unless we perform the feature check which depends on information in group 0. Hence, we have a dependency cycle.
In order to solve this problem, the algorithm for joining group 0 is modified, and verification of features and other parameters is offloaded to an existing node in group 0. Instead of directly asking the discovery leader to unconditionally add the node to the configuration with `GROUP0_MODIFY_CONFIG`, two different RPCs are added: `JOIN_NODE_REQUEST` and `JOIN_NODE_RESPONSE`. The main idea is as follows:
- The new node sends `JOIN_NODE_REQUEST` to the discovery leader. It sends a bunch of information describing the node, including supported cluster features. The discovery leader verifies some of the parameters and adds the node in the `none` state to `system.topology`.
- The topology coordinator picks up the request for the node to be joined (i.e. the node in `none` state), verifies its properties - including cluster features - and then:
- If the node is accepted, the coordinator transitions it to `boostrap`/`replace` state and transitions the topology to `join_group0` state. The node is added to group 0 and then `JOIN_NODE_RESPONSE` is sent to it with information that the node was accepted.
- Otherwise, the node is moved to `left` state, told by the coordinator via `JOIN_NODE_RESPONSE` that it was rejected and it shuts down.
The procedure is not retryable - if a node fails to do it from start to end and crashes in between, it will not be allowed to retry it with the same host_id - `JOIN_NODE_REQUEST` will fail. The data directory must be cleared before attempting to add it again (so that a new host_id is generated).
More details about the procedure and the RPC are described in `topology-over-raft.md`.
Fixes: #15152Closesscylladb/scylladb#15196
* github.com:scylladb/scylladb:
tests: mark test_blocked_bootstrap as skipped
storage_service: do not check features in shadow round
storage_service: remove raft_{boostrap,replace}
topology_coordinator: relax the check in enable_features
raft_group0: insert replaced node info before server setup
storage_service: use join node rpc to join the cluster
topology_coordinator: handle joining nodes
topology_state_machine: add join_group0 state
storage_service: add join node RPC handlers
raft: expose current_leader in raft::server
storage_service: extract wait_for_live_nodes_timeout constant
raft_group0: abstract out node joining handshake
storage_service: pass raft_topology_change_enabled on rpc init
rpc: add new join handshake verbs
docs: document the new join procedure
topology_state_machine: add supported_features to replica_state
storage_service: check destination host ID in raft verbs
group_state_machine: take reference to raft address map
raft_group0: expose joined_group0
This commit adds AzureSnitch (together with a link
to the AzureSnitch description) to the Cassandra
compatibility page.
In addition, the Sniches table is fixed.
Test cases kick scylla to force keyspaces flush (to have the objects on
object store) by hand. Equip the wrapped cluster object with the REST
API class instance for convenience
The assertion for 200 return status code is dropped, REST client does it
behind the scenes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>