Commit Graph

6619 Commits

Author SHA1 Message Date
Kefu Chai
d0ceb35e7e test/boost: print runtime_error using e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Tomasz Grabiec
042a4b7627 Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El
The CDC feature is not supported on a table that uses tablets
(Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This PR does this.

The warning text which will be produced is the following (obviously, it can
be improved later, as we perhaps find more missing features):

>   "Tables in this keyspace will be replicated using tablets, and will
>    not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from
>    issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop
>    this keyspace and re-create it without tablets, by adding AND TABLETS
>    = {'enabled': false} to the CREATE KEYSPACE statement."

This PR also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets. It also fixes existing tests which didn't like the new warning.

Fixes https://github.com/scylladb/scylladb/issues/16807

Closes scylladb/scylladb#17318

* github.com:scylladb/scylladb:
  tablets: add warning on CREATE KEYSPACE
  test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
2024-03-26 20:04:07 +01:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Avi Kivity
22b8065a89 Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes
These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html.

They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565.

The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation.

Closes scylladb/scylladb#17979

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the sstableinfo command
  tools/scylla-nodetool: implement the getsstables command
  tools/scylla-nodetool: move get_ks_cfs() to the top of the file
  test/nodetool: rest_api_mock.py: add expected_requests context manager
2024-03-26 14:38:00 +02:00
Kefu Chai
101fdfc33a test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

also, it's impossible to partial specialize a nested type of a
template class, we cannot specialize the `fmt::formatter` for
`stop_crash<M>::result_type`, as a workaround, a new type is
added.

in this change,

* define a new type named `stop_crash_result`
* add fmt::formatter for `stop_crash_result`
* define stop_crash::result_type as an alias of `stop_crash_result`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18018
2024-03-26 12:18:55 +02:00
Botond Dénes
7edbf189e6 Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai
`UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10.

in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped.

Closes scylladb/scylladb#18020

* github.com:scylladb/scylladb:
  utils: UUID: drop UUID::to_sstring()
  treewide: use fmt::to_string() to transform a UUID to std::string
2024-03-26 12:14:56 +02:00
Botond Dénes
f0ff23492f Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov
There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways.

First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped.

Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that.

Closes scylladb/scylladb#17964

* github.com:scylladb/scylladb:
  test: Do not duplicate test name in several skip-lists
  test: Mark tests with skip_mode instead of suite skip-list
2024-03-26 08:24:57 +02:00
Kefu Chai
1b859e484f treewide: use fmt::to_string() to transform a UUID to std::string
without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is
implemented using its `fmt::formatter`, which is not available
at the end of this header file where `UUID` is defined. at this moment,
we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can
still use `UUID::to_sstring()`, but in {fmt} v10, we cannot.

so, in this change, we change all callers of `UUID::to_sstring()`
to `fmt::to_string()`, so that we don't depend on
`FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Botond Dénes
1ea7b408db tools/scylla-nodetool: implement the sstableinfo command 2024-03-25 11:29:30 -04:00
Botond Dénes
50da93b9c8 tools/scylla-nodetool: implement the getsstables command 2024-03-25 11:29:30 -04:00
Botond Dénes
4ff88b848c test/nodetool: rest_api_mock.py: add expected_requests context manager
So tests and fixtures can use `with expected_requests():` and have
cleanup be taken care for them. I just discovered that some tests do not
clean up after themselves and when running all tests in a certain order,
this causes unrelated tests to fail.
Fix by using the context everywhere, getting guaranteed cleanup after
each test.
2024-03-25 11:29:30 -04:00
Petr Gusev
7c84fc527b test_invalid_user_type_statements: increase raft timeout
The test creates ut4 with a lot of fields,
this may take a while in debug builds,
to avoid raft operation timeout set the threshold
to some big value.

The error injector is disabled in release builds,
so this settings won't be applied to them.
This shouldn't be a problem since release builds
are fast enough, even on arm.

Fixes scylladb/scylladb#17987

Closes scylladb/scylladb#17997
2024-03-25 14:52:16 +01:00
Ferenc Szili
8bb7a18de2 test/cql-pytest: add --omit-scylla-output to Cassandra test runs
Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.

This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output

Closes scylladb/scylladb#17996
2024-03-25 15:14:45 +02:00
Pavel Emelyanov
16343b3edc test: Do not duplicate test name in several skip-lists
Some tests are only run in dev mode for some reason. For such tests
there's run_in_dev list, no need in putting it in all the non-dev
skip_in_... ones.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
90dfcec86b test: Mark tests with skip_mode instead of suite skip-list
There are many tests that are skipped in release mode becuase they rely
on error-injection machinery which doesn't work in release mode. Most of
those tests are listed in suite's skip_in_release, but it's not very
handy, mainly because it's not clear why the test is there. The
skip_mode decoration is much more convenient.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Kamil Braun
69bf962522 Merge 'allow changing snitch with topology over raft' from Gleb
Fixes scylladb/scylladb#17513

* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
  doc: amend snitch changing procedure to work with raft
  test: add test to check that snitch change takes effect.
  raft topology: update rack/dc info in topology state on reboot if changed
2024-03-25 10:41:39 +01:00
Gleb Natapov
d7adf26a56 test: add test to check that snitch change takes effect.
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
2024-03-25 10:41:49 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Kamil Braun
230f23004b Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit b4144d14c6.

The test is flaky and blocks next promotions.
2024-03-22 17:25:04 +01:00
Petr Gusev
2a5f5d1948 test_fencing: fix flakiness
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.

Fix this by disabling the tablet balancer earlier.

fixes scylladb/scylladb#17807

Closes scylladb/scylladb#17940
2024-03-22 12:49:13 +01:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Kamil Braun
9979adb670 Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak
In this PR, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix. However, this test
needs to block execution of the CDC generation publisher's loop
twice. Currently, error injections with handlers do not allow it
because handlers always share received messages. Apart from the
first created handler, all handlers would be instantly unblocked by
a message from the past that has already unblocked the first
handler. This seems like a general limitation that could cause
problems in the future, so in this PR, we extend injections with
handlers to solve it once and for all. We add the `share_messages`
parameter to the `inject` (with handler) function. Depending on its
value, handlers will share messages (as before) or not.

Fixes scylladb/scylladb#17497

Closes scylladb/scylladb#17934

* github.com:scylladb/scylladb:
  topology_coordinator: clean_obsolete_cdc_generations: fix log
  topology_coordinator: do not clear unpublished CDC generation's data
  topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
  error_injection: allow injection handlers to not share messages
2024-03-22 11:20:26 +01:00
Kamil Braun
4359a1b460 Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.

The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.

The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.

Fixes scylladb/scylladb#16604

Closes scylladb/scylladb#17590

* github.com:scylladb/scylladb:
  migration_manager: use raft_timeout{}
  storage_service::join_node_response_handler: use raft_timeout{}
  storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
  storage_service::set_tablet_balancing_enabled: use raft_timeout{}
  storage_service::move_tablet: use raft_timeout{}
  raft_check_and_repair_cdc_streams: use raft_timeout{}
  raft_timeout: test that node operations fail properly
  raft_rebuild: use raft_timeout{}
  do_cluster_cleanup: use raft_timeout{}
  raft_initialize_discovery_leader: use raft_timeout{}
  update_topology_with_local_metadata: use with_timeout{}
  raft_decommission: use raft_timeout{}
  raft_removenode: use raft_timeout{}
  join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
  raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
  raft_group0: make_raft_config_nonvoter: add abort_source parameter
  manager_client: server_add with start=false shouldn't call driver_connect
  scylla_cluster: add seeds parameter to the add_server and servers_add
  raft_server_with_timeouts: report the lost quorum
  join_node_request_handler: add raft_timeout{} for start_operation
  skip_mode: add platform_key
  auth: use raft_timeout{}
  raft_group0_client: add raft_timeout parameter
  raft_group_registry: add group0_with_timeouts
  utils: add composite_abort_source.hh
  error_injection: move api registration to set_server_init
  error_injection: add inject_parameter method
  error_injection: move injection_name string into injection_shared_data
  error_injection: pass injection parameters at startup
2024-03-22 10:45:33 +01:00
Botond Dénes
f02baef871 Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity
Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease
maintenance and reduce recompilations.

Closes scylladb/scylladb#17965

* github.com:scylladb/scylladb:
  test: sstables::test_env: complete pimplification
  test/lib: test_env: move test_env::reusable_sst() to test_services.cc
2024-03-22 11:26:12 +02:00
Patryk Wrobel
28ed20d65e scylla-nodetool: adjust effective ownership handling
When a keyspace uses tablets, then effective ownership
can be obtained per table. If the user passes only a
keyspace, then /storage_service/ownership/{keyspace}
returns an error.

This change:
 - adds an additional positional parameter to 'status'
   command that allows a user to query status for table
   in a keyspace
 - makes usage of /storage_service/ownership/{keyspace}
   optional to avoid errors when user tries to obtain
   effective ownership of a keyspace that uses tablets
 - implements new frontend tests in 'test_status.py'
   that verify the new logic

Refs: scylladb#17405
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17827
2024-03-22 09:51:57 +02:00
Michał Jadwiszczak
c0853b461c test: test service levels v2 works in recovery mode 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c551a85cda test: add test for service levels migration 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
5811f696be test: add test for service levels snapshot 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
bf3aed1ecb test:topology: extract trigger_snapshot to utils
The function was defined separately in a few tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
2917ec5d51 service:qos: service levels migration
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.

Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
159a6a2169 service:qos: fix is_v2() method 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
fd32f5162a service:qos: add a method to upgrade data accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d403bdfdd5 test: add unit_test_raft_service_levels_accessor
Raft service level data accessor with logic simillar to
`unit_test_service_levels_accessor` to avoid sleeps in boost tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d5fa0747d7 service:qos: add abort_source for group0 operations
Add mechanism to abort ongoing group0 operations while draining
service_level_controller or leaving the cluster.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
71c07addb5 service:qos: use group0_guard in data accessor
Adjust service_level_controller and
service_level_controller::service_level_distributed_data_accessor
interfaces to take `group0_guard` while adding/altering/dropping a
service level.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
674286b868 test: fix overrides in unit_test_service_levels_accessor 2024-03-21 23:14:57 +01:00
Avi Kivity
b530dc1e3b test: sstables::test_env: complete pimplification
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.

Complete the pimplification:
 - the `impl` nested struct is moved out-of-line
 - all non-template member functions are moved out-of-line
 - a destructor is declared and defined out-of-line
 - the move constructor is also defined (necessary after the destructor is
   defined)

After this, we can forward-declare more components.
2024-03-21 22:29:01 +02:00
Avi Kivity
d745929b44 test/lib: test_env: move test_env::reusable_sst() to test_services.cc
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.

Move toc_filename with it, as it is its only caller and it is static.
2024-03-21 22:21:02 +02:00
Andrei Chekun
7de28729e7 test: change maintenance socket location to /tmp
Fixes #16912

By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.

Closes scylladb/scylladb#17941
2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak
27465a00e0 topology_coordinator: do not clear unpublished CDC generation's data
In this commit, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
f45aebeee2 topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
In the following commit, we add a test that needs to block the CDC
generation publisher's loop twice. We allow it in this commit by
making handlers of the `cdc_generation_publisher_fiber` injection
share messages. From now on, unblocking every step of the loop will
require sending a new message from the test.

This change breaks the test already using the
`cdc_generation_publisher_fiber` injection, so we adjust the test.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
c5c4cc7d00 error_injection: allow injection handlers to not share messages
For a single injection, all created injection handlers share all
received messages. In particular, it means that one received message
unblocks all handlers waiting for the first message. This behavior
is often desired, for example, if multiple fibers execute the
injected code and we want to unblock them all with a single message.
However, there is a problem if we want to block every execution
of the injected code. Apart from the first created handler, all
handlers will be instantly unblocked by messages from the past that
have already unblocked the first handler.

In one of the following commits, we add a test that needs to block
the CDC generation publisher's loop twice. Since it looks like there
are no good workarounds for this arguably general problem, we extend
injections with handlers in a way that solves it. We introduce the
new `share_messages` parameter. Depending on its value, handlers
will share messages or not. The details are described in the new
comments in `error_injection.hh`.

We also add some basic unit tests for the new funcionality.
2024-03-21 14:35:38 +01:00
Petr Gusev
ae0ec19537 migration_manager: use raft_timeout{}
Checking all the call sites of the migration manager shows
that all of them are initiated by user requests,
not background activities. Therefore, we add a global
raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
294e1ff464 storage_service::join_node_response_handler: use raft_timeout{}
This function is called as part of a node join procedure
initiated by the user, so having timeouts here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
ca21362ade raft_timeout: test that node operations fail properly 2024-03-21 16:35:48 +04:00
Petr Gusev
099c756ba1 join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
We also add a specific test_quorum_lost_during_node_join. It
exercises the case when the quorum is lost after start_operation
but before these methods are called.
2024-03-21 16:35:48 +04:00
Petr Gusev
99ddffac32 manager_client: server_add with start=false shouldn't call driver_connect
If the server is not started there is not point
in starting the driver, it would fail because there
are no nodes to connect to. On the other hand, we
should connect the driver in server_start()
if it's not connected yet.
2024-03-21 16:35:48 +04:00
Petr Gusev
3f6cf38dd5 scylla_cluster: add seeds parameter to the add_server and servers_add
If this parameter is set, we use its value for
the scylla.yaml of the new node, otherwise we
use IPs of all running nodes as before.

We'll need this parameter in subsequent commits to
restrict the communication between nodes.

We remove default values for _create_server_add_data parameters
since they are redundant - in the two call sites we pass all
of them.
2024-03-21 16:35:48 +04:00
Petr Gusev
99419d5964 raft_server_with_timeouts: report the lost quorum
In this commit we extend the timeout error message with
additional context - if we see that there is no quorum of
available nodes, we report this as the most likely
cause of the error.

We adjust the test by adding this new information to the
expected_error. We need raft-group-registry-fd-threshold-in-ms
to make _direct_fd threshold less than
group0-raft-op-timeout-in-ms.
2024-03-21 16:35:48 +04:00
Petr Gusev
1a3fc58438 join_node_request_handler: add raft_timeout{} for start_operation
In the test, we use the group0-raft-op-timeout-in-ms parameter to
reduce the timeout to one second so as not to waste time.

The join_node_request_handler method contains other group0 calls
which should have timeouts (make_nonvoters and add_entry). They
will be handled in a separate commit.
2024-03-21 16:35:48 +04:00