Commit Graph

37748 Commits

Author SHA1 Message Date
Anna Stuchlik
88e62ec573 doc: improve User Data info in Launch on AWS
Fixes https://github.com/scylladb/scylladb/issues/14565

This commit improves the description of ScyllaDB configuration
via User Data on AWS.
- The info about experimental features and developer mode is removed.
- The description of User Data is fixed.
- The example in User Data is updated.
- The broken link is fixed.

Closes #14569
2023-07-07 16:34:06 +02:00
Kamil Braun
de7f668441 Merge 'raft topology: send cdc generation data in parts' from Mikołaj Grzebieluch
The CDC generation data can be large and not fit in a single command.
This pr splits it into multiple mutations by smartly picking a
`mutation_size_threshold` and sending each mutation as a separate group
0 command.

Commands are sent sequentially to avoid concurrency problems.

Topology snapshots contain only mutation of current CDC generation data
but don't contain any previous or future generations. If a new
generation of data is being broadcasted but hasn't been entirely applied
yet, the applied part won't be sent in a snapshot. New or delayed nodes
can never get the applied part in this scenario.

Send the entire cdc_generations_v3 table in the snapshot to resolve this
problem.

A mechanism to remove old CDC generations will be introduced as a
follow-up.

Closes #13962

* github.com:scylladb/scylladb:
  test: raft topology: test `prepare_and_broadcast_cdc_generation_data`
  service: raft topology: print warning in case of `raft::commit_status_unknown` exception in topology coordinator loop
  raft topology: introduce `prepare_and_broadcast_cdc_generation_data`
  raft: add release_guard
  raft: group0_state_machine::merger take state_id as the maximal value from all merged commands
  raft topology: include entire cdc_generations_v3 table in cdc_generation_mutations snapshot
  raft topology: make `mutation_size_threshold` depends on `max_command_size`
  raft: reduce max batch size of raft commands and raft entries
  raft: add description argument to add_entry_unguarded
  raft: introduce `write_mutations` command
  raft: refactor `topology_change` applying
2023-07-07 16:31:29 +02:00
Kamil Braun
f9cfd7e4f5 Merge 'raft: do not ping self in direct failure detector' from Konstantin Osipov
Avoid pinging self in direct failure detector, this adds confusing noise and adds constant overhead.
Fixes #14388

Closes #14558

* github.com:scylladb/scylladb:
  direct_fd: do not ping self
  raft: initialize raft_group_registry with host id early
  raft: code cleanup
2023-07-07 14:26:17 +02:00
Mikołaj Grzebieluch
4e3c97d8d4 test: raft topology: test prepare_and_broadcast_cdc_generation_data
This test limits `commitlog_segment_size_in_mb` to 2, thus `max_command_size`
is limited to less than 1 MB. It adds an injection which copies mutations
generated by `get_cdc_generation_mutations` n times, where n is picked that
the memory size of all mutations exceeds `max_command_size`.

This test passes if cdc generation data is committed by raft in multiple commands.
If all the data is committed in a single command, the leader node will loop trying
to send raft command and getting the error:
```
storage_service - raft topology: topology change coordinator fiber got error raft::command_is_too_big_error (Command size {} is greater than the configured limit {})
```
2023-07-07 13:56:35 +02:00
Mikołaj Grzebieluch
8d6c95f9e3 service: raft topology: print warning in case of raft::commit_status_unknown exception in topology coordinator loop
When the topology_cooridnator fiber gets `raft::commit_status_unknown`, it
prints an error. This exception is not an error in this case, and it can be
thrown when the leader has changed. It can happen in `add_entry_unguarded`
while sending a part of the CDC generation data in the `write_mutations` command.

Catch this exception in `topology_coordinator::run` and print a warning.
2023-07-07 13:56:35 +02:00
Mikołaj Grzebieluch
ade15ad74a raft topology: introduce prepare_and_broadcast_cdc_generation_data
Broadcasts all mutations returned from `prepare_new_cdc_generation_data`
except the last one. Each mutation is sent in separate raft command. It takes
`group0_guard`, and if the number of mutations is greater than one, the guard
is dropped, and a new one is created and returned, otherwise the old one will
be returned. Commands are sent in parallel and unguarded (the guard used for
sending the last mutation will guarantee that the term hasn't been changed).
Returns the generation's UUID, guard and last mutation, which will be sent
with additional topology data by the caller.

If we send the last mutation in the `write_mutation` command, we would use a
total of `n + 1` commands instead of `n-1 + 1` (where `n` is the number of
mutations), so it's better to send it in `topology_change` (we need to send
it after all `write_mutations`) with some small metadata.

With the default commitlog segment size, `mutation_size_threshold` will be 4 MB.
In large clusters e.g. 100 nodes, 64 shards per node, 256 vnodes cdc generation
data can reach the size of 30 MB, thus there will be no more than 8 commands.

In a multi-DC cluster with 100ms latencies between DCs, this operation should
take about 200ms since we send the commands concurrently, but even if the commands
were replicated sequentially by Raft, it should take no more than 1.6s, which is
incomparably smaller than bootstrapping operation (bootstrapping is quick if there
is no data in the cluster, but usually if one has 100 nodes they have tons of data,
so indeed streaming/repair will take much longer (hours/days)).

Fixes FIXME in pr #13683.
2023-07-07 13:56:35 +02:00
Mikołaj Grzebieluch
04c38c6185 raft: add release_guard
This function takes guard and calls its destructor. It's used to not call raw destructor.
2023-07-07 13:49:25 +02:00
Mikołaj Grzebieluch
d2a4079bbe raft: group0_state_machine::merger take state_id as the maximal value from all merged commands
If `group0_state_machine` applies all commands individually (without batching),
the resulting current `state_id` -- which will be compared with the
`prev_state_id` of the next command if it is a guarded command -- equals the
maximum of the `next_state_id` of all commands applied up to this point.
That's because the current `state_id` is obtained from the history table by
taking the row with the largest clustering key.

When `group0_state_machine::apply` is called with a batch of commands, the
current `state_id` is loaded from `system.group0_history` to `merger::last_group0_state_id`
only once. When a command is merged, its `next_state_id` overwrites
`last_group0_state_id`, regardless of their order.

Let's consider the following situation:
The leader sends two unguarded `write_mutations` commands concurrently, with
timeuuids T1 and T2, where T1 < T2. Leader waits to apply them and sends guarded
`topology_change` with `prev_state_id` equal T2.
Suppose that the command with timeuuid T2 is committed first, and these commands
are small enough that all of `write_mutations` could be merged into one command.
Some followers can get all of these three commands before its `fsm` polls them.
In this situation, `group0_state_machine::apply` is called with all three of
them and `merger` will merge both `write_mutations` into one command. After that,
`merger::last_group0_state_id` will be equal to T1 (this command was committed
as the second one). When it processes the `topology_change` command, it will
compare its `prev_state_id` and `merger::last_group0_state_id`, resulting in
making this command a no-op (which wouldn't happen if the commands were applied
individually).
Such a scenario results in inconsistent results: one replica applies `topology_change`,
but another makes it a no-op.
2023-07-07 13:49:25 +02:00
Mikołaj Grzebieluch
b2d22d665e raft topology: include entire cdc_generations_v3 table in cdc_generation_mutations snapshot
Topology snapshots contain only mutation of current CDC generation data but don't
contain any previous or future generations. If new a generation of data is being
broadcasted but hasn't been entirely applied yet, the applied part won't be sent
in a snapshot. In this scenario, new or delayed nodes can never get the applied part.

Send entire cdc_generations_v3 table in the snapshot to resolve this problem.

As a follow-up, a mechanism to remove old CDC generations will be introduced.
2023-07-07 13:11:52 +02:00
Mikołaj Grzebieluch
dc6017b71b raft topology: make mutation_size_threshold depends on max_command_size
`get_cdc_generation_mutations` splits data to mutations of maximal size
`mutation_size_treshold`. Before this commit it was hardcoded to 2 MB.

Calculate `mutation_size_threshold` to leave space for cdc generation
data and not exceed `max_command_size`.
2023-07-07 13:11:52 +02:00
Mikołaj Grzebieluch
6dad582796 raft: reduce max batch size of raft commands and raft entries
For now, `raft_sys_table_storage::_max_mutation_size` equals `max_mutation_size`
(half of the commitlog segment size), so with some additional information, it
can exceed this threshold resulting in throwing an exception when writing
mutation to the commitlog.

A batch of raft commands has the size at most `group0_state_machine::merger::max_command_size`
(half of the commitlog segment size). It doesn't have additional metadata, but
it may have a size of exactly `max_mutation_size`. It shouldn't make any trouble,
but it is prefered to be careful.

Make `raft_sys_table_storage::_max_mutation_size` and
`group0_state_machine::merger::max_command_size` more strict to leave space
for metadata.

Fixed typo "1204" => "1024".
2023-07-07 13:11:52 +02:00
Mikołaj Grzebieluch
760d415781 raft: add description argument to add_entry_unguarded
Provide useful description for `write_mutations` and
`broadcast_tables_query` that is stored in `system.group0_history`.

Reduces scope of issue #13370.
2023-07-07 13:11:44 +02:00
Anna Stuchlik
799ae97b52 doc: add the Rust CDC Connector to the docs
Fixes https://github.com/scylladb/scylladb/issues/13877

This commit adds the information about Rust CDC Connector
to the documentation. All relevant pages are updated:
the ScyllaDB Rust Driver page, and other places in
the docs where Java and Go CDC connectors are mentioned.

In addition, the drivers table is updated to indicate
Rust driver support for CDC.

Closes #14530
2023-07-07 11:13:25 +02:00
Nadav Har'El
edfb89ef65 sstables: stop warning when auto-snapshot leaves non-empty directory
When a table is dropped, we delete its sstables, and finally try to delete
the table's top-level directory with the rmdir system call. When the
auto-snapshot feature is enabled (this is still Scylla's default),
the snapshot will remain in that directory so it won't be empty and will
cannot be removed. Today, this results in a long, ugly and scary warning
in the log:

```
WARN  2023-07-06 20:48:04,995 [shard 0] sstable - Could not remove table directory "/tmp/scylla-test-198265/data/alternator_alternator_Test_1688665684546/alternator_Test_1688665684546-4238f2201c2511eeb15859c589d9be4d/snapshots": std::filesystem::__cxx11::filesystem_error (error system:39, filesystem error: remove failed: Directory not empty [/tmp/scylla-test-198265/data/alternator_alternator_Test_1688665684546/alternator_Test_1688665684546-4238f2201c2511eeb15859c589d9be4d/snapshots]). Ignored.
```

It is bad to log as a warning something which is completely normal - it
happens every time a table is dropped with the perfectly valid (and even
default) auto-snapshot mode. We should only log a warning if the deletion
failed because of some unexpected reason.

And in fact, this is exactly what the code **tried** to do - it does
not log a warning if the rmdir failed with EEXIST. It even had a comment
saying why it was doing this. But the problem is that in Linux, deleting
a non-empty directory does not return EEXIST, it returns ENOTEMPTY...
Posix actually allows both. So we need to check both, and this is the
only change in this patch.

To confirm this that this patch works, edit test/cql-pytest/run.py and
change auto-snapshot from 0 to 1, run test/alternator/run (for example)
and see many "Directory not empty" warnings as above. With this patch,
none of these warnings appear.

Fixes #13538

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14557
2023-07-07 11:08:10 +02:00
Benny Halevy
cd44ad9338 docs: compaction: correct min_sstable_size default value
DEFAULT_MIN_SSTABLE_SIZE is defined as `50L * 1024L * 1024L`
which is 50 MB, not 50 bytes.

Fixes #14413

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14414
2023-07-07 11:08:10 +02:00
Marcin Maliszkiewicz
c5de25be4c locator: use deferred_close in azure and gcp snitches
Close needs to be called even if function throws in the middle.

Closes #14458
2023-07-07 11:08:10 +02:00
Avi Kivity
1f9a999c26 cql3: statement_restrictions: clean up dead code
We have plenty of code marked with #if 0. Once it was an indication
of missing functionality, but the code has evolved so much it's
useless as an indication and only a distraction.

Delete it.

Closes #14511
2023-07-07 11:08:10 +02:00
Gleb Natapov
4f23eec44f Rename experimental raft feature to consistent-topology-changes
Make the name more descriptive

Fixes #14145

Message-Id: <ZKQ2wR3qiVqJpZOW@scylladb.com>
2023-07-07 11:08:10 +02:00
Kamil Braun
3c139265b3 Merge 'doc: remove the dead link to unirestore' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/14459

This PR removes the (dead) link to the unirestore tool in a private repository. In addition, it adds minor language improvements.

Closes #14519

* github.com:scylladb/scylladb:
  doc: minor language improvements on the Migration Tools page
  doc: remove the link to the private repository
2023-07-07 11:08:10 +02:00
Nadav Har'El
d6aba8232b alternator: configurable override for DescribeEndpoints
The AWS C++ SDK has a bug (https://github.com/aws/aws-sdk-cpp/issues/2554)
where even if a user specifies a specific enpoint URL, the SDK uses
DescribeEndpoints to try to "refresh" the endpoint. The problem is that
DescribeEndpoints can't return a scheme (http or https) and the SDK
arbitrarily picks https - making it unable to communicate with Alternator
over http. As an example, the new "dynamodb shell" (written in C++)
cannot communicate with Alternator running over http.

This patch adds a configuration option, "alternator_describe_endpoints",
which can be used to override what DescribeEndpoints does:

1. Empty string (the default) leaves the current behavior -
   DescribeEndpoints echos the request's "Host" header.

2. The string "disabled" disables the DescribeEndpoints (it will return
   an UnknownOperationException). This is how DynamoDB Local behaves,
   and the AWS C++ SDK and the Dynamodb Shell work well in this mode.

3. Any other string is a fixed string to be returned by DescribeEndpoints.
   It can be useful in setups that should return a known address.

Note that this patch does not, by default, change the current behaivor
of DescribeEndpoints. But it us the future to override its behavior
in a user experiences problems in the field - without code changes.

Fixes #14410.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14432
2023-07-07 11:08:10 +02:00
Konstantin Osipov
ff41ea86b6 direct_fd: do not ping self
No need to ping self in direct failure detector. This is confusing
during debugging and adds extra overhead.

Fixes #14388
2023-07-06 21:05:39 +03:00
Konstantin Osipov
50140980ac raft: initialize raft_group_registry with host id early
Earlier, when local query processor wasn't available at
the beginning of system start, we couldn't query our own
host id when initializing the raft group registry. The local
host id is needed by the registry since it is responsible
to route RPC messages to specific raft groups, and needs
to reject messages destined to a different host.

Now that the host id is known early at boot, remove the optional
and pass host id in the constructor. Resolves an earlier fixme.
2023-07-06 20:54:05 +03:00
Konstantin Osipov
d79d05aa46 raft: code cleanup
Rename raft_rpc::_server_id to raft_rpc::_my_id as is already the
name used in raft_group0:
- for consistency
- to reflect which server id it is.
2023-07-06 19:46:24 +03:00
Kamil Braun
0d437a7d63 Merge 'utils: error injection: add inject_with_handler for interactions with injected code' from Mikołaj Grzebieluch
Currently, it is hard for injected code to wait for some events, for example, requests on some REST endpoint.

This PR adds the `inject_with_handler` method that executes injected function and passes `injection_handler` as its argument.
The `injection_handler` class is used to wait for events inside the injected code.
The `error_injection` class can notify the injection's handler or handlers associated with the injection on all shards about the received message.

Closes #14357.

Closes #14460

* github.com:scylladb/scylladb:
  tests: introduce InjectionHandler class for communicating with injected code
  api/error_injection: add message_injection endpoint
  tests: utils: error injections: add test for inject_with_handler
  utils: error injection: add inject_with_handler for interactions with injected code
  utils: error injection: create structure for error injections data
2023-07-06 18:16:51 +02:00
Mikołaj Grzebieluch
907c0e8900 tests: introduce InjectionHandler class for communicating with injected code
Add a client for sending empty messages to the injected code from tests.
2023-07-06 12:34:53 +02:00
Mikołaj Grzebieluch
8b1f5ba293 api/error_injection: add message_injection endpoint
Add an endpoint for sending empty messages to the injected code.
2023-07-06 12:34:53 +02:00
Mikołaj Grzebieluch
7e5c42af0a tests: utils: error injections: add test for inject_with_handler
Add a test checking the correctness of the `inject_with_handler` method
in presence of concurrency.
2023-07-06 12:34:53 +02:00
Mikołaj Grzebieluch
086b3369f4 utils: error injection: add inject_with_handler for interactions with injected code
Currently, it is hard for injected code to wait for some events, for example,
requests on some REST endpoint.

This commit adds the `inject_with_handler` method that executes injected function
and passes `injection_handler` as its argument.
The `injection_handler` class is used to wait for events inside the injected code.
The `error_injection` class can notify the injection's handler or handlers
associated with the injection on all shards about the received message.
There is a counter of received messages in `received_messages_counter`; it is shared
between the injection_data, which is created once when enabling an injection on
a given shard, and all `injection_handler`s, that are created separately for each
firing of this injection. The `counter` is incremented when receiving a message from
the REST endpoint and the condition variable is signaled.
Each `injection_handler` (separate for each firing) stores its own private counter,
`_read_messages_counter` that private counter is incremented whenever we wait for a
message, and compared to the received counter. We sleep on the condition variable
if not enough messages were received.
2023-07-06 12:32:07 +02:00
Tomasz Grabiec
c25201c1a3 Merge 'view: fix range tombstone handling on flushes in view_updating_consumer' from Michał Chojnowski
View update routines accept `mutation` objects.
But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to convert the fragment stream into `mutation`s. This is done by piping the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might have to be split into multiple partial `mutation` objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error.

This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next `mutation` object).

The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic.

Fixes https://github.com/scylladb/scylladb/issues/14503

Closes #14502

* github.com:scylladb/scylladb:
  test: view_build_test: add range tombstones to test_view_update_generator_buffering
  test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
  view_updating_consumer: make buffer limit a variable
  view: fix range tombstone handling on flushes in view_updating_consumer
2023-07-05 21:21:43 +02:00
Michał Chojnowski
f6203f2bd4 test: view_build_test: add range tombstones to test_view_update_generator_buffering
This patch adds a full-range tombstone to the compacted mutation.
This raises the coverage of the test. In particular, it reproduces
issue #14503, which should have been caught by this test, but wasn't.
2023-07-05 17:33:49 +02:00
Michał Chojnowski
aab10402ce test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
A random mutation test for view_updating_consumer's buffering logic.
Reproduces #14503.
2023-07-05 17:33:49 +02:00
Michał Chojnowski
ac29b6f198 view_updating_consumer: make buffer limit a variable
The limit doesn't change at runtime, but we this patch makes it variable for
unit testing purposes.
2023-07-05 17:33:47 +02:00
Kefu Chai
fa8eaab62b build: remove duplicated test
this change has no impact on `build.ninja` generated by `configure.py`.
as we are using a `set` for tracking the tests to be built. but it's
still an improvement, as we should not add duplicated entries in a set
when initializing it.

there are two occurrences of `test/boost/double_decker_test`, the one
which is in the club of the local cluster of collections tests - bptree,
btree, radix_tree and double_decker are preserved.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14478
2023-07-05 15:43:04 +03:00
Kefu Chai
e4697e2bd2 sstable: remove stale comment
this comment should have been removed in
f014ccf369. but better late than never.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14497
2023-07-05 15:42:11 +03:00
Pavel Emelyanov
e91f95a629 Merge 's3/test: restructure object_store test into a pytest based test suite' from Kefu Chai
in this series, test/object_storage is restructured into a pytest based test. this paves the road to a test suites covers more use cases. so we can some more lower-level tests for tiered/caching-store.

Closes #14165

* github.com:scylladb/scylladb:
  s3/test: do not return ip in managed_cluster()
  s3/test: verify the behavior with asserts
  s3/test: restructure object_store/run into a pytest
  s3/test: extract get_scylla_with_s3_cmd() out
  s3/test: s/restart_with_dir/kill_with_dir/
  s3/test: vendor run_with_dir() and friends
  s3/test: remove get_tempdir()
  s3/test: extract managed_cluster() out
2023-07-05 15:40:43 +03:00
Gleb Natapov
c42a91ec72 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942
Message-Id: <ZJ2aeNIBQCtnTaE2@scylladb.com>
2023-07-05 14:38:34 +02:00
Mikołaj Grzebieluch
01bc6f5294 utils: error injection: create structure for error injections data
This enables holding additional data associated with the injection.
2023-07-05 13:52:46 +02:00
Anna Stuchlik
088a31cdb0 doc: minor language improvements on the Migration Tools page 2023-07-05 11:39:52 +02:00
Pavel Emelyanov
dfff5f2f2e Merge 'test/pylib: retry if minio_server is not ready and define a name for alias' from Kefu Chai
there is chance that minio_server is not ready to serve after
launching the server executable process. so we need to retry until
the first "mc" command is able to talk to it.

in this change, add method `mc()` is added to run minio client,
so we can retry the command before it timeouts. and it allows us to
ignore the failure or specify the timeout. this should ready the
minio server before tests start to connect to it.

also, in this change, instead of hardwiring the alias of "local" in the code,
define a variable for it. less repeating this way.

Fixes https://github.com/scylladb/scylladb/issues/1719

Closes #14517

* github.com:scylladb/scylladb:
  test/pylib: do not hardwire alias to "local"
  test/pylib: retry if minio_server is not ready
2023-07-05 12:32:58 +03:00
Anna Stuchlik
3213feee5f doc: remove the link to the private repository
This commit removes the dead link to the unirestore tool in
the private repository.
2023-07-05 11:28:37 +02:00
Kefu Chai
9080f8842b s3/test: do not return ip in managed_cluster()
let's just use cluster.contact_points for retrieving the IP address
of the scylla node in this single-node cluster. so the name of
managed_cluster() is less weird.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:39 +08:00
Kefu Chai
ec6410653f s3/test: verify the behavior with asserts
instead of assigning to "success", let's use assert for this purpose.
simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:21 +08:00
Kefu Chai
471d75c6c6 s3/test: restructure object_store/run into a pytest
instead of using a single run to perform the test, restructure
it into a pytest based test suite with a single test case.
this should allow us to add more tests exercising the object-storage
and cached/tierd storage in future.

* add fixtures so they can be reused by tests
* use tmpdir fixture for managing the tmpdir, see
  https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmpdir-fixture
* perform part of the teardown in the "test_tempdir()" fixture
* change the type of test from "Run" to "Python"
* rename "run" to "test_basic.py"
* optionally start the minio server if the settings are not
  found in command line or env variables, so that the tests are
  self-contained without the fixture setup by test.py.
* instead of sys.exit(), use assert statement, as this is
  what pytest uses.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:05:13 +08:00
Kefu Chai
bffaf84395 s3/test: extract get_scylla_with_s3_cmd() out
* define a dedicated S3_server class which duck types MinioServer.
  it will be used to represent S3 server in place of MinioServer if
  S3 is used for testing
* prepare object_storage.yaml in get_scylla_with_s3(), so it is more
  clear that we are using the same set of settings for launching
  scylla

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:49:04 +08:00
Kefu Chai
f74218f434 s3/test: s/restart_with_dir/kill_with_dir/
replace the restart_with_dir() with kill_with_dir(), so
that we can simplify the usage of managed_cluster() by enabling it
to start and stop the single-node cluster. with this change, the caller
does not need to run the scylla and pass its pid to this function
any more.

since the restart_with_dir() call is superseded by managed_cluster(),
which tears down the cluster, teardown() is now only responsible to
print out the log file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:48:25 +08:00
Kefu Chai
a6bb5864ff s3/test: vendor run_with_dir() and friends
so we don't need to mess up with cql-pytest/run.py, which is
use by cql-pytest.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:48:04 +08:00
Kefu Chai
b45049c968 s3/test: remove get_tempdir()
to match with another call of managed_cluster(), so it's clear that
we are just reusing test_tempdir.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:45:14 +08:00
Kefu Chai
a5a87d81c6 s3/test: extract managed_cluster() out
for setting up the cluster and tearing down it.
this helps to indent the code so that it is visually explicit
the lifecycle of the cluster.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 16:45:14 +08:00
Kefu Chai
1faf50fc05 test/pylib: do not hardwire alias to "local"
define a variable for it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 15:58:41 +08:00
Kefu Chai
d55cfdc152 test/pylib: retry if minio_server is not ready
there is chance that minio_server is not ready to serve after
launching the server executable process. so we need to retry until
the first "mc" command is able to talk to it.

in this change, add method `mc()` is added to run minio client,
so we can retry the command before it timeouts. and it allows us to
ignore the failure or specify the timeout. this should ready the
minio server before tests start to connect to it.

Fixes #1719
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 15:57:59 +08:00