Commit Graph

105 Commits

Author SHA1 Message Date
Alejo Sanchez
700054abee test.py: use internal id to manage servers
Instead of using assigned IP addresses, use an internal server id.

Define types to distinguish local server id, host ID (UUID), and IP
address.

This is needed to test servers changing IP address and for node replace
(host UUID).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
1e38f5478c test.py: rename hostname to ip_addr
The code explicitly manages an IP as string, make it explicit in the
variable name.

Define its type and test for set in the instance instead of using an
empty string as placeholder.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
f478eb52a3 test.py: get host id
When initializing a ScyllaServer, try to get the host id instead of only
checking the REST API is up.

Use the existing aiohttp session from ScyllaCluster.

In case of HTTP error check the status was not an internal error (500+).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
78663dda72 test.py: use REST api client in ScyllaCluster
Move the REST api client to ScyllaCluster. This will allow the cluster
to query its own servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
75ea345611 test.py: remove unnecessary reference to web app
The aiohttp.web.Application only needs to be passed, so don't store a
reference in ScyllaCluster object.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
a5316b0c6b test.py: requests without aiohttp ClientSession
Simplify REST helper by doing requests without a session.

Reusing an aiohttp.ClientSession causes knock-on effects on
`rest_api/test_task_manager` due to handling exceptions outside of an
async with block.

Requests for cluster management and Scylla REST API don't need session,
anyway.

Raise HTTPError with status code, text reason, params, and json.

In ScyllaCluster.install_and_start() instead of adding one more custom
exception, just catch all exceptions as they will be re-raised later.

While there avoid code duplication and improve sanity, type checking,
and lint score.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Petr Gusev
44f48bea0f raft: test_remove_node_with_concurrent_ddl
The test runs remove_node command with background ddl workload.
It was written in an attempt to reproduce scylladb#11228 but seems to have
value on its own.

The if_exists parameter has been added to the add_table
and drop_table functions, since the driver could retry
the request sent to a removed node, but that request
might have already been completed.

Function wait_for_host_known waits until the information
about the node reaches the destination node. Since we add
new nodes at each iteration in main, this can take some time.

A number of abort-related options was added
SCYLLA_CMDLINE_OPTIONS as it simplifies
nailing down problems.

Closes #11734
2022-11-04 17:16:35 +01:00
Kamil Braun
4974a31510 test/topology_raft_disabled: more Raft upgrade tests
The tests are checking the upgrade procedure and recovery from failure
in scenarios like when a node fails causing the procedure to get stuck
or when we lose a majority in a fully upgraded cluster.

Added some new functionalities to `ScyllaRESTAPIClient` like injecting
errors and obtaining gossip generation numbers.
2022-10-10 14:32:10 +02:00
Kamil Braun
fa8dcb0d54 test/pylib: scylla_cluster: pass a list of ignored nodes to removenode
The `removenode` operation normally requires the removing node to
contact every node in the cluster except the one that is being removed.
But if more than 1 node is down it's possible to specify a list of nodes
to ignore for the operation; the `/storage_service/remove_node` endpoint
accepts an `ignore_nodes` param which is a comma-separated list of IPs.

Extend `ScyllaRESTAPIClient`, `ScyllaClusterManager` and `ManagerClient`
so it's possible to pass the list of ignored nodes.

We also modify the `/cluster/remove-node` Manager endpoint to use
`put_json` instead of `get_text` and pass all parameters except the
initiator IP (the IP of the node who coordinates the `removenode`
operation) through JSON. This simplifies the URL greatly (it was already
messy with 3 parameters) and more closely resembles Scylla's endpoint.
2022-10-10 12:59:12 +02:00
Kamil Braun
130ab1d312 test/pylib: rest_client: propagate errors from put_json 2022-10-10 12:59:12 +02:00
Kamil Braun
63892326d5 test/pylib: fix some type hints 2022-10-10 12:59:12 +02:00
Kamil Braun
6e3fe13fcf test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up
Do a simple `SELECT` instead.

This speeds up tests - creating and dropping keyspaces is relatively
expensive, and we did this on every server restart.
2022-10-10 12:59:12 +02:00
Alejo Sanchez
abf1425ad4 test.py: Scylla REST methods for topology tests
Provide a helper client for Scylla REST requests. Use it on both
ScyllaClusterManager (e.g. remove node, test.py process) and
ManagerClient (e.g. get uuid, pytest process).

For now keep using IPs as key in ScyllaCluster, but this will be changed
to UUID -> IP in the future. So, for now, pass both independently. Note
the UUID must be obtained from the server before stopping it.

Refresh client driver connection when decommissioning or removing
a node.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 19:01:03 +02:00
Alejo Sanchez
86c752c2a0 test.py: rename server_id to server_ip
In ScyllaCluster currently servers are tracked by the host IP. This is
not the host id (UUID). Fix the variable name accordingly

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 19:01:03 +02:00
Alejo Sanchez
a7a0b446f0 test.py: HTTP client helper
Split aiohttp client to a shared helper file.

While there, move aiohttp session setup back to constructors. When there
were teardown issues it looked it could be caused by aiohttp session
being created outside a coroutine. But this is proven not to be the case
after recent fixes. So move it back to the ManagerClient constructor.

On th other hand, create a close() coroutine to stop the aiohttp session.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 19:01:03 +02:00
Alejo Sanchez
41dbdf0f70 test.py: topology pass ManagerClient instead of...
cql connection

When there are topology changes, the driver needs to be updated. Instead
of passing the CassandraCluster.Connection, pass the ManagerClient
instance which manages the driver connection inside of it.

Remove workaround for test_raft_upgrade.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 19:00:47 +02:00
Alejo Sanchez
0c3a06d0d7 test.py: delete unimplemented remove server
Delete of Unused and unimplemented broken version of remove server.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 18:57:38 +02:00
Alejo Sanchez
98bc4c198f test.py: fix variable name ssl name clash
Change variable ssl to use_ssl to avoid clash with ssl module.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-03 18:57:38 +02:00
Kamil Braun
b2cf610567 test/pylib: scylla_cluster: improve cluster printing
Print the cluster name and stopped servers in addition to the running
servers.

Fix a logging call which tried to print a server in place of a cluster
and even at that it failed (the server didn't have a hostname yet so it
printed as an empty string). Add another logging call.
2022-09-30 17:00:05 +02:00
Kamil Braun
05ed3769dd test/pylib: don't pass test_case_name to after-test endpoint
It's redundant now, the manager tracks the current test case using
before-test endpoint calls.
2022-09-30 16:41:45 +02:00
Kamil Braun
dc6f37b7f7 test/pylib: scylla_cluster: track current test case name and print it
Use `_before_test` calls to track the current test case name.
Concatenate it with the unique test name like this:
`test_topology.1::test_add_server_add_column`, and print it
instead of the test case name.
2022-09-30 16:38:35 +02:00
Kamil Braun
5be818d73b test.py: pass the unique test name (e.g. test_topology.1) to cluster manager
This helps us distinguish the different repeats of a test in logs.
Rename the variable accordingly in `ScyllaClusterManager`.
2022-09-30 16:24:10 +02:00
Kamil Braun
fde4642472 test/pylib: scylla_cluster: pass the test case name to before_test
We pass the test case name to `after_test` - so make it consistent.
Arguably, the test case name is more useful (as it's more precise) than
the test name.
2022-09-30 16:17:59 +02:00
Kamil Braun
43d8b4a214 test/pylib: use "test_case_name" variable name when talking about test cases
Distinguish "test name" (e.g. `test_topology`) from "test case name"
(e.g. `test_add_server_add_column` - a test case inside
`test_topology`).
2022-09-30 16:15:48 +02:00
Kamil Braun
1793d43b15 test/pylib: scylla_cluster: mark server_remove as not implemented
The `server_remove` function did a very weird thing: it shut down a
server and made the framework 'forget' about it. From the point of view
of the Scylla cluster and the driver the server was still there.

Replace the function's body with `raise NotImplementedError`. In the
future it can be replaced with an implementation that calls
`removenode` on the Scylla cluster.

Remove `test_remove_server_add_column` from `test_topology`. It
effectively does the same thing as `test_stop_server_add_column`, except
that the framework also 'forgets' about the stopped server. This could
lead to weird situations because the forgotten server's IP could be
reused in another test that was running concurrently with this test.

Closes #11657
2022-09-29 21:03:18 +03:00
Nadav Har'El
de1bc147bc Merge 'test.py: cleanups in topology test suites' from Kamil Braun
Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory`

Closes #11590

* github.com:scylladb/scylladb:
  test.py: replace `topology` with `cluster_size` in Topology tests
  test.py: rename `topology_for_class` to `get_cluster_factory`
  test/pylib: ScyllaCluster: fix create_server parameter type
2022-09-28 15:19:54 +03:00
Kamil Braun
1bcc28b48b test/topology_raft_disabled: reenable test_raft_upgrade
The test was disabled due to a bug in the Python driver which caused the
driver not to reconnect after a node was restarted (see
scylladb/python-driver#170).

Introduce a workaround for that bug: we simply create a new driver
session after restarting the nodes. Reenable the test.

Closes #11641
2022-09-28 15:13:42 +03:00
Alejo Sanchez
02933c9b82 test.py: close aiohttp session for topology tests
Close the aiohttp ClientSession after pytest session finishes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11648
2022-09-27 18:09:08 +02:00
Kamil Braun
06cc4f9259 test/pylib: ScyllaCluster: fix create_server parameter type
The only usage of `ScyllaCluster` constructor passed a `create_server`
function which expected a `List[str]` for the second parameter, while
the constructor specified that the function should expect an
`Optional[List[str]]`. There was no reason for the latter, we can easily
fix this type error.

Also give a type hint for `create_cluster` function in
`PythonTestSuite.topology_for_class`. This is actually what catched the
type error.
2022-09-26 11:45:44 +02:00
Alejo Sanchez
510215d79a test.py: fix ScyllaClusterManager start/stop
Check existing is_running member to avoid re-starting.

While there, set it to false after stopping.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-21 11:42:02 +02:00
Alejo Sanchez
933d93d052 test.py: fix topology init error handling
Start ScyllaClusterManager within error handling so the ScyllaCluster
logs are available in case of error starting up.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-21 09:15:25 +02:00
Alejo Sanchez
087ae521c5 test.py: make client fail if before test check fails
Check if request to server side (test.py) failed and raise if so.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11575
2022-09-19 18:04:07 +02:00
Kamil Braun
348582c4c8 test/pylib: pool: make it possible to free up space
Some tests mark clusters as 'dirty', which makes them non-reusable by
later tests; we don't want to return them to the pool of clusters.

This use-case was covered by the `add_one` function in the `Pool` class.
However, it had the unintended side effect of creating extra clusters
even if there were no more tests that were waiting for new clusters.

Rewrite the implementation of `Pool` so it provides 3 interface
functions:
- `get` borrows an object, building it first if necessary
- `put` returns a borrowed object
- `steal` is called by a borrower to free up space in the pool;
  the borrower is then responsible for cleaning up the object.

Both `put` and `steal` wake up any outstanding `get` calls. Objects are
built only in `get`, so no objects are built if none are needed.

Closes #11558
2022-09-18 12:05:57 +03:00
Alejo Sanchez
2da7304696 test.py: log server restarts for topology tests
Add missing logging for server restart.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 15:10:29 +02:00
Alejo Sanchez
61a92afa2d test.py: log actions for topology tests
For debugging, log driver connection, before and after checks, and
topology changes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 15:10:29 +02:00
Alejo Sanchez
604f7353ef Revert "test.py: restart stopped servers before...
teardown..."

This reverts commit df1ca57fda.

In order to prevent timeouts on teardown queries, the previous commit
added functionality to restart servers that were down. This issue is
fixed in fc0263fc9b so there's no longer need to restart stopped servers
on test teardown.
2022-09-15 14:47:01 +02:00
Alejo Sanchez
ed81f1a85c test.py: ManagerClient API fix return text
For ManagerClient request API, don't return status, raise an exception.
Server side errors are signaled by status 500, not text body.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Alejo Sanchez
4a5f2418ec test.py: ManagerClient raise on HTTP != 200
Raise an exception if the request result is not HTTP 200 for .get()
helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Alejo Sanchez
a84bde38c0 test.py: ManagerClient fix paths to updated resource
Fix missing path renames for server-side rename
"node" -> "server" API.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Alejo Sanchez
b8f68729b0 test.py: Pool add fresh when item not returned
Pool.get() might have waiting callers, so if an item is not returned
to the pool after use, tell the pool to add a new one and tell the pool
an entry was taken (used for total running entries, i.e. clusters).

Use it when a ScyllaCluster is dirty and not returned.

While there improve logging and docstrings.

Issue reported by @kbr-.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11546
2022-09-15 13:56:44 +03:00
Kamil Braun
73bf781e17 test/pylib: APIs to read and modify configuration from tests
We introduce `server_get_config` to fetch the entire configuration dict
and `update_config` to update a value under the given key.
2022-09-14 12:46:41 +02:00
Kamil Braun
1f550428a9 test/pylib: ScyllaServer: extract _write_config_file function
For refreshing the on-disk config file with the config stored in dict
form in the `self.config` field.
2022-09-14 12:46:41 +02:00
Kamil Braun
52e52e8503 test/pylib: ScyllaCluster: extend ActionReturn with dict data
For returning types more complex than text. Also specify a default empty
string value for the `msg` field for non-text return values.
2022-09-14 12:46:41 +02:00
Kamil Braun
c9348ae8ea test/pylib: ManagerClient: introduce _put_json
For sending PUT requests to the Manager (such as updating
configuration).
2022-09-14 12:46:41 +02:00
Kamil Braun
d81c722476 test/pylib: ManagerClient: replace _request with _get, _get_text
`_request` performed a GET request and extracted a text body out of the
response.

Split it into `_get`, which only performs the request, and `_get_text`,
which calls `_get` and extracts the body as text.

Also extract a `_resource_uri` function which will be used for other
request types.
2022-09-14 12:46:41 +02:00
Kamil Braun
9d39e14518 test: pylib: store server configuration in ScyllaServer
In following commits we will make this configuration accessible from
tests through the Manager (for fetching and updating).
2022-09-14 12:46:41 +02:00
Kamil Braun
311806244d test: pylib: use Python dicts to manipulate ScyllaServer configuration
Previously we used a formattable string to represent the configuration;
values in the string were substituted by Python's formatting mechanism
and the resulting string was stored to obtain the config file.

This approach had some downsides, e.g. it required boilerplate work to
extend: to add a new config options, you would have to modify this
template string.

Instead we can represent the configuration as a Python dictionary. Dicts
are easy to manipulate, for example you can sum two dicts; if a key
appears in both, the second dict 'wins':
```
{1:1} | {1:2} == {1:2}
```

This makes the configuration easy to extend without having to write
boilerplate: if the user of `ScyllaServer` wants to add or override a
config option, they can simply add it to the `config_options` dict and
that's it - no need to modify any internal template strings in
`ScyllaServer` implementation like before. The `config_options` dict is
simply summed with the 'base' config dict of `ScyllaServer`
(`config_options` is the right summand so anything in there overrides
anything in the base dict).

An example of this extensibility is the `authenticator` and `authorizer`
options which no longer appear in `scylla_cluster.py` module after this
change, they only appear in the suite.yaml file.

Also, use "workdir" option instead of specifying data dir, commitlog
dir etc. separately.
2022-09-12 11:57:58 +02:00
Kamil Braun
fd19825eaa test: pylib: store config_options in ScyllaServer
Previously the code extracted `authenticator` and `authorizer` keys from
the config options and stored them.

Store the entire dict instead. The new code is easier to extend if we
want to make more options configurable.
2022-09-12 11:57:18 +02:00
Pavel Emelyanov
bbad3eac63 pylib: Cast port number config to int explicitly
Otherwise it crashes some python versions.

The cast was there before a2dd64f68f
explicitly dropped one while moving the code between files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11511
2022-09-09 18:08:08 +02:00
Kamil Braun
dba595d347 Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch
Broadcast tables are tables for which all statements are strongly
consistent (linearizable), replicated to every node in the cluster and
available as long as a majority of the cluster is available. If a user
wants to store a “small” volume of metadata that is not modified “too
often” but provides high resiliency against failures and strong
consistency of operations, they can use broadcast tables.

The main goal of the broadcast tables project is to solve problems which
need to be solved when we eventually implement general-purpose strongly
consistent tables: designing the data structure for the Raft command,
ensuring that the commands are idempotent, handling snapshots correctly,
and so on.

In this MVP (Minimum Viable Product), statements are limited to simple
SELECT and UPDATE operations on the built-in table. In the future, other
statements and data types will be available but with this PR we can
already work on features like idempotent commands or snapshotting.
Snapshotting is not handled yet which means that restarting a node or
performing too many operations (which would cause a snapshot to be
created) will give incorrect results.

In a follow-up, we plan to add end-to-end Jepsen tests
(https://jepsen.io/). With this PR we can already simulate operations on
lists and test linearizability in linear complexity. This can also test
Scylla's implementation of persistent storage, failure detector, RPC,
etc.

Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing

Closes #11164

* github.com:scylladb/scylladb:
  raft: broadcast_tables: add broadcast_kv_store test
  raft: broadcast_tables: add returning query result
  raft: broadcast_tables: add execution of intermediate language
  raft: broadcast_tables: add compilation of cql to intermediate language
  raft: broadcast_tables: add definition of intermediate language
  db: system_keyspace: add broadcast_kv_store table
  db: config: add BROADCAST_TABLES feature flag
2022-09-09 18:05:37 +02:00