scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Gleb Natapov	4f23eec44f	Rename experimental raft feature to consistent-topology-changes Make the name more descriptive Fixes #14145 Message-Id: <ZKQ2wR3qiVqJpZOW@scylladb.com>	2023-07-07 11:08:10 +02:00
Kamil Braun	0d437a7d63	Merge 'utils: error injection: add inject_with_handler for interactions with injected code' from Mikołaj Grzebieluch Currently, it is hard for injected code to wait for some events, for example, requests on some REST endpoint. This PR adds the `inject_with_handler` method that executes injected function and passes `injection_handler` as its argument. The `injection_handler` class is used to wait for events inside the injected code. The `error_injection` class can notify the injection's handler or handlers associated with the injection on all shards about the received message. Closes #14357. Closes #14460 * github.com:scylladb/scylladb: tests: introduce InjectionHandler class for communicating with injected code api/error_injection: add message_injection endpoint tests: utils: error injections: add test for inject_with_handler utils: error injection: add inject_with_handler for interactions with injected code utils: error injection: create structure for error injections data	2023-07-06 18:16:51 +02:00
Mikołaj Grzebieluch	907c0e8900	tests: introduce InjectionHandler class for communicating with injected code Add a client for sending empty messages to the injected code from tests.	2023-07-06 12:34:53 +02:00
Kefu Chai	1faf50fc05	test/pylib: do not hardwire alias to "local" define a variable for it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 15:58:41 +08:00
Kefu Chai	d55cfdc152	test/pylib: retry if minio_server is not ready there is chance that minio_server is not ready to serve after launching the server executable process. so we need to retry until the first "mc" command is able to talk to it. in this change, add method `mc()` is added to run minio client, so we can retry the command before it timeouts. and it allows us to ignore the failure or specify the timeout. this should ready the minio server before tests start to connect to it. Fixes #1719 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-05 15:57:59 +08:00
Kefu Chai	c005b6dce0	test/pylib: chmod +x minio_server.py add a shebang line. so we can just launch a minio_server using ```console test/pylib/minio_server.py --host 127.0.0.1 ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-04 13:19:34 +08:00
Kefu Chai	2bae0b9aa8	test/pylib: allow run minio_server.py as a stand-alone tool this would allow developer to run a minio server for testing, for instance, s3_test, using something like: ```console $ python3 test/pylib/minio_server.py --host 127.0.0.1 tempdir='/tmp/tmpfoobar-minio' export S3_SERVER_ADDRESS_FOR_TEST=127.0.0.1 export S3_SERVER_PORT_FOR_TEST=900 export S3_PUBLIC_BUCKET_FOR_TEST=testbucket ``` and developer is supposed to copy-and-paste the `export` commands to prepare the environmental variables for the test using the minio server. the tempdir is used for the rundir of minio, and it is also used for holding the log file of this tool. one might want to check it when necessary. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-04 13:14:42 +08:00
Konstantin Osipov	3d81408a58	test.py: make `experimental: raft` the default for all tests Make sure all tests use the new centralized topology coordinator. This is a step forward towards maturing the coordinator implementation. Closes #14039	2023-06-29 14:44:00 +02:00
Kamil Braun	b38dcba6ed	test: pylib: increase checking period for `get_alive_endpoints` `server_sees_others` and similar functions periodically call `get_alive_endpoints`. The period was `.1` seconds, increase it to `.5` to reduce the log spam (I checked empirically that `.5` is usually how long it takes in dev mode on my laptop.)	2023-06-20 13:03:46 +02:00
Kamil Braun	ae92932240	test: pylib: manager_client: `get_cql()` helper	2023-06-20 13:03:46 +02:00
Kamil Braun	e02249f0cd	test: pylib: ScyllaCluster: server pause/unpause API	2023-06-20 13:03:46 +02:00
Piotr Dulikowski	e7c355e84f	test: introduce get_supported_features/get_enabled_features Introduces two helper functions that allow getting information about supported/enabled features on a node, according to its system tables. As a bonus, the `wait_for_feature` function is refactored to use `get_enabled_features`.	2023-06-12 13:28:16 +02:00
Piotr Dulikowski	56d3d8b9e2	test: move wait_for_feature to pylib utils The `wait_for_feature` can be useful, and will be used, in other test suites than `topology_raft_disabled`, so it is moved to the common pylib utils.	2023-06-12 10:09:00 +02:00
Alejo Sanchez	5b8fc86737	test/pylib: minio unique temp dir Create a unique minio server temp dir for each test run. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14095	2023-06-07 16:29:58 +03:00
Nadav Har'El	5984db047d	Merge 'mv: forbid IS NOT NULL on columns outside the primary key' from Jan Ciołek statement_restrictions: forbid IS NOT NULL on columns outside the primary key IS NOT NULL is currently allowed only when creating materialized views. It's used to convey that the view will not include any rows that would make the view's primary key columns NULL. Generally materialized views allow to place restrictions on the primary key columns, but restrictions on the regular columns are forbidden. The exception was IS NOT NULL - it was allowed to write regular_col IS NOT NULL. The problem is that this restriction isn't respected, it's just silently ignored (see #10365). Supporting IS NOT NULL on regular columns seems to be as hard as supporting any other restrictions on regular columns. It would be a big effort, and there are some reasons why we don't support them. For now let's forbid such restrictions, it's better to fail than be wrong silently. Throwing a hard error would be a breaking change. To avoid breaking existing code the reaction to an invalid IS NOT NULL restrictions is controlled by the `strict_is_not_null_in_views` flag. This flag can have the following values: * `true` - strict checking. Having an `IS NOT NULL` restriction on a column that doesn't belong to the view's primary key causes an error to be thrown. * `warn` - allow invalid `IS NOT NULL` restrictions, but throw a warning. The invalid restrictions are silently ignored. * `false` - allow invalid `IS NOT NULL` restricitons, without any warnings or errors. The invalid restrictions are silently ignored. The default values for this flag are `warn` in `db::config` and `true` in scylla.yaml. This way the existing clusters will have `warn` by default, so they'll get a warning if they try to create such an invalid view. New clusters with fresh scylla.yaml will have the flag set to `true`, as scylla.yaml overwrites the default value in `db::config`. New clusters will throw a hard error for invalid views, but in older existing clusters it will just be a warning. This way we can maintain backwards compatibility, but still move forward by rejecting invalid queries on new clusters. Fixes: #10365 Closes #13013 * github.com:scylladb/scylladb: boost/restriction_test: test the strict_is_not_null_in_views flag docs/cql/mv: columns outside of view's primary key can't be restricted cql-pytest: enable test_is_not_null_forbidden_in_filter statement_restrictions: forbid IS NOT NULL on columns outside the primary key schema_altering_statement: return warnings from prepare_schema_mutations() db/config: add strict_is_not_null_in_views config option statement_restrictions: add get_not_null_columns() test: remove invalid IS NOT NULL restrictions from tests	2023-06-07 12:12:19 +03:00
Jan Ciolek	c67d65987e	db/config: add strict_is_not_null_in_views config option IS NOT NULL shouldn't be allowed on columns which are outside of the materialized view's primary key. It's currently allowed to create views with such restrictions, but they're silently ignored, it's a bug. In the following commits restricting regular columns with IS NOT NULL will be forbidden. This is a breaking change. Some users might have existing code that creates views with such restrictions, we don't want to break it. To deal with this a new feature flag is introduced: strict_is_not_null_in_views. By default it's set to `warn`. If a user tries to create a view with such invalid restrictions they will get a warning saying that this is invalid, but the query will still go through, it's just a warning. The default value in scylla.yaml will be `true`. This way new clusters will have strict enforcement enabled and they'll throw errors when the user tries to create such an invalid view, Old clusters without the flag present in scylla.yaml will have the flag set to warn, so they won't break on an update. There's also the option to set the flag to `false`. It's dangerous, as it silences information about a bug, but someone might want it to silence the warnings for a moment. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-07 01:48:39 +02:00
Kamil Braun	7e56388721	test: pylib: ScyllaCluster: generalize config type for `server_add` Generalize from `dict[str, str]` to `dict[str, Any]`.	2023-05-29 11:03:36 +02:00
Kamil Braun	ce13395ce4	test: pylib: scylla_cluster: add explicit timeout for graceful server stop If server shutdown hangs, the `manager.server_stop_gracefully` call would eventually (after 5 minutes) timeout with a cryptic `TimeoutError`; it's a generic timeout for performing requests by the tests to `ScyllaClusterManager`. It was non-obvious how to find what actually caused the timeout - you'd have to browse multiple logs. Introduce an explicit timeout in `ScyllaServer.stop_gracefully`. Set it to 1 minute. Whether this is a good value may be arguable, but shutdown taking longer than that probably indicates problems. The important thing is that this timeout is shorter than the generic request timeout. If this times out we get a nice error in the test: ``` E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/server/1/stop_gracefully, params: None, json: None, body: E Stopping server ScyllaServer(1, 127.162.40.1, 826d5884-4696-4a22-80a7-cc872aa43102) gracefully took longer than 60s ```	2023-05-29 11:03:30 +02:00
Jan Ciolek	d2ef55b12c	test: use NetworkTopologyStrategy in all unit tests As described in https://github.com/scylladb/scylladb/issues/8638, we're moving away from `SimpleStrategy`, in the future it will become deprecated. We should remove all uses of it and replace them with `NetworkTopologyStrategy`. This change replaces `SimpleStrategy` with `NetworkTopologyStrategy` in all unit tests, or at least in the ones where it was reasonable to do so. Some of the tests were written explicitly to test the `SimpleStrategy` strategy, or changing the keyspace from `SimpleStrategy` to `NetworkTopologyStrategy`. These tests were left intact. It's still a feature that is supported, even if it's slowly getting deprecated. The typical way to use `NetworkTopologyStrategy` is to specify a replication factor for each datacenter. This could be a bit cumbersome, we would have to fetch the list of datacenters, set the repfactors, etc. Luckily there is another way - we can just specify a replication factor to use for or each existing datacenter, like this: ```cql CREATE KEYSPACE {} WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}; ``` This makes the change rather straightforward - just replace all instances of `'SimpleStrategy'', with `'NetworkTopologyStrategy'`. Refs: https://github.com/scylladb/scylladb/issues/8638 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13990	2023-05-23 08:52:56 +03:00
Tomasz Grabiec	9d4bca26cc	Merge 'raft topology: implement `check_and_repair_cdc_streams` API' from Kamil Braun `check_and_repair_cdc_streams` is an existing API which you can use when the current CDC generation is suboptimal, e.g. after you decommissioned a node the current generation has more stream IDs than you need. In that case you can do `nodetool checkAndRepairCdcStreams` to create a new generation with fewer streams. It also works when you change number of shards on some node. We don't automatically introduce a new generation in that case but you can use `checkAndRepairCdcStreams` to create a new generation with restored shard-colocation. This PR implements the API on top of raft topology, it was originally implemented using gossiper. It uses the `commit_cdc_generation` topology transition state and a new `publish_cdc_generation` state to create new CDC generations in a cluster without any nodes changing their `node_state`s in the process. Closes #13683 * github.com:scylladb/scylladb: docs: update topology-over-raft.md test: topology_experimental_raft: test `check_and_repair_cdc` API raft topology: implement `check_and_repair_cdc_streams` API raft topology: implement global request handling raft topology: introduce `prepare_new_cdc_generation_data` raft_topology: `get_node_to_work_on_opt`: return guard if no node found raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition raft topology: separate `publish_cdc_generation` state raft topology: non-node-specific `exec_global_command` raft topology: introduce `start_operation()` raft topology: non-node-specific `topology_mutation_builder` topology_state_machine: introduce `global_topology_request` topology_state_machine: use `uint16_t` for `enum_class`es raft topology: make `new_cdc_generation_data_uuid` topology-global	2023-05-22 11:33:58 +02:00
Kamil Braun	64dc76db55	test: pylib: fix `read_barrier` implementation The previous implementation didn't actually do a read barrier, because the statement failed on an early prepare/validate step which happened before read barrier was even performed. Change it to a statement which does not fail and doesn't perform any schema change but requires a read barrier. This breaks one test which uses `RandomTables.verify_schema()` when only one node is alive, but `verify_schema` performs a read barrier. Unbreak it by skipping the read barrier in this case (it makes sense in this particular test). Closes #13933	2023-05-18 18:30:11 +02:00
Pavel Emelyanov	01628ae8c1	test,minio: Run mc with --debug option With that if mc fails we'll (hopefully) get some meaningful information about why it happened. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:16:15 +03:00
Pavel Emelyanov	4041c2f30d	test,minio: Log mc operations to log file Currently everything minio.py does goes to test.py log, while mc (and minio) output go to another log file. That's inconvenient, better to keep minio.py's messages in minio log file. Also, while at it, print a message if local alias drop fails (it's benign failure, but it's good to have the note anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:14:49 +03:00
Kamil Braun	f581282625	test: topology_experimental_raft: test `check_and_repair_cdc` API	2023-05-08 16:49:01 +02:00
Kamil Braun	3f3dcf451b	test: pylib: random_tables: perform read barrier in `verify_schema` `RandomTables.verify_schema` is often called in topology tests after performing a schema change. It compares the schema tables fetched from some node to the expected latest schema stored by the `RandomTables` object. However there's no guarantee that the latest schema change has already propagated to the node which we query. We could have performed the schema change on a different node and the change may not have been applied yet on all nodes. To fix that, pick a specific node and perform a read barrier on it, then use that node to fetch the schema tables. Fixes #13788 Closes #13789	2023-05-08 13:21:10 +02:00
Petr Gusev	330d1d5163	scylla_cluster.py: fix read_last_line This is a follow-up to #13399, the patch addresses the issues mentioned there: * linesep can be split between blocks; * linesep can be part of UTF-8 sequence; * avoid excessively long lines, limit to 512 chars; * the logic of the function made simpler and more maintainable.	2023-05-05 12:57:36 +04:00
Petr Gusev	8a5e211c30	scylla_cluster.py: move read_last_line to util.py We want to add tests for read_last_line, so we move it to make this simper.	2023-05-05 12:51:25 +04:00
Pavel Emelyanov	3bec5ea2ce	s3/client: Keep server port on config Currently the code temporarily assumes that the endpoint port is 9000. This is what tests' local minio is started with. This patch keeps the port number on endpoint config and makes test get the port number from minio starting code via environment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Botond Dénes	1426c623eb	Merge 'Tune up S3 unit tests environment usage (and a bit more)' from Pavel Emelyanov The tests in question are using MINIO_SERVER_ADDRESS environment variable to export minio server address from pylib to test cases. Also they use hard-coded public bucket name. Both plays badly with AWS S3, the former due to MINIO_... in its name and the latter because public bucket name can be any. So this PR puts address and public bucket name into S3_..._FOR_TEST environment variables and fixes output stream closure on failure while at it. Detached from #13493 Closes #13546 * github.com:scylladb/scylladb: s3/test: Rename MINIO_SERVER_ADDRESS environment variable s3/test: Keep public bucket name in environment s3/test: Fix upload stream closure test/lib: Add getenv_safe() helper	2023-04-20 18:01:12 +03:00
Alejo Sanchez	11561a73cb	test/pylib: ManagerClient helpers to wait for... server to see other servers after start/restart When starting/restarting a server, provide a way to wait for the server to see at least n other servers. Also leave the implementation methods available for manual use and update previous tests, one to wait for a specific server to be seen, and one to wait for a specific server to not be seen (down). Fixes #13147 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13438	2023-04-20 14:22:31 +02:00
Pavel Emelyanov	a77ca69360	s3/test: Rename MINIO_SERVER_ADDRESS environment variable Using it the pylib minio code export minio address for tests. This creates unneeded WTFs when running the test over AWS S3, so it's better to rename to variable not to mention MINIO at all. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:51:12 +03:00
Pavel Emelyanov	12c4e7d605	s3/test: Keep public bucket name in environment Local test.py runs minio with the public 'testbucket' bucket and all test cases know that. This series adds an ability to run tests over real S3 so the bucket name should be configurable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:51:12 +03:00
Tomasz Grabiec	041ee3ffdd	test: pylib: Add a way to create cql connections with particular coordinators Usage: await manager.driver_connect(server=servers[0]) manager.cql.execute(f"...", execution_profile='whitelist')	2023-04-13 21:23:03 +02:00
Alejo Sanchez	62a945ccd5	test/pylib: get gossiper alive endpoints Helper to get list of gossiper alive endpoints from REST API. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-13 21:23:03 +02:00
Alejo Sanchez	3508a4e41e	test/pylib: configurable replication factor Make replication factor configurable for the RandomTables helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-13 21:23:02 +02:00
Pavel Emelyanov	6dbe41d277	test.py: Equip it with minio server When test.py starts it activates a minio server inside test-dir and configures an anonymous bucket for test cases to run on Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Petr Gusev	09636b20f3	scylla_cluster.py: optimize node logs reading There are two occasions in scylla_cluster where we read the node logs, and in both of them we read the entire file in memory. This is not efficient and may cause an OOM. In the first case we need the last line of the log file, so we seek at the end and move backwards looking for a new line symbol. In the second case we look through the log file to find the expected_error. The readlines() method returns a Python list object, which means it reads the entire file in memory. It's sufficient to just remove it since iterating over the file instance already yields lines lazily one by one. This is a follow-up for #13134. Closes #13399	2023-04-03 12:28:08 +02:00
Alejo Sanchez	81b40c10de	test/pylib: RandomTables.add_column with value column When adding extra columns in a test, make them value column. Name them with the "v_" prefix and use the value column number counter. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13271	2023-03-31 11:19:49 +02:00
Alejo Sanchez	e3b462507d	test/pylib: topology: support clusters of initial size 0 To allow tests with custom clusters, allow configuration of initial cluster size of 0. Add a proof-of-concept test to be removed later. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13342	2023-03-31 11:17:58 +02:00
Petr Gusev	e407956e9f	scylla_cluster.py: add start flag to server_add Sometimes when creating a node it's useful to just install it and not start. For example, we may want to try to start it later with expected error. The ScyllaServer.install method has been made exception safe, if an exception occurs, it reverts to the original state. This allows to not duplicate the try/except logic in two of its call sites.	2023-03-24 16:08:17 +04:00
Petr Gusev	794d0e4000	ServerInfo: drop host_id We are going to allow the ScyllaCluster.add_server function not to start the server if the caller has requested that with a special parameter. The host_id can only be obtained from a running node, so add_server won't be able to return it in this case. I've grepped the tests for host_id and there doesn't seem to be any reference to it in the code.	2023-03-24 16:08:17 +04:00
Petr Gusev	8e3392c64f	scylla_cluster.py: add config to server_add Sometimes when creating a node it's useful to pass a custom node config.	2023-03-24 16:08:17 +04:00
Petr Gusev	c1d0ee2bce	scylla_cluster.py: add expected_error to server_start Sometimes it's useful to check that the node has failed to start for a particular reason. If server_start can't find expected_error in the node's log or if the node has started without errors, it throws an exception.	2023-03-24 16:08:11 +04:00
Petr Gusev	a4411e9ec4	scylla_cluster.py: ScyllaServer.start, refactor error reporting Extract the function that encapsulates all the error reporting logic. We are going to use it in several other places to implement expected_error feature.	2023-03-24 15:54:52 +04:00
Petr Gusev	21b505e67c	scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed The ScyllaServer expects cmd to be None if the Scylla process is not running. Otherwise, if start failed and the test called update_config, the latter will try to send a signal to a non-existent process via cmd.	2023-03-24 15:54:52 +04:00
Konstantin Osipov	7309a1bd6b	test: improve logging in ScyllaCluster Print IP addresses and cluster identifiers in more log messages, it helps debugging.	2023-03-10 19:53:19 +03:00
Konstantin Osipov	4ace19928d	raft: (test) test ip address change	2023-03-10 19:52:40 +03:00
Botond Dénes	e55f475db1	Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds. Closes #12765 * github.com:scylladb/scylladb: test/pylib: use larger timeout for decommission/removenode test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT	2023-02-13 16:30:24 +02:00
Nadav Har'El	2653865b34	Merge 'test.py: improve test failure handling' from Kamil Braun Improve logging by printing the cluster at the end of each test. Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure. Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test. Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do. Closes #12652 * github.com:scylladb/scylladb: test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters test/topology: don't drop random_tables keyspace after a failed test test/pylib: mark cluster as dirty after a failed test test: pylib, topology: don't perform operations after test on a dirty cluster test/pylib: print cluster at the end of test	2023-02-12 12:13:25 +02:00
Kamil Braun	54f85c641d	test/pylib: use larger timeout for decommission/removenode Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds.	2023-02-10 15:56:31 +01:00

1 2 3 4 5

216 Commits