scylladb

Author	SHA1	Message	Date
Avi Kivity	cccd2e7fa7	Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place. To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream. Closes scylladb/scylladb#16175 * github.com:scylladb/scylladb: sstable: Generalize toc file read and parse s3/client: Don't GET object contents on out-of-bound reads s3/client: Cache stats on readable_file	2023-11-29 19:18:31 +02:00
Nadav Har'El	62f89d49e5	tablets, mv: fix on_internal_error on write to base table This situation before this patch is that when tablets are enabled for a keyspace, we can create a materialized view but later any write to the base table fails with an on_internal_error(), saying that: "Tried to obtain per-keyspace effective replication map of test but it's per-table." Indeed, with tablets, the replication is different for each table - it's not the same for the entire keyspace. So this patch changes the view update code to take the replication map from the specific base table, not the keyspace. This is good enough to get materialized-views reads and writes working in a simple single-node case, as the included test demonstrates (the test fails with on_internal_error() before this patch, and passes afterwards). But this fix is not perfect - the base-view pairing code really needs to consider not only the base table's replication map, but also the view table's replication map - as those can be different. We'll fix this remaining problem as a followup in a separate patch - it will require a substantially more elaborate test to reproduce the need for the different mapping and to verify that fix. Fixes #16209. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16211	2023-11-29 15:29:17 +01:00
Pavel Emelyanov	c5d85bdf79	s3/client: Don't GET object contents on out-of-bound reads If S3 readable file is used inside file input stream, the latter may call its read methods with position that is above file size. In that case server replies with generic http error and the fact that the range was invalid is encoded into reply body's xml. That's not great to catch this via wrong reply status exception and xml parsing all the more so we can know that the read is out-of-bound in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Calle Wilund	3b70fde3cd	commitlog: Make named_files in delete_segments have updated size Fixes #16207 commitlog::delete_segments deletes (or recycles) segments replayed. The actual file size here is added to footprint so actual delete then can determine iff things should be recycled or removed. However, we build a pending delete list of named_files, and the files we added did not have size set. Bad. Actual deletion then treated files as zero-byte sized, i.e. footprint calculations borked. Simple fix is just filling in the size of the objects when addind. Added unit test for the problem. Closes scylladb/scylladb#16210	2023-11-29 09:58:47 +02:00
Botond Dénes	3ed6925673	Merge 'Major compaction: flush commitlog by forcing new active segment and flushing all tables' from Benny Halevy Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Closes scylladb/scylladb#15820 * github.com:scylladb/scylladb: docs: nodetool: flush: enrich examples docs: nodetool: compact: fix example api: add /storage_service/compact api: add /storage_service/flush compaction_manager: flush_all_tables before major compaction database: add flush_all_tables api: compaction: add flush_memtables option test/nodetool: jmx: fix path to scripts/scylla-jmx scylla-nodetool, docs: improve optional params documentation	2023-11-29 08:48:40 +02:00
Nadav Har'El	88a5ddabce	tablets, mv: create tablets for a new materialized view Before this patch, trying to create a materialized view when tablets are enabled for a keyspace results in a failure: "Tablet map not found for table <uuid>", with uuid referring to the new view. When a table schema is created, the handler on_before_create_column_family() is called - and this function creates the tablet map for the new table. The bug was that we forgot to do the same when creating a materialized view - which also a bona-fide table. In this patch we call on_before_create_column_family() also when creating the materialized view. I decided not to create a new callback (e.g., on_before_create_view()) and rather call the existing on_before_create_column_family() callback - after all, a view is a column family too. This patch also includes a test for this issue, which fails to create the view before this patch, and passes with the patch. The test is in the test/topology_experimental_raft suite, which runs Scylla with the tablets experimental feature, and will also allow me to create tests that need multiple nodes. However, the first test added here only needs a single node to reproduce the bug and validate its fix. Fixes #16194. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16205	2023-11-28 21:54:32 +01:00
Benny Halevy	b12b142232	api: add /storage_service/compact For major compacting all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool compact` translates to a sequence of `/storage_service/keyspace_compaction` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1b576f358b	api: add /storage_service/flush For flushing all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool flush` translates to a sequence of `/storage_service/keyspace_flush` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1fd85bd37b	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	7f860d612a	test/nodetool: jmx: fix path to scripts/scylla-jmx The current implementation makes no sense. Like `nodetool_path`, base the default `jmx_path` on the assumption that the test is run using, e.g. ``` (cd test/nodetool; pytest --nodetool=cassandra test_compact.py) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Botond Dénes	f46cdce9d3	Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal. Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well. fixes: #13745 Closes scylladb/scylladb#15635 * github.com:scylladb/scylladb: test: Add object_store test to validate config reloading works test: Add config update facility to test cluster test: Make S3_Server export config file as pathlib.Path config: Make object storage config updateable_value_source memtable: Extend list of checking codes sstables/storage/s3: Fix missing TOC status check s3/client: Map http exceptions into storage_io_error exceptions: Extend storage_io_error construction options	2023-11-28 09:33:37 +02:00
Botond Dénes	3ccf1e020b	Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16177 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-28 09:08:04 +02:00
Nadav Har'El	8d040325ab	cql: fix SELECT toJson() or SELECT JSON of time column The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column of type "time" forgot to put the time string in quotes. The result was invalid JSON. This is patch is a one-liner fixing this bug. This patch also removes the "xfail" marker from one xfailing test for this issue which now starts to pass. We also add a second test for this issue - the existing test was for "SELECT TOJSON(t)", and the second test shows that "SELECT JSON t" had exactly the same bug - and both are fixed by the same patch. We also had a test translated from Cassandra which exposed this bug, but that test continues to fail because of other bugs, so we just need to update the xfail string. The patch also fixes one C++ test, test/boost/json_cql_query_test.cc, which enshrined the wrong behavior - JSON output that isn't even valid JSON - and had to be fixed. Unlike the Python tests, the C++ test can't be run against Cassandra, and doesn't even run a JSON parser on the output, which explains how it came to enshrine wrong output instead of helping to discover the bug. Fixes #7988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16121	2023-11-27 10:03:04 +02:00
Konstantin Osipov	f0aa325187	test: provide overview of the contents of test/ directory Fixes #16080 Closes scylladb/scylladb#16088	2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz	81be3e0935	test/alternator/run: port -h and --omit-scylla-output options from cql-pytest Closes scylladb/scylladb#16171	2023-11-26 13:52:01 +02:00
Aleksandra Martyniuk	9c2c964b8e	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk	8639eae0ce	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-24 19:25:20 +01:00
Botond Dénes	a472700309	Merge 'Minor fixes and refactors' from Kamil Braun - remove some code that is obsolete in newer Scylla versions, - fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless, - refactor some code for better maintainability. Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new. Closes scylladb/scylladb#16162 * github.com:scylladb/scylladb: test/pylib: log_browsing: fix type hint migration_manager: take `abort_source&` in get_schema_for_read/write migration_manager: inline merge_schema_in_background migration_manager: remove unused merge_schema_from overload migration_manager: assume `canonical_mutation` support migration_manager: add `std::move` to avoid a copy schema_tables: refactor `scylla_tables(schema_features)` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak	15d3ed4357	test: topology: update run_first lists `run_first` lists in `suite.yaml` files provide a simple way to shorten the tests' average running time by running the slowest tests at first. We update these lists, since they got outdated over time: - `test_topology_ip` was renamed to `test_replace` and changed suite, - `test_tablets` changed suite, - new slow tests were added: - `test_cluster_features`, - `test_raft_cluster_features`, - `test_raft_ignore_nodes`, - `test_read_repair`. Closes scylladb/scylladb#16104	2023-11-24 16:18:30 +01:00
Patryk Jędrzejczak	a8d06aa9fd	test: topology: add test_concurrent_bootstrap We add a test for concurrent bootstrap support in the raft-based topology. The plan is to make this test temporary. In the future, we will: - use ManagerClient.servers_add in other tests wherever possible, - start initial servers concurrently in all suites with initial_size > 0. So, this test will not test anything unique. We could make the changes proposed above now instead of adding this small test. However, if we did that and it turned out that concurrent bootstrap is flaky in CI, we would make almost every CI run fail with many failures. We want to avoid such a situation. Running only this test for some time in CI will reduce the risk and make investigating any potential failures easier.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	cd7b282db6	test: ManagerClient: introduce servers_add We add a new function - servers_add - that allows adding multiple servers concurrently to a cluster. It makes use of a concurrent bootstrap now supported in the raft-based topology. servers_add doesn't have the replace_cfg parameter. The reason is that we don't support concurrent replace operations, at least for now. There is an implementation detail in ScyllaCluster.add_servers. We cannot simply do multiple calls to add_server concurrently. If we did that in an empty cluster, every node would take itself as the only seed and start a new cluster. To solve this, we introduce a new field - initial_seed. It is used to choose one of the servers as a seed for all servers added concurrently to an empty cluster. Note that the add_server calls in asyncio.gather in add_servers cannot race with each other when setting initial_seed because there is only one thread. In the future, we will also start all initial servers concurrently in ScyllaCluster.install_and_start. The changes in this commit were designed in a way that will make changing install_and_start easy.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	aca90e6640	test: ManagerClient: introduce _create_server_add_data We introduce this function to avoid code duplication. After the following commits, it will also be used in the new ManagerClient.servers_add function.	2023-11-24 09:39:01 +01:00
Botond Dénes	c47a63835e	Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes https://github.com/scylladb/scylladb/issues/16073 Closes scylladb/scylladb#16074 * github.com:scylladb/scylladb: test/sstable_compaction_test: check every sstable replaced sstable test/sstable_compaction_test: s/old_sstables.front()/old_sstable/	2023-11-24 07:25:28 +02:00
Kamil Braun	35bb025f99	test/pylib: log_browsing: fix type hint	2023-11-23 17:23:47 +01:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Kamil Braun	c3257bf546	Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown" This reverts commit `93ee7b7df9`. It's causing assertion failures when shutting down `cql_test_env` in boost unit tests: scylladb/scylladb#16144	2023-11-23 15:32:13 +01:00
Kamil Braun	03ecc8457c	Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive. Apart from adding this change, this PR adds a test case - `test_replacing_alive_node_fails` - that verifies it. A few testing framework adjustments were necessary to implement this test and to avoid flakiness in other tests that use the replace operation after the change. From now, we need to ensure that all nodes see the node being replaced as dead before starting the replace. Otherwise, the check added in this PR could reject the replace. Additionally, this PR changes the replace procedure in a way that if the replacing node reuses the IP of the node being replaced, other nodes can see it as alive only after the topology coordinator accepts its join request. The replacing node may become alive before the topology coordinator checks if the node being replaced is dead. If that happens and the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. Fixes #15863 Closes scylladb/scylladb#15926 * github.com:scylladb/scylladb: test: add test_replacing_alive_node_fails raft topology: reject replace if the node being replaced is not dead raft topology: add the gossiper ref to topology_coordinator test: test_cluster_features: stop gracefully before replace test: decrease failure_detector_timeout_in_ms in replace tests test: move test_replace to topology_custom test: server_add: wait until the node being replaced is dead test: server_add: add support for expected errors raft topology: join: delay advertising replacing node if it reuses IP raft topology: join: fix a condition in validate_joining_node	2023-11-23 10:31:59 +01:00
Kefu Chai	55103f4a6b	hints: move formatter of db::hints::sync_point to test the operator<<() based formatter is only used in its test, so let's move it to where it is used. we can always bring it back later if it is required in other places. but better off implementing it as a fmt::formatter<> then. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16142	2023-11-23 11:22:31 +02:00
Tomasz Grabiec	b06a0078fb	Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed. The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload. Mechanism for sending custom_payload added. Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable. I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly. Automatic tests added. Closes scylladb/scylladb#15410 * github.com:scylladb/scylladb: docs: add documentation about sending tablet info to protocol extensions Add tests for sending tablet info cql3: send tablet if wrong node/shard is used during modification statement cql3: send tablet if wrong node/shard is used during select statement locator: add function to check locality locator: add function to check if host is local transport: add function to add tablet info to the result_message transport: add support for setting custom payload	2023-11-22 17:44:07 +02:00
Botond Dénes	0ae1335daa	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `11cafd2fc8`, reversing changes made to `2bae14f743`. Reverting because this series causes frequent CI failures, and the proposed quickfix causes other failures of its own. Fixes: #16113	2023-11-22 17:44:07 +02:00
Kefu Chai	48340380dd	scylla-sstable: print "validate" result in JSON instead of printing the result of the "validate" subcommand in a free-style plain text, let's print it using JSON. for two reasons: 1. it is simpler to consume the output with other tools and tests. 2. more consistent with other commands. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16105	2023-11-22 17:44:07 +02:00
Nadav Har'El	242a4b23c0	Merge 'tests: Skip unnecessary sleeps in cql_test_env teardown' from Tomasz Grabiec This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests. Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s. Closes scylladb/scylladb#16111 * github.com:scylladb/scylladb: test: cql_test_env: Interrupt all components on cql_test_env teardown tests: cql_test_env: Skip gossip shutdown sleep	2023-11-22 17:44:07 +02:00
Botond Dénes	b1a76ebb93	Merge 'Sanitize storage service init/deinit sequences' from Pavel Emelyanov Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place. refs: #2795 refs: #2737 Closes scylladb/scylladb#16103 * github.com:scylladb/scylladb: storage_service: Drop (un)init_messaging_service_part() pair storage_service: Init/Deinit RPC handlers in constructor/stop storage_service: Dont capture container() on RPC handler storage_service: Use storage_service::_sys_dist_ks in some places storage_service: Add explicit dependency on system dist. keyspace storage_service: Rurn query processor pointer into reference storage_service: Add explicity query_processor dependency main: Start storage service later	2023-11-22 17:44:07 +02:00
sylwiaszunejko	207d673ad6	Add tests for sending tablet info	2023-11-22 09:23:43 +01:00
sylwiaszunejko	cea4c40685	cql3: send tablet if wrong node/shard is used during modification statement	2023-11-22 09:23:43 +01:00
Pavel Emelyanov	74329e5aee	test: Add object_store test to validate config reloading works The test case is - start scylla with broken object storage endpoint config - create and populate s3-backed keyspace - try flushing it (API call would hang, so do it in the background) - wait for a few seconds, then fix the config - wait for the flush to finish and stop scylla - start scylla again and check that the keyspace is properly populated Nice side effect of this test is that once flush fails (due to broken config) it tries to remove the not-yet-sealed sstables and (!) fails again, for the same reason. So during the restart there happen to be several sstables in "creating" state with no stored objects, so this additionally tests one more g.c. corner case Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	26f8202651	test: Add config update facility to test cluster The Cluster wrapper used by object_store test already has the ability to access cluster via CQL and via API. Add the sugar to make the cluster re-read its scylla.yaml and other configs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	4a531e4129	test: Make S3_Server export config file as pathlib.Path The pylib minio server does that already. A test case added by the next patch would need to have both cases as path, not as string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	210b01a5ce	config: Make object storage config updateable_value_source Now its plain updateable_value, but without the ..._source object the updateable_value is just a no-op value holder. In order for the observers to operate there must be the value source, updating it would update the attached updateable values _and_ notify the observers. In order for the config to be the u.v._source, config entries should be comparable to each other, thus the <=> operator for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	855626f7de	s3/client: Map http exceptions into storage_io_error When http request resolves with excpetion it makes sense to translate the network exception into storage exceptio to make upper layers think that it was some sort of IO error, not SUDDENLY and http one. The translation is, for now, pretty simple: - 404 and 3xx -> ENOENT - 403(forbidden) and 401(unauthorized) -> EACCESS - anything else -> EIO Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Patryk Jędrzejczak	566176bcd1	test: add test_replacing_alive_node_fails We add a test for the Raft-based topology's new feature - rejecting the replace operation if the node being replaced is considered alive by the failure detector. This test is not so fast, and it does not test any critical paths so we run it only in dev mode.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	8605cdd9cd	test: test_cluster_features: stop gracefully before replace In on of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time is around 20 s if we stop the node being replaced ungracefully. We change the stop procedure to graceful to not slow down the test.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	206a446a02	test: decrease failure_detector_timeout_in_ms in replace tests In one of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time can be around 20 s if we stop the node being replaced ungracefully. 20 s is the default value of the failure detector timeout. We don't want to slow down the replace operations this much for no good reason. We could use server_stop_gracefully instead of server_stop everywhere, but we should have at least a few replace tests with server_stop. For now, test_replace and test_raft_ignore_nodes will be these tests. To keep them reasonably fast, we decrease the failure_detector_timeout_in_ms value on all initial servers. We also skip test_replace in debug mode to avoid flakiness due to low failure_detector_timeout_in_ms (test_raft_ignore_nodes is already skipped).	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	7062ff145e	test: move test_replace to topology_custom In the following commit, we make all servers in test_replace use failure-detector-timeout-in-ms = 2000. Therefore, we need test_replace to be in a suite with initial_size equal to 0.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	9775b1c12d	test: server_add: wait until the node being replaced is dead In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky replace operations that fail because the node being replaced hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being replaced as dead in all replace operations.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	18ed89f760	test: server_add: add support for expected errors After this change, if we try to add a server and it fails with an expected error, the add_server function will not throw. Also, the server will be correctly installed and stopped. Two issues are motivating this feature. The first one is that if we want to add a server while expecting an error, we have to do it in two steps: - call server_add with the start parameter set to False, - call server_start with the expected_error parameter. It is quite inconvenient. The second one is that we want to be able to test the replace operation when it is considered incorrect, for example when we try to replace an alive node. To do this, we would have to remove some assertions from ScyllaCluster.add_server. However, we should not remove them because they give us clear information when we write an incorrect test. After adding the expected_error parameter, we can ignore these assertions only when we expect an error. In this way, we enable testing failing replace operations without sacrificing the testing framework's protection.	2023-11-21 12:39:16 +01:00
Tomasz Grabiec	93ee7b7df9	test: cql_test_env: Interrupt all components on cql_test_env teardown This should interrupt all sleeps in component teardown. Before this patch, there was a 1s sleep on gossiper shutdown, which I don't know where it comes from. After the patch there is no such sleep.	2023-11-21 12:22:32 +01:00
Tomasz Grabiec	7f3a74efab	tests: cql_test_env: Skip gossip shutdown sleep Removes unnecessary 2s sleep on each cql test env teardown.	2023-11-21 12:22:24 +01:00
Botond Dénes	65e42e4166	Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, will not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired. This series fixes the bug and adds a minimal reproducer test. Fixes #10598 Closes scylladb/scylladb#16003 * github.com:scylladb/scylladb: mutation_query_test: test that range tombstones are sent in reverse queries mutation_query: properly send range tombstones in reverse queries	2023-11-21 09:19:14 +02:00
Nadav Har'El	0fd10690d4	Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that. fixes: #15074 Closes scylladb/scylladb#16038 * github.com:scylladb/scylladb: test: Add validation of misconfigured storage creation sstables: Throw early if endpoint for keyspace is not configured replica: Move storage options validation to sstables manager test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store sstables: Add has_endpoint_client() helper to manager	2023-11-20 21:12:48 +02:00

1 2 3 4 5 ...

5892 Commits