scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Kamil Braun	4376854473	schema_tables: remove default value for `reload` in `merge_schema` To avoid bugs like the one fixed in the previous commit.	2023-09-15 13:04:04 +02:00
Petr Gusev	6c3cc7d6e0	test_fence_hints: increase timeouts We saw failures on CI in debug mode, probably the machine running the test is shared, and we starved for some resources. Fix #15285 Closes #15388	2023-09-14 16:22:50 +02:00
Avi Kivity	d9a453e72e	Merge 'Introduce a scylla-native nodetool' from Botond Dénes This series introduces a scylla-native nodetool. It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands. For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable.. Closes #14841 * github.com:scylladb/scylladb: test: add nodetool tests test.py: add ToolTestSuite and ToolTest tools/scylla-nodetool: implement compact operation tools/scylla-nodetool: implement basic scylla_rest_api_client tools: introduce scylla-nodetool utils: export dns_connection_factory from s3/client.cc to http.hh utils/s3/client: pass logger to dns_connection_factory in constructor tools/utils: tool_app_template::run_async(): also detect --help* as --help	2023-09-14 17:20:40 +03:00
Avi Kivity	a3d73bfba7	Merge 'Add support for decommission with tablets' from Tomasz Grabiec Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be aborted. There is no infrastructure for aborts yet, so this is not implemented. Closes #15197 * github.com:scylladb/scylladb: tablets, raft topology: Add support for decommission with tablets tablet_allocator: Compute load sketch lazily tablet_allocator: Set node id correctly tablet_allocator: Make migration_plan a class tablets: Implement cleanup step storage_service, tablets: Prevent stale RPCs from running beyond their stage locator: Introduce tablet_metadata_guard locator, replica: Add a way to wait for table's effective_replication_map change storage_service, tablets: Extract do_tablet_operation() from stream_tablet() raft topology: Add break in the final case clause raft topology: Fix SIGSEGV when trace-level logging is enabled raft topology: Set node state in topology raft topology: Always set host id in topology	2023-09-14 17:16:23 +03:00
Kamil Braun	0564d000c6	Merge 'Validate compaction strategy options' from Aleksandra Martyniuk When a column family's schema is changed new compaction strategy type may be applied. To make sure that it will behave as expected, compaction strategy need to contain only the allowed options and values. Methods throwing exception on invalid options are added. Fixes: #2336. Closes #13956 * github.com:scylladb/scylladb: test: add test for compaction strategy validation compaction: unify exception messages compaction: cql3: validate options in check_restricted_table_properties compaction: validate options used in different compaction strategies compaction: validate common compaction strategy options compaction: split compaction_strategy_impl constructor compaction: validate size_tiered_compaction_strategy specific options compaction: validate time_window_compaction_strategy specific options compaction: add method to validate min and max threshold compaction: split size_tiered_compaction_strategy_options constructor compaction: make compaction strategy keys static constexpr compaction: use helpers in validate_* functions compaction: split time_window_compaction_strategy_options construtor compaction: add validate method to compaction_strategy_options time_window_compaction_strategy_options: make copy and move-able size_tiered_compaction_strategy_options: make copy and move-able	2023-09-14 16:11:52 +02:00
Tomasz Grabiec	551cc0233d	tablets, raft topology: Add support for decommission with tablets Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be paused so that admin becomes aware of the problem and issues an abort on the topology change. There is no infrastructure for aborts yet, so this is not implemented.	2023-09-14 13:05:49 +02:00
Tomasz Grabiec	389573543e	tablet_allocator: Make migration_plan a class It will be extended with more fields so that load balancer can communicate more information to the coordinator.	2023-09-14 13:04:47 +02:00
Botond Dénes	3e2d8ca94d	test: add nodetool tests Testing the new scylla nodetool tool. The tests can be run aginst both implementations of nodetool: the scylla-native one and the cassandra one. They all pass with both implementations.	2023-09-14 05:25:14 -04:00
Kamil Braun	bff9cedef9	Merge 'system_keyspace: remove flushes when writing to system tables' from Petr Gusev There are several system tables with strict durability requirements. This means that if we have written to such a table, we want to be sure that the write won't be lost in case of node failure. We currently accomplish this by accompanying each write to these tables with `db.flush()` on all shards. This is expensive, since it causes all the memtables to be written to sstables, which causes a lot of disk writes. This overheads can become painful during node startup, when we write the current boot state to `system.local`/`system.scylla_local` or during topology change, when `update_peer_info`/`update_tokens` write to `system.peers`. In this series we remove flushes on writes to the `system.local`, `system.peers`, `system.scylla_local` and `system.cdc_local` tables and start using schema commitlog for durability. Fixes: #15133 Closes #15279 * github.com:scylladb/scylladb: system_keyspace: switch CDC_LOCAL to schema commitlog system_keyspace: scylla_local: use schema commitlog database.cc: make _uses_schema_commitlog optional system_keyspace: drop load phases database.hh: add_column_family: add readonly parameter schema_tables: merge_tables_and_views: delay events until tables/views are created on all shards system_keyspace: switch system.peers to schema commitlog system_keyspace: switch system.local to schema commitlog main.cc: move schema commitlog replay earlier sstables_format_selector: extract listener sstables_format_selector: wrap when_enabled with seastar::async main.cc: inline and split system_keyspace.setup system_keyspace: refactor save_system_schema function system_keyspace: move initialize_virtual_tables into virtual_tables.hh system_keyspace: remove unused parameter config.cc: drop db::config::host_id main.cc:: extract local_info initialization into function schema.cc: check static_props for sanity system_keyspace: set null sharder when configuring schema commitlog system_keyspace: rename static variables system_keyspace: remove redundant wait_for_sync_to_commitlog	2023-09-14 10:39:20 +02:00
Botond Dénes	cc16502691	Merge 'Add metrics to S3 client' from Pavel Emelyanov The added metrics include: - http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far - IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time) fixes: #13369 Closes #14494 * github.com:scylladb/scylladb: s3/client: Add IO stats metrics s3/client: Add HTTP client metrics s3/client: Split make_request() s3/client: Wrap http client with struct group_client s3/client: Move client::stats to namespace scope s3/client: Keep part size local variable	2023-09-14 09:49:08 +03:00
Petr Gusev	ce0ee32d5a	database.cc: make _uses_schema_commitlog optional This field on the null shard is properly initialized in maybe_init_schema_commitlog function, until then we can't make decisions based on its value. This problem can happen e.g. if add_column_family function is called with readonly=false before maybe_init_schema_commitlog. It will call commitlog_for to pass the commitlog to mark_ready_for_writes and commitlog_for reads _uses_schema_commitlog. In this commit we add protection against this case - we trigger internal_error if _uses_schema_commitlog is read before it is initialized. maybe_init_schema_commitlog() was added to cql_test_env to make boost tests work with the new invariant.	2023-09-13 23:17:20 +04:00
Petr Gusev	beb29f094b	system_keyspace: drop load phases We want to switch system.scylla_local table to the schema commitlog, but load phases hamper here - schema commitlog is initialized after phase1, so a table which is using it should be moved to phase2, but system.scylla_local contains features, and we need them before schema commitlog initialization for SCHEMA_COMMITLOG feature. In this commit we are taking a different approach to loading system tables. First, we load them all in one pass in 'readonly' mode. In this mode, the table cannot be written to and has not yet been assigned a commit log. To achieve this we've added _readonly bool field to the table class, it's initialized to true in table's constructor. In addition, we changed the table constructor to always assign nullptr to commitlog, and we trigger an internal error if table.commitlog() property is accessed while the table is in readonly mode. Then, after triggering on_system_tables_loaded notifications on feature_service and sstable_format_selector, we call system_keyspace::mark_writable and eventually table::mark_ready_for_writes which selects the proper commitlog and marks the table as writable. In sstable_compaction_test we drop several mark_ready_for_writes calls since they are redundant, the table has already been made writable in env.make_table_for_tests call. The table::commitlog function either returns the current commitlog or causes an error if the table is readonly. This didn't work for virtual tables, since they never called mark_ready_for_writes. In this commit we add this call to initialize_virtual_tables.	2023-09-13 23:17:20 +04:00
Petr Gusev	47ffc66c7f	database.hh: add_column_family: add readonly parameter Previously, creating a table or view in schema_tables.cc/merge_tables_and_views was a two-step process: first adding a column family (add_column_family function) and then marking it as ready for writes (mark_table_as_writable). There is an yield between these stages, this means someone could see a table or view for which the mark_table_as_writable method had not yet been called, and start writing to it. This problem was demonstrated by materialised view dtests. A view is created on all nodes. On some nodes it will be created earlier than on others and the view rebuild process will start writing data to that view on other nodes, where mark_table_as_writable has not yet been called. In this patch we solve this problem by adding a readonly parameter to the add_column_family method. When loading tables from disk, this flag is set to true and the mark_table_as_writable is called only after all sstables have been loaded. When creating a new table, this flag is set to false, mark_table_as_writable is called from inside add_column_family and the new table becomes visible already as writable.	2023-09-13 23:17:20 +04:00
Petr Gusev	e395086557	system_keyspace: move initialize_virtual_tables into virtual_tables.hh This is a readability refactoring commit without observable changes in behaviour. initialize_virtual_tables logically belongs to virtual_tables module, and it allows to make other functions in virtual_tables.cc (register_virtual_tables, install_virtual_readers) local to the module, which simplifies the matters a bit. all_virtual_tables() is not needed anymore, all the references to registered virtual tables are now local to virtual_tables module and can just use virtual_tables variable directly.	2023-09-13 23:00:15 +04:00
Petr Gusev	c4787a160b	system_keyspace: remove unused parameter	2023-09-13 23:00:15 +04:00
Petr Gusev	b90011294d	config.cc: drop db::config::host_id In this refactoring commit we remove the db::config::host_id field, as it's hacky and duplicates token_metadata::get_my_id. Some tests want specific host_id, we add it to cql_test_config and use in cql_test_env. We can't pass host_id to sstables_manager by value since it's initialized in database constructor and host_id is not loaded yet. We also prefer not to make a dependency on shared_token_metadata since in this case we would have to create artificial shared_token_metadata in many tools and tests where sstables_manager is used. So we pass a function that returns host_id to sstables_manager constructor.	2023-09-13 23:00:15 +04:00
Avi Kivity	0a5d9532f9	Merge 'Sanitize batchlog manager start/stop' from Pavel Emelyanov This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard refs: #2795 Closes #15375 * github.com:scylladb/scylladb: batchlog_manager: Remove start() method batchlog_manager: Start replay loop in constructor main, cql_test_env: Start-stop batchlog manager in one "block" batchlog_manager: Move shard-0 check into batchlog_replay_loop() batchlog_manager: Fix drain() reentrability	2023-09-13 18:20:56 +03:00
Aleksandra Martyniuk	14598fdfdd	test: add test for compaction strategy validation	2023-09-13 16:59:40 +02:00
Botond Dénes	7e7101c180	Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes" This reverts commit `628e6ffd33`, reversing changes made to `45ec76cfbf`. The test included with this PR is flaky and often breaks CI. Revert while a fix is found. Fixes: #15371	2023-09-13 10:45:37 +03:00
Pavel Emelyanov	512465288f	main, cql_test_env: Start-stop batchlog manager in one "block" Currently starting and stopping of b.m. is spread over main(). Keep it close to each other. Another trickery here is that calling b.m.::start() can only be done after joining the cluster, because this start() spawns replay loop which, in turn calls token_metadata::count_normal_token_owners() and if the latter returns zero, the b.m. code uses it as a fraction denominator and crashes. With the above in mind, cql_test_env should start batchlog manager after it "joins the ring" too. For now it doesn't make any difference, but next patch will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:33:31 +03:00
Avi Kivity	89ba4e4a5e	Merge 'Stop using anonymous minio bucket for tests' from Pavel Emelyanov Currently minio starts with a bucket that has public anonymous access. Respectively, all tests use unsigned S3 requests. That was done for simplicity, and its better to apply some policy to the bucket and, consequentially, make tests sign their requests. Other than the obvious benefit that we test requests signing in unit tests, another goal of this PR is to make it possible to simulate and test various error paths locally, e.g. #13745 and #13022 Closes #14525 * github.com:scylladb/scylladb: test/s3: Remove AWS_S3_EXTRA usage test/s3: Run tests over non-anonymous bucket test/minio: Create random temp user on start code: Rename S3_PUBLIC_BUCKET_FOR_TEST	2023-09-11 23:12:56 +03:00
Tomasz Grabiec	f77e90a0f0	tests: test_tablets: Reconnect the driver after server restart This is a workaround for the flakiness of the test where INSERT statements following the rolling restart fail with "No host available" exception. The hypothesis is that those INSERTS race with driver reconnecting to the cluster and if INSERTs are attempted before reconnection is finished, the driver will refuse to execute the statements. The real fix should be in the driver to join with reconnections but before that is ready we want to fix CI flakiness. Refs #14746 Closes #15355	2023-09-11 21:58:46 +03:00
Avi Kivity	628e6ffd33	Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large: 1. Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque 2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s 3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111. This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows. This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do. My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition): * Node1: 1 live row, 1M dead rows * Node2: 1M dead rows, 1 live row This was designed to trigger reconciliation right from the very start of the query. Before: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` After: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before). Refs https://github.com/scylladb/scylladb/issues/7929 Refs https://github.com/scylladb/scylladb/issues/3672 Refs https://github.com/scylladb/scylladb/issues/7933 Fixes https://github.com/scylladb/scylladb/issues/9111 Closes #14923 * github.com:scylladb/scylladb: test/topology_custom: add test_read_repair.py replica/mutation_dump: detect end-of-page in range-scans tools/scylla-sstable: write: abort parser thread if writing fails test/pylib: add REST methods to get node exe and workdir paths test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction} service/storage_proxy: add trace points for the actual read executor type service/storage_proxy: add trace points for read-repair storage_proxy: Add more trace-level logging to read-repair database: Fix accounting of small partitions in mutation query database, storage_proxy: Reconcile pages with no live rows incrementally	2023-09-11 19:20:19 +03:00
Nadav Har'El	45ec76cfbf	Merge 'Enlighten native-transport shutdown' from Pavel Emelyanov When `nodetool disablebinary` command executes its handler aborts listening sockets, shuts down all client connections _and_ (!) then waits for the connections to stop existing. Effectively the command tries to make sure that no activity initiated by a CQL query continues, even though client would never see its result (client sockets are closed) This makes the disablebinary command hang for long sometimes, which is not really nice. The proposal is to wait for the connections to terminate in the background. So once disablebinary command exists what's guaranteed is that all client connections are aborted and new connections are not admitted, but some activity started by them may still be running (e.g. up until `nodetool drain` is issued). Driver-side sockets won't get the queries' results anyway. The behavior of `disablebinary` is not documented wrt whether it should wait for CQL processing to stop or not, so technically we're not breaking anything. However, it can happen that it's a disruptive change and some setups may behave differently after it. refs: #14031 refs: #14711 Closes #14743 * github.com:scylladb/scylladb: test/cql-pytest: Add enable\|disable-binary test case test.py: Add suite option to auto-dirty cluster after test test/pylib: Add nodetool enable\|disable-binary commands transport: Shutdown server on disablebinary generic_server: Introduce shutdown() generic_server: Decouple server stopped from connection stopped transport/controller: Coroutinize do_stop_server() transport/controller: Coroutinize stop_server()	2023-09-11 17:54:52 +03:00
Pavel Emelyanov	821a9c1fd4	test/cql-pytest: Add enable\|disable-binary test case The test checks that `nodetool disablebinary` makes subsequent queries fail and `nodetool enablebinary` lets client to establish new connections. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-11 17:38:49 +03:00
Pavel Emelyanov	2c3b30b395	test/pylib: Add nodetool enable\|disable-binary commands Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-11 17:37:48 +03:00
Benny Halevy	7119c1d8cc	token_metadata: update_topology: make endpoint_dc_rack arg optional It's better to pass a disengaged optional when the caller doesn't have the information rather than passing the default dc_rack location so the latter will never implicitly override a known endpoint dc/rack location. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15300	2023-09-11 16:16:19 +02:00
Botond Dénes	f770ff7a2b	test/topology_custom: add test_read_repair.py	2023-09-11 07:07:12 -04:00
Botond Dénes	b55cead5cd	replica/mutation_dump: detect end-of-page in range-scans The current read-loop fails to detect end-of-page and if the query result buider cuts the page, it will just proceed to the next partition. This will result in distorted query results, as the result builder will request for the consumption to stop after each clustering row. To fix, check if the page was cut before moving on to the next partition. A unit test reproducing the bug was also added.	2023-09-11 07:02:14 -04:00
Botond Dénes	46e37436d0	test/pylib: add REST methods to get node exe and workdir paths	2023-09-11 07:02:14 -04:00
Botond Dénes	dc269cb6bd	test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction} To support the equivalent (roughly) of the following nodetool commands: * nodetool refresh * nodetool flush * nodetool compact	2023-09-11 07:01:20 -04:00
Botond Dénes	b062b245ad	Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three. Closes #15280 * github.com:scylladb/scylladb: system_keyspace: Don't require snitch argument on start system_keyspace: Don't cache local dc:rack pair system_keyspace: Save local info with explicit location storage_service: Get endpoint location from snitch, not system keyspace snitch: Introduce and use get_location() method repair: Local location variables instead of system keyspace's one repair: Use full endpoint location instead of datacenter part	2023-09-11 10:26:26 +03:00
Nadav Har'El	ea56c8efcd	test/alternator: reduce code duplication in test for list_append() A reviewer noted that test_update_expression_list_append_non_list_arguments has too much code duplication - the same long API call to run "SET a = list_append(...)" was repeated many times. So in this patch we add a short inner function "try_list_append" to avoid this duplication. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes: #15298	2023-09-11 10:09:35 +03:00
Botond Dénes	7385f93816	Merge 'Task manager repair tasks progress' from Aleksandra Martyniuk Find progress of repair tasks based on the number of ranges that have been repaired. Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156). Closes #14698 * github.com:scylladb/scylladb: test: repair tasks test repair: add methods making repair progress more precise tasks: make progress related methods virtual repair: add get_progress method to shard_repair_task_impl repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size() repair: log a name of a particular table repair is working on tasks: delete move and copy constructors from task_manager::task::impl	2023-09-11 09:32:23 +03:00
Kamil Braun	26d9a82636	Merge 'raft topology: replace publish_cdc_generation with a bg fiber' from Patryk Jędrzejczak Currently, the topology coordinator has the `topology::transition_state::publish_cdc_generation` state responsible for publishing the already created CDC generations to the user-facing description tables. This process cannot fail as it would cause some CDC updates to be missed. On the other hand, we would like to abort the `publish_cdc_generation` state when bootstrap aborts. Of course, we could also wait until handling this state finishes, even in the case of the bootstrap abort, but that would be inefficient. We don't want to unnecessarily block topology operations by publishing CDC generations. The solution proposed by this PR is to remove the `publish_cdc_generation` state completely and introduce a new background fiber of the topology coordinator -- `cdc_generation_publisher` -- that continually publishes committed CDC generations. Apart from introducing the CDC generation publisher, we add `test_cdc_generation_publishing.py` that verifies its correctness and we adapt other CDC tests to the new changes. Fixes #15194 Closes #15281 * github.com:scylladb/scylladb: test: test_cdc: introduce wait_for_first_cdc_generation test: move cdc_streams_check_and_repair check test: add test_cdc_generation_publishing docs: remove information about publish_cdc_generation raft topology: introduce the CDC generation publisher system_keyspace: load unpublished_cdc_generations to topology raft topology: mark committed CDC generations as unpublished raft topology: add unpublished_cdc_generations to system.topology	2023-09-08 15:08:41 +02:00
Kamil Braun	8bff5843b5	Merge 'test: topology: add tests for gossiper/endpoint/live and gossiper/endpoint/down' from Aleksandra Martyniuk Add tests for gossiper/endpoint/live and gossiper/endpoint/down which run only in release mode. Enable test_remove_node_with_concurrent_ddl and fix types and variables names used by it, so that they can be reused in gossiper test. Fixes: #15223. Closes #15244 * github.com:scylladb/scylladb: test: topology: add gossiper test test: fix types and variable names in wait_for_host_down	2023-09-08 12:43:11 +02:00
Patryk Jędrzejczak	23a4557662	test: test_cdc: introduce wait_for_first_cdc_generation After introducing the CDC generation publisher, test_cdc_log_entries_use_cdc_streams could (at least in theory) fail by accessing system_distributed.cdc_streams_descriptions_v2 before the first CDC generation has been published. To avoid flakiness, we simply wait until the first CDC generation is published in a new function -- wait_for_first_cdc_generation.	2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak	3a2c080cbe	test: move cdc_streams_check_and_repair check The part of test_topology_ops that tests the cdc_streams_check_and_repair request could (at least in theory) fail on `assert(len(gen_timestamps) + 1 == len(new_gen_timestamps))` after introducing the CDC generation publisher because we can no longer assume that all previously committed CDC generations have been published before sending the request. To prevent flakiness, we move this part of the test to test_cdc_generations_are_published. This test allows for ensuring that all previous CDC generations have been published. Additionally, checking cdc_streams_check_and_repair there is simpler and arguably fits the test better.	2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak	4ee68a47bb	test: add test_cdc_generation_publishing We add two test cases that test the new CDC generation publisher to detect potential bugs like incorrect order of publications or not publishing some generations at all. The purpose of the second test case -- test_multiple_unpublished_cdc_generations -- is to enforce and test a scenario when there are multiple unpublished CDC generations at the same time. We expect that this is a rare case. The main fiber of the topology coordinator would have to make much more progress (like finishing two bootstraps) than the CDC generation publisher fiber. Since multiple unpublished CDC generations might never appear in other tests but could be handled incorrectly, having such a test is valuable.	2023-09-08 09:05:01 +02:00
Nadav Har'El	42e26ab13b	Merge 'Explicitly use do_with_cql_env_thread in query test' from Pavel Emelyanov Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that Closes #15305 * github.com:scylladb/scylladb: cql_query_test: Fix indentation after previous patch cql_query_test: Use do_with_cql_env_thread() explicitly	2023-09-07 11:54:54 +03:00
Pavel Emelyanov	4dc4f65b18	test/s3: Remove AWS_S3_EXTRA usage Now when the keys and region can be configured with "standard" environment variables, the old custom one can be removed. No automation uses that it was purely a support for manual testing of a client against AWS's S3 server Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-07 11:16:13 +03:00
Pavel Emelyanov	1d00cc5baa	test/s3: Run tests over non-anonymous bucket Currently minio applies anonymous public policy for the test bucket and all tests just use unsigned S3 requests. This patch generates a policy for the temporary minio user and removes the anon public one. All tests are updated respectively to use the provided key:secret pair. The use-https bit is off by default as minio still starts with plain http. That's OK for now, all tests are local and have no secret data anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-07 11:16:13 +03:00
Pavel Emelyanov	bff8064abd	test/minio: Create random temp user on start The user is going to have rights to access the test bucket. For now just create one and export the tests via environment Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-07 10:40:12 +03:00
Pavel Emelyanov	e8e8539c7c	code: Rename S3_PUBLIC_BUCKET_FOR_TEST The bucket is going to stop being public, rename the env variable in advance to make the essential patch smaller Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-07 10:25:53 +03:00
Pavel Emelyanov	627c1932e4	s3/client: Move client::stats to namespace scope The stats is stats about object, not about client, so it's better if it lives in namespace scope. Also it will avoid conflicts with client stats that will be reported as metrics (later patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-07 09:25:00 +03:00
Nadav Har'El	c52e0fd333	test/alternator: avoid warnings about unverified HTTPS The Alternator tests can run against HTTPS - namely when using test/alternator/run with the "--https" option (local Alternator configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS). In some cases we make these HTTPS requests with verify=False, to avoid checking the SSL certificates. E.g., this is necessary for Alternator with a self-signed certificate. Unfortunately, the urllib3 library adds an ugly warning message when SSL certificate verification is disabled. In the past we tried to disable these warnings, using the documented urllib3.disable_warnings() function, but it didn't help. It turns out that pytest has its own warning handling, so to disable warnings in pytest we must say so in a special configuration parameter in pytest.ini. So in this patch, we drop the disable_warnings call from conftest.py (where it didn't help), and instead put a similar declaration in pytest.ini. The disable_warnings call in the test/alternator/run script needs to remain - it is run outside pytest, so pytest.ini doesn't affect it. After this patch, running test/alternator/run with --https or --aws finishes without warnings, as desired. Fixes #15287 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15292	2023-09-07 07:23:57 +03:00
Pavel Emelyanov	9da4668c71	cql_query_test: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-06 16:54:25 +03:00
Pavel Emelyanov	84e30ab56c	cql_query_test: Use do_with_cql_env_thread() explicitly Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that Indentation is deliberately left broken until next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-06 16:54:14 +03:00
Nadav Har'El	5930637ad8	Merge 'task_manager: module: make_task: enter gate when the task is created' from Benny Halevy Passing the gate_closed_exception to the task promise ends up with abandoned exception since no-one is waiting for it. Instead, enter the gate when the task is made so it will fail make_task if the gate is already closed. Fixes scylladb/scylladb#15211 In addition, this series adds a private abort_source for each task_manager module (chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop(). gate holding in compaction_manager is hardened and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases. Closes #15213 * github.com:scylladb/scylladb: compaction_manager: stop: close compaction_state:s gates compaction_manager: gracefully handle gate close task_manager: task: start: fixup indentation task_manager: module: make_task: enter gate when the task is created task_manaer: module: stop: request abort task_manager: task::impl: subscribe to module about_source test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers test: simple_backlog_controller_test: stop compaction and task managers	2023-09-06 13:29:26 +03:00
Nadav Har'El	cfc70810d3	test/alternator: more error-path tests for list_append() function Improved the coverage of the tests for the list_append() function in UpdateExpression - test that if one of its arguments is not a list, including a missing attribute or item, it is reported as an error as expected. The new tests pass on both Alternator and DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15291	2023-09-06 11:59:54 +03:00

1 2 3 4 5 ...

5555 Commits