scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	8b9e62e107	Merge '[Backport 6.0] cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 (cherry picked from commit `e9ace7c203`) (cherry picked from commit `8eb5ca8202`) Refs scylladb/scylladb#19350 Closes scylladb/scylladb#19499 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-07-02 21:03:24 +02:00
Gleb Natapov	724ec62e22	test: add test that checks that local address cannot expire between join request placemen and its processing (cherry picked from commit `3f136cf2eb`)	2024-07-01 10:44:31 +00:00
Piotr Smaron	6a1e0489c6	cql: forbid switching from tablets to vnodes in ALTER KS This check is already in place, but isn't fully working, i.e. switching from a vnode KS to a tablets KS is not allowed, but this check doesn't work in the other direction. To fix the latter, `ks_prop_defs::get_initial_tablets()` has been changed to handle 3 states: (1) init_tablets is set, (2) it was skipped, (3) tablets are disabled. These couldn't fit into std::optional, so a new local struct to hold these states has been introduced. Callers of this function have been adjusted to set init_tablets to an appropriate value according to the circumstances, i.e. if tablets are globally enabled, but have been skipped in the CQL, init_tablets is automatically set to 0, but if someone executes ALTER KS and doesn't provide tablets options, they're inherited from the old KS. I tried various approaches and this one resulted in the least lines of code changed. I also provided testcases to explain how the code behaves. Fixes: #18795 (cherry picked from commit `758139c8b2`) Closes scylladb/scylladb#19540	2024-06-28 17:58:35 +03:00
Botond Dénes	fa644c6269	Merge '[Backport 6.0] tasks: fix tasks abort' from Aleksandra Martyniuk Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: https://github.com/scylladb/scylladb/issues/19304. Requires backport to all versions containing task manager (cherry picked from commit `3463f495b1`) (cherry picked from commit `50cb797d95`) Refs https://github.com/scylladb/scylladb/pull/19305 Closes scylladb/scylladb#19437 * github.com:scylladb/scylladb: test: add test for abort while a task is being unregistered tasks: fix tasks abort	2024-06-27 14:45:34 +03:00
Botond Dénes	cb4b4fe678	Merge '[Backport 6.0] test_tablets: add test_tablet_storage_freeing' from ScyllaDB Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly. Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed. Should be backported to branch-6.0, because the tested problem was also affecting that branch. Fixes #16946 (cherry picked from commit `7741491b47`) (cherry picked from commit `823da140dd`) Refs #18906 Closes scylladb/scylladb#19295 * github.com:scylladb/scylladb: test_tablets: add test_tablet_storage_freeing test: pylib: add get_sstables_disk_usage()	2024-06-27 14:40:06 +03:00
Michał Jadwiszczak	29c6a4cf44	test/boost/cql_query_test: add test for single-partition aggregation (cherry picked from commit `8eb5ca8202`)	2024-06-25 23:56:49 +02:00
Kefu Chai	1b2f10a4e7	sstables: use "me" sstable format by default in `7952200c`, we changed the `selected_format` from `mc` to `me`, but to be backward compatible the cluster starts with "md", so when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature, the format selector believes that the node is already using "ME", which is specified by `_selected_format`. even it is actually still using "md", which is specified by `sstable_manager::_format`, as changed by `54d49c04`. as explained above, it was specified to "md" in hope to be backward compatible when upgrading from an existign installation which might be still using "md". but after a second thought, since we are able to read sstables persisted with older formats, this concern is not valid. in other words, `7952200c` introduced a regression which changed the "default" sstable format from `me` to `md`. to address this, we just change `sstable_manager::_format` to "me", so that all sstables are created using "me" format. a test is added accordingly. Fixes #18995 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `5a0d30f345`) Closes scylladb/scylladb#19422	2024-06-23 19:26:53 +03:00
Aleksandra Martyniuk	169dfaf037	test: add test for abort while a task is being unregistered (cherry picked from commit `50cb797d95`)	2024-06-22 15:47:03 +02:00
Botond Dénes	cfac9d8bef	Merge '[Backport 6.0] Reduce TWCS off-strategy space overhead' from ScyllaDB Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes #16514. (cherry picked from commit `b8bd4c51c2`) (cherry picked from commit `51c7ee889e`) (cherry picked from commit `0ce8ee03f1`) (cherry picked from commit `ace4e5111e`) Refs #18137 Closes scylladb/scylladb#19404 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage replica: don't expose compaction_group to reshape task	2024-06-21 20:00:10 +03:00
Nadav Har'El	0715038dbe	test: unflake test test_alternator_ttl_scheduling_group This test in topology_experimental_raft/test_alternator.py wants to check that during Alternator TTL's expiration scans, ALL of the CPU was used in the "streaming" scheduling group and not in the "statement" scheduling group. But to allow for some fluke requests (e.g., from the driver), the test actually allows work in the statement group to be up to 1% of the work. Unfortunately, in one test run - a very slow debug+aarch64 run - we saw the work on the statement group reach 1.4%, failing the test. I don't know exactly where this work comes from, perhaps the driver, but before this bug was fixed we saw more than 58% of the work in the wrong scheduling group, so neither 1% or 1.4% is a sign that the bug came back. In fact, let's just change the threshold in the test to 10%, which is also much lower than the pre-fix value of 58%, so is still a valid regression test. Fixes #19307 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `9fc70a28ca`) Closes scylladb/scylladb#19333	2024-06-21 19:55:09 +03:00
Raphael S. Carvalho	3d9aa9d49e	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `ace4e5111e`)	2024-06-20 20:41:41 +00:00
Raphael S. Carvalho	ef72075920	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `0ce8ee03f1`)	2024-06-20 20:41:41 +00:00
Raphael S. Carvalho	37f1af2646	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `51c7ee889e`)	2024-06-20 20:41:41 +00:00
Calle Wilund	fd59176a73	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. (cherry picked from commit `5056a98289`) Closes scylladb/scylladb#19330	2024-06-20 18:08:51 +03:00
Botond Dénes	869f2637b8	Merge '[Backport 6.0] Fix usage of utils/chunked_vector::reserve_partial' from ScyllaDB utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This PR updates the method comment and all the callers to use the right way. Fixes #19254 (cherry picked from commit `64768b58e5`) (cherry picked from commit `29f036a777`) (cherry picked from commit `0a22759c2a`) (cherry picked from commit `d4f8b91bd6`) (cherry picked from commit `310c5da4bb`) (cherry picked from commit `83190fa075`) (cherry picked from commit `c49f6391ab`) Refs #19279 Closes scylladb/scylladb#19310 * github.com:scylladb/scylladb: utils/large_bitset: remove unused includes identified by clangd utils/large_bitset: use thread::maybe_yield() test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial utils/lsa/chunked_managed_vector: fix reserve_partial() utils/chunked_vector: return void from reserve_partial and make_room test/boost/chunked_vector_test: fix testcase tests_reserve_partial utils/chunked_vector::reserve_partial: fix usage in callers	2024-06-17 09:31:28 +03:00
Lakshmi Narayanan Sreethar	e64e659ef1	test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial Update the maximum size tested by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (12.8 KB). So, use twice the max_chunk_capacity as the test size distribution upper limit to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `310c5da4bb`)	2024-06-14 15:48:57 +00:00
Lakshmi Narayanan Sreethar	397b04b2a4	utils/lsa/chunked_managed_vector: fix reserve_partial() Fix the method comment and return types of chunked_managed_vector's reserve_partial() similar to chunked_vector's reserve_partial() as it has the same issues mentioned in #19254. Also update the usage in the chunked_managed_vector_test. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `d4f8b91bd6`)	2024-06-14 15:48:56 +00:00
Lakshmi Narayanan Sreethar	4e68599b17	test/boost/chunked_vector_test: fix testcase tests_reserve_partial Fix the usage of reserve_partial in the testcase. Also update the maximum chunk size used by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (128 KB). So, use smaller chunk size, 512 bytes, to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `29f036a777`)	2024-06-14 15:48:56 +00:00
Kefu Chai	57980b77d3	test: test_topology_ops: adapt to tablets in `e7d4e080`, we reenabled the background writes in this test, but when running with tablets enabled, background writes are still disabled because of #17025, which was fixed last week. so we can enable background writes with tablets. in this change, * background writes are enabled with tablets. * increase the number of nodes by 1 so that we have enough nodes to fulfill the needs of tablets, which enforces that the number of replicas should always satisfy RF. * pass rf to `start_writes()` explicitly, so we have less magic numbers in the test, and make the data dependencies more obvious. Fixes #17589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `77f0259a63`) Closes scylladb/scylladb#19184	2024-06-14 15:54:36 +03:00
Michał Chojnowski	ddcaefefdc	test_tablets: add test_tablet_storage_freeing Tests that tablet storage is freed after it is migrated away. Fixes #16946 (cherry picked from commit `823da140dd`)	2024-06-14 10:19:32 +00:00
Michał Chojnowski	f466dcfa5f	test: pylib: add get_sstables_disk_usage() Adds an util for measuring the disk usage of the given table on the given node. Will be used in a follow-up patch for testing that sstables are freed properly. (cherry picked from commit `7741491b47`)	2024-06-14 10:19:32 +00:00
Botond Dénes	b18d9e5d0d	Merge '[Backport 6.0] make enable_compacting_data_for_streaming_and_repair truly live-update' from ScyllaDB This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken. This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series. Fixes: https://github.com/scylladb/scylladb/issues/18674 - [x] This patch has to be backported because it fixes broken functionality (cherry picked from commit `dbccb61636`) (cherry picked from commit `4590026b38`) (cherry picked from commit `feea609e37`) (cherry picked from commit `0c61b1822c`) (cherry picked from commit `8ef4fbdb87`) Refs #18705 Closes scylladb/scylladb#19240 * github.com:scylladb/scylladb: test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update test/pylib: rest_client: add get_injection() api/error_injection: add getter for error_injection utils/error_injection: add set_parameter() replica/database: fix live-update enable_compacting_data_for_streaming_and_repair	2024-06-13 12:45:23 +03:00
Kefu Chai	b39c0a1d15	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `223fba3243`)	2024-06-12 15:44:11 +00:00
Kefu Chai	548fd01bd4	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `2df4e9cfc2`)	2024-06-12 15:44:11 +00:00
Pavel Emelyanov	2306c3b522	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `a4e8f9340a`) Closes scylladb/scylladb#19233	2024-06-12 10:02:45 +03:00
Tomasz Grabiec	6d90ff84d9	Merge '[Backport 6.0] tablets: Filter-out left nodes in get_natural_endpoints()' from ScyllaDB The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 (cherry picked from commit `3e1ba4c859`) (cherry picked from commit `0d596a425c`) Refs #18955 Closes scylladb/scylladb#19143 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-12 01:31:38 +02:00
Botond Dénes	0d13c51dd4	test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update Avoid this the live-update feature of this config item breaking silently. (cherry picked from commit `8ef4fbdb87`)	2024-06-11 17:32:37 +00:00
Botond Dénes	d4563e2b28	test/pylib: rest_client: add get_injection() The /v2/error_injection/{injection} endpoint now has a GET method too, expose this. (cherry picked from commit `0c61b1822c`)	2024-06-11 17:32:37 +00:00
Raphael S. Carvalho	d4c3a43b34	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `7b41630299`) Closes scylladb/scylladb#19229	2024-06-11 18:12:43 +03:00
Tomasz Grabiec	7479167af2	tablets: Filter-out left nodes in get_natural_endpoints() The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 (cherry picked from commit `0d596a425c`)	2024-06-11 12:18:17 +02:00
Tomasz Grabiec	e35ab96f8b	test: pylib: Extract start_writes() load generator utility (cherry picked from commit `3e1ba4c859`)	2024-06-11 12:18:17 +02:00
Nadav Har'El	4810937ddf	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `13cf6c543d`) Closes scylladb/scylladb#19193	2024-06-10 20:20:54 +03:00
Tomasz Grabiec	a3e4dc7b6c	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 (cherry picked from commit `c8f71f4825`) Closes scylladb/scylladb#19199	2024-06-10 20:12:56 +03:00
Botond Dénes	7a6ff12ace	Merge '[Backport 6.0] alternator: keep TTL work in the maintenance scheduling group' from ScyllaDB Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. (cherry picked from commit `5d3f7c13f9`) (cherry picked from commit `1fe8f22d89`) Refs #18729 Closes scylladb/scylladb#19196 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-10 19:58:38 +03:00
Gleb Natapov	45ff4d2c41	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes #18863 (cherry picked from commit `a74fbab99a`) Closes scylladb/scylladb#19175	2024-06-10 10:34:29 +02:00
Nadav Har'El	0662e80917	alternator, scheduler: test reproducing RPC scheduling group bug This patch adds a test for issue #18719: Although the Alternator TTL work is supposedly done in the "streaming" scheduling group, it turned out we had a bug where work sent on behalf of that code to other nodes failed to inherit the correct scheduling group, and was done in the normal ("statement") group. Because this problem only happens when more than one node is involved, the test is in the multi-node test framework test/topology_experimental_raft. The test uses the Alternator API. We already had in that framework a test using the Alternator API (a test for alternator+tablets), so in this patch we move the common Alternator utility functions to a common file, test_alternator.py, where I also put the new test. The test is based on metrics: We write expiring data, wait for it to expire, and then check the metrics on how much CPU work was done in the wrong scheduling group ("statement"). Before #18719 was fixed, a lot of work was done there (more than half of the work done in the right group). After the issue was fixed in the previous patch, the work on the wrong scheduling group went down to zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `1fe8f22d89`)	2024-06-10 07:42:23 +00:00
Tomasz Grabiec	f8243cbf19	Merge '[Backport 6.0] Serialize repair with tablet migration' from ScyllaDB We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. (cherry picked from commit `6c64cf33df`) (cherry picked from commit `1513d6f0b0`) (cherry picked from commit `476c076a21`) (cherry picked from commit `c45ce41330`) (cherry picked from commit `e97acf4e30`) (cherry picked from commit `98323be296`) (cherry picked from commit `5ca54a6e88`) Refs #18641 Closes scylladb/scylladb#19144 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-09 00:20:44 +02:00
Tomasz Grabiec	27f01bf4e3	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor. (cherry picked from commit `5ca54a6e88`)	2024-06-08 16:31:18 +02:00
Tomasz Grabiec	ded9aca6ee	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561. (cherry picked from commit `98323be296`)	2024-06-08 16:31:18 +02:00
Tomasz Grabiec	e518bb68b2	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service (cherry picked from commit `c45ce41330`)	2024-06-06 13:01:19 +00:00
Kamil Braun	5d3dde50f4	Merge '[Backport 6.0] Fail bootstrap if ip mapping is missing during double write stage' from ScyllaDB If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. (cherry picked from commit `1faef47952`) (cherry picked from commit `27445f5291`) (cherry picked from commit `6853b02c00`) (cherry picked from commit `f91db0c1e4`) Refs #18927 Closes scylladb/scylladb#19118 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-06 11:35:13 +02:00
Tomasz Grabiec	b7fe4412d0	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 (cherry picked from commit `2c3f7c996f`) Closes scylladb/scylladb#19130	2024-06-06 08:22:45 +03:00
Botond Dénes	8d12eeee62	Merge '[Backport 6.0] tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: https://github.com/scylladb/scylladb/issues/16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. (cherry picked from commit `6add9edf8a`) (cherry picked from commit `319e799089`) (cherry picked from commit `e6c50ad2d0`) (cherry picked from commit `a82a2f0624`) (cherry picked from commit `c1b2b8cb2c`) (cherry picked from commit `30f97ea133`) (cherry picked from commit `fc0796f684`) (cherry picked from commit `d7e80a6520`) (cherry picked from commit `beef77a778`) Refs https://github.com/scylladb/scylladb/pull/18735 Closes scylladb/scylladb#19104 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-06 07:56:12 +03:00
Gleb Natapov	c53cd98a41	test: add test of bootstrap where the coordinator crashes just before storing IP mapping On the next boot there is no host ID to IP mapping which causes node to crash again with "No mapping for :: in the passed effective replication map" assertion. (cherry picked from commit `27445f5291`)	2024-06-05 13:55:28 +00:00
Patryk Jędrzejczak	65021c4b1c	[Backport 6.0] test: test_topology_ops: run correctly without tablets The values of `tablets_enabled` were nonempty strings, so they always evaluated to `True` in the if statement responsible for enabling writing workers only if tablets are disabled. Hence, the writing workers were always disabled. The original commit, `ea4717da65`, contains one more change, which is not needed (and conflicting) in 6.0 because scylladb/scylladb#18898 has been backported first. Closes scylladb/scylladb#19111	2024-06-05 15:15:00 +02:00
Botond Dénes	341c29bd74	Merge '[Backport 6.0] storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability. If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count. Fixes https://github.com/scylladb/scylladb/issues/18085. (cherry picked from commit `abcc68dbe7`) (cherry picked from commit `551bf9dd58`) (cherry picked from commit `e7246751b6`) Refs https://github.com/scylladb/scylladb/pull/18287 Closes scylladb/scylladb#19095 * github.com:scylladb/scylladb: topology_experimental_raft/test_tablets: restore usage of check_with_down test: Fix flakiness in topology_experimental_raft/test_tablets service: Use tablet read selector to determine which replica to account table stats storage_service: Fix race between tablet split and stats retrieval	2024-06-05 13:06:32 +03:00
Aleksandra Martyniuk	50e1369d1d	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected. (cherry picked from commit `d7e80a6520`)	2024-06-04 14:42:10 +00:00
Aleksandra Martyniuk	21e860453c	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails. (cherry picked from commit `fc0796f684`)	2024-06-04 14:42:10 +00:00
Aleksandra Martyniuk	607be221b8	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents. (cherry picked from commit `a82a2f0624`)	2024-06-04 14:42:09 +00:00
Raphael S. Carvalho	a373ed52a5	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-04 11:01:22 -03:00

1 2 3 4 5 ...

6933 Commits