scylladb

Author	SHA1	Message	Date
Botond Dénes	50c05abd14	test/cluster: add test_data_resurrection_in_memtable.py Reproducers for #23252 and #23291 -- cache garbage collecting tombstones resurrecting data in the memtable. (cherry picked from commit `34b18d7ef4`)	2025-04-10 06:52:18 -04:00
Botond Dénes	de1d8372fa	test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts Such that a given index in the return hosts refers to the same underlying Scylla instance, as the same index in the passed-in nodes list. This is what users of this method intuitively expect, but currently the returned hosts list is unordered (has random order). (cherry picked from commit `e5afd9b5fb`)	2025-04-10 03:17:27 -04:00
Botond Dénes	b43d024ffb	db/row_cache: add overlap-check for cache tombstone garbage collection The cache should not garbage-collect tombstone which cover data in the memtable. Add overlap checks (get_max_purgeable) to garbage collection to detect tombstones which cover data in the memtable and to prevent their garbage collection. (cherry picked from commit `6b5b563ef7`)	2025-04-10 03:17:27 -04:00
Nadav Har'El	c6825920a6	alternator: in GetRecords, enforce Limit to be <= 1000 Alternator Streams' "GetRecords" operation has a "Limit" parameter on how many records to return. The DynamoDB documentations says that the upper limit on this Limit parameter is 1000 - but Alternator didn't enforce this. In this patch we begin enforcing this highest Limit, and also add a test for verifying this enforcement. As usual, the new test passes on DynamoDB, and after this patch - also on Alternator. The reason why it's useful to have some upper limit on Limit is that the existing executor::get_records() implementation does not really have preemption points in all the necessary places. In particular, we have a loop on all returned records without preemption points. We also store the returned records in a RapidJson vector, which requires a contiguous allocation. Even before this patch, GetRecords had a hard limit of 1 MB of results. But still, in some cases 1 MB of results may be a lot of results, and we can see stalls in the aforementioned places being O(number of results). Fixes #23534 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23547 (cherry picked from commit `84fd52315f`) Closes scylladb/scylladb#23643	2025-04-09 12:46:30 +03:00
Benny Halevy	27ca0d1812	boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `9fac0045d1`)	2025-04-08 08:35:26 +03:00
Benny Halevy	736f89b31a	tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}`. Refs scylladb/scylla-enterprise#4355 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `62aeba759b`)	2025-04-08 08:35:14 +03:00
Benny Halevy	a49e27ac8f	db/config: add tablets_mode_for_new_keyspaces option The new option deprecates the existing `enable_tablets` option. It will be extended in the next patch with a 3rd value: "enforced" while will enable tablets by default for new keyspace but without the posibility to opt out using the `tablets = {'enabled': false}` keyspace schema option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c62865df90`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-08 08:08:47 +03:00
Botond Dénes	1a896169dc	Merge '[Backport 2025.1] repair: release erm in repair_writer_impl::create_writer when possible' from Scylladb[bot] Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed. Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked. Fixes: #23453. Needs backport to 2025.1 that introduces the tablet repair scheduler. - (cherry picked from commit `1dc29ddc86`) - (cherry picked from commit `bae6711809`) Parent PR: #23455 Closes scylladb/scylladb#23580 * github.com:scylladb/scylladb: \test: add test to check concurrent migration and repair of two different tablets repair: release erm in repair_writer_impl::create_writer when possible	2025-04-07 10:10:20 +03:00
Piotr Smaron	a17dd4d4c9	[Backport 2025.1] auth: forbid modifying system ks by non-superusers Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces. Fixes: scylladb/scylladb#23218 (cherry picked from commit `fee50f287c`) Parent PR: #23219 Closes scylladb/scylladb#23594	2025-04-06 15:10:06 +03:00
Nadav Har'El	a2a4c6e4b2	test/alternator: increase timeout in Alternator RBAC test On our testing infrastructure, tests often run a hundred times (!) slower than usual, for various reasons that we can't always avoid. This is why all our test frameworks drastically increase the default timeouts. We forgot to increase the timeout in one place - where Alternator tests use CQL. This is needed for the Alternator role-based access control (RBAC) tests, which is configured via CQL and therefore the Alternator test unusually uses CQL. So in this patch we increase the timeout of CQL driver used by Alternator tests to the same high timeouts (60-120 seconds) used by the regular CQL tests. As the famous saying goes, these timeouts should be enough for anyone. Fixes #23569. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23578 (cherry picked from commit `a9a6f9eecc`) Closes scylladb/scylladb#23601	2025-04-06 11:49:46 +03:00
Aleksandra Martyniuk	b5b2ffa5df	\test: add test to check concurrent migration and repair of two different tablets (cherry picked from commit `bae6711809`)	2025-04-04 10:14:51 +02:00
Dawid Mędrek	c56e47f72f	db/hints: Cancel draining when stopping node Draining hints may occur in one of the two scenarios: * a node leaves the cluster and the local node drains all of the hints saved for that node, * the local node is being decommissioned. Draining may take some time and the hint manager won't stop until it finishes. It's not a problem when decommissioning a node, especially because we want the cluster to retain the data stored in the hints. However, it may become a problem when the local node started draining hints saved for another node and now it's being shut down. There are two reasons for that: * Generally, in situations like that, we'd like to be able to shut down nodes as fast as possible. The data stored in the hints won't disappear from the cluster yet since we can restart the local node. * Draining hints may introduce flakiness in tests. Replaying hints doesn't have the highest priority and it's reflected in the scheduling groups we use as well as the explicitly enforced throughput. If there are a large number of hints to be replayed, it might affect our tests. It's already happened, see: scylladb/scylladb#21949. To solve those problems, we change the semantics of draining. It will behave as before when the local node is being decommissioned. However, when the local node is only being stopped, we will immediately cancel all ongoing draining processes and stop the hint manager. To amend for that, when we start a node and it initializes a hint endpoint manager corresponding to a node that's already left the cluster, we will begin the draining process of that endpoint manager right away. That should ensure all data is retained, while possibly speeding up the shutdown process. There's a small trade-off to it, though. If we stop a node, we can then remove it. It won't have a chance to replay hints it might've before these changes, but that's an edge case. We expect this commit to bring more benefit than harm. We also provide tests verifying that the implementation works as intended. Fixes scylladb/scylladb#21949 Closes scylladb/scylladb#22811 (cherry picked from commit `0a6137218a`) Closes scylladb/scylladb#23370	2025-04-03 09:09:05 +02:00
Tomasz Grabiec	51ee15f02d	Merge '[Backport 2025.1] tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Fixes #23042 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config Closes scylladb/scylladb#23443 * github.com:scylladb/scylladb: Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config	2025-04-01 20:31:05 +02:00
Tomasz Grabiec	975882a489	test: tablets: Fix flakiness due to ungraceful shutdown The test fails sporadically with: cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1} That's becase a server is stopped in the middle of the workload. The server is stopped ungracefully which will cause some requests to time out. We should stop it gracefully to allow in-flight requests to finish. Fixes #20492 Closes scylladb/scylladb#23451 (cherry picked from commit `8e506c5a8f`) Closes scylladb/scylladb#23469	2025-03-28 14:56:02 +01:00
Evgeniy Naydanov	3653662099	test.py: random_failures: deselect topology ops for some injections After recent changes #18640 and #19151 started to reproduce for stop_after_sending_join_node_request and stop_after_bootstrapping_initial_raft_configuration error injections too. The solution is the same: deselect the tests. Fixes #23302 Closes scylladb/scylladb#23405 (cherry picked from commit `574c81eac6`) Closes scylladb/scylladb#23460	2025-03-27 13:19:59 +02:00
Avi Kivity	cff90755d8	Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Refs #23042 Closes scylladb/scylladb#23079 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats (cherry picked from commit `b1d9f80d85`)	2025-03-25 23:16:35 +01:00
Tomasz Grabiec	3be469da29	test: tablets_test: Add support for auto-split mode rebalance_tablets() was performing migrations and merges automatically but not splits, because splits need to be acked by replicas via load_stats. It's inconvenient in tests which want to rebalance to the equilibrium point. This patch changes rebalance_tablets() to split automatically by default, can be disabled for tests which expect differently. shared_load_stats was introduced to provide a stable holder of load_stats which can be reused across rebalance_tablets() calls. (cherry picked from commit `5e471c6f1b`)	2025-03-25 18:23:22 +01:00
Tomasz Grabiec	1895724465	test: cql_test_env: Expose db config (cherry picked from commit `f3b63bfeff`)	2025-03-25 18:22:32 +01:00
Avi Kivity	bc98301783	Merge '[Backport 2025.1] repair: allow concurrent repair and migration of two different tablets' from Aleksandra Martyniuk Do not hold erm during repair of a tablet that is started with tablet repair scheduler. This way two different tablets can be repaired and migrated concurrently. The same tablet won't be migrated while being repaired as it is provided by topology coordinator. Use topology_guard to maintain safety. Fixes: https://github.com/scylladb/scylladb/issues/22408. Needs backport to 2025.1 that introduces the tablet repair scheduler. Closes scylladb/scylladb#23362 * github.com:scylladb/scylladb: test: add test to check concurrent tablets migration and repair repair: do not hold erm for repair scheduled by scheduler repair: get total rf based on current erm repair: make shard_repair_task_impl::erm private repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary repair: pass session_id to repair_writer_impl::create_writer repair: keep materialized topology guard in shard_repair_task_impl repair: pass session_id to repair_meta	2025-03-23 20:14:53 +02:00
Dawid Mędrek	ecdefe801c	main: Refuse to start node when RF-rack-invalid keyspace exists When a node is started with the option `rf_rack_valid_keyspaces` enabled, the initialization will fail if there is an RF-rack-invalid keyspace. We want to force the user to adjust their existing keyspaces when upgrading to 2025.* so that the invariant that every keyspace is RF-rack-valid is always satisfied. Fixes scylladb/scylladb#23300 (cherry picked from commit `0e04a6f3eb`)	2025-03-21 12:27:04 +00:00
Dawid Mędrek	af2215c2d2	cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces In this commit, we refuse to create or alter a keyspace when that operation would make it RF-rack-invalid if the option `rf_rack_valid_keyspaces` is enabled. We provide two tests verifying that the changes work as intended. Fixes scylladb/scylladb#23276 (cherry picked from commit `41f862d7ba`)	2025-03-21 12:27:04 +00:00
Aleksandra Martyniuk	5153b91514	test: add test to check concurrent tablets migration and repair Add a test to check whether a tablet can be migrated while another tablet is repaired. (cherry picked from commit `20f9d7b6eb`)	2025-03-19 10:15:19 +01:00
Botond Dénes	b8797551eb	Merge '[Backport 2025.1] Rack aware tablet merge colocation migration ' from Tomasz Grabiec service: Introduce rack-aware co-location migrations for tablet merge Merge co-location can emit migrations across racks even when RF=#racks, reducing availability and affecting consistency of base-view pairing. Given replica set of sibling tablets T0 and T1 below: [T0: (rack1,rack3,rack2)] [T1: (rack2,rack1,rack3)] Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at only a subset of racks, reducing availability. This is the main problem fixed by this patch. It also lays the ground for consistent base-view replica pairing, which is rack-based. For tables on which views can be created we plan to enforce the constraint that replicas don't move across racks and that all tablets use the same set of racks (RF=#racks). This patch avoids moving replicas across racks unless it's necessary, so if the constraint is satisfied before merge, there will be no co-locating migrations across racks. This constraint of RF=#racks is not enforced yet, it requires more extensive changes. Fixes #22994. Refs #17265. This patch is based on Raphael's work done in PR #23081. The main differences are: 1) Instead of sorting replicas by rack, we try to find replicas in sibling tablets which belong to the same rack. This is similar to how we match replicas within the same host. It reduces number of across-rack migrations even if RF!=#racks, which the original patch didn't handle. Unlike the original patch, it also avoids rack-overloaded in case RF!=#racks 2) We emit across-rack co-locating migrations if we have no other choice in order to finalize the merge This is ok, since views are not supported with tablets yet. Later, we will disallow this for tables which have views, and we will allow creating views in the first place only when no such migrations can happen (RF=#racks). 3) Added boost unit test which checks that rack overload is avoided during merge in case RF<#racks 4) Moved logging of across-rack migration to debug level 5) Exposed metric for across-rack co-locating migrations (cherry picked from commit `af949f3b6a`) Also backports dependent patches: - locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes - locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables - Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec Closes scylladb/scylladb#22657 Closes scylladb/scylladb#22652 Closes scylladb/scylladb#23297 * github.com:scylladb/scylladb: service: Introduce rack-aware co-location migrations for tablet merge Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes	2025-03-18 16:22:29 +02:00
Nadav Har'El	b1cf1890a9	alternator: document the state of tablet support in Alternator In commit `c24bc3b` we decided that creating a new table in Alternator will by default use vnodes - not tablets - because of all the missing features in our tablets implementation that are important for Alternator, namely - LWT, CDC and Alternator TTL. We never documented this, or the fact that we support a tag `experimental:initial_tablets` which allows to override this decision and create an Alternator table using tablets. We also never documented what exactly doesn't work when Alternator uses tablet. This patch adds the missing documentation in docs/alternator/new-apis.md (which is a good place for describing the `experimental:initial_tablets` tag). The patch also adds a new test file, test_tablets.py, which includes tests for all the statements made in the document regarding how `experimental:initial_tablets` works and what works or doesn't work when tablets are enabled. Two existing tests - for TTL and Streams non-support with tablets - are moved to the new test file. When the tablets feature will finally be completed, both the document and the tests will need to be modified (some of the tests should be outright deleted). But it seems this will not happen for at least several months, and that is too long to wait without accurate documentation. Fixes #21629 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22462 (cherry picked from commit `c0821842de`) Closes scylladb/scylladb#23298	2025-03-16 18:25:21 +02:00
Raphael S. Carvalho	33b5f27057	service: Introduce rack-aware co-location migrations for tablet merge Merge co-location can emit migrations across racks even when RF=#racks, reducing availability and affecting consistency of base-view pairing. Given replica set of sibling tablets T0 and T1 below: [T0: (rack1,rack3,rack2)] [T1: (rack2,rack1,rack3)] Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at only a subset of racks, reducing availability. This is the main problem fixed by this patch. It also lays the ground for consistent base-view replica pairing, which is rack-based. For tables on which views can be created we plan to enforce the constraint that replicas don't move across racks and that all tablets use the same set of racks (RF=#racks). This patch avoids moving replicas across racks unless it's necessary, so if the constraint is satisfied before merge, there will be no co-locating migrations across racks. This constraint of RF=#racks is not enforced yet, it requires more extensive changes. Fixes #22994. Refs #17265. This patch is based on Raphael's work done in PR #23081. The main differences are: 1) Instead of sorting replicas by rack, we try to find replicas in sibling tablets which belong to the same rack. This is similar to how we match replicas within the same host. It reduces number of across-rack migrations even if RF!=#racks, which the original patch didn't handle. Unlike the original patch, it also avoids rack-overloaded in case RF!=#racks 2) We emit across-rack co-locating migrations if we have no other choice in order to finalize the merge This is ok, since views are not supported with tablets yet. Later, we will disallow this for tables which have views, and we will allow creating views in the first place only when no such migrations can happen (RF=#racks). 3) Added boost unit test which checks that rack overload is avoided during merge in case RF<#racks 4) Moved logging of across-rack migration to debug level 5) Exposed metric for across-rack co-locating migrations (cherry picked from commit `af949f3b6a`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>	2025-03-14 20:02:33 +01:00
Botond Dénes	eb147ec564	Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec This PR converts boost load balancer tests in preparation for load balancer changes which add per-table tablet hints. After those changes, load balancer consults with the replication strategy in the database, so we need to create proper schema in the database. To do that, we need proper topology for replication strategies which use RF > 1, otherwise keyspace creation will fail. Topology is created in tests via group0 commands, which is abstracted by the new `topology_builder` class. Tests cannot modify token_metadata only in memory now as it needs to be consistent with the schema and on-disk metadata. That's why modifications to tablet metadata are now made under group0 guard and save back metadata to disk. Closes scylladb/scylladb#22648 * github.com:scylladb/scylladb: test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario tests: tablets: Set initial tablets to 1 to exit growing mode test: tablets_test: Create proper schema in load balancer tests test: lib: Introduce topology_builder test: cql_test_env: Expose topology_state_machine topology_state_machine: Introduce lock transition (cherry picked from commit `51a273401c`)	2025-03-13 14:08:30 +01:00
Tomasz Grabiec	637e5fc9b5	locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables For example, nodes which are being decommissioned should not be consider as available capacity for new tables. We don't allocate tablets on such nodes. Would result in higher per-shard load then planned. Closes scylladb/scylladb#22657 (cherry picked from commit `3bb19e9ac9`)	2025-03-13 14:08:27 +01:00
Tomasz Grabiec	0d77754c63	locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes In that case, new_racks will be used, but when we discover no candidates, we try to pop from existing_racks. Fixes #22625 Closes scylladb/scylladb#22652 (cherry picked from commit `e22e3b21b1`)	2025-03-13 14:00:48 +01:00
Aleksandra Martyniuk	1957dac2b4	test: add new cases to tablet_repair tests Add tests for tablet repair with host and dc filters that select one or no replica. (cherry picked from commit `c7c6d820d7`)	2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk	1091ef89e1	test: extract repiar check to function (cherry picked from commit `c40eaa0577`)	2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk	7fdc7bdc4b	test: add test to check dcs and hosts repair filter (cherry picked from commit `e499f7c971`)	2025-02-27 12:14:47 +01:00
Aleksandra Martyniuk	c2e926850d	test: add repair dc selection to test_tablet_metadata_persistence (cherry picked from commit `1c8a41e2dd`)	2025-02-27 12:14:47 +01:00
Evgeniy Naydanov	871fabd60a	test.py: test_random_failures: improve handling of hung node In some cases the paused/unpaused node can hang not after 30s timeout. This make the test flaky. Change the condition to always check the coordinator's log if there is a hung node. Add `stop_after_streaming` to the list of error injections which can cause a node's hang. Also add a wait for a new coordinator election in cluster events which cause such elections. Closes scylladb/scylladb#22825 (cherry picked from commit `99be9ac8d8`) Closes scylladb/scylladb#23007	2025-02-25 14:31:51 +03:00
Pavel Emelyanov	aa5cb15166	Merge 'Alternator: implement UpdateTable operation to add or delete GSI' from Nadav Har'El In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is not the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes #11567. We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes #11471. This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. It should be backported to 2025.1, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1. Closes scylladb/scylladb#21989 * github.com:scylladb/scylladb: alternator: fix view build on oversized GSI key attribute mv: clean up do_delete_old_entry test/alternator: unflake test for IndexStatus test/alternator: work around unrelated bug causing test flakiness docs/alternator: adding a GSI is no longer an unimplemented feature test/alternator: remove xfail from all tests for issue 11567 alternator: overhaul implementation of GSIs and support UpdateTable mv: support regular_column_transformation key columns in view alternator: add new materialized-view computed column for item in map build: in cmake build, schema needs alternator build: build tests with Alternator alternator: add function serialized_value_if_type() mv: introduce regular_column_transformation, a new type of computed column alternator: add IndexStatus/Backfilling in DescribeTable alternator: add "LimitExceededException" error type docs/alternator: document two more unimplemented Alternator features (cherry picked from commit `529ff3efa5`) Closes scylladb/scylladb#22826	2025-02-18 19:05:21 +02:00
Nadav Har'El	35b410326b	test/topology_custom: fix very slow test test_localnodes_broadcast_rpc_address The test topology_custom/test_alternator::test_localnodes_broadcast_rpc_address sets up nodes with a silly "broadcast rpc address" and checks that Alternator's "/localnodes" requests returns it correctly. The problem is that although we don't use CQL in this test, the test framework does open a CQL connection when the test starts, and closes it when it ends. It turns out that when we set a silly "broadcast RPC address", the driver tends to try to connect to it when shutting down, I'm not even sure why. But the choice of the silly address was 1.2.3.4 is unfortunate, because this IP address is actually routable - and the driver hangs until it times out (in practice, in a bit over two minutes). This trivial patch changes 1.2.3.4 to 127.0.0.0 - and equally silly address but one to which connections fail immediately. Before this patch, the test often takes more than 2 minutes to finish on my laptop, after this patch, it always finishes in 4-5 seconds. Fixes #22744 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22746 (cherry picked from commit `f89235517d`) Closes scylladb/scylladb#22875	2025-02-18 10:33:21 +02:00
Asias He	b50a6657e8	repair: Add await_completion option for tablet_repair api Set true to wait for the repair to complete. Set false to skip waiting for the repair to complete. When the option is not provided, it defaults to false. It is useful for management tool that wants the api to be async. Fixes #22418 Closes scylladb/scylladb#22436 (cherry picked from commit `fb318d0c81`) Closes scylladb/scylladb#22851	2025-02-18 10:31:53 +02:00
Botond Dénes	93479ffcf9	Merge '[Backport 2025.1] raft/group0_state_machine: load current RPC compression dict on startup' from Michał Chojnowski We are supposed to be loading the most recent RPC compression dictionary on startup, but we forgot to port the relevant piece of logic during the source-available port. This causes a restarted node not to use the dictionary for RPC compression until the next dictionary update. Fix that. Fixes #22738 This is more of a bugfix than an improvement, so it should be backported to 2025.1. * (cherry picked from commit [`dd82b40`](`dd82b40186`)) * (cherry picked from commit [`8fb2ea6`](`8fb2ea61ba`)) Additionally cherry picked https://github.com/scylladb/scylladb/pull/22836 to fix the timeout. Parent PR: #22739 Closes scylladb/scylladb#22837 * github.com:scylladb/scylladb: test_rpc_compression.py: fix an overly-short timeout test_rpc_compression.py: test the dictionaries are loaded on startup raft/group0_state_machine: load current RPC compression dict on startup	2025-02-18 10:31:23 +02:00
Botond Dénes	38bd74b2d4	tools/scylla-nodetool: netstats: don't assume both senders and receivers The code currently assumes that a session has both sender and receiver streams, but it is possible to have just one or the other. Change the test to include this scenario and remove this assumption from the code. Fixes: #22770 Closes scylladb/scylladb#22771 (cherry picked from commit `87e8e00de6`) Closes scylladb/scylladb#22874	2025-02-17 14:34:36 +02:00
Botond Dénes	c627aff5f7	Merge '[Backport 2025.1] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot] `set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: https://github.com/scylladb/scylladb/issues/22629 Backport required to all active releases, they are all affected. - (cherry picked from commit `a3ae0c7cee`) - (cherry picked from commit `9174f27cc8`) Parent PR: #22701 Closes scylladb/scylladb#22752 * github.com:scylladb/scylladb: reader_concurrency_semaphore: set_notify_handler(): disable timeout reader_permit: mark check_abort() as const	2025-02-13 15:24:54 +02:00
Michał Chojnowski	ffca4a9f85	test_rpc_compression.py: fix an overly-short timeout The timeout of 10 seconds is too small for CI. I didn't mean to make it so short, it was an accident. Fix that by changing the timeout to 10 minutes.	2025-02-13 10:03:13 +01:00
Botond Dénes	1998733228	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622 (cherry picked from commit `7ce932ce01`) Closes scylladb/scylladb#22719	2025-02-13 09:40:05 +02:00
Botond Dénes	d05b3897a2	Merge '[Backport 2025.1] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot] Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches - (cherry picked from commit `e37d1bcb98`) - (cherry picked from commit `18cc79176a`) Parent PR: #22310 Closes scylladb/scylladb#22598 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-02-13 09:38:12 +02:00
Botond Dénes	9116fc635e	Merge '[Backport 2025.1] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot] `tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups. The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that `tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but `tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE Fixes #22431 This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1 - (cherry picked from commit `24e8d2a55c`) - (cherry picked from commit `8bff7786a8`) Parent PR: #22330 Closes scylladb/scylladb#22560 * github.com:scylladb/scylladb: test: add reproducer and test for fix to split ready CG creation table: run set_split_mode() on all storage groups during all_storage_groups_split()	2025-02-13 09:36:23 +02:00
Raphael S. Carvalho	5f74b5fdff	test: Use linux-aio backend again on seastar-based tests Since mid December, tests started failing with ENOMEM while submitting I/O requests. Logs of failed tests show IO uring was used as backend, but we never deliberately switched to IO uring. Investigation pointed to it happening accidentaly in commit `1bac6b75dc`, which turned on IO uring for allowing native tool in production, and picked linux-aio backend explicitly when initializing Scylla. But it missed that seastar-based tests would pick the default backend, which is io_uring once enabled. There's a reason we never made io_uring the default, which is that it's not stable enough, and turns out we made the right choice back then and it apparently continue to be unstable causing flakiness in the tests. Let's undo that accidental change in tests by explicitly picking the linux-aio backend for seastar-based tests. This should hopefully bring back stability. Refs #21968. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22695 (cherry picked from commit `ce65164315`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22800	2025-02-12 20:50:51 +02:00
Michał Chojnowski	a746fd2bb8	test_rpc_compression.py: test the dictionaries are loaded on startup Reproduces scylladb/scylladb#22738 (cherry picked from commit `8fb2ea61ba`)	2025-02-11 15:52:34 +00:00
Michael Litvak	8d1f6df818	test/test_view_build_status: fix flaky asserts In few test cases of test_view_build_status we create a view, wait for it and then query the view_build_status table and expect it to have all rows for each node and view. But it may fail because it could happen that the wait_for_view query and the following queries are done on different nodes, and some of the nodes didn't apply all the table updates yet, so they have missing rows. To fix it, we change the assert to work in the eventual consistency sense, retrying until the number of rows is as expectd. Fixes scylladb/scylladb#22644 Closes scylladb/scylladb#22654 (cherry picked from commit `c098e9a327`) Closes scylladb/scylladb#22780	2025-02-11 10:21:54 +01:00
Botond Dénes	fa9b1800b6	reader_concurrency_semaphore: set_notify_handler(): disable timeout set_notify_handler() is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: #scylladb/scylladb#22629 (cherry picked from commit `9174f27cc8`)	2025-02-09 00:32:38 +00:00
Botond Dénes	319626e941	reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload with_permit() creates a permit, with a self-reference, to avoid attaching a continuation to the permit's run function. This self-reference is used to keep the permit alive, until the execution loop processes it. This self reference has to be carefully cleared on error-paths, otherwise the permit will become a zombie, effectively leaking memory. Instead of trying to handle all loose ends, get rid of this self-reference altogether: ask caller to provide a place to save the permit, where it will survive until the end of the call. This makes the call-site a little bit less nice, but it gets rid of a whole class of possible bugs. Fixes: #22588 Closes scylladb/scylladb#22624 (cherry picked from commit `f2d5819645`) Closes scylladb/scylladb#22704	2025-02-06 10:08:19 +02:00
Aleksandra Martyniuk	cca2d974b6	service: use read barrier in tablet_virtual_task::contains Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent. To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table. Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it. Fixes: #21975. Closes scylladb/scylladb#21995 (cherry picked from commit `610a761ca2`) Closes scylladb/scylladb#22694	2025-02-06 10:07:51 +02:00
Michael Litvak	246635c426	test/test_view_build_status: fix wrong assert in test The test expects and asserts that after wait_for_view is completed we read the view_build_status table and get a row for each node and view. But this is wrong because wait_for_view may have read the table on one node, and then we query the table on a different node that didn't insert all the rows yet, so the assert could fail. To fix it we change the test to retry and check that eventually all expected rows are found and then eventually removed on the same host. Fixes scylladb/scylladb#22547 Closes scylladb/scylladb#22585 (cherry picked from commit `44c06ddfbb`) Closes scylladb/scylladb#22608	2025-02-03 09:24:17 +01:00

1 2 3 4 5 ...

8210 Commits