scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Andrei Chekun	ca615af407	test.py: refactor resource_gather.py Refactor resource_gather.py to not create the initial cgroup when the process it's already in it. This will allow not going deeper, creating again and again the same cgroup with each test.py execution when the terminal isn't closed. Add creation of own event loop in case it's not exists. This needed to be able to work with test.py that creates loop and with pytest that not create loop.	2025-04-24 14:05:49 +02:00
Wojciech Mitros	ee5883770a	test: remove flakiness from test_schema_is_recovered_after_dying Due to the changes in creating schemas with base info the test_schema_is_recovered_after_dying seems to be flaky when checking that the schema is actually lost after 'grace_period'. We don't actually guarantee that the the schema will be lost at that exact moment so there's no reason to test this. To remove the flakiness, we remove the check and the related sleep, which should also slightly improve the speed of this test.	2025-04-24 01:09:35 +02:00
Wojciech Mitros	bf7bba9634	mv: add a test for dropping an index while it's building Dropping an index is a schema change of its base table and a schema drop of the index's materialized view. This combination of schema changes used to cause issues during view building, because when a view schema was dropped, it wasn't getting updated with the new version of the base schema, and while the view building was in progress, we would update the base schema for the base table mutation reader and try generating updates with a view schema that wasn't compatible with the base schema, failing on an `on_internal_error`. In this patch we add a test for this scenario. We create an index, halt its view building process using an injection, and drop it. If no errors are thrown, the test succeeds. The test was failing before https://github.com/scylladb/scylladb/pull/23337 and is passing afterwards.	2025-04-24 01:09:32 +02:00
Wojciech Mitros	d77f11d436	base_info: remove the lw_shared_ptr variant The base_dependent_view_info is no longer needed to be shared or modified in the view_info, so we no longer need to keep it as a shared pointer.	2025-04-24 01:08:40 +02:00
Wojciech Mitros	d7bd86591e	view_info: don't re-set base_info after construction In the previous commits we made sure that the base info is not dependent on the base schema version, and the info dependent on the base schema version is calculated when it's needed. In this patch we remove the unnecessary re-setting of the base_info. The set_base_info method isn't removed completely, because it also has a secondary function - zeroing the view_info fields other than base_info. Because of this, in this patch we rename it accordingly and limit its use to the updates caused by a base schema change.	2025-04-24 01:08:40 +02:00
Wojciech Mitros	05fce91945	schema_registry: store base info instead of base schema for view entries In the following patch we plan to remove the base schema from the base_info to make the base_info immutable. To do that, we first prepare the schema registry for the change; we need to be able to create view schemas from frozen schemas there and frozen schemas have no information about the base table. Unless we do this change, after base schemas are removed from the base info, we'll no longer be able to load a view schema to the schema registry without looking up the base schema in the database. This change also required some updates to schema building: * we add a method for unfreezing a view schema with base info instead of a base schema * we make it possible to use schema_builder with a base info instead of a base schema * we add a method for creating a view schema from mutations with a base info instead of a base schema * we add a view_info constructor withat base info instead of a base schema * we update the naming in schema_registry to reflect the usage of base info instead of base schema	2025-04-24 01:08:39 +02:00
Wojciech Mitros	900687c818	view_info: set base info on construction Currently, the base_info may or may not be set in view schemas. Even when it's set, it may be modified. This necessitates extra checks when handling view schemas, as well as potentially causing errors when we forget to set it at some point. Instead, we want to make the base info an immutable member of view schemas (inside view_info). The first step towards that is making sure that all newly created schemas have the base info set. We achieve that by requiring a base schema when constructing a view schema. Unfortunately, this adds complexity each time we're making a view schema - we need to get the base schema as well. In most cases, the base schema is already available. The most problematic scenario is when we create a schema from mutations: - when parsing system tables we can get the schema from the database, as regular tables are parsed before views - when loading a view schema using the schema loader tool, we need to load the base additionally to the view schema, effectively doubling the work - when pulling the schema from another node - in this case we can only get the current version of the base schema from the local database Additionally, we need to consider the base schema version - when we generate view updates the version of the base schema used for reads should match the version of the base schema in view's base info. This is achieved by selecting the correct (old or new) schema in `db::schema_tables::merge_tables_and_views` and using the stored base schema in the schema_registry.	2025-04-24 01:08:39 +02:00
Benny Halevy	f279625f59	test_tablets_cql: test_alter_dropped_tablets_keyspace: extend expected error The query may fail also on a no_such_keyspace exception, which generates the following cql error: ``` Error from server: code=2200 [Invalid query] message="Can\'t find a keyspace test_1745198244144_qoohq" ``` Extend the pytest.raises match expression to include this error as well. Fixes #23812 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#23875	2025-04-23 18:54:22 +03:00
Aleksandra Martyniuk	c1618c7de5	test: test table drop during flush	2025-04-23 14:29:28 +02:00
Nadav Har'El	64a5eee6b9	test/cqlpy: insert test names into Scylla logs Both test.py and test/cqlpy/run run many test functions against the same Scylla process. In the resulting log file, it is hard to understand which log messages are related to which test. In this patch, we log a message (using the "/system/log" REST API) every time a test is started or ends. The messages look like this: INFO 2025-04-22 15:10:44,625 [shard 1:strm] api - /system/log: test/cqlpy: Starting test_lwt.py::test_lwt_missing_row_with_static ... INFO 2025-04-22 15:10:44,631 [shard 0:strm] api - /system/log: test/cqlpy: Ended test_lwt.py::test_lwt_missing_row_with_static We already had a similar feature in test/alternator, added three years ago in commit `b0371b6bf8`. The implementation is similar but not identical due to different available utility functions, and in any case it's very simple. While at it, this patch also fixes the has_rest_api() to timeout after one second. Without this, if the REST API is blocked in a way that a connection attempt just hangs, the tests can hang. With the new timeout, the test will hang for a second, realize the REST API is not available, and remember this decision (the next tests will not wait one second again). We had the same bug in Alternator, and fixed it in `758f8f01d7`. This one second "pause" will only happen if the REST API port is blocked - in the more typical case the REST API port is just not listening but not blocked, and the failure will be noticed immediately and won't wait a whole second. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23857	2025-04-23 12:04:14 +03:00
Piotr Dulikowski	3d73c79a72	test: mv: skip test_view_building_scheduling_group in debug The test populates a table with 50k rows, creates a view on that table and then compares the time spent in streaming vs. gossip scheduling groups. It only takes 10s in dev mode on my machine, but is much slower in debug mode in CI - building the view doesn't finish within 2 minutes. The bigger the view to build, the more accurrate the measurement; moreover, the test scenario isn't interesting enough to be worth running it in debug mode as this should be covered by other tests. Therefore, just skip this test in debug mode. Fixes: scylladb/scylladb#23862 Closes scylladb/scylladb#23866	2025-04-23 11:29:35 +03:00
Pavel Emelyanov	a6ba535c3c	Merge 'test.py: refactoring before boost pytest integration' from Andrei Chekun This PR contains changes that do not add new functionality, and have small refactoring of the existing code. The most significant change though is switching the SQLite writer from a singleton to a thread locking mechanism that will be needed later on. This PR is an extraction of several commits from https://github.com/scylladb/scylladb/pull/22894 as reviewer [request](https://github.com/scylladb/scylladb/pull/22894?notification_referrer_id=NT_kwDOACiLR7MxNDg0ODk2MDU1MjoyNjU3MDk1&notifications_query=reason%3Aparticipating#pullrequestreview-2778582278). Closes scylladb/scylladb#23867 * github.com:scylladb/scylladb: test.py: move the readme file for LDAP tests to the correct location test.py: eliminate deprecation warning for xml.etree.ElementTree.Element test.py: align the behavior of max-failures parameter with pytest maxfail test.py: fix typo in toxiproxy name parameter test.py: add locking to the sqlite writer for resource gather test.py: add sqlite datetime adapter for resource gather test.py: change the parameter for get_modes_to_run()	2025-04-23 11:10:56 +03:00
Andrzej Jackowski	3c69340b8c	test: add test_long_query_timeout_erm This commit adds a test to verify that a query with long timeout doesn't block ERM on failure. The motivation for the test is fixing scylladb#21831. This commit: - add test_long_query_timeout_erm	2025-04-23 09:29:47 +02:00
Andrzej Jackowski	1f1e4f09cd	test: add get_cql_exclusive to manager_client.py This commit adds to ManagerClient a get_cql_exclusive function that allows creating a cql connection with WhiteListRoundRobinPolicy for a single server. Such connection is useful in tests that kill nodes to make sure that the live node handles the queries. Before this commit, some tests used cluster_con from test/cluster/conftest.py, and after this commit test can start to use a method from MangerClient. This change: - Extend ManagerClient con_gen type to allow LoadBalancingPolicy arg - Implement get_cql_exclusive()	2025-04-23 09:29:47 +02:00
Andrei Chekun	57b66e6b2e	test.py: move the readme file for LDAP tests to the correct location README file was created in incorrect location, now it moved to the directory with source files where it intended to be.	2025-04-22 19:03:28 +02:00
Andrei Chekun	cf4747c151	test.py: eliminate deprecation warning for xml.etree.ElementTree.Element Testing the truth value of an Element emits DeprecationWarning. This check is done correctly	2025-04-22 19:03:21 +02:00
Andrei Chekun	5c3501e4bf	test.py: fix typo in toxiproxy name parameter Fix typo in toxiproxy name parameter. No any functional changes just cosmetic fix.	2025-04-22 19:02:12 +02:00
Andrei Chekun	2c37a793d1	test.py: add locking to the sqlite writer for resource gather SQLite blocking the DB during writes, so it's not possible to make writes from several thread. To be able to gather metrics in several threads, we need a locking mechanism for threads during writes. So thread will not try to write metrics while another thread is performing writes.	2025-04-22 19:01:30 +02:00
Andrei Chekun	800710dc2c	test.py: add sqlite datetime adapter for resource gather Add sqlite datetime adapter for resource gather since default adapters are deprecated from 3.12	2025-04-22 18:59:49 +02:00
Andrei Chekun	bf2a9e267e	test.py: change the parameter for get_modes_to_run() Change the parameter for get_modes_to_run() from session to config to narrow the scope, and prepare it to later use in method that do not have access to the session, but have access to the config object	2025-04-22 18:58:33 +02:00
Pavel Emelyanov	65efd2b2f6	Merge 'Refactor and enhance s3_tests' from Ernest Zaslavsky This PR introduces a cleanup mechanism in s3_tests to remove uploaded objects after the test completes, ensuring a clean testing environment. Additionally, the recently added test has been refactored and split into smaller, more maintainable parts, improving readability and extending its coverage to include the "proxied" case. As these changes primarily improve code aesthetics and maintainability, backporting is not necessary. Refs: https://github.com/scylladb/scylladb/issues/23830 Closes scylladb/scylladb#23828 * github.com:scylladb/scylladb: s3_tests: Improve and extend copy object test coverage s3_tests: Implement post-test cleanup for uploaded objects	2025-04-22 16:40:37 +03:00
Nadav Har'El	8d1a413357	test/scylla_gdb: better error message when running on dev build mode The test/scylla_gdb suite needs Scylla to have been built with debug symbols - which is NOT the case for the dev build. So the script test/scylla_gdb/run attempts to recognize when a developer runs it on an executable with the debug symbols missing - and prints a clear error. Unfortunately, as we noticed in #10863, and again in #23832, because wasmtime is compiled with debug symbols and linked with Scylla, build/dev/scylla "pretends" to have debug symbols, foiling the check in test/scylla_gdb/run. Reviewers rejected two solutions to this problem (pull requests #10865 and #10923), so in pull request #10937 I added a cosmetic solution just for test/scylla_gdb: in test/scylla_gdb/conftest.py we check that there are really debug symbols that interest us, and if not, exit immediately instead of failing each test separately. For some reason, the sys.exit() we used is no longer effective - it no longer exits pytest, so in this patch we use pytest.exit() instead. Fixes #23832 (sort of, we leave build/dev/scylla with the fake claim that it has debug symbols, but test/scylla_gdb will handle this situation more gracefully). Closes scylladb/scylladb#23834	2025-04-22 15:02:06 +03:00
Michael Litvak	5c1d24f983	test: test_mv_topology_change: increase timeout for remove_node The test `test_mv_write_to_dead_node` currently uses a timeout of 60 seconds for remove_node, after it was increased from 30 seconds to fix scylladb/scylladb#22953. Apparently it is still too low, and it was observed to fail in debug mode. Normally remove_node uses a default timeout of TOPOLOGY_TIMEOUT = 1000 seconds, but the test requires a timeout which is shorter than 5 minutes, because it is a regression test for an issue where MV updates hold topology changes for more than 5 minutes, and we want to verify in the test that the topology change completes in less than 5 minutes. To resolve the issue, we set the test to skip in debug mode, because the remove node operation is unpredictably slow, and we increase the timeout to 180 seconds which is hopefully enough time for remove_node in non-debug modes, and still sufficient to satisfy the test requirements. Fixes scylladb/scylladb#22530 Closes scylladb/scylladb#23833	2025-04-22 10:51:19 +02:00
Ernest Zaslavsky	edaa3f4bdd	s3_tests: Improve and extend copy object test coverage Refactored the copy object test to enhance readability and maintainability. The test was simplified and split into smaller, more focused parts. Additionally, a "proxied" variant of the test was introduced to expand coverage.	2025-04-21 20:54:14 +03:00
Ernest Zaslavsky	252a0a14af	s3_tests: Implement post-test cleanup for uploaded objects Ensure cleanup after tests by deleting objects uploaded to MinIO. This improves resource management and maintains a clean test environment.	2025-04-21 20:54:14 +03:00
Avi Kivity	2dcd2b21ae	Merge 'tablets: Equalize per-table balance when allocating tablets for a new table' from Tomasz Grabiec Fixes the following scenario: 1. Scale out adds new nodes to each rack 2. Table is created - all tablets are allocated to new nodes because they have low load 3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed We're wrong to try to equalize global load when allocating tablets, and we should equalize per-table load instead, and let background load balancing fix it in a fair way. It will add to the allocated storage imbalance, but: 1. The table is initially empty, so doesn't impact actual storage imbalance. 2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately. 3. If the table was created before imbalance was formed, we would end up in the same situation as in the problematic scenario after the patch. 4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in. Before we have CPU-aware tablet allocation, and thus can prove we have CPU capacity on the small nodes, we should respect per-table balance as this is the way in which we achieve full CPU utilization. Fixes #23631 Backport to 2025.1 because load imbalance is a serious problem in production. Closes scylladb/scylladb#23708 * github.com:scylladb/scylladb: tablets: Equalize per-table balance when allocating tablets for a new table load_sketch: Tolerate missing tablet_map when selecting for a given table tests: tablets: Simplify tests by moving common code to topology_builder	2025-04-21 17:06:30 +03:00
Pavel Emelyanov	eb5b52f598	Merge 'main: make DC and rack immutable after bootstrap' from Piotr Dulikowski Changing DC or rack on a node which was already bootstrapped is, in case of vnodes, very unsafe (almost guaranteed to cause data loss or unavailability), and is outright not supported if the cluster has a tablet-backed keyspaces. Moreover, the possibility of doing that makes it impossible to uphold some of the invariants promised by the RF-rack-valid flag, which is eventually going to become unconditionally enabled. Get rid of the above problems by removing the possibility of changing the DC / rack of a node. A node will now fail to start if its snitch reports a different DC or rack than the one that was reported during the first boot. Fixes: scylladb/scylladb#23278 Fixes: scylladb/scylladb#22869 Marking for backport to 2025.1, as this is a necessary part of the RF-rack-valid saga Closes scylladb/scylladb#23800 * github.com:scylladb/scylladb: doc: changing topology when changing snitches is no longer supported test: cluster: introduce test_no_dc_rack_change storage_service: don't update DC/rack in update_topology_with_local_metadata main: make dc and rack immutable after bootstrap test: cluster: remove test_snitch_change	2025-04-21 15:52:55 +03:00
Avi Kivity	0ba3ce1741	test: gdb: avoid using `file(1)` to determine if debug information is present The scylla_gdb tests verify, as a sanity check, that the executable was built with debug information. They do so via file(1). In Fedora 42, file(1) crashes on ELF files that have interpreter pathnames larger than 128 characters[1]. This was later fixed[2], but the fix is not in any release. Work around the problem by using objdump instead of file. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2354970 [2] `b3384a1fbf` Closes scylladb/scylladb#23823	2025-04-21 13:29:27 +03:00
Andrei Chekun	441cee8d9c	test.py: fix gathering logs in case of fail Currently log files have information about run_id twice: cluster.object_store_test_backup.10.test_abort_restore_with_rpc_error.dev.10_cluster.log However, sometimes the first run_id can be incorrect: cluster.object_store_test_backup.1.test_abort_restore_with_rpc_error.dev.10_cluster.log Removing first run_id in the name to not face this issue and because it's actually redundant. Removing creation empty file for scylla manager log, since it redundant and was done as incorrect assumption on the root cause of the fail. Add extension to the stacktrace file, so it will be opened in the browser in Jenkins in the new tab instead of downloading it. Fixes: https://github.com/scylladb/scylladb/issues/23731 Closes scylladb/scylladb#23797	2025-04-21 13:12:35 +03:00
Pavel Emelyanov	09caad6147	test: Remove sstable_assertions::get_stats_metadata() It mirrors the sstable method of the same name, which is public. With -> operator, it's just as convenient to call it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-04-18 18:53:41 +03:00
Pavel Emelyanov	294e56207d	test: Add sstable_assertions::operator->() ... and replace get_sstable() with it. It's more natural (despite having the only user) to consider the class to be yet another "pointer" to an sstable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-04-18 18:52:39 +03:00
Sergey Zolotukhin	2314feeae2	test: Ignore DEBUG,TRACE,INFO level messages when checking for failed mutations. Update the regular expression in `check_node_log_for_failed_mutations` to avoid false test failures when DEBUG-level logging is enabled. Fixes scylladb/scylladb#23688 Closes scylladb/scylladb#23658	2025-04-18 16:17:41 +03:00
Calle Wilund	4a44651fce	encryption_at_rest_test: Make fake_proxy read/write loop noexcept Fixes #23774 Test code falls into same when_all issue as http client did. Avoid passing exceptions through this, and instead catch and report in worker lambda. Closes scylladb/scylladb#23778	2025-04-18 16:17:41 +03:00
Pavel Emelyanov	324daac156	Merge 'Add CopyObject API implementation to S3 client' from Ernest Zaslavsky Implement the CopyObject API to directly copy S3 object from one location to another. This implementation consumes zero networking overhead on the client side since the object is copied internally by S3 machinery Usage example: Backup of tiered SSTables - you already have SSTables on S3, CopyObject is the ideal way to go No need to backport since we are adding new functionality for a future use Closes scylladb/scylladb#23779 * github.com:scylladb/scylladb: s3_client: implement S3 copy object s3_client: improve exception message s3_client: reposition local function for future use	2025-04-18 16:17:41 +03:00
Pavel Emelyanov	cc919b08c2	Merge 'backup: Optimize S3 throughput with shard-based upload' from Ernest Zaslavsky This PR enhances S3 throughput by leveraging every available shard to upload backup files concurrently. By distributing the load across multiple shards, we significantly improve the upload performance. Each shard retrieves an SSTable and processes its files sequentially, ensuring efficient, file-by-file uploads. To prevent uncontrolled fiber creation and potential resource exhaustion, the backup task employs a directory semaphore from the sstables_manager. This mechanism helps regulate concurrency at the directory level, ensuring stable and predictable performance during large-scale backup operations. Refs #22460 fixes: #22520 ``` =========================================== Release build, master, smp-16, mem-32GiB Bytes: 2342880184, backup time: 9.51 s =========================================== Release build, this PR, smp-16, mem-32GiB Bytes: 2342891015, backup time: 1.23 s =========================================== ``` Looks like it is faster at least x7.7 No backport needed since it (native backup) is still unused functionality Closes scylladb/scylladb#23727 * github.com:scylladb/scylladb: backup: Add test for invalid endpoint backup_task: upload on all shards backup_task: integrate sharded storage manager for upload	2025-04-18 16:17:41 +03:00
Avi Kivity	6b415cfd4b	Merge 'managed_bytes: in the copy constructor, respect the target preferred allocation size' from Michał Chojnowski Commit `14bf09f447` added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer. But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too. But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes. In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator). In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments. Consequences of the bug: 1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2. 2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though). 3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory. There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew. But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments. If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation. Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072 Fixes https://github.com/scylladb/scylladb/issues/22941 Fixes https://github.com/scylladb/scylladb/issues/22389 Fixes https://github.com/scylladb/scylladb/issues/23781 This is a regression fix, should be backported to all affected releases. Closes scylladb/scylladb#23782 * github.com:scylladb/scylladb: managed_bytes_test: add a reproducer for #23781 managed_bytes: in the copy constructor, respect the target preferred allocation size	2025-04-17 21:14:10 +03:00
Pavel Emelyanov	ca2cc5e826	Merge 'test/cluster/test_read_repair: make incremental test work with tablets' from Botond Dénes There are two tests which test incremental read repair: one with row the other with partition tombstones. The tests currently force vnodes, by creating the test keyspace with {'enabled': false}. Even so, the tests were found to be flaky so one of them are marked for skip. This commit does the following changes: * Make the tests use tablets by creating the test keyspace with tablets. * Change the way the tests write data so it works with tablets: currently the tests use scylla-sstable write + upload but this won't work with tablets since upload with tablets implies --load-and-stream which means data is streamed to all replicas (no difference created between nodes). Switch to the classic stop-node + write to other replica with CL=ONE. * Remove the skip added to the partition-tombstone test variant. Fixes: #21179 Test improvement, no backport required. Closes scylladb/scylladb#23167 * github.com:scylladb/scylladb: wip test/cluster/test_read_repair: make incremental test work with tablets	2025-04-17 18:54:00 +03:00
Piotr Dulikowski	796c8d1601	test: cluster: introduce test_no_dc_rack_change The test makes sure that changing the DC or rack in the snitch's configuration fails with an expected error.	2025-04-17 16:22:58 +02:00
Piotr Dulikowski	ce2fab7cce	main: make dc and rack immutable after bootstrap Changing DC or rack on a node which was already bootstrapped is, in case of vnodes, very unsafe (almost guaranteed to cause data loss or unavailability), and is outright not supported if the cluster has a tablet-backed keyspaces. Moreover, the possibility of doing that makes it impossible to uphold some of the invariants promised by the RF-rack-valid flag, which is eventually going to become unconditionally enabled. Get rid of the above problems by removing the possibility of changing the DC / rack of a node. A node will now fail to start if its snitch reports a different DC or rack than the one that was reported during the first boot. Fixes: scylladb/scylladb#23278	2025-04-17 16:22:26 +02:00
Tomasz Grabiec	1e407ab4d2	tablets: Equalize per-table balance when allocating tablets for a new table Fixes the following scenario: 1. Scale out adds new nodes to each rack 2. Table is created - all tablets are allocated to new nodes because they have low load 3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed We're wrong to try to equalize global load when allocating tablets, and we should equalize per-table load instead, and let background load balancing fix it in a fair way. It will add to the allocated storage imbalance, but: 1. The table is initially empty, so doesn't impact actual storage imbalance. 2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately. 3. If the table was created before imbalance was formed, we would end up in the same situation in the problematic scenario after the patch. 4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in. Before we have CPU-aware tablet allocation, and thus can prove we have CPU capacity on the small nodes, we should respect per-table balance as this is the way in which we achieve full CPU utilization. Fixes #23631	2025-04-17 16:01:23 +02:00
Ernest Zaslavsky	b79ca5a1aa	backup: Add test for invalid endpoint * During the development phase, the backup functionality broke because we lacked a test that runs backup with an invalid endpoint. This commit adds a test to cover that scenario. * Add checking for the expected error to be propagated from failing/aborted backup	2025-04-17 16:31:43 +03:00
Piotr Dulikowski	dd2e507ece	test: cluster: remove test_snitch_change This test checked that it is possible to change DC/rack of a node during restart. This will become explicitly forbidden, so remove the test.	2025-04-17 13:51:22 +02:00
Aleksandra Martyniuk	e178bd7847	test: add test for getting tasks children Add test that checks whether the children of a virtual task will be properly gathered if a node is down.	2025-04-17 13:48:44 +02:00
Michał Chojnowski	6c1889f65c	managed_bytes_test: add a reproducer for #23781	2025-04-17 12:51:01 +02:00
Nadav Har'El	84d4af1f0e	Merge 'Alternator batch rcu' from Amnon Heiman This series adds support for reporting consumed capacity in BatchGetItem operations in Alternator. It includes changes to the RCU accounting logic, exposing internal functionality to support batch-specific behavior, and adds corresponding tests for both simple and complex use cases involving multiple tables and consistency modes. Need backporting to 2025.1, as RCU and WCU are not fully supported Fixes #23690 Closes scylladb/scylladb#23691 * github.com:scylladb/scylladb: test_returnconsumedcapacity.py: test RCU for batch get item alternator/executor: Add RCU support for batch get items alternator/consumed_capacity: make functionality public	2025-04-17 10:08:16 +03:00
Botond Dénes	22a28ca1db	wip	2025-04-17 03:01:17 -04:00
Ernest Zaslavsky	a369dda049	s3_client: implement S3 copy object Add support for the CopyObject API to enable direct copying of S3 objects between locations. This approach eliminates networking overhead on the client side, as the operation is handled internally by S3.	2025-04-17 09:47:47 +03:00
Botond Dénes	19b4f10598	test/cluster/test_read_repair: make incremental test work with tablets There are two tests which test incremental read repair: one with row the other with partition tombstones. The tests currently force vnodes, by creating the test keyspace with {'enabled': false}. Even so, the tests were found to be flaky so one of them are marked for skip. This commit does the following changes: * Make the tests use tablets by creating the test keyspace with tablets. * Change the way the tests write data so it works with tablets: currently the tests use scylla-sstable write + upload but this won't work with tablets since upload with tablets implies --load-and-stream which means data is streamed to all replicas (no difference created between nodes). Switch to the classic stop-node + write to other replica with CL=ONE. * Remove the skip added to the partition-tombstone test variant. Also add tracing to the read-repair query, to make debugging the test easier if it fails. Fixes: #21179	2025-04-17 02:01:17 -04:00
Avi Kivity	0206da5232	Merge 'readers: strip "flat" and "v2" from names' from Botond Dénes Continue the effort of normalizing reader names, stripping legacy qualifying terms like "flat" and "v2". Flat and v2 readers are the default now, we only need to add qualifying terms to readers which are different than the normal. One such reader remains: `make_generating_reader_v1()`. This PR contains mostly mechanical changes, done with a sed script. Commits which only contain such mechanical renames are marked as such in the commitlog. Code cleanup, no backport needed. Closes scylladb/scylladb#23767 * github.com:scylladb/scylladb: readers: mv reversing_v2.hh reversing.hh readers: mv generating_v2.hh generating.hh tree: s/make_generating_reader_v2/make_generating_reader/ readers: mv from_mutations_v2.hh from_mutations.hh tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s readers: mv from_fragments_v2.hh from_fragments.hh readers: mv forwardable_v2.hh forwardable.hh readers: mv empty_v2.hh empty.hh tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/ readers/empty_v2.hh: replace forward declarations with include of fwd header readers/mutation_reader_fwd.hh: forward declare reader_permit readers: mv delegating_v2.hh delegating.hh readers/delegating_v2.hh: move reader definition to _impl.hh file	2025-04-16 20:21:51 +03:00
Amnon Heiman	3acde5f904	test_returnconsumedcapacity.py: test RCU for batch get item This patch adds tests for consumed capacity in batch get item. It tests both the simple case and the multi-item, multi-table case that combines consistent and non-consistent reads.	2025-04-16 17:05:32 +03:00

... 61 62 63 64 65 ...

11801 Commits