scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 00:02:37 +00:00

Author	SHA1	Message	Date
Botond Dénes	7943db9844	replica/table: add error injection to memtable post-flush phase After the memtable was flushed to disk, but before it is merged to cache. The injection point will only active for the table specified in the "table_name" injection parameter. (cherry picked from commit `6c1f6427b3`)	2025-04-10 06:52:18 -04:00
Botond Dénes	bd8c584a01	utils/error_injection: add a way to set parameters from error injection points With this, now it is possible to have two-way communication between the error injection point and its enabler. The test can enable the error injection point, then wait until it is hit, before proceedin. (cherry picked from commit `f7938e3f8b`)	2025-04-10 06:52:18 -04:00
Botond Dénes	50c05abd14	test/cluster: add test_data_resurrection_in_memtable.py Reproducers for #23252 and #23291 -- cache garbage collecting tombstones resurrecting data in the memtable. (cherry picked from commit `34b18d7ef4`)	2025-04-10 06:52:18 -04:00
Botond Dénes	de1d8372fa	test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts Such that a given index in the return hosts refers to the same underlying Scylla instance, as the same index in the passed-in nodes list. This is what users of this method intuitively expect, but currently the returned hosts list is unordered (has random order). (cherry picked from commit `e5afd9b5fb`)	2025-04-10 03:17:27 -04:00
Botond Dénes	dcc3604e02	replica/mutation_dump: don't assume cells are live Currently the dumper unconditionally extracts the value of atomic cells, assuming they are live. This doesn't always hold of course and attempting to get the value of a dead cell will lead to marshalling errors. Fix by checking is_live() before attempting to get the cell value. Fix for both regular and collection cells. (cherry picked from commit `df09b3f970`)	2025-04-10 03:17:27 -04:00
Botond Dénes	39ca3463b3	replica/database: do_apply() add error injection point So writes (to user tables) can be failed on a replica, via error injection. Should simplify tests which want to create differences in what writes different replicas receive. (cherry picked from commit `cb76cafb60`)	2025-04-10 03:17:27 -04:00
Botond Dénes	1c7a6ba140	replica: improve memtable overlap checks for the cache The current memtable overlap check that is used by the cache -- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only checks the active memtable, so memtables which are either being flushed or are already flushed and also have active reads against them do not participate in the overlap check. This can result in temporary data resurrection, where a cache read can garbage-collect a tombstone which still covers data in a flushing or flushed memtable, which still have active read against it. To prevent this, extend the overlap check to also consider all of the memtable list. Furthermore, memtable_list::erase() now places the removed (flushed) memtable in an intrusive list. These entries are alive only as long as there are readers still keeping an `lw_shared_ptr<memtable>` alive. This list is now also consulted on overlap checks. (cherry picked from commit `d126ea09ba`)	2025-04-10 03:17:27 -04:00
Botond Dénes	4febf2a938	replica/memtable: add is_merging_to_cache() And set it when the memtable is merged to cache. (cherry picked from commit `7e600a0747`)	2025-04-10 03:17:27 -04:00
Botond Dénes	b43d024ffb	db/row_cache: add overlap-check for cache tombstone garbage collection The cache should not garbage-collect tombstone which cover data in the memtable. Add overlap checks (get_max_purgeable) to garbage collection to detect tombstones which cover data in the memtable and to prevent their garbage collection. (cherry picked from commit `6b5b563ef7`)	2025-04-10 03:17:27 -04:00
Botond Dénes	4bb1969a7f	mutation/mutation_compactor: copy key passed-in to consume_new_partition() This doesn't introduce additional work for single-partition queries: the key is copied anyway on consume_end_of_stream(). Multi-partition reads and compaction are not that sensitive to additional copy added. This change fixes a bug in the compacting_reader: currently the reader passes _last_uncompacted_partition_start.key() to the compactor's consume_new_partition(). When the compactor emits enough content for this partition, _last_uncompacted_partition_start is moved from to emit the partition start, this makes the key reference passed to the compaction corrupt (refer to moved-from value). This in turn means that subsequent GC checks done by the compactor will be done with a corrupt key and therefore can result in tombstone being garbage-collected while they still cover data elsewhere (data resurrection). The compacting reader is violating the API contract and normally the bug should be fixed there. We make an exception here because doing the fix in the mutation compactor better aligns with our future plans: * The fix simplifies the compactor (gets rid of _last_dk). * Prepares the way to get rid of the consume API used by the compactor. (cherry picked from commit `c2518cdf1a`)	2025-04-10 03:17:27 -04:00
Nadav Har'El	c94d8e2471	Merge '[Backport 2025.1] transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing' from Scylladb[bot] A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver. However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then). This patch fixes this. Fixes #23173 The issue fixed by this PR is not critical but the fix is simple and safe enough so we should backport it to all live releases. - (cherry picked from commit `ca6bddef35`) - (cherry picked from commit `f7e1695068`) Parent PR: #23174 Closes scylladb/scylladb#23524 * github.com:scylladb/scylladb: CQL Tracing: set common query parameters in a single function transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing	2025-04-09 14:59:13 +03:00
Kefu Chai	d7265a1bc2	storage_proxy: Prevent integer overflow in abstract_read_executor::execute Fix UBSan abort caused by integer overflow when calculating time difference between read and write operations. The issue occurs when: 1. The queried partition on replicas is not purgeable (has no recorded modified time) 2. Digests don't match across replicas 3. The system attempts to calculate timespan using missing/negative last_modified timestamps This change skips cross-DC repair optimization when write timestamp is negative or missing, as this optimization is only relevant for reads occurring within write_timeout of a write. Error details: ``` service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long') SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80 Aborting on shard 1, in scheduling group sl:default ``` Related to previous fix `39325cf` which handled negative read_timestamp cases. Fixes #23314 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23359 (cherry picked from commit `ebf9125728`) Closes scylladb/scylladb#23387	2025-04-09 14:56:10 +03:00
Nadav Har'El	7f19a27f4f	Merge '[Backport 2025.1] main: safely check stop_signal in-between starting services' from Scylladb[bot] To simplify aborting scylla while starting the services, add a _ready state to stop_signal, so that until main is ready to be stopped by the abort_source, just register that the signal is caught, and let a check() method poll that and request abort and throw respective exception only then, in controlled points that are in-between starting of services after the service started successfully and a deferred stop action was installed. This patch prevents gate_closed_exception to escape handling when start-up is aborted early with the stop signal, causing https://github.com/scylladb/scylladb/issues/23153 The regression is apparently due to `a25c3eaa1c` Fixes https://github.com/scylladb/scylladb/issues/23153 * Requires backport to 2025.1 due to `a25c3eaa1c` - (cherry picked from commit `23433f593c`) - (cherry picked from commit `282ff344db`) - (cherry picked from commit `feef7d3fa1`) - (cherry picked from commit `b6705ad48b`) Parent PR: #23103 Closes scylladb/scylladb#23184 * github.com:scylladb/scylladb: main: add checkpoints main: safely check stop_signal in-between starting services main: move prometheus start message main: move per-shard database start message	2025-04-09 14:54:19 +03:00
Nadav Har'El	c6825920a6	alternator: in GetRecords, enforce Limit to be <= 1000 Alternator Streams' "GetRecords" operation has a "Limit" parameter on how many records to return. The DynamoDB documentations says that the upper limit on this Limit parameter is 1000 - but Alternator didn't enforce this. In this patch we begin enforcing this highest Limit, and also add a test for verifying this enforcement. As usual, the new test passes on DynamoDB, and after this patch - also on Alternator. The reason why it's useful to have some upper limit on Limit is that the existing executor::get_records() implementation does not really have preemption points in all the necessary places. In particular, we have a loop on all returned records without preemption points. We also store the returned records in a RapidJson vector, which requires a contiguous allocation. Even before this patch, GetRecords had a hard limit of 1 MB of results. But still, in some cases 1 MB of results may be a lot of results, and we can see stalls in the aforementioned places being O(number of results). Fixes #23534 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23547 (cherry picked from commit `84fd52315f`) Closes scylladb/scylladb#23643	2025-04-09 12:46:30 +03:00
Botond Dénes	bff75aa812	Merge '[Backport 2025.1] Add tablet enforcing option' from Scylladb[bot] This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing `enable_tablets` option. It can be set to the following values: disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option enabled: New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option `tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether tablets are disabled or enabled by default for new keyspaces, respectively. In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}` keyspace option, when the keyspace is created. `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}` Fixes scylladb/scylla-enterprise#4355 [Edit: changed `Refs` above to `Fixes` to apeace the backport bot gods] * Requires backport to 2025.1 - (cherry picked from commit `c62865df90`) - (cherry picked from commit `62aeba759b`) - (cherry picked from commit `9fac0045d1`) Parent PR: #22273 Closes scylladb/scylladb#23602 * github.com:scylladb/scylladb: boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option db/config: add tablets_mode_for_new_keyspaces option	2025-04-09 08:47:10 +03:00
Michał Chojnowski	2a74426084	table: fix a race in table::take_storage_snapshot() `safe_foreach_sstable` doesn't do its job correctly. It iterates over an sstable set under the sstable deletion lock in an attempt to ensure that SSTables aren't deleted during the iteration. The thing is, it takes the deletion lock after the SSTable set is already obtained, so SSTables might get unlinked before we take the lock. Remove this function and fix its usages to obtain the set and iterate over it under the lock. Closes scylladb/scylladb#23397 (cherry picked from commit `e23fdc0799`) Closes scylladb/scylladb#23628	2025-04-08 19:07:22 +03:00
Lakshmi Narayanan Sreethar	b7e72b3167	replica/table::do_apply : do not check for async gate's closure The `table::do_apply()` method verifies if the compaction group's async gate is open to determine if the compaction group is active. Closing this async gate prevents any new operations but waits for existing holders to exit, allowing their operations to complete. When holding a gate, holders will observe the gate as closed when it is being closed, but this is irrelevant as they are already inside the gate and are allowed to complete. All the callers of `table::do_apply()` already enter the gate before calling the method. So, the async gate check inside `table::do_apply()` will erroneously throw an exception when the compaction group is closing despite holding the gate. This commit removes the check to prevent this from happening. Fixes #23348 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#23579 (cherry picked from commit `750f4baf44`) Closes scylladb/scylladb#23645	2025-04-08 18:59:22 +03:00
Yaron Kaikov	98359dbfb1	.github: Make "make-pr-ready-for-review" workflow run in base repo in `57683c1a50` we fixed the `token` error, but removed the checkout part which causing now the following error ``` failed to run git: fatal: not a git repository (or any of the parent directories): .git ``` Adding the repo checkout stage to avoid such error Fixes: https://github.com/scylladb/scylladb/issues/22765 Closes scylladb/scylladb#23641 (cherry picked from commit `2dc7ea366b`) Closes scylladb/scylladb#23654	2025-04-08 13:49:27 +03:00
Benny Halevy	27ca0d1812	boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `9fac0045d1`)	2025-04-08 08:35:26 +03:00
Benny Halevy	736f89b31a	tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}`. Refs scylladb/scylla-enterprise#4355 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `62aeba759b`)	2025-04-08 08:35:14 +03:00
Benny Halevy	a49e27ac8f	db/config: add tablets_mode_for_new_keyspaces option The new option deprecates the existing `enable_tablets` option. It will be extended in the next patch with a 3rd value: "enforced" while will enable tablets by default for new keyspace but without the posibility to opt out using the `tablets = {'enabled': false}` keyspace schema option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c62865df90`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-08 08:08:47 +03:00
Botond Dénes	1a896169dc	Merge '[Backport 2025.1] repair: release erm in repair_writer_impl::create_writer when possible' from Scylladb[bot] Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed. Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked. Fixes: #23453. Needs backport to 2025.1 that introduces the tablet repair scheduler. - (cherry picked from commit `1dc29ddc86`) - (cherry picked from commit `bae6711809`) Parent PR: #23455 Closes scylladb/scylladb#23580 * github.com:scylladb/scylladb: \test: add test to check concurrent migration and repair of two different tablets repair: release erm in repair_writer_impl::create_writer when possible scylla-2025.1.1 scylla-2025.1.1-candidate-20250408065609	2025-04-07 10:10:20 +03:00
Kefu Chai	9ccad33e59	.github: Make "make-pr-ready-for-review" workflow run in base repo The "make-pr-ready-for-review" workflow was failing with an "Input required and not supplied: token" error. This was due to GitHub Actions security restrictions preventing access to the token when the workflow is triggered in a fork: ``` Error: Input required and not supplied: token ``` This commit addresses the issue by: - Running the workflow in the base repository instead of the fork. This grants the workflow access to the required token with write permissions. - Simplifying the workflow by using a job-level `if` condition to controlexecution, as recommended in the GitHub Actions documentation (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution). This is cleaner than conditional steps. - Removing the repository checkout step, as the source code is not required for this workflow. This change resolves the token error and ensures the "make-pr-ready-for-review" workflow functions correctly. Fixes scylladb/scylladb#22765 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22766 (cherry picked from commit `ca832dc4fb`) Closes scylladb/scylladb#23561	2025-04-07 08:10:10 +03:00
Piotr Smaron	a17dd4d4c9	[Backport 2025.1] auth: forbid modifying system ks by non-superusers Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces. Fixes: scylladb/scylladb#23218 (cherry picked from commit `fee50f287c`) Parent PR: #23219 Closes scylladb/scylladb#23594	2025-04-06 15:10:06 +03:00
Nadav Har'El	a2a4c6e4b2	test/alternator: increase timeout in Alternator RBAC test On our testing infrastructure, tests often run a hundred times (!) slower than usual, for various reasons that we can't always avoid. This is why all our test frameworks drastically increase the default timeouts. We forgot to increase the timeout in one place - where Alternator tests use CQL. This is needed for the Alternator role-based access control (RBAC) tests, which is configured via CQL and therefore the Alternator test unusually uses CQL. So in this patch we increase the timeout of CQL driver used by Alternator tests to the same high timeouts (60-120 seconds) used by the regular CQL tests. As the famous saying goes, these timeouts should be enough for anyone. Fixes #23569. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23578 (cherry picked from commit `a9a6f9eecc`) Closes scylladb/scylladb#23601	2025-04-06 11:49:46 +03:00
Avi Kivity	64182d9df6	Update seastar submodule (prefaulter leaving zombie threads) * seastar a350b5d70e...6d8fccf14c (1): > smp: prefaulter: don't leave zombie worker threads Fixes #23316	2025-04-05 22:28:53 +03:00
Pavel Emelyanov	8e85ef90d2	sstables_loader: Do not stop sharded<progress_monitor> unconditionally The member in question is unconditionally .stop()-ed in task's release_resources() method, however, it may happen that the thing wasn't .start()-ed in the first place. Start happens in the middle of the task's .run() method and there can be several reasons why it can be skipped -- e.g. the task is aborted early, or collecting sstables from S3 throws. fixes: #23231 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23483 (cherry picked from commit `832d83ae4b`) Closes scylladb/scylladb#23557	2025-04-04 17:46:20 +03:00
Aleksandra Martyniuk	b5b2ffa5df	\test: add test to check concurrent migration and repair of two different tablets (cherry picked from commit `bae6711809`)	2025-04-04 10:14:51 +02:00
Andrzej Jackowski	b7f067ce33	audit: fix empty query string in BATCH query Function modification_statement::add_raw() is never called, which makes query string in audit_info of batch queries empty. In enterprise branch, add_raw is called in Cql.g and those changes were never merged to master. This changes: - Add missing call of add_raw() to Cql.g - Include other related changes (from PR#3228 in scylla-enterprise) Fixes scylladb#23311 Closes scylladb/scylladb#23315 (cherry picked from commit `b8adbcbc84`) Closes scylladb/scylladb#23495	2025-04-03 16:46:33 +03:00
Aleksandra Martyniuk	307f00a398	repair: release erm in repair_writer_impl::create_writer when possible Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed. Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked. Fixes: #23453. (cherry picked from commit `1dc29ddc86`)	2025-04-03 13:19:40 +00:00
Dawid Mędrek	c56e47f72f	db/hints: Cancel draining when stopping node Draining hints may occur in one of the two scenarios: * a node leaves the cluster and the local node drains all of the hints saved for that node, * the local node is being decommissioned. Draining may take some time and the hint manager won't stop until it finishes. It's not a problem when decommissioning a node, especially because we want the cluster to retain the data stored in the hints. However, it may become a problem when the local node started draining hints saved for another node and now it's being shut down. There are two reasons for that: * Generally, in situations like that, we'd like to be able to shut down nodes as fast as possible. The data stored in the hints won't disappear from the cluster yet since we can restart the local node. * Draining hints may introduce flakiness in tests. Replaying hints doesn't have the highest priority and it's reflected in the scheduling groups we use as well as the explicitly enforced throughput. If there are a large number of hints to be replayed, it might affect our tests. It's already happened, see: scylladb/scylladb#21949. To solve those problems, we change the semantics of draining. It will behave as before when the local node is being decommissioned. However, when the local node is only being stopped, we will immediately cancel all ongoing draining processes and stop the hint manager. To amend for that, when we start a node and it initializes a hint endpoint manager corresponding to a node that's already left the cluster, we will begin the draining process of that endpoint manager right away. That should ensure all data is retained, while possibly speeding up the shutdown process. There's a small trade-off to it, though. If we stop a node, we can then remove it. It won't have a chance to replay hints it might've before these changes, but that's an edge case. We expect this commit to bring more benefit than harm. We also provide tests verifying that the implementation works as intended. Fixes scylladb/scylladb#21949 Closes scylladb/scylladb#22811 (cherry picked from commit `0a6137218a`) Closes scylladb/scylladb#23370	2025-04-03 09:09:05 +02:00
Tomasz Grabiec	51ee15f02d	Merge '[Backport 2025.1] tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Fixes #23042 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config Closes scylladb/scylladb#23443 * github.com:scylladb/scylladb: Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config scylla-2025.1.1-candidate-20250402022041	2025-04-01 20:31:05 +02:00
Vlad Zolotarov	feadb781f2	CQL Tracing: set common query parameters in a single function Each query-type (QUERY, EXECUTE, BATCH) CQL opcode has a number of parameters in their payload which we always want to record in the Tracing object. Today it's a Consistency Level, Serial Consistency Level and a Default Timestamp. Setting each of them individually can lead to a human error when one (or more) of them would not be set. Let's eliminate such a possibility by defining a single function that sets them all. This also allows an easy addition of such parameters to this function in the future. (cherry picked from commit `f7e1695068`)	2025-04-01 11:45:54 +00:00
Vlad Zolotarov	6b71d6b9ba	transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver. However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then). This patch fixes this. Fixes #23173 (cherry picked from commit `ca6bddef35`)	2025-04-01 11:45:54 +00:00
Jenkins Promoter	36bb089663	Update ScyllaDB version to: 2025.1.1	2025-04-01 14:18:18 +03:00
yangpeiyu2_yewu	1661b35050	mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction. Fixes scylladb/scylladb#22790 Closes scylladb/scylladb#22812 (cherry picked from commit `0de232934a`) Closes scylladb/scylladb#22943	2025-04-01 13:46:27 +03:00
Asias He	8c93a331f7	repair: Enable small table optimization for system_replicated_keys This enterprise-only system table is replicated and small. It should be included for small table optimization. Fixes scylladb/scylla-enterprise#5256 Closes scylladb/scylladb#23135 Closes scylladb/scylladb#23147	2025-04-01 13:36:51 +03:00
Calle Wilund	85c161b9f1	generic_server: Update conditions for is_broken_pipe_or_connection_reset Refs scylla-enterprise#5185 Fixes #22901 If a tls socket gets EPIPE the error is not translated to a specific gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat EPIPE as ignorable for plain sockets, we need to unwind nested exception here to detect that the error was in fact due to this, so we can suppress log output for this. Closes scylladb/scylladb#22888 (cherry picked from commit `e49f2046e5`) Closes scylladb/scylladb#23045	2025-04-01 13:06:29 +03:00
Patryk Jędrzejczak	d088cc8a2d	Merge '[Backport 2025.1] Fix a regression that sometimes causes an internal error and demote barrier_and_drain rpc error log to a warning ' from Scylladb[bot] The series fixes a regression and demotes a barrier_and_drain logging error to a warning since this particular condition may happen during normal operation. We want to backport both since one is a bug fix and another is trivial and reduces CI flakiness. - (cherry picked from commit `1da7d6bf02`) - (cherry picked from commit `fe45ea505b`) Parent PR: #22650 Closes scylladb/scylladb#22923 * https://github.com/scylladb/scylladb: topology_coordinator: demote barrier_and_drain rpc failure to warning topology_coordinator: read peers table only once during topology state application	2025-04-01 11:54:56 +02:00
Patryk Jędrzejczak	39c20144e5	Merge '[Backport 2025.1] raft topology: Add support for raft topology init to happen before group0 initialization' from Scylladb[bot] In the current scenario, the problem discovered is that there is a time gap between group0 creation and raft_initialize_discovery_leader call. Because of that, the group0 snapshot/apply entry enters wrong values from the disk(null) and updates the in-memory variables to wrong values. During the above time gap, the in-memory variables have wrong values and perform absurd actions. This PR removes the variable `_manage_topology_change_kind_from_group0` which was used earlier as a work around for correctly handling `topology_change_kind` variable, it was brittle and had some bugs (causing issues like scylladb/scylladb#21114). The reason for this bug that _manage_topology_change_kind used to block reading from disk and was enabled after group0 initialization and starting raft server for the restart case. Similarly, it was hard to manage `topology_change_kind` using `_manage_topology_change_kind_from_group0` correctly in bug free anner. Post `_manage_topology_change_kind_from_group0` removal, careful management of `topology_change_kind` variable was needed for maintaining correct `topology_change_kind` in all scenarios. So this PR also performs a refactoring to populate all init data to system tables even before group0 creation(via `raft_initialize_discovery_leader` function). Now because `raft_initialize_discovery_leader` happens before the group 0 creation, we write mutations directly to system tables instead of a group 0 command. Hence, post group0 creation, the node can read the correct values from system tables and correct values are maintained throughout. Added a new function `initialize_done_topology_upgrade_state` which takes care of updating the correct upgrade state to system tables before starting group0 server. This ensures that the node can read the correct values from system tables and correct values are maintained throughout. By moving `raft_initialize_discovery_leader` logic to happen before starting group0 server, and not as group0 command post server start, we also get rid of the potential problem of init group0 command not being the 1st command on the server. Hence ensuring full integrity as expected by programmer. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#21114 - (cherry picked from commit `4748125a48`) - (cherry picked from commit `e491950c47`) - (cherry picked from commit `623e01344b`) - (cherry picked from commit `d7884cf651`) Parent PR: #22484 Closes scylladb/scylladb#22966 * https://github.com/scylladb/scylladb: storage_service: Remove the variable _manage_topology_change_kind_from_group0 storage_service: fix indentation after the previous commit raft topology: Add support for raft topology system tables initialization to happen before group0 initialization service/raft: Refactor mutation writing helper functions.	2025-04-01 11:46:15 +02:00
Jenkins Promoter	f1e7cee7a5	Update pgo profiles - aarch64	2025-04-01 04:20:56 +03:00
Jenkins Promoter	023b27312d	Update pgo profiles - x86_64	2025-04-01 04:08:00 +03:00
Anna Stuchlik	2ffbc81e19	doc: remove the outdated info on seeds-info This commit removes the outdated information about seed nodes. We no longer need it in the docs, as a) the documentation is versioned, and b) the ScyllaDB Open Source 4.3 and ScyllaDB Enterprise 2021.1 versions mentioned in the docs are no longer supported. In addition, some clarification has been added to the existing sections. Fixes https://github.com/scylladb/scylladb/issues/22400 Closes scylladb/scylladb#23282 (cherry picked from commit `dbbf9e19e4`) Closes scylladb/scylladb#23327	2025-03-31 12:33:59 +02:00
Yaron Kaikov	88e548ed72	.github: add action to make PR ready for review when conflicts label was removed Moving a PR out of draft is only allowed to users with write access, adding a github action to switch PR to `ready for review` once the `conflicts` label was removed Closes scylladb/scylladb#22446 (cherry picked from commit `ed4bfad5c3`) Closes scylladb/scylladb#23023	2025-03-31 13:22:04 +03:00
Tomasz Grabiec	975882a489	test: tablets: Fix flakiness due to ungraceful shutdown The test fails sporadically with: cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1} That's becase a server is stopped in the middle of the workload. The server is stopped ungracefully which will cause some requests to time out. We should stop it gracefully to allow in-flight requests to finish. Fixes #20492 Closes scylladb/scylladb#23451 (cherry picked from commit `8e506c5a8f`) Closes scylladb/scylladb#23469	2025-03-28 14:56:02 +01:00
Evgeniy Naydanov	3653662099	test.py: random_failures: deselect topology ops for some injections After recent changes #18640 and #19151 started to reproduce for stop_after_sending_join_node_request and stop_after_bootstrapping_initial_raft_configuration error injections too. The solution is the same: deselect the tests. Fixes #23302 Closes scylladb/scylladb#23405 (cherry picked from commit `574c81eac6`) Closes scylladb/scylladb#23460	2025-03-27 13:19:59 +02:00
Anna Stuchlik	7336bb38fa	doc: fix product names in the 2025.1 upgrage guides This commit fixes the product names in the upgrade 2025.1 guides so that: - 6.2 is preceded with "ScyllaDB Open Source" - 2024.x is preceded with "ScyllaDB Enterprise" - 2025.1 is preceded with "ScyllaDB" Fixes https://github.com/scylladb/scylladb/issues/23154 Closes scylladb/scylladb#23223 (cherry picked from commit `cd61f60549`) Closes scylladb/scylladb#23328	2025-03-27 11:58:01 +02:00
Avi Kivity	cff90755d8	Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Refs #23042 Closes scylladb/scylladb#23079 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats (cherry picked from commit `b1d9f80d85`)	2025-03-25 23:16:35 +01:00
Tomasz Grabiec	3be469da29	test: tablets_test: Add support for auto-split mode rebalance_tablets() was performing migrations and merges automatically but not splits, because splits need to be acked by replicas via load_stats. It's inconvenient in tests which want to rebalance to the equilibrium point. This patch changes rebalance_tablets() to split automatically by default, can be disabled for tests which expect differently. shared_load_stats was introduced to provide a stable holder of load_stats which can be reused across rebalance_tablets() calls. (cherry picked from commit `5e471c6f1b`)	2025-03-25 18:23:22 +01:00
Tomasz Grabiec	1895724465	test: cql_test_env: Expose db config (cherry picked from commit `f3b63bfeff`)	2025-03-25 18:22:32 +01:00

1 2 3 4 5 ...

46494 Commits