scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Andrzej Jackowski	f85e738b11	test: audit: fix parsing of syslog messages Before this commit, there were following issues with parsing of syslog messages in audit tests: - `line_to_row()` function was never called - `line_to_row()` was not prepared for changes introduced in scylladb#23099 (i.e. key=value pairs) - `line_to_row()` didn't handle newlines in queries - `line_to_row()` didn't handle "\\" escaping in queries Due to the aforementioned issues, the syslog audit tests were failing. This commit fixes all of those issues, by parsing each audit syslog message using a regexp.	2025-07-04 12:40:51 +02:00
Andrzej Jackowski	c8ab5928a3	test: audit: synchronize audit syslog server In audit tests, UnixDatagramServer is used to receive audit logs. This commit introduces a synchronization between the logs receiver and a function that reads already received logs. Without this, there was a race condition that resulted in test failures (e.g., audit logs were missing during assertion check).	2025-06-30 09:19:26 +02:00
Andrzej Jackowski	fcd88e1e54	docs: audit: update syslog audit format to the current one The documentation of the syslog audit format was not updated when scylladb#23099 and earlier audit log changes were introduced. This commit includes the missing update.	2025-06-30 09:19:25 +02:00
Andrzej Jackowski	422b81018d	audit: bring back commas to audit syslog When the audit syslog format was changed in scylladb#23099, commas were removed. This made the syslog format inconsistent, as LOGIN audit logs contained commas while other audit logs did not. Additionally, the lack of commas was not aligned with the audit documentation. This commit brings back the use of commas in the audit syslog format to ensure consistency across all types of audit logs. Fixes: scylladb#24410	2025-06-30 09:19:25 +02:00
Emil Maskovsky	c6307aafd5	test.py: handle cancellation gracefully to avoid TypeError Previously, if test execution was cancelled, `run_all_tests()` could return `None`. This caused a `TypeError` when the result was unconditionally unpacked into `total_tests_pytest, failed_pytest_tests`. This commit updates the code to handle the cancellation appropriately, preventing the confusing `TypeError` exception and ensuring clean cancellation behavior. Closes scylladb/scylladb#24624	2025-06-27 20:14:35 +03:00
Pavel Emelyanov	23d86ede72	Merge 'audit: introduce debug level logs on happy path' from Dario Mirovic Audit component defines `audit` logger which it uses only for `error` and `info` logs, regarding `audit` module initialization and errors during audit log writing. This change introduces `debug` level logs on the happy path of audit log writes. Fixes: https://github.com/scylladb/scylladb/issues/23773 No backport needed - this is a small quality-of-life improvement. Closes scylladb/scylladb#24658 * github.com:scylladb/scylladb: audit: change audit test logger level to `debug` audit: introduce debug level logs on happy path	2025-06-27 20:10:54 +03:00
Anna Stuchlik	2367330513	doc: remove OSS mention from the SI notes This commit removes a confusing reference to an Open Source version form the Local Secondary Indexes page. Fixes https://github.com/scylladb/scylladb/issues/24668 Closes scylladb/scylladb#24673	2025-06-27 20:07:51 +03:00
Anna Stuchlik	7537f5f260	doc: fix the headings in the Admin Guide This commit fixes incorrect headings in the Admin Guide and the files that are included in that guide. The purpose is to properly organize the content and improve the search, as well as prevent potential build problems caused by a poor heading organization. Fixes https://github.com/scylladb/scylladb/issues/24441 Closes scylladb/scylladb#24700	2025-06-27 20:07:09 +03:00
Dario Mirovic	ec6249b581	audit: change audit test logger level to `debug` Audit module tests should show the `debug` level messages. This change makes audit_test.py `audit` module log level to `debug`. Closes scylladb/scylladb#23773	2025-06-27 16:27:33 +02:00
Dario Mirovic	666364f651	audit: introduce debug level logs on happy path Audit component defines `audit` logger which it uses only for `error` and `info` logs, regarding `audit` module initialization and errors during audit log writing. This change introduces `debug` level logs on the happy path of audit log writes. Ref: scylladb/scylladb#23773	2025-06-27 16:27:27 +02:00
Botond Dénes	495f607e73	test/cluster/test_read_repair: write 100 rows in trace test This test asserts that a read repair really happened. To ensure this happens it writes a single partition after enabling the database_apply error injection point. For some reason, the write is sometimes reordered with the error injection and the write will get replicated to both nodes and no read repair will happen, failing the test. To make the test less sensitive to such rare reordering, add a clustering column to the table and write a 100 rows. The chance of all 100 of them being reordered with the error injection should be low enough that it doesn't happen again (famous last words). Fixes: #24330 Closes scylladb/scylladb#24403	2025-06-27 16:23:08 +03:00
Pavel Emelyanov	4c0154f156	Merge 'test.py: enhance allure reporting' from Andrei Chekun Add run ID for process output file to be not overwritten in the next case: first run failed, second passed. They are using the same name, so the second run will overwrite and delete the file. This will help to investigate in case of C++ test fails Add attaching Scylla log files to allure report in case test failed. This is an alternative for link in JUnit report that exists in CI. That change will help to investigate the cluster tests fails. Example can be found in the failed [job](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2980/allure/). Backport is not needed, this is only framework enhancements Closes scylladb/scylladb#24677 * github.com:scylladb/scylladb: test.py: Attach node logs in allure report in case of fail test.py: Add run id to the boost output file	2025-06-27 16:22:03 +03:00
Botond Dénes	e715a150b9	tools/scylla-nodetool: backup: add --move-files parameter Allow opting in for backup to move the files instead of copying them. Fixes: https://github.com/scylladb/scylladb/issues/24372 Closes scylladb/scylladb#24503	2025-06-27 16:21:39 +03:00
Piotr Dulikowski	9d70e7a067	Merge 'docs: document the new recovery procedure' from Patryk Jędrzejczak We replace the documentation of the old recovery procedure with the documentation of the new recovery procedure. The new recovery procedure requires the Raft-based topology to be enabled, so to remove the old procedure from the documentation, we must assume users have the Raft-based topology enabled. We can do it in 2025.2 because the upgrade guides to 2025.1 state that enabling the Raft-based topology is a mandatory step of the upgrade. Another reminder is the upgrade guides to 2025.2. Since we rely on the Raft-based topology being enabled, we remove the obsolete parts of the documentation. We will make the Raft-based topology mandatory in the code in the future, hopefully in 2025.3. For this reason, we also don't touch the dev docs in this PR. Fixes scylladb/scylladb#24530 Requires backport to 2025.2 because 2025.2 contains the new recovery procedure. Closes scylladb/scylladb#24583 * github.com:scylladb/scylladb: docs: rely on the Raft-based topology being enabled docs: handling-node-failures: document the new recovery procedure	2025-06-26 17:07:37 +02:00
Gleb Natapov	5f953eb092	storage_proxy: retry paxos repair even if repair write succeeded After paxos state is repaired in begin_and_repair_paxos we need to re-check the state regardless if write back succeeded or not. This is how the code worked originally but it was unintentionally changed when co-routinized in `61b2e41a23`. Fixes #24630 Closes scylladb/scylladb#24651	2025-06-26 17:06:02 +02:00
Andrei Chekun	2c726c5074	test.py: Attach node logs in allure report in case of fail Currently, allure report have no nodes logs in case of fail, this will allow to view the logs in one place without going anywhere else.	2025-06-26 15:37:33 +02:00
Piotr Dulikowski	2f7ed8b1d4	Merge 'Fix for cassandra role gets recreated after DROP ROLE' from Marcin Maliszkiewicz This patchset fixes regression introduced by `7e749cd848` when we started re-creating default superuser role and password from the config, even if new custom superuser was created by the user. Now we'll check, first with CL LOCAL_ONE if there is a need to create default superuser role or password, confirm it with CL QUORUM and only then atomically create role or password. If server is started without cluster quorum we'll skip creating role or password. Fixes https://github.com/scylladb/scylladb/issues/24469 Backport: all versions since 2024.2 Closes scylladb/scylladb#24451 * github.com:scylladb/scylladb: test: auth_cluster: add test for password reset procedure auth: cache roles table scan during startup test: auth_cluster: add test for replacing default superuser test: pylib: add ability to specify default authenticator during server_start test: pylib: allow rolling restart without waiting for cql auth: split auth-v2 logic for adding default superuser password auth: split auth-v2 logic for adding default superuser role auth: ldap: fix waiting for underlying role manager auth: wait for default role creation before starting authorizer and authenticator	2025-06-26 14:36:25 +02:00
Lakshmi Narayanan Sreethar	279253ffd0	utils/big_decimal: fix scale overflow when parsing values with large exponents The exponent of a big decimal string is parsed as an int32, adjusted for the removed fractional part, and stored as an int32. When parsing values like `1.23E-2147483647`, the unscaled value becomes `123`, and the scale is adjusted to `2147483647 + 2 = 2147483649`. This exceeds the int32 limit, and since the scale is stored as an int32, it overflows and wraps around, losing the value. This patch fixes that the by parsing the exponent as an int64 value and then adjusting it for the fractional part. The adjusted scale is then checked to see if it is still within int32 limits before storing. An exception is thrown if it is not within the int32 limits. Note that strings with exponents that exceed the int32 range, like `0.01E2147483650`, were previously not parseable as a big decimal. They are now accepted if the final adjusted scale fits within int32 limits. For the above value, unscaled_value = 1 and scale = -2147483648, so it is now accepted. This is in line with how Java's `BigDecimal` parses strings. Fixes: #24581 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#24640	2025-06-26 15:29:28 +03:00
Patryk Jędrzejczak	203ea5d8f9	docs: rely on the Raft-based topology being enabled In 2025.2, we don't force enabling the Raft-based topology in the code, but we stated in the upgrade guides that it's a mandatory step of the upgrade to 2025.1. We also remind users to enable the Raft-based topology in the upgrade guides to 2025.2. Hence, we can rely in the the documentation on the Raft-based topology being enabled. If it is still disabled, we can just send the user to the upgrade guides. Hence: - we remove all documentation related to enabling the Raft-based topology, enabling the Raft-based schema (enabled Raft-based topology implies enabled Raft-based schema), and the gossip-based topology, - we can replace the documentation of the old manual recovery procedure with the documentation of the new manual recovery procedure (done in the previous commit).	2025-06-26 14:17:54 +02:00
Patryk Jędrzejczak	4e256182a0	docs: handling-node-failures: document the new recovery procedure We replace the documentation of the old recovery procedure with the documentation of the new recovery procedure. We can get rid of the old procedure from the documentation because we requested users to enable the Raft-based topology during upgrades to 2025.1 and 2025.2. We leave the note that enabling the Raft-based topology is required to use the new recovery procedure just in case, since we didn't force enabling the Raft-based topology in the code.	2025-06-26 14:17:50 +02:00
Andrei Chekun	156e7d2e7a	test.py: Add run id to the boost output file To avoid overwriting the output tests adding the run id to it. Previously, when first repeat failed and the second passes, because the are using the same name for the output, it will be overwritten and deleted since the second repeat passed	2025-06-26 12:51:15 +02:00
Marcin Maliszkiewicz	5e7ac34822	test: auth_cluster: add test for password reset procedure	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	0ffddce636	auth: cache roles table scan during startup It may be particularly beneficial during connection storms on startup. In such cases, it can happen that none of the user's read requests succeed, preventing the cache from being populated. This, in turn, makes it more difficult for subsequent reads to succeed, reducing resiliency against such storms.	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	67a4bfc152	test: auth_cluster: add test for replacing default superuser This test demonstrates creating custom superuser guide: https://opensource.docs.scylladb.com/stable/operating-scylla/security/create-superuser.html	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	a3bb679f49	test: pylib: add ability to specify default authenticator during server_start Sometimes we may not want to use default cassandra role for control connection, especially when we test dropping default role.	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	d9ec746c6d	test: pylib: allow rolling restart without waiting for cql Waiting for CQL requires default superuser being present in db. In some cases we may delete it and still want to do rolling restart. Additionally if we need CQL we may want to wait after restart is complete (once, and not for each node).	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	f85d73d405	auth: split auth-v2 logic for adding default superuser password In raft mode (auth-v2) we need to do atomic write after read as we give stricter consistency guarantees. Instead of patching legacy logic this commit adds different path as: - old code may be less tested now so it's best to not change it - new code path avoids quorum selects in a typical flow (passwords set) There may be a case when user deletes a superuser or password right before restarting a node, in such case we may ommit updating a password but: - this is a trade-off between quorum reads on startup - it's far more important to not update password when it shouldn't be - if needed password will be updated on next node restart If there is no quorum on startup we'll skip creating password because we can't perform any raft operation. Additionally this fixes a problem when password is created despite having non default superuser in auth-v2.	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	2e2ba84e94	auth: split auth-v2 logic for adding default superuser role In raft mode (auth-v2) we need to do atomic write after read as we give stricter consistency guarantees. Instead of patching legacy logic this commit adds different path as: - old code may be less tested now so it's best to not change it - new code path avoids quorum selects in a typical flow (roles set) This fixes a problem when superuser role is created despite having non default superuser in auth-v2. If there is no quorum on startup we'll skip creating role because we can't perform any raft operation.	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	c96c5bfef5	auth: ldap: fix waiting for underlying role manager ldap_role_manager depends on standard_role_manager, therefore it needs to wait for superuser initialization. If this is missing, the password authenticator will start checking the default password too early and may fail to create the default password if there is no default role yet. Currently password authenticator will create password together with the role in such case but in following commits we want to separate those responsibilities correctly.	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	68fc4c6d61	auth: wait for default role creation before starting authorizer and authenticator There is a hidden dependency: the creation of the default superuser role is split between the password authenticator and the role manager. To work correctly, they must start in the right order: role manager first, then password authenticator.	2025-06-26 12:28:08 +02:00
Piotr Dulikowski	62efe6616a	Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski The primary motivation for this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB is generally transitioning towards tablets, and using tablets simplifies work dispatching, the decision was made to design the new algorithm specifically for tablets. The goal of the algorithm is to divide the work in such a way that each `tablet_replica` (that is <host, shard> pair) processes two tablets at a time. The new algorithm can be summarized as follows: 1. Prepare a tablet_replica -> partition_range mapping where the values cover the entire space. 2. For each tablet_replica, in parallel, take two partition ranges and dispatch them to the node hosting the replica. The ERM is released and re-acquired in each iteration, allowing the destination (i.e., tablet_replica) to change for each artition range (in such cases, the partition range is assigned to the appropriate tablet_replica). In step 1, the main difference compared to the old algorithm (dispatch_to_vnodes) is that partition ranges are assigned to a tablet_replica rather than just the host. In step 2, the main difference is that the work is divided into smaller batches, and the ERM is released and re-acquired for each batch. In the current implementation, each node can correctly handle every partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because mapreduce_service::execute_on_this_shard creates a new pager that coordinates the partition range read, including obtaining its own ERM. However, every partition range that is absent locally is handled by shard 0. Therefore, proper routing of partition ranges is necessary to avoid shard 0 overload. This is why, in step 2, the ERM is retained during each batch processing, and the tablet_replica is refreshed for each processed range. Additionally, shard_id is added to mapreduce request. When shard_id is set, the entire partition range is handled by the specified shard. As the new tablet-aware mapreduce algorithm balances the workload across shards, shard_id ensure that the balance is preserved, even during events such as tablet splits. This patch series: - Refactors a bit mapreduce service, to facilitate having two algorithm versions (one for vnodes and one for tablets). - Implements tablet-aware dispatching algorithm. - Adds shard_id to mapreduce request and uses the information to handle requests entirely by selected shard. - Adds test_long_query_timeout_erm to verify the new functionality. Fixes: scylladb#21831 No backport, as it is rather new feature than a bugfix. Closes scylladb/scylladb#24383 * github.com:scylladb/scylladb: mapreduce: add missing comma and space in mapreduce_request operator<< mapreduce: add shard_id_hint to mapreduce request test: add test_long_query_timeout_erm mapreduce: add tablet-aware dispatching algorithm storage_proxy: make storage_proxy::is_alive public mapreduce: remove _shared_token_metadata from mapreduce_service mapreduce: move dispatching logic to dispatch_to_vnodes mapreduce: remove underscores from variable names mapreduce: move req_with_modified_pr handling to a new function mapreduce: change next_vnode lambda to get_next_partition_range function	2025-06-26 12:25:39 +02:00
Avi Kivity	947906e6fd	Merge 'Make uuid sstable generations mandatory' from Benny Halevy Before we can eradicate the numerical sstable generations, This series completes https://github.com/scylladb/scylladb/issues/20337 by disabling the use of numerical sstable generations where we can and making sure the feature is never disabled. Note that until the cluster feature is enabled in the startup process on first boot, numerical generation might be used for local system tables. Refs #24248 * Enhancement. No backport required Closes scylladb/scylladb#24554 * github.com:scylladb/scylladb: feature_service: never disable UUID_SSTABLE_IDENTIFIERS test: sstable_move_test: always use uuid sstable generation test: sstable_directory_test: always use uuid sstable generation sstables: sstable_generation_generator: set last_generation=0 by default test: database_test: test_distributed_loader_with_pending_delete: use uuid sstable generation test: lib: test_env: always use uuid sstable generation test: sstable_test: always use uuid sstable generation test: sstable_resharding_test::sstable_resharding_over_s3_test: use default use_uuid in config test: sstable_datafile_test: compound_sstable_set_basic_test: use uuid sstable generation test: sstable_compaction_test: always use uuid sstable generation	2025-06-26 12:25:38 +02:00
Szymon Malewski	f28bab741d	utils/exceptions.cc: Added check for `exceptions::request_timeout_exception` in `is_timeout_exception` function. It solves the issue, where in some cases a timeout exceptions in CAS operations are logged incorrectly as a general failure. Fixes #24591 Closes scylladb/scylladb#24619	2025-06-26 12:25:38 +02:00
Pavel Emelyanov	0f5b358c47	test: Use test sched groups, not database ones Some tests want to switch between sched groups. For that there's cql-test-env facility to create and use them. However, there's a test that uses replica::database as sched groups provider, which is not nice. Fix it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24615	2025-06-26 12:25:38 +02:00
Avi Kivity	ff508ce82c	Merge 'sstables: purge SCYLLA_ASSERT from the sstable read/parse paths' from Botond Dénes Introduce `sstables::parse_assert()`, to replace `SCYLLA_ASSERT()` on the read/parse path. SSTables can get corrupt for various reasons, some outside of the database's control. A bad SSTable should not bring down the database, the parsing should simply be aborted, with as much information printed as possible for the investigation of the nature of the corruption. The newly introduced `parse_assert()` uses `on_internal_error()` under the hood, which prints a backtrace and optionally allows for aborting when on the error, to generate a coredump. Fixes https://github.com/scylladb/scylladb/issues/20845 We just hit another case of `SCYLLA_ASSERT()` triggering due to corrupt sstables bringing down nodes in the field, should be backported to all releases, so we don't hit this in the future Closes scylladb/scylladb#24534 * github.com:scylladb/scylladb: sstables: replace SCYLLA_ASSERT() with parse_assert() on the read path sstables/exceptions: introduce parse_assert()	2025-06-26 12:25:38 +02:00
Ferenc Szili	96267960f8	logging: Add row count to large partition warning message When writing large partitions, that is: partitions with size or row count above a configurable threshold, ScyllaDB outputs a warning to the log: WARN ... large_data - Writing large partition test/test: (1200031 bytes) to me-3glr_0xkd_54jip2i8oqnl7hk8mu-big-Data.db This warning contains the information about the size of the partition, but it does not contain the number of rows written. This can lead to confusion because in cases where the warning was written because of the row count being larger than the threshold, but the partition size is below the threshold, the warning will only contain the partition size in bytes, leading the user to believe the warning was output because of the partition size, when in reality it was the row count that triggered the warning. See #20125 This change adds a size_desc argument to cql_table_large_data_handler::try_record(), which will contain the description of the size of the object written. This method is used to output warnings for large partitions, row counts, row sizes and cell sizes. This change does not modify the warning message for row and cell sizes, only for partition size and row count. The warning for large partitions and row counts will now look like this: WARN ... large_data - Writing large partition test/test: (1200031 bytes/100001 rows) to me-3glr_0xkd_54jip2i8oqnl7hk8mu-big-Data.db Closes scylladb/scylladb#22010	2025-06-26 12:25:38 +02:00
Yaniv Michael Kaul	198ecd8039	Do not perform blkdiscard by default on the disks during RAID setup. This is not needed on clean disks, which is often the case with cloud instances, but can be useful on bare metal servers with disks that were used before. Therefore, the default is to skip blkdiscard operation, which makes overall installation faster. If the user wishes to run it anyway, use the newly introduced --blkdiscard option of scylla_raid_setup to perform it. Note: since we either perform online discard or schedule fstrim, the (previously used) space will gradually get trimmed, this way or another. Fixes: https://github.com/scylladb/scylladb/issues/24470 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#24579	2025-06-26 12:25:38 +02:00
Piotr Dulikowski	23f0d275c8	Merge 'generic_server: fix connections semaphore config observer' from Marcin Maliszkiewicz In `ed3e4f33fd` we introduced new connection throttling feature which is controlled by uninitialized_connections_semaphore_cpu_concurrency config. But live updating of it was broken, this patch fixes it. When the temporary value from observer() is destroyed, it disconnects from updateable_value, so observation stops right away. We need to retain the observer. Backport: to 2025.2 where this feature was added Fixes: https://github.com/scylladb/scylladb/issues/24557 Closes scylladb/scylladb#24484 * github.com:scylladb/scylladb: test: add test for live updates of generic server config utils: don't allow do discard updateable_value observer generic_server: fix connections semaphore config observer	2025-06-26 12:25:38 +02:00
Andrzej Jackowski	ba6ed45d7f	mapreduce: add missing comma and space in mapreduce_request operator<< This change is introduced to fix the broken formating of mapreduce_request `operator<<`. Due to lack of ", " before "cmd" the output was `reductions=[...]cmd=read_command{...}` instead of `reductions=[...], cmd=read_command{...}`.	2025-06-25 19:23:07 +02:00
Andrzej Jackowski	26403df9ea	mapreduce: add shard_id_hint to mapreduce request If a partition range is not present locally, `partition_ranges_owned_by_this_shard` assigns it to shard 0, which can overload shard 0. To address this, this commit adds a `shard_id_hint` to the mapreduce request. When `shard_id_hint` is set, the entire partition range in the request is handled by the specified shard. The `shard_id_hint` is set by the new tablet-aware mapreduce algorithm, introduced in `dispatch_to_tablets`. This algorithm balances the workload across shards, so the changes in this commit ensure that load balancing is preserved, even during events such as tablet splits. Fixes: scylladb#21831	2025-06-25 19:23:07 +02:00
Andrzej Jackowski	5f31011111	test: add test_long_query_timeout_erm This test verifies the effectiveness of the mechanism for releasing ERM introduced in this patch series. In test scenario, during processing of a query in mapreduce service, reads are intentionally blocked by an injected error. However, when table uses tablets, ERM is now often released by the mapreduce service, so the topology is not blocked to the end of the request. As a result, it is possible to add a new node before the query finishes. Refs. scylladb#21831	2025-06-25 19:22:48 +02:00
Robert Bindar	6e7cab5b45	Add repository layout dev documentation This change adds an md file which gives a high level overview of the scylladb repository, the components each path contains and a basic description for each one of them. This is mainly intended for onboarding engineers to help get a mental picture when starting ramping up on Scylla concepts. Refs #22908 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#23010	2025-06-25 13:58:05 +03:00
Patryk Jędrzejczak	cc8c618356	Merge 'LWT for tablets: fix paxos state for intranode migration' from Petr Gusev This PR fixes the "intra-node tablet migration" issue from the [LWT over tablets spec](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.uk3mizf7gvs1). We make `get_replica_lock` to acquire locks on both shards to avoid races. We also implement read_repair for paxos state -- if `load_paxos_state` returns different states on two shards, we 'repair' it by choosing the values with maximum timestamp and writing the 'repaired' state to both shards. LWT for tablets is not enabled yet. It requires migrating paxos state to colocated tablets, which is blocked on [this PR](https://github.com/scylladb/scylladb/pull/22906). Regarding testing: * We could possibly arrange a test case for the locking commit through some error injection magic. We'll return to this when LWT for tablets is enabled. * We can't think of a clear test case for the read_repair commit. Any suggestions are welcome (@gleb-cloudius). Backport: no need, since it's a new feature. Closes scylladb/scylladb#24478 * https://github.com/scylladb/scylladb: paxos_state: read repair for intranode_migration paxos_state: fix get_replica_lock for intranode_migration	2025-06-25 11:08:39 +02:00
Sergey Zolotukhin	0d7de90523	Fix regexp in `check_node_log_for_failed_mutations` The regexp that was added in https://github.com/scylladb/scylladb/pull/23658 does not work as expected: `TRACE`, `INFO` and `DEBUG` level messages are not ignored. This patch corrects the pattern to ensure those log levels are excluded. Fixes scylladb/scylladb#23688 Closes scylladb/scylladb#23889	2025-06-25 12:00:16 +03:00
Anna Stuchlik	592d45a156	doc: remove references to Open Source from README This commit removes the references to ScyllaDB Open Source from the README file for documentation. In addition, it updates the link where the documentation is currently published. We've removed Open Source from all the documentation, but the README was missed. This commit fixes that. Closes scylladb/scylladb#24477	2025-06-25 11:38:46 +03:00
Michał Chojnowski	cace55aaaf	test_sstable_compression_dictionaries_basic.py: fix a flaky check test_dict_memory_limit trains new dictionaries and checks (via metrics) that the old dictionaries are appropriately cleaned up. The problem is that the cleanup is asynchronous (because the lifetimes are handled by foreign_ptr, which sends the destructor call to the owner shard asynchronously), so the metrics might be checked a few milliseconds before the old dictionary is cleaned up. The dict lifetimes are lazy on purpose, the right thing to do is to just let the test retry the check. Fixes scylladb/scylladb#24516 Closes scylladb/scylladb#24526	2025-06-25 11:30:28 +03:00
Amnon Heiman	51cf2c2730	api/failure_detector.cc: stream endpoints Previously, get_all_endpoint_states accumulated all results in memory, which could lead to large allocations when dealing with many endpoints. This change uses the stream_range_as_array helper to stream the results. Fixes #24386 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#24405	2025-06-25 11:28:37 +03:00
Guy Shtub	71ba1f8bc9	docs: update third party driver list with Exandra Elixir driver Closes scylladb/scylladb#24260	2025-06-25 11:27:03 +03:00
Kefu Chai	e212b1af0c	build: add p11-kit's cflags to user_cflags instead of args.user_cflags Fix an issue introduced in commit `083f7353` where p11-kit's compiler flags were incorrectly added to `args.user_cflags` instead of `user_cflags`. This created the following problem: When using CMake generation mode, these flags were added to `CMAKE_CXX_FLAGS`, causing them to be passed to all compiler invocations including linking stages where they were irrelevant. This change moves p11-kit's cflags to `user_cflags`, which ensures the flags are correctly included in compilation commands but not in linking commands. This maintains the proper behavior in the ninja build system while fixing the issue in the CMake build system. `args.user_cflags` is preserved for its intended purpose of storing user-specified compiler flags passed via command line options. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23988	2025-06-25 11:24:09 +03:00
Andrzej Jackowski	ea2bdae45a	mapreduce: add tablet-aware dispatching algorithm The primary goal of this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB transitions towards tablets, which simplify work dispatching, the new algorithm is designed specifically for tablets. The algorithm divides work so that each `tablet_replica` (a <host, shard> pair) processes two tablets at a time. After processing of each `tablet_replica`, the ERM is released and re-acquired. The new algorithm can be summarized as follows: 1. Prepare a set of exclusive `partition_ranges`, where each range represents one tablet. This set is called `ranges_left`, because it contains ranges that still need processing. 2. Loop until `ranges_left` is empty: I. Create `tablet_replica` -> `ranges` mapping for the current ERM and `ranges_left`. Store this mapping and the number representing current ERM version as `ranges_per_replica`. II. In parallel, for each tablet_replica, iterate through ranges_per_tablet_replica. Select independently up to two ranges that are still existing in ranges_left. Remove each range selected for processing from ranges_left. Before each iteration, verify that ERM version has not changed. If it has, return to Step I. Steps I and II are exclusive to simplify maintaining `ranges_left` and `ranges_per_replica`: - Step I iterates through `ranges_left` and creates `ranges_per_replica` - Step II iterates through `ranges_per_replica` and remove processed ranges from `ranges_left` To maintain the exclusivity, the algorithm uses `parallel_for_each` in Step II, requiring all ongoing `tablet_replica` processing to finish before returning to Step I. Currently, each node can handle any partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because `execute_on_this_shard` creates a new pager to coordinate the partition range read, including obtaining its own ERM. However, absent ranges are handled by shard 0, so proper routing is necessary to avoid overloading shard 0. Thus, in Step II, the ERM is retained during each `tablet_replica` processing. The tablet split scenario is not well-handled in this implementation. After a split, the entire pre-split range is sent to a node hosting the `tablet_replica` containing the range's `end_token`. The node will typically not have other tablets in the range, and as aforementioned, absent ranges are handled by shard 0. As a result, in such scenario, shard 0 handles a significant portion of the range. This issue is addressed later in this patch series by introducing `shard_id` in `mapreduce_request`. Ref. scylladb#21831	2025-06-25 10:18:02 +02:00

1 2 3 4 5 ...

48233 Commits