scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 14:33:08 +00:00

Author	SHA1	Message	Date
Łukasz Paszkowski	a84d9cb8c4	test_backup.py: fix race in test_restore_tablets_vs_migration The test was racing move_tablet against restore_tablets without ensuring that move_tablet had actually reached the streaming phase before restore began. This caused restore to win the group0 race, putting the tablet into transition first, which made move_tablet fail with "Tablet is in transition". Fix by adding a log message to the block_tablet_streaming error injection and waiting for it in the test, ensuring the move has entered the streaming phase (and is blocked) before restore starts. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2147 Closes scylladb/scylladb#30173	2026-06-02 08:17:54 +03:00
Aleksandra Martyniuk	33af16d808	test/cluster/test_tablets: increase timeout for test_multi_rf_of_many_keyspaces_0_N Multi-RF change handles multiple keyspaces concurrently, but tablet rebuilds are not all started at once — the load balancer considers machine load when scheduling them. With 3 keyspaces each having a base table and materialized view, the total operation time approaches the default 200s CQL timeout on slow/busy CI machines (observed at ~191s). Double the timeout to 400s to provide sufficient margin. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2042. Closes scylladb/scylladb#30018	2026-06-01 20:07:03 +03:00
Michael Litvak	a7a7f02392	test: test_cdc_with_tablets: add read barrier Add group0 read barrier in test_cdc_with_tablets whenever we observed a condition such as tablet count change or cdc stream change, and we want to proceed to check that cdc tables are consistent with the change. For example, when we wait for tablet count change and then check the cdc streams changed as well. The problem is that when we observe the tablet count change, for example, even though the cdc streams are changed in the same group0 operation, we may observe it during the group0 apply, when the operation is only partially applied. The read barrier ensures that the change we observed is fully applied. Fixes SCYLLADB-2352 Closes scylladb/scylladb#30177	2026-06-01 13:56:01 +02:00
Botond Dénes	bb81dbf65e	Merge 'guardrails: Add replica-side large data guardrails' from Taras Veretilnyk Adds write-path guardrails that reject or warn on mutations targeting partitions, rows, or collections that already exceed configured size thresholds, based on SSTable `large_data_record` metadata. ScyllaDB already detects and records large partitions/rows/cells in `system.large_data_records` after compaction, but takes no preventive action on the write path. Once a partition grows past operational limits it causes latency spikes, OOM, and repair failures. These guardrails let operators set hard and soft thresholds so that writes to already-oversized data are rejected (hard) or logged as warnings (soft) before they make the problem worse. - Intrusive index over SSTable metadata: A per-table `large_data_record_index` maintains three `boost::intrusive::multiset`s (partitions, rows, cells) using `auto_unlink` hooks directly on `large_data_record`. SSTable destruction automatically removes records from the index — no explicit deregistration needed. - Virtual dispatch for zero-cost disabled path: `large_data_guardrail_base` → `noop_large_data_guardrail` / `large_data_guardrail`. Tables without guardrails enabled pay only a virtual call to a no-op. No index is built or maintained for disabled tables. - Schema storage: The per-table flag is stored as a scylla_tables column, following the tablets pattern: only write a live cell when enabled, omit entirely when disabled. The CQL feature gate prevents enabling until all nodes are upgraded. - Write-path integration: The guardrail check runs in `do_apply` after the frozen mutation is deserialized but before it is applied to the memtable. Hint replay and Paxos learn skip the check via `skip_large_data_guardrails`. Uses existing `large__warn_threshold` config options as soft limits and new `large__fail_threshold` options as hard limits. Checked dimensions: - Partition size (bytes) - Partition row count - Row size (bytes) - Collection element count Backport is not required Fixes https://scylladb.atlassian.net/browse/SCYLLADB-180 Closes scylladb/scylladb#29733 * github.com:scylladb/scylladb: test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests test/cqlpy: add large collection guardrail tests test/cqlpy: add large row guardrail tests test/cqlpy: add large partition guardrail tests test/boost: add large_data_guardrail unit tests test/cluster: add large data guardrails rolling upgrade test replica: wire large_data_guardrail into the write path schema: add per-table large_data_guardrails_enabled flag db: implement large_data_guardrail db: implement large_data_record_index sstables: add intrusive index hook to large_data_record db: add large_collection_elements_fail_threshold config option db: add large_row_fail_threshold_mb config option db: add rows_count_fail_threshold config option db: add large_partition_fail_threshold_mb config option replica: introduce large_data_exception	2026-06-01 13:26:00 +03:00
Nadav Har'El	b254a9826a	test/cluster: add pylib-style nodetool.py Tests in test/cqlpy use a tiny nodetool-like library, where calls to nodetool.flush() are translated to the parallel REST API request on Scylla - but use an external "nodetool" command when running the test against Cassandra. Some tests/cluster also began using test/cqlpy/nodetool.py, but it is NOT a good fit for test/cluster tests, because: 1. It falls back to using the external "nodetool" when it thinks the REST API is not available. In cluster tests, no such fallback is needed (these tests can't be run on Cassandra). If the REST API is down, the test should fail - not fall back to an irrelevant method. 2. The nodetool.flush() et al. functions are not async, and cluster tests are supposed (by design...) to only use async APIs. 3. test/cqlpy/nodetool.py was not written in the "style" defined for the test/cluster codebase - specifically they don't have docstrings or strong typing. This patch introduces test/pylib/nodetool.py, based on test/cqlpy/nodetool.py but fixing all the above problems - there are no Cassandra fallbacks, there are docstrings and type hints, and all the functions are async. We also fix the test/cluster tests that used test/cqlpy/nodetool.py to switch to test/pylib/nodetool.py. Of course it means the newly async functions need to be "await"ed, not just called, so this patch changes that too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#30129	2026-06-01 13:03:29 +03:00
Dimitrios Symonidis	4c0a991017	test/cluster: fix proxy resource leak in internode compression test The test_internode_compression_between_datacenters test was flaky due to proxy servers and leased host IPs not being cleaned up on failure paths. If any exception occurred after proxies were started (e.g. during server_start or driver_connect), the asyncio.Server listeners remained bound and leased hosts were never released back to HostRegistry. On subsequent test runs, this caused EADDRINUSE (errno 98) when trying to bind the same address:port. Wrap the proxy/server lifecycle in try/finally to ensure proxies are always stopped and hosts are always released, regardless of whether the test succeeds or fails. Fixes: SCYLLADB-2183 Closes scylladb/scylladb#30127	2026-05-29 13:51:43 +03:00
Taras Veretilnyk	0201c1530e	test/cluster: add large data guardrails rolling upgrade test Simulated rolling upgrade: start a 2-node cluster where one node suppresses the LARGE_DATA_GUARDRAILS feature, verify that enabling guardrails is rejected, then upgrade the old node and verify that enabling guardrails succeeds.	2026-05-29 12:51:31 +02:00
Pavel Emelyanov	5d0371620d	test/backup: Reduce s3 logging from trace to debug Change s3 log level from TRACE to DEBUG in backup tests. TRACE level generates excessive log volume with too much low-level detail about S3 operations. While it was usefult in the early days of S3 client, nowadays DEBUG level likely provides sufficient diagnostic information for backup test troubleshooting. The reduced log volume significantly improves test performance, which is the main outcome of this change: - Less I/O time writing logs during test execution - Faster teardown: each test scans all server logs for errors, and smaller logs mean faster grep operations (23.3s → 9.97s for 8-node cluster teardown) Impact on test_restore_with_streaming_scopes[topology4] (8 nodes): - Log volume: 49 MB → 23 MB (reduced by half) - Test runtime: 82.55s → 57.53s (30% faster) - Teardown time: 23.3s → 9.97s (57% faster) Tests that start smaller clusters also have notable timing improvements Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30109	2026-05-29 13:46:10 +03:00
Pavel Emelyanov	24c0ea6b19	sstables_loader: Prevent table destruction during tablet restore download Similar to `e5e6608f20` ("sstables_loader: prevent use-after-free on table drop during streaming") which fixed the same class of race for load_and_stream, the tablet restore path also holds a replica::table& reference across the download_sstable() coroutine without preventing concurrent table destruction. If DROP KEYSPACE is applied while download_sstable() is writing SSTable components to the table's data directory, the directory is removed mid-write causing ENOENT → abort (with --abort-on-internal-error). Fix by acquiring a stream_in_progress() phaser guard after find_column_family() and before download_sstable(). table::stop() calls _pending_streams_phaser.close() which blocks until all outstanding guards are released, keeping the table alive for the duration of the download. Fixes: SCYLLADB-2187 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#30094	2026-05-29 13:43:37 +03:00
Botond Dénes	1384c9523e	Merge 'Simplify handler injection call sites to use appropriate existing API' from Pavel Emelyanov Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site. Cleaning up in-code debugging facilities, no need to backport Closes scylladb/scylladb#29962 * github.com:scylladb/scylladb: error_injection: Convert handler-style breakpoints to wait_for_message sugar error_injection: Convert no-op handler injections to enter()/is_enabled() error_injection: Convert handler-throw injections to lambda-throw style utils: Add share_messages parameter to breakpoint injection API	2026-05-29 13:41:09 +03:00
Piotr Dulikowski	8dfd455001	Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev - Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum - Handle unhandled timeout exception in the wait-for-leader loop during group startup When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion. Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state. SCYLLADB-2080 fix: - Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released). - Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling. - Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between). Timeout handling fix: - Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled. Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that: 1. Pauses a write on the leader before add_entry 2. Drops the table (follower destroys its group immediately) 3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds backport: no need, strong consistency is not released yet Fixes: SCYLLADB-2080 Closes scylladb/scylladb#30105 * github.com:scylladb/scylladb: strong consistency/groups_manager: handle timeout in update() wait-for-leader loop strong consistency: abort raft server before gate close when dropping a table test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080	2026-05-28 09:59:20 +02:00
Petr Gusev	d922c43358	strong consistency: abort raft server before gate close when dropping a table When a strongly consistent table is dropped, schedule_raft_group_deletion() used to call g->close() first, which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires, unnecessarily delaying group deletion. Fix: initiate gate close (prevents new operations from entering), then abort the raft server (causes in-flight add_entry/read_barrier to throw raft::stopped_error, releasing their gate holders), then await the gate future (resolves immediately since holders are now released). Handle raft::stopped_error in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return no_such_column_family (which the CQL layer converts to InvalidRequest 'unconfigured table'). Otherwise fall through to the default timeout handling. Also replace gate->hold() with try_hold() + on_internal_error in acquire_server, and handle the timeout exception in the wait-for-leader loop in update() gracefully (log + break instead of propagating). Fixes: SCYLLADB-2080	2026-05-27 12:06:46 +02:00
Petr Gusev	89307064b5	test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080 Rewrite the test to use 2 nodes (RF=2) instead of 1 (RF=1), which exposes the quorum-loss scenario: when a table is dropped, the follower destroys its raft group immediately while the leader's in-flight operations are still holding the gate. The test pauses both a read and a write on the leader, drops the table, then resumes them. Both are expected to fail with 'no such column family' since the raft server is aborted as part of group deletion. A 15-second timeout guard detects the old buggy behavior (write stuck forever). Marked xfail until the fix is applied in the next commit.	2026-05-27 12:06:46 +02:00
Pavel Emelyanov	cd7d9a63bc	error_injection: Convert handler-style breakpoints to wait_for_message sugar Replace verbose handler lambdas that only log and call wait_for_message() with the equivalent one-liner breakpoint sugar. The behavior is identical -- the sugar produces the same log messages in the format "{name}: waiting for message" / "{name}: message received". Update Python tests that waited for the old ad-hoc log messages to match the new standardized format. Converted injections: - topology_state_load_before_update_cdc (storage_service.cc) - migration_streaming_wait x2 (storage_service.cc) - pause_after_streaming_tablet (storage_service.cc) - cdc_generation_publisher_fiber (topology_coordinator.cc) - wait_after_tablet_cleanup (topology_coordinator.cc) - fast_orphan_removal_fiber (topology_coordinator.cc) - split_storage_groups_wait (table.cc) - wait_before_stop_compaction_groups (table.cc) - tasks_vt_get_children (task_manager.cc) - truncate_compaction_disabled_wait (database.cc) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 15:01:01 +03:00
Nikos Dragazis	54cb6d4608	test: Order task-wait before finalization in test_migration_wait_task The purpose of this test is to verify that the task manager's "wait" API works correctly for vnodes-to-tablets migration virtual tasks. It starts a `wait_task` HTTP request concurrently with a finalize (or rollback) operation, and asserts that the wait returns the correct final state ("done" or "suspended"). The test `uses asyncio.create_task()` to wrap the wait request into a task, and then immediately calls finalize. With asyncio's lazy task scheduling, the wait coroutine does not start until the event loop yields, so the finalization request reaches the server before wait, and therefore may also complete before it. Once finalization completes, the virtual migration task is no longer discoverable, causing a "task not found" error. Add a log message in Scylla's wait handler and a synchronization point in the test to ensure that the wait request lands the server before finalization. This follows the same pattern used in `test_tablet_tasks.py::check_and_abort_repair_task`. Fixes SCYLLADB-2077 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29973	2026-05-26 10:43:22 +03:00
Botond Dénes	db89f3f095	Merge 'compaction_manager: unregister compaction module on early shutdown' from Patryk Jędrzejczak The compaction module is registered with task_manager in the compaction_manager constructor, and unregistered in compaction_manager::really_do_stop(), which was gated behind `_state != state::none` in compaction_manager::do_stop(). Since enable() -- which transitions _state from none to running -- is called later during startup (from database::start() or the disk space monitor callback) than the compaction_manager constructor, an early shutdown could leave the compaction module registered after compaction_manager::do_stop() returned. task_manager::stop() then aborted with 'Tried to stop task manager while some modules were not unregistered'. Fix compaction_manager::do_stop() to call _task_manager_module->stop() even when `_state == state::none`, so that the compaction module is always properly unregistered. Fixes: SCYLLADB-2106 Backport to all supported branches, as the bug is there and it has already caused a failure in 2026.1 CI. Closes scylladb/scylladb#30015 * github.com:scylladb/scylladb: test: add test_stop_before_starting_compaction_manager compaction_manager: unregister compaction module on early shutdown	2026-05-25 16:08:20 +03:00
Dmitry Kropachev	06eeaf48ff	tests: avoid CQL_ALTERNATOR_QUERIED on zero-token nodes The keyspace RF test starts zero-token nodes as part of its topology setup. The python driver 3.29.9 can't schedule queries on zero-token nodes, so waiting for `CQL_ALTERNATOR_QUERIED` on those nodes is the wrong readiness gate. This change makes the zero-token `server_add()` calls stop at `CQL_ALTERNATOR_CONNECTED`. The test still exercises the keyspace replication assertions through a normal token-owning contact point. Verified with running all 4 variations of `cluster.test_keyspace_rf::test_create_keyspace_with_default_replication_factor` on this branch. Closes scylladb/scylladb#29779	2026-05-25 14:22:04 +03:00
Piotr Dulikowski	3a5dd2e5be	Merge 'strong_consistency: forward reads to the raft leader' from Wojciech Mitros Strongly consistent reads currently call read_barrier() on whichever replica happens to process the request. When a follower runs read_barrier(), it sends an RPC to the leader to get the current read index, then waits for its local apply index to catch up. If the follower is behind, this wait can be significant. By forwarding linearizable reads to the leader, we don't need an RPC from replica to leader to get the index to wait for apply -- it's available locally. Note that read_barrier() is still required on the leader to confirm it is still the leader and guarantee linearizability. A future optimization would be to implement leases in the raft library, which could eliminate read_barrier() on the leader entirely. The CL-to-behavior mapping is isolated in a single parse_consistency_level() function: - CL=(LOCAL_)QUORUM -> linearizable: forwarded to the raft leader - CL=(LOCAL_)ONE -> non-linearizable: existing behavior (no read_barrier()/forwarding, may return stale results) - All other CLs -> invalid request Read forwarding reuses the same CQL-layer bounce_to_node() mechanism that write forwarding already uses. The transport layer's existing requests_forwarded_* metrics automatically count forwarded reads. Coordinator-level metrics (linearizable_reads, non_linearizable_reads, writes) are added for visibility into the strong consistency workload. Fixes: SCYLLADB-1157 Closes scylladb/scylladb#29575 * github.com:scylladb/scylladb: strong_consistency: test read forwarding to leader strong_consistency: skip read_barrier() for non-linearizable reads strong_consistency: split coordinator-level read latency metrics strong_consistency: forward linearizable reads to raft leader strong_consistency: classify reads by consistency level strong_consistency: add begin_read() to raft_server	2026-05-25 10:55:00 +02:00
Michael Litvak	73470150a0	logstor: disable logstor compaction in table truncate in database::truncate_table_on_all_shards disable logstor compaction before the table data is truncated, similarly to how non-logstor compaction is disabled, to avoid race conditions between logstor compaction and segments discarding. Fixes SCYLLADB-2186	2026-05-24 10:25:08 +02:00
Wojciech Mitros	45f5df14e5	strong_consistency: test read forwarding to leader Test the linearizable read forwarding behavior in a single test that exercises all scenarios on one cluster: - CL=QUORUM reads on leader, follower, and non-replica nodes - CL=ONE reads (non-linearizable, no forwarding) - Linearizability: write + CL=QUORUM read from follower (10 iterations) - Coordinator latency histogram metrics for both read types Refs: SCYLLADB-1157	2026-05-23 11:35:37 +02:00
Wojciech Mitros	d07692a7ff	strong_consistency: split coordinator-level read latency metrics Split the latency metrics for strongly consistent reads into two categories: linearizable and non-linearizable. They replace the existing metrics for both types combined - this shouldn't cause issues because the feature is still experimental and both the initial introduction of latency metrics and the split will be a part of the same release. Also fix a test that was using the old metric.	2026-05-23 11:35:37 +02:00
Łukasz Paszkowski	96a992002c	tasks: fix busy-spin and shutdown hang in tablet_virtual_task::wait() for repair tasks The condition variable predicate for repair tasks unconditionally returned true (introduced in `e5928497ce`), which meant event.wait(pred) never actually suspended: do_until checks the predicate first, and if it's already satisfied, returns immediately without calling the inner wait(). This caused two problems: 1. The while(true) loop busy-spun, polling without blocking between topology changes. 2. During shutdown, event.broken() had no effect because no waiter was registered on the CV. The loop kept spinning, holding the HTTP server's task gate open and preventing http_server::stop() from completing. After ~15 minutes, systemd killed the process with SIGABRT. The fix replaces the synchronous predicate with an async task_finished() helper that dispatches on the task type. Since the repair check is async (for_each_tablet scans every tablet), we cannot use event.wait(Pred). Instead, we register a waiter via event.wait() before running the async check, ensuring no broadcast is missed during the check. event.broken() during shutdown propagates broken_condition_variable to the registered waiter and unblocks the loop promptly. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1532 Closes scylladb/scylladb#29485	2026-05-22 16:47:48 +03:00
Raphael S. Carvalho	3ba6184462	repair, test: fix split-repair synchronization test timeout in debug mode The test_split_and_incremental_repair_synchronization[True] test was timing out waiting for 'Finalizing resize decision for table' in debug mode. The root cause is a timing race: the incremental_repair_prepare_wait error injection has a hardcoded 60s auto-expiry timeout (wait_for_message(60s)), but split compactions in debug mode take ~58s per SSTable due to -O0 compilation and scheduler starvation (the maintenance_compaction group gets ~10% of wall-clock time). When the injection auto-expires before split finalization, the repair fails, leaving tablets stuck in transition=repair state. This prevents the topology coordinator from finalizing the split, causing the 600s test timeout. Fix both contributing factors: - Increase the injection timeout from 60s to 10min, giving split compactions ample time to complete before the injection auto-expires. The test explicitly messages the injection to release it (line 2200), so the longer timeout is just a safety net. - Reduce data volume from 256 to 64 rows (and repair data from 256 to 64 rows), producing smaller SSTables that split much faster in debug mode. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2123. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#30004	2026-05-22 15:03:47 +03:00
Patryk Jędrzejczak	082936ce43	Merge 'test: pylib: Convict the node on server_stop()' from Tomasz Grabiec This is about ungraceful stop, where the node is killed. Test cases typically need to wait for other nodes to notice that the node is down before proceeding. By default, that takes about 20s. Can be reduced via config by reducing failure detector threshold, but it's not the best solution: - cannot set the threshold too low, or we'll introduce falkiness due to false positives - so it's still slow (a couple of seconds) - developers forget about it and the test still works This patch speeds this up by adding a way to convict the node immediately after stopping the node, controlled by the "convict" parameter. At the end of the series the "convict" parameter is required, and each test decides what it wants. Commits are split into steps: - the series starts with defaulting to convict=False - each test case sets "convict" explicitly, and changes are split into 3 commits depending on whether convict=True is: useless, beneficial, undesirable - finally, the "convict" parameter is made mandatory There is also a dedicated test for natural failure detection (test_natural_failure_detection in test_gossiper.py) to ensure FD coverage is not lost. Tested on dev-mode cluster/test_tablets_parallel_decommission.py::test_node_lost_during_decommission_drain: Wall clock time reduced from 41s to 16s No backport: enhancement Closes scylladb/scylladb#28495 * https://github.com/scylladb/scylladb: test: gossiper: Add test for natural failure detection test: pylib: Make convict a required parameter in server_stop() test: Annotate server_stop() calls where conviction is harmful test: Annotate server_stop() calls where conviction is beneficial test: Annotate server_stop() calls where conviction is useless test: pylib: Add convict option to server_stop() api: failure_detector: Introduce convict-node API gms: gossiper: Make convict() public and safe to call from any scheduling group api: Extract validate functions to common header	2026-05-22 13:39:50 +02:00
Patryk Jędrzejczak	b7400d20dd	test: add test_stop_before_starting_compaction_manager	2026-05-22 11:58:37 +02:00
Marcin Maliszkiewicz	18dd281e72	Merge 'test: audit: pin empty-keyspace DDL audit behavior' from Andrzej Jackowski `9646ee05bd` changed behavior of empty keyspace handling and this code path was never tested for CQL audit. Test CREATE/DROP FUNCTION and CREATE/DROP AGGREGATE targeting both an existing keyspace and a nonexistent one to verify both are audited with empty keyspace. No backport, just a missing test case. Closes scylladb/scylladb#29542 * github.com:scylladb/scylladb: test: audit: pin empty-keyspace DDL audit behavior test: audit: restart server when any non-live config key changes test: audit: rename 'needed' to 'target_config' for clarity	2026-05-22 09:42:34 +02:00
Tomasz Grabiec	445a8b9a3e	test: gossiper: Add test for natural failure detection Add test_natural_failure_detection which verifies that the failure detector detects a killed node as DOWN without using the convict mechanism. Uses the failure_detector_timeout fixture to keep the FD timeout short (2s in release mode). This ensures that natural failure detection continues to work correctly even as other tests adopt the convict mechanism for speed.	2026-05-21 21:33:24 +02:00
Tomasz Grabiec	9b40cf89fe	test: Annotate server_stop() calls where conviction is harmful Add explicit convict=False to server_stop() calls where convicting the node would break or weaken the test. In test_backoff_when_node_fails_task_rpc, the desired behavior is for the node to not be marked as down immediately: # The purpose of this is to simulate a situation when the gossiper # doesn't mark a dead node as such immediately. In raft tests, conviction could trigger voter reassignment while the test wants to test the scenario with voters being still down. In test_tablet_mv_replica_pairing_during_replace, conviction triggers SCYLLADB-1996 (replace fails with "Failed to add server").	2026-05-21 21:33:19 +02:00
Tomasz Grabiec	92416d850a	test: Annotate server_stop() calls where conviction is beneficial Add explicit convict=True to server_stop() calls where the test needs other nodes to detect the stopped node as DOWN in order to proceed. These are cases before remove_node, replace, or explicit waits for failure detection (server_not_sees_other_server, wait_new_coordinator_elected). Convicting immediately speeds up the test.	2026-05-21 21:31:22 +02:00
Tomasz Grabiec	624fe11178	test: Annotate server_stop() calls where conviction is useless Pass convict=False explicitly to server_stop() calls where conviction provides no benefit because there is no consumer of the failure detection: - single-node clusters (no other node to call the API on) - all nodes being stopped concurrently (no live node remains) - immediate restart (no test logic between stop and start depends on other nodes detecting the stopped node as dead) - node stopped for file manipulation or bootstrap abort - majority killed with no quorum on surviving nodes to react - no test logic depends on other nodes detecting the failure This is a no-op change since the default is already convict=False, but makes the intent explicit for each call site.	2026-05-21 21:13:55 +02:00
Patryk Jędrzejczak	1ed3f5c4af	Merge 'storage_service: cancel write handlers during drain to prevent shutdown deadlock' from Petr Gusev Fixes a shutdown deadlock where a node hangs because `stale_versions_in_use()` blocks on stale `token_metadata` versions held by write handlers whose `MUTATION_DONE` responses can never arrive (transport is already stopped). Two manifestations depending on whether the shutting-down node is the topology coordinator: - Coordinator: do_drain → wait_for_group0_stop deadlocks because the topology coordinator fiber is stuck in barrier_and_drain → stale_versions_in_use(). - Non-coordinator: ss::stop → uninit_messaging_service deadlocks because the barrier_and_drain RPC handler holds the gate open. The non-coordinator case was fixed in PR #24714 (cancel all write requests on storage_proxy shutdown), but its test never actually failed — the write handler always captured the current token_metadata version because `pause_before_barrier_and_drain` used `one_shot=True,` so only the first `barrier_and_drain` was paused. The topology state hadn't advanced by that point, meaning the write handler's ERM version matched the current version and `stale_versions_in_use()` returned immediately. The coordinator case was not covered at all. Cancel all write response handlers on all shards right after `stop_transport()` in `do_drain()`. This releases their ERMs and the associated stale token_metadata versions, unblocking `stale_versions_in_use()`. Fixed the test to ensure the write handler holds a stale version: use one_shot=False, let the first barrier_and_drain through (version still current), then wait for the second one (version now stale). Extended to cover both coordinator and non-coordinator shutdown on the same 2-node cluster. Also includes supporting changes: - error_injection: release wait_for_message waiters on disable() so the test can atomically unblock paused handlers - error_injection: add non-shared mode to wait_for_message for per-invocation message semantics - scylla_cluster.py: allow stop() to bypass start_stop_lock so SIGKILL works while stop_gracefully is blocked Fixes: SCYLLADB-1842 Refs: scylladb/scylladb#23665 backports: SCYLLADB-1842 reported a failure in 2025.1, so we need to backport to all versions starting from 2025.1 Closes scylladb/scylladb#29882 * https://github.com/scylladb/scylladb: storage_service: cancel write handlers during drain to prevent shutdown deadlock test_unfinished_writes_during_shutdown: extend to cover coordinator shutdown test_unfinished_writes_during_shutdown: fix to reproduce the shutdown deadlock test_unfinished_writes_during_shutdown: await add_last_node_task instead of cancelling it test_unfinished_writes_during_shutdown: add timeout and deadlock detection for shutdown_task test: scylla_cluster: allow stop() to bypass start_stop_lock error_injection: add non-shared mode to wait_for_message error_injection: release waiters when injection is disabled	2026-05-21 15:43:36 +02:00
Piotr Dulikowski	6148316f66	Merge 'db/view/view_building_coordinator: add flag to mark if any remote work was finished' from Michał Jadwiszczak There is small windows just after view building coordinator releases group0 guard and before it waits on view_building_state_machine's CV, when the coordinator may miss CV broadcast triggered by finished remote work. To fix it, this patch adds a boolean flag, which is set to true before broadcasting the CV and is checked before awaiting on the CV. Fixes SCYLLADB-2029 The problem is not critical but it should be backported to 2025.4 and newer version, all of them contains view building coordinator. Closes scylladb/scylladb#27313 * github.com:scylladb/scylladb: test/cluster/test_view_building_coordinator: add reproducer db/view/view_building_coordinator: add flag to mark if any remote work was finished	2026-05-21 15:11:58 +02:00
Wojciech Mitros	13c043903d	strong_consistency: cache leader location for non-replica nodes When a non-replica node handles a strongly consistent write, it must forward the request to a replica. If the closest replica is not the leader, the request gets redirected again, causing an extra roundtrip. Add a leader location cache in groups_manager, keyed by raft group_id. After a write request is forwarded, the CQL transport layer records the final node as the leader in the cache. Subsequent write requests from the same node for the same group are forwarded directly to the cached leader, eliminating the extra roundtrip. The cache is only used for writes. Reads can be served by any replica, so they skip the cache and use proximity-based routing instead. Cache entries are validated at use time: if the cached leader is no longer a replica (e.g. after tablet migration), the entry is evicted and the normal closest-replica path is taken. This prevents a scenario where two nodes keep redirecting to each other because both think that the other is the leader but actually both are non-replicas - such loop is broken as soon as the tablet maps are updated. On token_metadata updates, entries for groups that no longer exist (e.g. table dropped, tablet merged) are evicted. Entries for groups that still exist are kept — use-time validation handles staleness. An on_node_resolved callback is propagated through the redirect/bounce path so the transport layer can update the cache generically without coupling to the strong-consistency coordinator. The coordinator creates the callback only for writes (capturing the groups_manager and group_id) and attaches it to the bounce message; the transport layer invokes it once the final node is known, keeping the forwarding infrastructure subsystem-agnostic. We also add a test which verifies that after the initial redirect, following requests to the same node avoid the extra redirect and forward directly to the leader. Fixes: SCYLLADB-1064 Closes scylladb/scylladb#29392	2026-05-21 10:32:56 +02:00
Gleb Natapov	cc034f84c5	schema: ensure committed_by_group0 is set for all non-system tables on boot Tables created before the GROUP0_SCHEMA_VERSIONING feature was enabled have committed_by_group0 = null in system_schema.scylla_tables. This causes maybe_delete_schema_version() to delete their version cell, forcing the legacy hash-based schema version computation path. Add ensure_committed_by_group0() which runs on boot and fixes up any non-system tables where committed_by_group0 is not true (null or false): 1. Queries system_schema.scylla_tables for rows where committed_by_group0 is null or false, skipping system keyspaces (system, system_schema). 2. Takes a group0 guard 3. Re-checks after the raft barrier in case another node already fixed it. 4. For each table needing fixup, creates a mutation writing the version cell (from the in-memory schema). The committed_by_group0 = true flag is stamped by add_committed_by_group0_flag() inside announce(). 5. Announces via raft group0. 6. Retries with a small random delay on group0_concurrent_modification. On other nodes, schema_applier will detect these as "altered" tables (scylla_tables mutation changed), but since the actual table definition is unchanged, update_column_family is effectively a no-op. This is a prerequisite for eventually removing the legacy hash-based schema versioning code path. Closes scylladb/scylladb#29911	2026-05-21 10:22:07 +02:00
Patryk Jędrzejczak	cbadc3d675	test: fix flaky test_raft_snapshot_truncation by waiting for async log truncation Snapshot creation and raft log truncation happen asynchronously in the IO fiber after a schema change completes. The test was querying system.raft immediately after the schema change returned, racing with the IO fiber's store_snapshot_descriptor call. Replace immediate assertions with wait_for polling loops: - log_size == 0: wait for log truncation after drop keyspace - new_snap_id != original_snap_id: wait for new snapshot to be persisted Fixes: SCYLLADB-2120 Closes scylladb/scylladb#29967	2026-05-21 10:50:00 +03:00
Artsiom Mishuta	2259307c2e	test.py: remove redundant pytest.mark.asyncio decorators Fixes: SCYLLADB-1935	2026-05-21 10:36:47 +03:00
Andrzej Jackowski	d2bb72438e	test: audit: pin empty-keyspace DDL audit behavior `9646ee05bd` changed behavior of empty keyspace handling and this code path was never tested for CQL audit. Test CREATE/DROP FUNCTION and CREATE/DROP AGGREGATE targeting both an existing keyspace and a nonexistent one to verify both are audited with empty keyspace. Before `9646ee05bd`, an empty keyspace in audit_info would be checked against audit_keyspaces like any other value, silently skipping the statement when "" did not match any configured keyspace. That commit introduced a will_log() helper that treats an empty keyspace as unfilterable, so these DDL statements are now always logged when their category matches. Refs SCYLLADB-1641	2026-05-21 08:49:44 +02:00
Andrzej Jackowski	2c15277d02	test: audit: restart server when any non-live config key changes _check_restart_needed only compared NON_LIVE_AUDIT_KEYS against the running server config, so extra keys like enable_user_defined_functions were silently ignored and never applied. Generalize the check to restart whenever any key outside LIVE_AUDIT_KEYS differs.	2026-05-21 08:49:44 +02:00
Botond Dénes	f8ac8540bd	Merge 'logstor: compare records by timestamp and segment sequence number' from Michael Litvak Add the record timestamp. The timestamp is extracted from the row marker of the mutation when we write it. When inserting a record to index, we compare it with the existing record, and insert it only if it has newer timestamp. Add a segment sequence number that is a global (per-shard) increasing number that is allocated when getting a new segment for write, and is written in buffer headers in the segment. It is used to distinguish between buffers written to different generations of a segment, and for recovery to break ties by keeping the record from the newest segment. Refs https://scylladb.atlassian.net/browse/SCYLLADB-770 no backport - logstor is a new feature Closes scylladb/scylladb#29933 * github.com:scylladb/scylladb: test: logstor: add basic delete test logstor: rewrite segment seq num from streaming logstor: add segment sequence number logstor: get_segment helper logstor: compare records by timestamp	2026-05-21 08:44:18 +03:00
Andrzej Jackowski	29b7bef15d	test: audit: rename 'needed' to 'target_config' for clarity	2026-05-21 07:41:51 +02:00
Petr Gusev	2927f0dd21	storage_service: cancel write handlers during drain to prevent shutdown deadlock When a node shuts down, do_drain() calls stop_transport() which tears down the messaging service. After this point, MUTATION_DONE responses from replicas can no longer reach the coordinator, so any in-flight write_response_handlers will never complete naturally. These handlers hold ERMs referencing stale token_metadata versions. If the topology coordinator calls barrier_and_drain (either on itself or via RPC), it blocks in stale_versions_in_use() waiting for these stale versions to be released. This causes: - On the coordinator node: do_drain -> wait_for_group0_stop deadlock (the topology coordinator fiber is stuck in barrier_and_drain). - On non-coordinator nodes: ss::stop -> uninit_messaging_service deadlock (the barrier_and_drain RPC handler holds the gate open). Fix: cancel all write response handlers on all shards right after stop_transport() in do_drain(). This releases their ERMs and the associated stale token_metadata versions, unblocking stale_versions_in_use(). Heap-allocate _write_handlers_gate and add an allow_new parameter to cancel_all_write_response_handlers(). When allow_new=true (used by do_drain), the gate is closed and swapped with a fresh one — existing handlers are waited on while new handlers can still be created. This avoids blocking internal writes (paxos learn, compaction history updates) that still need to create handlers during the remainder of the drain sequence. When allow_new=false (used by drain_on_shutdown), the gate is closed permanently — no new handlers can be created after final shutdown. Update test_lwt_shutdown to wait for 'Stop transport: done' instead of 'Shutting down storage proxy RPC verbs'. The latter message is now only logged after do_drain() completes, but do_drain() blocks in cancel_all_write_response_handlers() waiting for the background paxos learn handler — which is exactly what the test needs to release before shutdown can proceed. Fixes: SCYLLADB-1842 Refs: scylladb/scylladb#23665	2026-05-20 22:21:45 +02:00
Petr Gusev	5bc3e84d1e	test_unfinished_writes_during_shutdown: extend to cover coordinator shutdown The existing test only covers the case where the shutting-down node is NOT the topology coordinator (deadlocks in uninit_messaging_service). When the node IS the coordinator, the deadlock manifests differently: the topology coordinator fiber calls barrier_and_drain on itself (without messaging), and do_drain -> wait_for_group0_stop blocks because the coordinator can't stop while stale_versions_in_use is waiting on the uncancelled write handler. Run the test twice on the same 2-node cluster (RF=2): - Run 1: target is a non-coordinator - Restore cluster state (restart target, decommission added node) - Run 2: target is the topology coordinator Use CL=ONE so the write completes from the local replica even with the other server's response paused. Mark as xfail since this reproduces bugs not yet fixed on this branch. Refs: SCYLLADB-1842	2026-05-20 17:22:23 +02:00
Petr Gusev	a093be9ca9	test_unfinished_writes_during_shutdown: fix to reproduce the shutdown deadlock The test was written for another case, and was not supposed to reproduce the issue that was fixed in this PR. Fix the test to reproduce the real scenario: 1. Use one_shot=False for pause_before_barrier_and_drain so the injection fires on every barrier_and_drain RPC, not just the first. 2. Let the first barrier_and_drain through (at this point the write handler's ERM version matches the current token_metadata version). 3. Wait for the second barrier_and_drain. Between the two calls, topology_state_load installs a new token_metadata version. The write handler still holds the old version's ERM — now stale. 4. After stop_transport completes, disable the injection (rather than sending a single message) to release the paused handler and any subsequent ones that arrived during stop_transport. The 'disabled' flag in injection_shared_data ensures all waiters wake up. With these changes the test reliably fails (shutdown deadlock within 15s) on the unfixed code and passes on the fixed version from `e0dc73f52a` ('Cancel all write requests on storage_proxy shutdown'). Refs: scylladb/scylladb#23665	2026-05-20 17:22:23 +02:00
Petr Gusev	32002f6443	test_unfinished_writes_during_shutdown: await add_last_node_task instead of cancelling it asyncio cancel() only affects the client-side coroutine. The server-side addserver handler in the cluster manager continues running. If it can't complete (e.g. no raft quorum because the target node is shut down), the orphaned handler blocks _after_test cleanup for 120s. Await the task instead so it completes cleanly (we restart the target node first to restore quorum).	2026-05-20 17:22:16 +02:00
Petr Gusev	fa01f74ae6	test_unfinished_writes_during_shutdown: add timeout and deadlock detection for shutdown_task Add a 15s timeout around the shutdown_task await. If the timeout fires, the deadlock is reproduced (shutdown hung because stale_versions_in_use blocks on a write handler holding a stale token_metadata version). When the timeout fires, explicitly kill the node via server_stop() so that the manager's _after_test handler does not wait 120s for the stuck stop_gracefully request. Then fail the test with a clear message.	2026-05-20 17:21:56 +02:00
Michał Jadwiszczak	eac9449967	test/test_mv_building: ensure nodes see each other after restart In SCYLLADB-2058 we observed a timeout exception while querying the base table after restarting nodes 2 and 3. Unfortunately, logs don't give us much useful information about the root cause. This patch adds basic checks that nodes see each other after the restart and that the cql connection sees restarted node. It doesn't guarantee that the error won't occur again - in logs from SCYLLADB-2058 we see that each node sees other via gossip after part of the cluster is restarted. In case the error will occur again, this commit also increases logging level of `cql_server` and `storage_proxy`. Refs SCYLLADB-2058 Closes scylladb/scylladb#29951	2026-05-20 14:11:41 +02:00
Marcin Maliszkiewicz	83823149e9	Merge 'audit: implement audit_rules config' from Andrzej Jackowski This patch series adds `audit_rules`, a new audit configuration option for fine-grained, role-aware audit filtering with per-rule sink routing. Rules can be configured in `scylla.yaml` or updated live through `system.config` without restarting the node. Each rule specifies target sinks (`table`, `syslog`), statement categories, qualified table name patterns, and role patterns. Table and role patterns use POSIX `fnmatch` with extended glob syntax. For table-scoped categories (`DML`, `DDL`, `QUERY`), a rule matches only when the category, role, and qualified table name all match. For table-independent categories (`AUTH`, `ADMIN`, `DCL`), the table filter is ignored. Empty category or role lists match nothing; an empty table list matches nothing only for table-scoped categories. The new rules are additive with the existing `audit_categories`, `audit_keyspaces`, and `audit_tables` settings: both mechanisms are evaluated for each audit event, and the final sink set is the union of all matches. To avoid evaluating glob patterns on every audit event, audit rules use a preprocessed cache of known roles and tables. The cache is kept in sync through group0 role/table snapshots, role-change notifications, and schema migration notifications. For known entities, rule matching uses precomputed role/table rule sets; unknown entities fall back to direct rule evaluation. When `audit_rules` is empty, per-event rule matching returns immediately and does not evaluate glob patterns. Audit still keeps known role/table metadata in sync while audit is enabled, so rules can be enabled later through live configuration updates without restarting the node. Performance Measured with `perf-simple-query --smp 1 --duration 100` against a null syslog socket. Results show no regression when audit is disabled, and audit-rules performance has at most 1% more instructions than legacy config for equivalent workloads: ``` =============================================================================================================================================================================== Configuration \| Binary \| throughput (tps) \| insns/op \| cpu_cycles/op \| alloc/op \| logal/op \| task/op =============================================================================================================================================================================== audit=none [1] \| baseline \| 206922.4 \| 36591.6 \| 15348.3 \| 58.1 \| 0.0 \| 14.1 audit=none [1] \| this PR \| 207856.4 (+0.5%) \| 36544.9 (-0.1%) \| 15274.0 (-0.5%) \| 58.1 \| 0.0 \| 14.1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks [2] \| baseline \| 94871.8 \| 54163.0 \| 27172.4 \| 72.0 \| 0.0 \| 24.0 audit=syslog keyspaces=ks [2] \| this PR \| 96138.4 (+1.3%) \| 54072.3 (-0.2%) \| 26699.3 (-1.7%) \| 72.0 \| 0.0 \| 24.0 audit=syslog audit-rules=ks [3] \| this PR \| 95142.1 (+0.3%) \| 54457.8 (+0.5%) \| 26953.8 (-0.8%) \| 72.0 \| 0.0 \| 24.0 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks-non-existent [4] \| baseline \| 213997.8 \| 36735.6 \| 14848.1 \| 58.1 \| 0.0 \| 14.1 audit=syslog keyspaces=ks-non-existent [4] \| this PR \| 219297.2 (+2.5%) \| 36667.3 (-0.2%) \| 14500.1 (-2.3%) \| 58.1 \| 0.0 \| 14.1 audit=syslog audit-rules=ks-non-existent [5] \| this PR \| 211038.7 (-1.4%) \| 36999.7 (+0.7%) \| 15048.6 (+1.4%) \| 58.1 \| 0.0 \| 14.1 =============================================================================================================================================================================== [1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none" [2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" [4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks-non-existent."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null` ``` Fixes: SCYLLADB-1430 No backport: new feature Closes scylladb/scylladb#29267 * github.com:scylladb/scylladb: test: alternator: audit: rules filtering and batch bypass test: perf: add --audit-rules option to perf-simple-query docs: add audit rules section to the auditing guide test: audit: cover role and schema cache notifications test: audit: cover audit rules cluster behavior audit: rebuild rule caches on group0 snapshot and role changes audit: refresh rule caches on schema, role, and config changes audit: route matching rules to configured sinks test: cover preprocessed audit rule cache audit: add preprocessed rule matching cache audit: pass sink targets to storage helpers test: audit: cover rule matching semantics audit: add rule matching and sink helpers test: audit: cover audit_rules configuration config: add live audit_rules option test: cover audit rule parsing and validation audit: define audit_rule type with parsing and validation	2026-05-20 14:10:45 +02:00
Gleb Natapov	c2cc7ebf39	test: fix test_cas_semaphore flakiness due to paxos state table creation timeout The test was starting Scylla with --write-request-timeout-in-ms=500 on the command line. This tight timeout also applied to paxos state table creation, which goes through raft and can take longer than 500ms on slow platforms (e.g. aarch64/dev). When the first batch of CAS requests triggered paxos state table creation under error injection, the raft schema change could still be in-flight when the second batch fired, causing spurious WriteTimeout failures unrelated to the semaphore bug being tested. Fix by changing the write timeout at runtime via the REST API: lower it to 500ms only for the error-injection CAS phase (after table creation is done), then restore it to 10000ms before the second batch that must succeed. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2104 Closes scylladb/scylladb#29969	2026-05-20 13:06:17 +02:00
Andrzej Jackowski	f03398fdba	test: audit: cover role and schema cache notifications Verify on a multi-node cluster that role creation/alter/ drop and table/materialized-view create/drop trigger updates to the preprocessed audit-rules cache on every node, and that a matching DML on the newly created table is audited via the cache. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00
Andrzej Jackowski	7f61d7662d	test: audit: cover audit rules cluster behavior Cluster-level tests should validate rule matching, live updates, sink routing, role filtering, and error handling without rerunning the broader audit suite. Add audit_rules to LIVE_AUDIT_KEYS so the test framework tracks it as a live-updatable config key. Test that rules with empty categories or roles match nothing, that DML rules coexist with legacy audit config, AUTH rules fire on login events, CQL and REST API update paths reject invalid JSON, per-rule sink routing works for table and syslog, role-based filtering works across sessions, and sink mismatch produces a warning in server logs. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00

1 2 3 4 5 ...

1385 Commits