scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Radosław Cybulski	df20f178aa	alternator: fix invalid rebase Fix an invalid rebase, that would properly merge code coming from master, except that code would ignore refactor done in the patch.	2025-12-29 08:33:10 +01:00
Radosław Cybulski	a31c8762ca	Update tests	2025-12-29 08:33:09 +01:00
Radosław Cybulski	5e1254eef0	Update documentation	2025-12-29 08:33:08 +01:00
Radosław Cybulski	a86b782d3f	Add table size to DescribeTable's output Add a table size to DescribeTable's output.	2025-12-29 08:33:07 +01:00
Radosław Cybulski	1bd855a650	Promote fill_table_description and create_table_on_shard0 to methods Promote `executor::fill_table_description` and `executor::create_table_on_shard0` to methods (from static functions).	2025-12-29 08:33:06 +01:00
Radosław Cybulski	6a26381f4f	Modify estimate_total_sstable_volume to opt ignore errors Modify `storage_service::estimate_total_sstable_volume` function to optionally ignore errors (instead substitute 0), when `ignore_errors` parameter is set to `yes`.	2025-12-29 08:33:06 +01:00
Radosław Cybulski	a532fc73bc	Add alternator_describe_table_info_cache_validity_in_seconds config option Add a `alternator_describe_table_info_cache_validity_in_seconds` configuration option with default value of 6 hours.	2025-12-29 08:33:05 +01:00
Radosław Cybulski	e246abec4d	Add ref to service::storage_service to executor Add a reference to `service::storage_service` to executor object.	2025-12-29 08:33:03 +01:00
Radosław Cybulski	dfa600fb8f	Add simple_value_with_expiry util class Add a `simple_value_with_expiry` utility class, which functions like a `std::optional` with added timeout. When emplacing a value, user needs to provide timeout, after which value expires (in which case the `simple_value_with_expiry` object behaves as if was never set at all). Add boost tests for the new class.	2025-12-29 08:32:52 +01:00
Pavel Emelyanov	2e33234e91	util: Remove lister::rmdir() There's seastar helper that does the same, no need to carry yet another implementation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27851	2025-12-28 19:46:19 +02:00
Avi Kivity	63e3a22f2e	Merge 'group0_state_machine: don't update in-memory state machine until start' from Piotr Dulikowski Group0 commands consist of one or more mutations and are supposed to be atomic - i.e. the data structures that reflect the group0 tables state are not supposed to be updated while only some mutations of a command are applied, the logic responsible for that is not supposed to observe an inconsistent state of group0 tables. It turns out that this assumption can be broken if a node crashes in the middle of applying a multi-mutation group0 command. Because these mutations are, in general, applied separately, only some mutations might survive a crash and a restart, so the group0 tables might be in an inconsistent state. The current logic of group0_state_machine will attempt to read the group0 tables' state as it was left after restart, so it may observe inconsistent state. This can confuse the node as it may observe a state that it was not supposed to observe, or the state will just outright break some invariants and trigger some sanity checks. One of those was observed in https://github.com/scylladb/scylladb/issues/26945, where a command from the CDC generation publisher fiber was partially applied. The fiber, in addition to publishing generations, it removes old, expired generations as well. Removal is done by removing data that describes the generation from cdc_generations_v3 and by removing the generation's ID from the committed generation list in the topology table. If only the first mutation gets through but not the other one, on reload the node will see a committed CDC generation without data, which will trigger an on_internal_error check. Fix this by delaying the moment when the in memory data structures are first loaded. In `579dcf187a`, a mechanism was introduced which persists the commit index before applying commands that are considered committed. Starting a raft server waits until commands are replayed up to that point. The fix is to start the group0_state_machine in a mode which only applies mutations - the aforementioned mechanism will re-apply the commands which will, thanks to the mutation idempotency, bring the group0 to a consistent state. After the group0 is known to be in consistent state (so, after raft::server_impl::start) the in-memory data structures of group0 are loaded for the first time. There is an exception, however: schema tables. Information about schema is actually loaded into memory earlier than the moment when group0 is started. Applying changes to schema is done through the migration manager module which compares the persisted state before and after the schema mutations are applied and acts on that. Refactoring migration manager is out of scope of this PR. However, this is not a problem because the migration manager takes care to apply all of the mutations given in a command in a single commitlog segment, so the initial schema loading code should not see an inconsistent state due to the state being partially applied. The fix is accompanied by a reproducer of scylladb/scylladb#26945. Fixes: scylladb/scylladb#26945 This is not a regression, so no need to backport. Closes scylladb/scylladb#27528 * github.com:scylladb/scylladb: test: cluster: test for recovery after partial group0 command group0_state_machine: remove obsolete comment about group0 consistency group0_state_machine: don't update in-memory state machine until start group0_state_machine: move reloading out of std::visit service: raft: add state machine ref to raft_server_for_group	2025-12-28 13:59:26 +02:00
Pavel Emelyanov	e963a8d603	checked-file: Implement experimental_list_directory() The method in question returns coroutine generator that co_yields directory_entry-s. In case the method is not implemented, seastar creates a fallback generator, that calls existing subscription-based list_directory() and co_yields them. And since checked file doesn't yet have it, fallback generator is used, thus skipping the lower file yielding lister. Not nice. This patch implements the generator lister for checked file, thus making full use of lower file generator lister too. A side note. It's not enough to implement it like return do_io_check([] { return lower_file->experimental_list_directory(); }); like list_directory() does, since io-checking will _not_ happen on directory reading itself, as it's supposed to. This is the problem of the check_file::list_directory() implementation -- it only checks for exception when creating the subscription (and it really never happens), but reading the directory itself happens without io checks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27850	2025-12-28 13:37:44 +02:00
Yaron Kaikov	1ee89c9682	Revert "scripts: benign fixes flagged by CodeQL/PyLens" This reverts commit `377c3ac072`. This breaks all artifact tests and cloud image build process Closes scylladb/scylladb#27881	2025-12-28 09:49:49 +02:00
Pavel Emelyanov	bda1709734	Merge 'test: fix infinite loop in python log browsing code triggered from test_orphaned_sstables_on_startup' from Avi Kivity Recently, test/cluster/test_tablet.py::test_orphaned_sstables_on_startup started spinning in the log browsing code, part of a the test library that looks into log files for expected or unexpected patterns. This reproduced somewhat in continuous integration, and very reliably for me locally. The test was introduced in `fa10b0b390`, a year ago. There are two bugs involved: first, that we're looking for crashes in this test, since in fact it is expected to crash. The node expectedly fails with an on_internal_error. Second, the log browsing code contains an infinite loop if the crash backtrace happens to be the last thing in the log. The series fixes both bugs. Fixes #27860. While the bad code exists in release branches, it doesn't trigger there so far, so best to only backport it if it starts manifesting there. Closes scylladb/scylladb#27879 * github.com:scylladb/scylladb: test: pylib: log_browsing: fix infinite loop in find_backtraces() test: pylib/log_browsing, cluster/test_tablets: don't look for expected crashes	2025-12-26 10:45:56 +03:00
Nadav Har'El	9c50d29a00	test/boost: fix flaky test_inject_future_disabled The test boost/error_injection_test.cc::test_inject_future_disabled checks what happens when a sleep injection is disabled: The test has a 10-millisecond-sleep injection and measures how much it takes. The test expects it to take less than 10 milliseconds - in fact it should take almost zero. But this is not guaranteed - on a slow debug build and an overcommitted server this do-nothing injection can take some time, and in one run (#27798) it took 14 milliseconds - and the test failed. The solution is easy - make the sleep-that-doesn't-happen much longer - e.g., 10 whole seconds. Since this sleep still doesn't happen, we expect the injection to return in less - much less - than 10 seconds. This 10 seconds is so ridiculously high we don't expect the do-nothing injection to take 10 seconds, not even a ridiculously busy test machine. Fixes #27798 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27874	2025-12-25 20:46:31 +02:00
Avi Kivity	92996ce9fa	test: pylib: log_browsing: fix infinite loop in find_backtraces() The find_backtraces() function uses a very convoluted loop to read the log file. The loop fails to terminate if the last thing in the log file is the backtrace, since the loop termination condition (`not line`) continues to be true. It's not clear why this did not reliably hit before, but it now reliably reproduces for me on both x86 and aarch64. Perhaps timing changed, or perhaps previously we had more text on the log.	2025-12-25 20:22:17 +02:00
Avi Kivity	50a3460441	test: pylib/log_browsing, cluster/test_tablets: don't look for expected crashes test_tablets.test_orphaned_sstables_on_startup verifies that an on_internal_error("Unable to load SSTable...") is generated when an sstable outside a tablet boundary is found on startup. The test indeed finds the error, but then proceeds to hang in find_backtraces(), or fail if find_backtraces() is fixed, since it finds an unexpected (for it) crash. Fix this by not looking for crashes if a new option expected_crash is set. Set it for this test.	2025-12-25 20:22:17 +02:00
Avi Kivity	55c7bc746e	Revert "vector_search_validator: move high availability tests from vector-store.git" This reverts commit `caa0cbe328`. It is either extremely slow or broken. I was never able to get it to run on an r8gd.8xlarge (on the NVMe disk). Even when it passes, it is very slow. Test script: ``` git submodule update --recursive \|\| exit 125 rm -rf build d() { ./tools/toolchain/dbuild -it -- "$@"; } d ./configure.py --mode release \|\| exit 125 d ninja release-build \|\| exit 125 d ./test.py --mode release ``` Ref #27858 Ref #27859 Ref #27860	2025-12-25 12:30:22 +00:00
Botond Dénes	ebb101f8ae	scylla-gdb.py: scylla small-objects: make freelist traversal more robust Traversing the span's freelist is known to generate "Cannot access memory at address ..." errors, which is especially annoying when it results in failed CI. Make this loop more robust: catch gdb.error coming from it and just log a warning that some listed objects in the span may be free ones. Fixes: #27681 Closes scylladb/scylladb#27805	2025-12-25 13:26:09 +03:00
Alex	f769e52877	test: boost: Fix flaky test_large_file_upload_s3 by creating induvidual files for testing During CI runs, multiple instances of the same test may execute concurrently. Although the test uses a temporary directory, the downloaded bucket artifacts were written using an identical filename across all instances. This caused concurrent writers to operate on the same file, leading to file corruption. In some cases, this manifested as test failures and intermittent std::bad_alloc exceptions. Change Description This change ensures that each test instance uses a unique filename for downloaded bucket files. By isolating file writes per test execution, concurrent runs no longer interfere with each other. Fixes: #27824 backport not required Closes scylladb/scylladb#27843	2025-12-25 09:40:13 +02:00
Nadav Har'El	186c91233b	Merge 'scylla-gdb.py: improve scylla fiber and scylla read-stats' from Botond Dénes Improve scylla fiber's ability to traverse through coroutines. Add --direction command-line parameter to scylla-fiber. Fix out-of-date premit collection in scylla read-stat and improve the printout. scylla-gdb.py improvements, no backport needed Closes scylladb/scylladb#27766 * github.com:scylladb/scylladb: scylla-gdb.py: scylla read-stats: include all permit lists scylla-gdb.py: scylla fiber: add --direction command-line param scylla-gdb.py: scylla fiber: add support for traversing through coroutines backward	2025-12-24 17:49:58 +02:00
Botond Dénes	27bf65e77a	db/batchlog_manager: add missing <seastar/coroutine/parallel_for_each.hh> include Build only fails if `--disable-precompiled-header` is passed to `configure.py`. Not sure why. Closes scylladb/scylladb#27721	2025-12-24 16:32:12 +02:00
Botond Dénes	c66275e05c	cql3/statements/batch_statement: make size error message more verbose Mention the type of batch: Logged or Unlogged. The size (warn/fail on too large size) error has different significance depending on the type. Refs: #27605 Closes scylladb/scylladb#27664	2025-12-24 15:27:01 +02:00
Piotr Szymaniak	9c5b4e74c3	doc: Correct reference in dev/audit.md Closes scylladb/scylladb#27832	2025-12-24 15:25:15 +02:00
Botond Dénes	ccc03d0026	test/pylib/runner.py: pytest_configure(): coerce repeat to int Coerce the return value of config.getoption("--repeat") to int to avoid: Traceback (most recent call last): File "/usr/bin/pytest", line 8, in <module> sys.exit(console_main()) ~~~~~~~~~~~~^^ File "/usr/lib/python3.14/site-packages/_pytest/config/__init__.py", line 201, in console_main code = main() File "/usr/lib/python3.14/site-packages/_pytest/config/__init__.py", line 175, in main ret: ExitCode \| int = config.hook.pytest_cmdline_main(config=config) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/site-packages/pluggy/_hooks.py", line 512, in __call__ return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/site-packages/pluggy/_manager.py", line 120, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 167, in _multicall raise exception File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 121, in _multicall res = hook_impl.function(args) File "/usr/lib/python3.14/site-packages/_pytest/helpconfig.py", line 154, in pytest_cmdline_main config._do_configure() ~~~~~~~~~~~~~~~~~~~~^^ File "/usr/lib/python3.14/site-packages/_pytest/config/__init__.py", line 1118, in _do_configure self.hook.pytest_configure.call_historic(kwargs=dict(config=self)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/site-packages/pluggy/_hooks.py", line 534, in call_historic res = self._hookexec(self.name, self._hookimpls.copy(), kwargs, False) File "/usr/lib/python3.14/site-packages/pluggy/_manager.py", line 120, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 167, in _multicall raise exception File "/usr/lib/python3.14/site-packages/pluggy/_callers.py", line 121, in _multicall res = hook_impl.function(args) File "/home/bdenes/ScyllaDB/scylladb/scylladb/test/pylib/runner.py", line 206, in pytest_configure config.run_ids = tuple(range(1, config.getoption("--repeat") + 1)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~ TypeError: can only concatenate str (not "int") to str Closes scylladb/scylladb#27649	2025-12-24 15:13:02 +02:00
Nadav Har'El	8df5189f9c	Merge 'docs: scylla-sstable.rst: extract script API to separate document' from Botond Dénes The script API is 500+ lines long in an already too long and hard to navigate document. Extract it to a separate document, making both documents shorter and easier to navigate. Documentation refactoring, no backport needed. Closes scylladb/scylladb#27609 * github.com:scylladb/scylladb: docs: scylla-sstable-script-api.rst: add introduction and title docs: scylla-sstable.rst: extract script API to separate document docs: scylla-sstable: prepare for script API extract	2025-12-24 15:02:57 +02:00
Botond Dénes	b036a461b7	tools/scylla-sstable: dump-schema: incude UDT description in dump If the table uses UDTs, include the description of these (CREATE TYPE statement) in the schema dump. Without these the schema is not useful. Closes scylladb/scylladb#27559	2025-12-24 14:46:52 +02:00
Botond Dénes	3071ccd54a	Merge 'Storage-agnostic table::snapshot_on_all_shards()' from Pavel Emelyanov The method in question knows that it writes snapshot to local filesystem and uses this actively. This PR relaxes this knowledge and splits the logic into two parts -- one that orchestrates sstables snapshot and collects the necessary metadata, and the code that writes the metadata itself. Closes scylladb/scylladb#27762 * github.com:scylladb/scylladb: table: Move snapshot_file_set to table.cc table: Rename and move snapshot_on_all_shards() method table: Ditch jsondir variable table, sstables: Pass snapshot name to sstable::snapshot() table: Use snapshot_writer in write_manifest() table: Use snapshot_writer in write_schema_as_cql() table: Add snapshot_writer::sync() table: Add snapshot_writer::init() table: Introduce snapshot_writer table: Move final sync and rename seal_snapshot() table: Hide write_schema_as_cql() table: Hide table::seal_snapshot() table: Open-code finalize_snapshot() table: Fix indentation after previuous patch table: Use smp::invoke_on_all() to populate the vector with filenames table: Don't touch dir once more on seal_snapshot() table: Open-code table::take_snapshot() into caller lambda table: Move parts of table::take_snapshot to sstables_manager table: Introduce table::take_snapshot() table: Store the result of smp::submit_to in local variable	2025-12-24 13:46:47 +02:00
Nadav Har'El	4ae45eb367	test/alternator: remove unused imports Remove many unused "import" statements or parts of import statement. All of them were detected by Copilot, but I verified each one manually and prepared this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27676	2025-12-24 13:44:28 +02:00
Nadav Har'El	da00401b7d	test/alternator: rename test with duplicate name The file test/alternator/test_transact.py accidentally had two tests with the same name, test_transact_get_items_projection_expression. This means the first of the two tests was ignored and never run. This patch renames the second of the two to a more appropriate (and unique...) name. I verified that after this change the number of tests in this file grows by one, and that still all tests pass on DynamoDB and fail (as expected by xfail) on Alternator. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27702	2025-12-24 13:43:43 +02:00
Botond Dénes	95d4c73eb1	Merge 'Make object storage config truly updateable' from Pavel Emelyanov The db::config::object_storage_endpoints parameter is live-updateable, but when the update really happens, the new endpoints may fail to propagate to non-zero shards because of the way db::config sharding is implemented. Refs: #7316 Fixes: #26509 Backport to 2025.3 and 2025.4, AFAIK there are set ups with object storage configs for native backup Closes scylladb/scylladb#27689 * github.com:scylladb/scylladb: sstables/storage_manager: Fix configured endpoints observer test/object_store: Add test to validate how endpoint config update works	2025-12-24 13:42:44 +02:00
Botond Dénes	12dcf79c60	Merge 'build: support (and prefer) sccache as the compiler cache' from Avi Kivity Currently, we support ccache as the compiler cache. Since it is transparent, nothing much is needed to support it. This series adds support to sccache[1] and prefers it over ccache when it is installed. sccache brings the following benefits over ccache: 1. Integrated distributed build support similar to distcc, but with automatic toolchain packaging and a scheduler 2. Rust support 3. C++20 modules (upcoming[2]) It is the C++20 modules support that motivates the series. C++20 modules have the potential to reduce build times, but without a compiler cache and distributed build support, they come with too large a penalty. This removes the penalty. The series detects that sccache is installed, selects it if so (and if not overridden by a new option), enables it for C++ and Rust, and disables ccache transparent caching if sccache is selected. Note: this series doesn't add sccache to the frozen toolchain or add dbuild support. That is left for later. [1] https://github.com/mozilla/sccache [2] https://github.com/mozilla/sccache/pull/2516 Toolchain improvement, won't be backported. Closes scylladb/scylladb#27834 * github.com:scylladb/scylladb: build: apply sccache to rust builds too build: prevent double caching by compiler cache build: allow selecting compiler cache, including sccache	2025-12-24 13:40:02 +02:00
Nadav Har'El	74a57d2872	test/cqlpy: remove unused imports Remove many unused "import" statements or parts of import statement. All of them were detected by Copilot, but I verified each one manually and prepared this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27675	2025-12-24 13:31:41 +02:00
Andrzej Jackowski	632ff66897	doc: audit: mention double audit sink in Enabling Audit section Configuration of both table and syslog audit is possible since scylladb/scylladb#26613 was implemented. However, the "Enabling Audit" section of the documentation wasn't updated, which can be misleading. Ref: scylladb/scylladb#26613 Closes scylladb/scylladb#27790	2025-12-24 13:20:03 +02:00
Gleb Natapov	04976875cc	topology coordinator: set session id for streaming at the correct time Commit `d3efb3ab6f` added streaming session for rebuild, but it set the session and request submission time. The session should be set when request starts the execution, so this patch moved it to the correct place. Closes scylladb/scylladb#27757	2025-12-24 13:17:53 +02:00
Yaniv Michael Kaul	377c3ac072	scripts: benign fixes flagged by CodeQL/PyLens Unused imports, unused variables and such. No functional changes, just to get rid of some standard CodeQL warnings. Benign - no need to backport. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#27801	2025-12-24 13:08:24 +02:00
Avi Kivity	d6edad4117	test: pylib: resource_gather: don't take ownership of /sys/fs/cgroup under podman Under podman, we already own /sys/fs/cgroup. Run the chown command only under docker where the container does not map the host user to the container root user. The chown process is sometimes observed to fail with EPERM (see issue). But it's not needed, so avoid it. Fixes #27837. Closes scylladb/scylladb#27842	2025-12-24 10:56:24 +02:00
Marcin Maliszkiewicz	3c1e1f867d	raft: auth: add semaphore to auth_cache::load_all Auth cache loading at startup is racing between auth service and raft code and it doesn't support concurrency causing it to crash. We can't easily remove any of the places as during raft recovery snapshot is not loaded and it relies on loading cache via auth service. Therefore we add semaphore. Fixes https://github.com/scylladb/scylladb/issues/27540 Closes scylladb/scylladb#27573	2025-12-24 10:56:24 +02:00
Nadav Har'El	f3a4af199f	test/cqlpy/test_materialized_view.py: Fix for Commented-out code This patch was suggested and prepared by copilot, I am writing the commit message because the original one was worthless. In commit `cf138da`, for an an unexplained reason, a loop waiting until the expected value appears in a materialized view was replaced by a call for wait_for_view_built(). The old loop code was left behind in a comment, and this commented-out code is now bothering our AI. So let's delete the commented-out code. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27646	2025-12-24 10:56:23 +02:00
Botond Dénes	1bb897c7ca	Merge 'Unify configuration of object storage endpoints' from Pavel Emelyanov To configure S3 storage, one needs to do ``` object_storage_endpoints: - name: s3.us-east-1.amazonaws.com port: 443 https: true aws_region: us-east-1 ``` and for GCS it's ``` object_storage_endpoints: - name: https://storage.googleapis.com:433 type: gs credentials_file: <gcp account credentials json file> ``` This PR updates the S3 part to look like ``` object_storage_endpoints: - name: https://s3.us-east-1.amazonaws.com:443 aws_region: us-east-1 ``` fixes: #26570 Not-yet released feature, no need to backport. Old configs are not accepted any longer. If it's needed, then this decision needs to be revised. Closes scylladb/scylladb#27360 * github.com:scylladb/scylladb: object_storage: Temporarily handle pure endpoint addresses as endpoints code: Remove dangling mentions of s3::endpoint_config docs: Update docs according to new endpoints config option format object_storage: Create s3 client with "extended" endpoint name test: Add named constants for test_get_object_store_endpoints endpoint names s3/storage: Tune config updating sstable: Shuffle args for s3_client_wrapper	2025-12-24 06:59:02 +02:00
Botond Dénes	954f2cbd2f	Merge 'config, transport: add listeners for native protocol fronted by proxy protocol v2' from Avi Kivity For deployments fronted by a reverse proxy (haproxy or privatelink), we want to use proxy protocol v2 so that client information in system.clients is correct and so that the shard-aware selection protocol, which depends on the source port, works correctly. Add proxy-protocol enabled variants of each of the existing native transport listeners. Tests are added to verify this works. I also manually tested with haproxy. New feature, no backport. Closes scylladb/scylladb#27522 * github.com:scylladb/scylladb: test: add proxy protocol tests config, transport: support proxy protocol v2 enhanced connections	2025-12-24 06:58:00 +02:00
Nadav Har'El	e75c75f8cd	test/cqlpy: fix two tests that couldn't fail because of typo As noticed by copilot, two tests in test_guardrail_compact_storage.py could never fail, because they used `pytest.fail` instead of the correct `pytest.fail()` to fail. Unfortunately, Python has a footgun where if it sees a bare function name without parenthesis, instead of complaining it evaluates the function object and then ignores it, and absolutely nothing happens. So let's add the missing `()`. The test still passes, but now it at least has a chance of failing if we have a regression. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27658	2025-12-24 06:49:54 +02:00
Yaron Kaikov	d671ca9f53	fix: remove return from finally block in s3_proxy.py during any jenkins job that trigger `test.py` we get: ``` /jenkins/workspace/releng-testing/byo/byo_build_tests_dtest/scylla/test/pylib/s3_proxy.py:152: SyntaxWarning: 'return' in a 'finally' block ``` The 'return' statement in the finally block was causing a SyntaxWarning. Moving the return outside the finally block ensures proper exception handling while maintaining the intended behavior. Closes scylladb/scylladb#27823	2025-12-24 06:48:03 +02:00
Avi Kivity	fc81983d42	test: sstable_validation_test: actually test `ms` version sstable_validation_test tests the `scylla sstable validate` command by passing it intentionally corrupted sstables. It uses an sstable cache to avoid re-creating the same sstables. However, the cache does not consider the sstable version, so if called twice with the same inputs for different versions, it will return an sstable with the original version for both calls. As a results, `ms` sstables were not tested. Fix this bug by adding the sstable version (and the schema for good measure) to the cache key. An additional bug, hidden by the first, was that we corrupted the sstable by overwriting its Index.db component. But `ms` sstables don't have an Index.db component, they have a Partitions.db component. Adjust the corrupting code to take that into account. With these two fixes, test_scylla_sstable_validate_mismatching_partition_large fails on `ms` sstables. Disable it for that version. Since it was previously practically untested, we're not losing any coverage. Fixing this test unblocks further work on making pytest take charge of running the tests. pytest exposed this problem, likely by running it on different runners (and thus reducing the effectiveness of the cache). Fixes #27822. Closes scylladb/scylladb#27825	2025-12-24 06:47:31 +02:00
Botond Dénes	cf70250a5c	Update seastar submodule * seastar 7ec14e83...f0298e40 (8): > Merge 'coroutine/try_future: call set_current_task() when resuming the coroutine' from Botond Dénes coroutine/try_future: call set_current_task() when resuming the coroutine core: move set_current_task() out-of-line > stop_signal: stop including reactor.hh > cmake: Mark hwloc headers as system includes to suppress warnings > build: explicitly enable vptr sanitizer > httpd: Add API to set tcp keepalive params > Merge 'Make datagram_channel::send() use temporary_buffer-s' from Pavel Emelyanov net: Remove no longer used to_iovec() helpers net,code: Update callers to use new datagram_channel::send() net: Introduce datagram_channel::send(span<temporary_buffer>) method posix-stack: Make UDP socket implementation use wrapped_iovec posix-stack: Introduce wrapped_iovec > code: Move pollable_fd_state::write_all(const char*) from API level 9 > thread: Remove unused sched_group() helper configure.py: added -lubsan to DEBUG sanitizer flags Closes scylladb/scylladb#27511	2025-12-24 06:46:36 +02:00
Nadav Har'El	54f3e69fdc	Fix for Statement has no effect This problem and its fix was suggested by copilot, I'm just writing the cover letter. test/nodetool/test_status.py has the silly statement tokens == "?" which has no effect. Looking around the code suggested to me (and also to Copilot, nice) that the correct intent was assert tokens == "?" and not, say, tokens = "?". Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27659	2025-12-24 06:43:26 +02:00
Piotr Dulikowski	9ed820cbf5	test: cluster: test for recovery after partial group0 command Add a reproducer for scylladb/scylladb#26945. By using error injections, the test triggers a situation where a command that removes an obsolete CDC generation is partially applied, then the node is killed an brought back. Thanks to the fix, restarting the node succeeds and does not trigger any consistency checks in the group0 reload logic.	2025-12-23 20:50:43 +01:00
Piotr Dulikowski	71bc1886ee	group0_state_machine: remove obsolete comment about group0 consistency The comment is outdated. It is concerned about group0 consistency after crash, and that re-applying committed commands may require a raft quorum. First, `579dcf1` was introduced (long ago) which gets rid of the need for quorum as the node persists the commit index before applying the commands - so it knows up to which command it should re-apply on restart. Second, the preceding commits in this PR makes use of this mechanism for group0. Remove the comment as the concern was fully addressed. Additionally, remove a mention of the comment in raft_group0_client.cc - although it claims that the comment is placed in `group0_state_machine::apply`, it has been moved to `merge_and_apply` in `96c6e0d` (both comments were originally introduced in `6a00e79`).	2025-12-23 20:44:17 +01:00
Piotr Dulikowski	b24001b5e7	group0_state_machine: don't update in-memory state machine until start Group0 commands consist of one or more mutations and are supposed to be atomic - i.e. the data structures that reflect the group0 tables state are not supposed to be updated while only some mutations of a command are applied, the logic responsible for that is not supposed to observe an inconsistent state of group0 tables. It turns out that this assumption can be broken if a node crashes in the middle of applying a multi-mutation group0 command. Because these mutations are, in general, applied separately, only some mutations might survive a crash and a restart, so the group0 tables might be in an inconsistent state. The current logic of group0_state_machine will attempt to read the group0 tables' state as it was left after restart, so it may observe inconsistent state. This can confuse the node as it may observe a state that it was not supposed to observe, or the state will just outright break some invariants and trigger some sanity checks. One of those was observed in scylladb/scylladb#26945, where a command from the CDC generation publisher fiber was partially applied. The fiber, in addition to publishing generations, it removes old, expired generations as well. Removal is done by removing data that describes the generation from cdc_generations_v3 and by removing the generation's ID from the committed generation list in the topology table. If only the first mutation gets through but not the other one, on reload the node will see a committed CDC generation without data, which will trigger an on_internal_error check. Fix this by delaying the moment when the in memory data structures are first loaded. In `579dcf1`, a mechanism was introduced which persists the commit index before applying commands that are considered committed. Starting a raft server waits until commands are replayed up to that point. The fix is to start the group0_state_machine in a mode which only applies mutations - the aforementioned mechanism will re-apply the commands which will, thanks to the mutation idempotency, bring the group0 to a consistent state. After the group0 is known to be in consistent state (so, after raft::server_impl::start) the in-memory data structures of group0 are loaded for the first time. There is an exception, however: schema tables. Information about schema is actually loaded into memory earlier than the moment when group0 is started. Applying changes to schema is done through the migration manager module which compares the persisted state before and after the schema mutations are applied and acts on that. Refactoring migration manager is out of scope of this PR. However, this is not a problem because the migration manager takes care to apply all of the mutations given in a command in a single commitlog segment, so the initial schema loading code should not see an inconsistent state due to the state being partially applied. Fixes: scylladb/scylladb#26945	2025-12-23 20:44:16 +01:00
Piotr Dulikowski	f4efdf18a5	group0_state_machine: move reloading out of std::visit In the next commit, we will adjust the logic so that it only reloads in memory state only when a flag is set. By moving the reload logic to one place in `merge_and_apply`, the next commit will be able to reach its goal by only adding a single `if`.	2025-12-23 20:44:16 +01:00

1 2 3 4 5 ...

51208 Commits