scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 16:33:35 +00:00

Author	SHA1	Message	Date
Ernest Zaslavsky	71ea973ae4	s3 cleanup: remove obsolete retry-related classes Delete `default_retry_strategy` and `retryable_http_client`, no longer used in `s3_client` after recent refactors.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d44bbb1b10	s3_client: remove unused `filler_exception` Eliminate the now-obsolete `filler_exception`, rendered redundant by earlier refactors that streamlined error handling in the S3 client.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d3c6338de6	s3_client: fix indentation Fix indentation in background download fiber in `chunked_download_source`	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	47704deb1e	s3_client: simplify chunked download error handling using `make_request` Refactor `chunked_download_source` to eliminate redundant exception handling by leveraging the new `make_request` override with custom retry strategy. This streamlines the download fiber logic, improving readability and maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	2bc9b205b6	s3_client: reformat `make_request` functions for readability Reformats `make_request` functions with long argument lists to improve readability and comply with formatting guidelines.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	bf39412f4a	s3_client: eliminate duplication in `make_request` by using overload Removes redundant code in the `make_request` function by invoking the appropriate overload, simplifying logic and improving maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	695e70834e	s3_client: reformat `make_request` function declarations for readability Reformats the `make_request` function declarations to improve readability due to the large number of arguments. This aligns with our formatting guidelines and makes the code easier to maintain.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	9f01c1f3ff	s3_client: reorder `make_request` and helper declarations Performs minor reordering of helper functor declarations in the header file to improve readability and maintain logical grouping.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	3d51124cb0	s3_client: add `make_request` override with custom retry and error handler Introduce an override for `make_request` in `s3_client` to support custom retry strategies and error handlers, enabling flexibility beyond the default client behavior and improving control over request handling	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	bdb3979456	s3_client: migrate s3_client to Seastar HTTP client Eliminate use of `retryable_http_client` in `s3_client` and adopt Seastar's native HTTP client.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	2025760e75	s3_client: fix crash in `copy_s3_object` due to dangling stream In the `copy_part` method, move the `input_stream<char>` argument into a local variable before use. Failing to do so can lead to a SIGSEGV or trigger an abort under address sanitizer.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	0983c791e9	s3_client: coroutinize `copy_s3_object` response callback coroutinize `copy_s3_object` response callback for a bugfix in the following commit to prevent failing on dangling stream	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	237217c798	aws_error: handle missing `unexpected_status_error` case Add a missing `case` clause to the `switch` statement to correctly handle scenarios where `unexpected_status_error` is thrown. This fixes overlooked error handling and improves robustness.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	4f6384b1a0	s3_creds: use Seastar HTTP client with retry strategy In AWS credentials providers, replace `retryable_http_client` with Seastar's native HTTP client. Integrate the newly added `default_aws_retry_strategy` to handle retries more efficiently and reduce dependency on external retry logic.	2025-10-23 15:58:07 +03:00
Ernest Zaslavsky	3851ee58d7	retry_strategy: add exponential backoff to `default_aws_retry_strategy` Add exponential backoff to `default_aws_retry_strategy` and call it to `sleep` before returning `true`, no-op in case of non-retryable error	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	524737a579	retry_strategy: introduce Seastar-based retry strategy Add a new class derived from Seastar's `default_retry_strategy`. Relocate the `should_retry` implementation from Scylla's `default_retry_strategy` into the new class to centralize and standardize retry behavior.	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	51aadd0ab3	retry_strategy: update CMake and configure.py for new strategy Include `default_aws_retry_strategy` in the build system by updating CMake and `configure.py` to ensure it is properly compiled and linked.	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	5d65b47a15	retry_strategy: rename `default_retry_strategy` to `default_aws_retry_strategy` Renames the `default_retry_strategy` class to `default_aws_retry_strategy` to clarify its association with the S3 client implementation. This avoids confusion with the unrelated `seastar::default_retry_strategy` class.	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	cc200ced67	retry_strategy: fix include Fix header inclusion in "newly" created file	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	d679fd514c	retry_strategy: Copied utils/s3/retry_strategy.hh to utils/s3/default_aws_retry_strategy.hh	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	7cd4be4c49	retry_strategy: Copied utils/s3/retry_strategy.cc to utils/s3/default_aws_retry_strategy.cc	2025-10-23 15:49:34 +03:00
Ernest Zaslavsky	abd3abc044	cmake: fix the seastar API level Fix the build to make it compile when using CMake by defining the right Seastar API level Closes scylladb/scylladb#26690	2025-10-23 11:20:20 +03:00
Botond Dénes	f8b0142983	Merge 'Add --drop-unfixable-sstables flag for scrub in segregate mode' from Taras Veretilnyk This PR introduces support for a new scrub option: `--drop-unfixable-sstables`, which enables the dropping of corrupted SSTables during scrub only in segregate mode. The patch includes implementation, validation, and set of tests to ensure correct behavior and error handling. Fixes #19060 Backport is not required, it is a new feature Closes scylladb/scylladb#26579 * github.com:scylladb/scylladb: sstable_compaction_test: add segregate mode tests for drop-unfixable-sstables option test/nodetool: add scrub drop-unfixable-sstables option testcase scrub: add support for dropping unfixable sstables in segregate mode	2025-10-23 11:06:19 +03:00
Tomasz Grabiec	564cebd0e6	Merge 'tablet_metadata_guard: fix split/merge handling' from Petr Gusev The guard should stop refreshing the ERM when the number of tablets changes. Tablet splits or merges invalidate the `tablet_id` field (`_tablet`), which means the guard can no longer correctly protect ongoing operations from tablet migrations. The problem is specific to LWT, since `tablet_metadata_guard` is used mostly for heavy topology operations, which exclude with split and merge. The guard was used for LWT as an optimization -- we don't need to block topology operations or migrations of unrelated tablets. In the future, we could use the guard for regular reads/writes as well (via the `token_metadata_guard` wrapper). Fixes [scylladb/scylladb#26437](https://github.com/scylladb/scylladb/issues/26437) backports: need to backport to 2025.4 since the bug is relevant to LWT over tablets. Closes scylladb/scylladb#26619 * github.com:scylladb/scylladb: test_tablets_lwt: add test_tablets_merge_waits_for_lwt test.py: add universalasync_typed_wrap tablet_metadata_guard: fix split/merge handling tablet_metadata_guard: add debug logs paxos_state: shards_for_writes: improve the error message storage_service: barrier_and_drain – change log level to info topology_coordinator: fix log message	2025-10-22 20:56:21 +02:00
Taras Veretilnyk	60334c6481	sstable_compaction_test: add segregate mode tests for drop-unfixable-sstables option Added a new test case, sstable_scrub_segregate_mode_drop_unfixable_sstables_test, which verifies that when the drop-unfixable-sstables flag is enabled in segregate mode, corrupted SSTables are correctly dropped.	2025-10-22 17:16:55 +02:00
Taras Veretilnyk	11874755a3	test/nodetool: add scrub drop-unfixable-sstables option testcase This patches introduces the test_scrub_drop_unfixable_sstables_option testcase, which verifies that correct request is generated when the --drop-unfixable-sstables flag is used. It also validates that an error is thrown if the drop-unfixable-sstables flag is enabled and mode is not set to SEGREGATE. This patch introduces test_scrub_drop_unfixable_sstables_option, which test	2025-10-22 17:16:55 +02:00
Taras Veretilnyk	42da7f1eb6	scrub: add support for dropping unfixable sstables in segregate mode This patch adds a new flag `drop-unfixable-sstables` to the scrub operation in segregate mode, allowing to automatically drop SSTables that cannot be fixed during scrub. It also includes API support of the 'drop_unfixable_sstables' paramater and validation to ensure this flag is not enabled in other modes rather than segragate.	2025-10-22 17:16:49 +02:00
Radosław Cybulski	621e88ce52	Fix spelling errors Closes scylladb/scylladb#26652	2025-10-22 16:46:31 +02:00
Asias He	5f1febf545	repair: Remove the regular mode name in the tablet repair api The patch `e34deb72f9` (repair: Rename incremental mode name) missed one place that references the removed regular mode name. Fixes #26503 Closes scylladb/scylladb#26660	2025-10-22 16:55:55 +03:00
Botond Dénes	1c7f1f16c8	Merge 'raft topology: fix group0 tombstone GC in the Raft-based recovery procedure' from Patryk Jędrzejczak Group0 tombstone GC considers only the current group 0 members while computing the group 0 tombstone GC time. It's not enough because in the Raft-based recovery procedure, there can be nodes that haven't joined the current group 0 yet, but they have belonged to a different group 0 and thus have a non-empty group 0 state ID. The current code can cause a data resurrection in group 0 tables. We fix this issue in this PR and add a regression test. This issue was uncovered by `test_raft_recovery_entry_loss`, which became flaky recently. We skipped this test for now. We will unskip it in a following PR because it's skipped only on master, while we want to backport this PR. Fixes #26534 This PR contains an important bugfix, so we should backport it to all branches with the Raft-based recovery procedure (2025.2 and newer). Closes scylladb/scylladb#26612 * github.com:scylladb/scylladb: test: test group0 tombstone GC in the Raft-based recovery procedure group0_state_id_handler: remove unused group0_server_accessor group0_state_id_handler: consider state IDs of all non-ignored topology members	2025-10-22 16:40:11 +03:00
Ernest Zaslavsky	a09ec56e3d	cmake: fix `s3_test` linkage Fix missing `s3_test` executable linkage with `scylla_encryption` Closes scylladb/scylladb#26655	2025-10-22 14:14:43 +03:00
Anna Stuchlik	9c0ff7c46b	doc: add support for Debian 12 Fixes https://github.com/scylladb/scylladb/issues/26640 Closes scylladb/scylladb#26668	2025-10-22 14:09:13 +03:00
Petr Gusev	03d6829783	test_tablets_lwt: add test_tablets_merge_waits_for_lwt	2025-10-22 11:33:20 +02:00
Petr Gusev	33e9ea4a0f	test.py: add universalasync_typed_wrap The universalasync.wrap function doesn't preserve the type information, which confuses the VS Code Pylance plugin and makes code navigation hard. In this commit we fix the problem by adding a typed wrapped around universalasync.wrap. Fixes: scylladb/scylladb#26639	2025-10-22 11:32:37 +02:00
Petr Gusev	b23f2a2425	tablet_metadata_guard: fix split/merge handling The guard should stop refreshing the ERM when the number of tablets changes. Tablet splits or merges invalidate the tablet_id field (_tablet), which means the guard can no longer correctly protect ongoing operations from tablet migrations. Fixes scylladb/scylladb#26437	2025-10-22 11:32:37 +02:00
Petr Gusev	ec6fba35aa	tablet_metadata_guard: add debug logs	2025-10-22 11:32:37 +02:00
Petr Gusev	64ba427b85	paxos_state: shards_for_writes: improve the error message Add the current token and tablet info, remove 'this_shard_id' since it's always written by the logging infrastructure.	2025-10-22 11:32:37 +02:00
Petr Gusev	6f4558ed4b	storage_service: barrier_and_drain – change log level to info Debugging global barrier issues is difficult without these logs. Since barriers do not occur frequently, increasing the log level should not produce excessive output.	2025-10-22 11:32:37 +02:00
Petr Gusev	e1667afa50	topology_coordinator: fix log message	2025-10-22 11:32:37 +02:00
Nadav Har'El	895d89a1b7	Update seastar submodule Among other things, the merge includes the patch "http: add "Connection: close" header to final server response.". This Fixes #26298: A missing response header meant that a test's client code sometimes didn't notice that the server closed the connection (since the client didn't need to use the connection again), which made one test flaky. * seastar bd74b3fa...63900e03 (6): > Merge 'Rework output_stream::slow_write()' from Pavel Emelyanov output_stream: Fix indentation of the slow_write() method output_stream: Remove pointless else output_stream: Replace std::swap with std::exchange output_stream: Unify some code-paths of slow_write() > Merge 'Deprecate in/out streams move-assignment operator' from Pavel Emelyanov iostream: Deprecate input/output stream default constructor and move-assignment operator test: Sub-split test-cases test: Don't reuse output_stream in file demo test: Keep input_/output_stream as optional util: Construct file_data_source in with_file_input_stream() websocket: Construct in/out in initializer list rpc: Wrap socket and buffers > scripts/perftune.py: detect corrupted NUMA topology information > Merge 'memory, smp: support more than 256 shards' from Avi Kivity reactor, smp: allocate smp queues across all shards memory: increase maximum shard count memory: make cpu_id_shift and related mask dynamic resource, memory: move memory limit calculation to memory.cc resource: don't error if --overprovisioned and asking for more vcpus than available > Merge 'Update perf_test text output, make columns selectable' from Travis Downs perf_tests: enhance text output perf_test_tests: add some check_output tests	2025-10-22 11:26:40 +03:00
Nadav Har'El	7c9f5ef59e	Merge 'alternator/executor: instantly mark view as built when creating it with base table' from Michał Jadwiszczak `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615 This fix should be backported to 2025.4. Closes scylladb/scylladb#26657 * github.com:scylladb/scylladb: test/alternator/test_tablets: add test for GSI backfill with tablets test/alternator/test_tablets: add reproducer for GSI with tablets alternator/executor: instantly mark view as built when creating it with base table	2025-10-22 10:44:28 +03:00
Avi Kivity	ab488fbb3f	Merge 'Switch to seastar API level 9 (no more packet-s in output_stream/data_sink API)' from Pavel Emelyanov Other than patching Scylla sinks to implement new data_sink_impl::put(std::span<temporary_buffer>) overload, the PR changes transport write_response() method to stop using output_stream::write(scattered_message) because it's also gone. Using newer seastar API, no need to backport Closes scylladb/scylladb#26592 * github.com:scylladb/scylladb: code: Fix indentation after previous patch code: Switch to seastar API level 9 transport: Open-code invoke_with_counting into counting_data_sink::put transport: Don't use scattered_message utils: Implement memory_data_sink::put(net::packet)	2025-10-22 01:51:43 +03:00
Michał Jadwiszczak	34503f43a1	test/alternator/test_tablets: add test for GSI backfill with tablets The test should pass without the fix for scylladb/scylladb#26615, because the `executor::updata_table()` uses `service::prepare_new_view_announcement()`, which creates view building tasks for the view. But it's better to add this test.	2025-10-22 00:34:49 +02:00
Michał Jadwiszczak	bdab455cbb	test/alternator/test_tablets: add reproducer for GSI with tablets	2025-10-22 00:34:10 +02:00
Andrei Chekun	24d17c3ce5	test.py: rewrite the wait_for_first_completed Rewrite wait_for first_completed to return only first completed task guarantee of awaiting(disappearing) all cancelled and finished tasks Use wait_for_first_completed to avoid false pass tests in the future and issues like #26148 Use gather_safely to await tasks and removing warning that coroutine was not awaited Closes scylladb/scylladb#26435	2025-10-22 01:13:43 +03:00
Takuya ASADA	eb30594a60	dist: detect corrupted NUMA topology information There are some environment which has corrupted NUMA topology information, such as some instance types on AWS EC2 with specific Linux kernel images. On such environment, we cannot get HW information correctly from hwloc, so we cannot proceed optimization on perftune. To avoid causing script error, check NUMA topology information and skip running perftune if the information corrupted. Related scylladb/seastar#2925 Closes scylladb/scylladb#26344	2025-10-22 01:11:14 +03:00
Michał Jadwiszczak	8fbf122277	alternator/executor: instantly mark view as built when creating it with base table `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615	2025-10-22 00:05:40 +02:00
Avi Kivity	029513bee9	Merge 'storage_proxy: wait for write handlers destruction' from Petr Gusev `shared_ptr<abstract_write_response_handler>` instances are captured in the `lmutate` and `rmutate` lambdas of `send_to_live_endpoints()`. As a result, an `abstract_write_response_handler` object may outlive its removal from the `storage_proxy::_response_handlers` map -> `cancel_all_write_response_handlers()` doesn't actually wait for requests completion -> `sp::drain_on_shutdown()` doesn't guarantee all requests are drained -> `sp::stop_remote()` completes too early and `paxos_store` is destroyed while LWT local writes might still be in progress. In this PR we introduce a `write_handler_destroy_promise` to wait for such pending instances in `cancel_write_handlers()` and `cancel_all_write_response_handlers()` to prevent the `use-after-free`. A better long-term solution might be to replace `shared_ptr` with `unique_ptr` for `abstract_write_response_handler` and use a separate gate to track the `lmutate/rmutate` lambdas. We do not actually need to wait for these lambdas to finish before sending a timeout or error response to the client, as we currently do in `~abstract_write_response_handler`. Fixes scylladb/scylladb#26355 backport: need to be backported to 2025.4 since #26355 is reproduced on LWT over tablets Closes scylladb/scylladb#26408 * github.com:scylladb/scylladb: test_tablets_lwt: add test_lwt_shutdown storage_proxy: wait for write handler destruction storage_proxy: coroutinize cancel_write_handlers storage_proxy: cancel_write_handlers: don't hold a strong pointer to handler	2025-10-22 00:02:08 +03:00
Michał Hudobski	5c957e83cb	vector_search: remove dependence on cql3 This patch removes the dependence of vector search module on the cql3 module by moving the contents of cql3/type_json.hh to types/json_utils.hh and removing the usage of cql3 primary_key object in vector_store_client. We also make the needed adjustments to files that were previously using the afformentioned type_json.hh file. This fixes the circular dependency cql3 <-> vector_search. Closes scylladb/scylladb#26482	2025-10-21 17:41:55 +03:00
Emil Maskovsky	cf93820c0a	test/cluster: fix missing await in test_group0_tombstone_gc The recursive call to alter_system_schema() was missing the await keyword, which meant the coroutine was never actually executed and the test wasn't doing what it was supposed to do. Not backporting: Test fix only. Closes scylladb/scylladb#26623	2025-10-21 11:22:39 +02:00

1 2 3 4 5 ...

50181 Commits