scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Anna Stuchlik	96a01082bb	doc: add tablets support information to the Drivers table This commit: - Extends the Drivers support table with information on which driver supports tablets and since which version. - Adds the driver support policy to the Drivers page. - Reorganizes the Drivers page to accommodate the updates. In addition: - The CPP-over-Rust driver is added to the table. - The information about Serverless (which we don't support) is removed and replaced with tablets to correctly describe the contents of the table. Fixes https://github.com/scylladb/scylladb/issues/19471 Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69 Closes scylladb/scylladb#24635 (cherry picked from commit `18b4d4a77c`) Closes scylladb/scylladb#25250	2025-07-31 12:18:33 +03:00
Aleksandra Martyniuk	782fb029d6	streaming: close sink when exception is thrown If an exception is thrown in result_handling_cont in streaming, then the sink does not get closed. This leads to a node crash. Close sink in exception handler. Fixes: https://github.com/scylladb/scylladb/issues/25165. Closes scylladb/scylladb#25238 (cherry picked from commit `99ff08ae78`) Closes scylladb/scylladb#25267	2025-07-31 12:18:17 +03:00
Aleksandra Martyniuk	c97da64e45	tasks: do not use binary progress for task manager tasks Currently, progress of a parent task depends on expected_total_workload, expected_children_number, and children progresses. Basically, if total workload is known or all children have already been created, progresses of children are summed up. Otherwise binary progress is returned. As a result, two tasks of the same type may return progress in different units. If they are children of the same task and this parent gathers the progress - it becomes meaningless. Drop expected_children_number as we can't assume that children are able to show their progresses. Modify get_progress method - progress is calculated based on children progresses. If expected_total_workload isn't specified, the total progress of a task may grow. If expected_total_workload isn't specified and no children are created, empty progress (0/0) is returned. Fixes: https://github.com/scylladb/scylladb/issues/24650. Closes scylladb/scylladb#25113 (cherry picked from commit `a7ee2bbbd8`) Closes scylladb/scylladb#25199	2025-07-28 09:25:39 +03:00
Ran Regev	054c658988	scylla.yaml: add recommended value for stream_io_throughput_mb_per_sec Fixes: #24758 Updated scylla.yaml and the help for scylla --help Closes scylladb/scylladb#24793 (cherry picked from commit `db4f301f0c`) Closes scylladb/scylladb#25196	2025-07-28 09:25:29 +03:00
Pavel Emelyanov	95b906bea9	Merge '[Backport 2025.2] storage_service: cancel all write requests after stopping transports' from Scylladb[bot] When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore. If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out. This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped. Fixes scylladb/scylladb#23665 Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3. - (cherry picked from commit `bc934827bc`) - (cherry picked from commit `e0dc73f52a`) Parent PR: #24714 Closes scylladb/scylladb#25169 * github.com:scylladb/scylladb: storage_service: Cancel all write requests on storage_proxy shutdown test: Add test for unfinished writes during shutdown and topology change	2025-07-28 09:25:15 +03:00
Pavel Emelyanov	8622a07bdd	Merge '[Backport 2025.2] streaming: Avoid deadlock by running view checks in a separate scheduling group' from Scylladb[bot] This issue happens with removenode, when RBNO is disabled, so range streamer is used. The deadlock happens in a scenario like this: 1. Start 3 nodes: {A, B, C}, RF=2 2. Node A is lost 3. removenode A 4. Both B and C gain ownership of ranges. 5. Streaming sessions are started with crossed directions: B->C, C->B Readers created by sender side exhaust streaming semaphore on B and C. Receiver side attempts to obtain a permit indirectly by calling check_needs_view_update_path(), which reads local tables. That read is blocked and times-out, causing streaming to fail. The streaming writer is already using a tracking-only permit. Even if we didn't deadlock, and the streaming semaphore was simply exhausted by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation. To avoid that, run the query under a different scheduling group, which translates to the system semaphore instead of the maintenance semaphore, to break the dependency. The gossip group was chosen because it shouldn't be contended and this change should not interfere with it much. Fixes #24807 Fixes #24925 - (cherry picked from commit `ee2fa58bd6`) - (cherry picked from commit `dff2b01237`) Parent PR: #24929 Closes scylladb/scylladb#25055 * github.com:scylladb/scylladb: streaming: Avoid deadlock by running view checks in a separate scheduling group service: migration_manager: Run group0 barrier in gossip scheduling group	2025-07-28 09:24:53 +03:00
Tomasz Grabiec	3991e4de28	streaming: Avoid deadlock by running view checks in a separate scheduling group This issue happens with removenode, when RBNO is disabled, so range streamer is used. The deadlock happens in a scenario like this: 1. Start 3 nodes: {A, B, C}, RF=2 2. Node A is lost 3. removenode A 4. Both B and C gain ownership of ranges. 5. Streaming sessions are started with crossed directions: B->C, C->B Readers created by sender side exhaust streaming semaphore on B and C. Receiver side attempts to obtain a permit indirectly by calling check_needs_view_update_path(), which reads local tables. That read is blocked and times-out, causing streaming to fail. The streaming writer is already using a tracking-only permit. To avoid that, run the query under a different scheduling group, which translates to the system semaphore instead of the maintenance semaphore, to break the dependency. The gossip group was chosen because it shouldn't be contended and this change should not interfere with it much. Fixes: #24807 (cherry picked from commit `dff2b01237`)	2025-07-27 22:52:56 +02:00
Sergey Zolotukhin	f15df0bcce	storage_service: Cancel all write requests on storage_proxy shutdown During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown` as one of the first steps. However, even after RPCs are shut down, some write handlers in `storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM. Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block the messaging server shutdown and delay the entire shutdown process until the write timeout occurs. This change introduces the cancellation of all outstanding write handlers in `storage_proxy` during shutdown to prevent unnecessary delays. Fixes scylladb/scylladb#23665 (cherry picked from commit `e0dc73f52a`)	2025-07-24 13:02:56 +00:00
Sergey Zolotukhin	487012e972	test: Add test for unfinished writes during shutdown and topology change This test reproduces an issue where a topology change and an ongoing write query during query coordinator shutdown can cause the node to get stuck. When a node receives a write request, it creates a write handler that holds a copy of the current table's ERM (Effective Replication Map). The ERM ensures that no topology or schema changes occur while the request is being processed. After the query coordinator receives the required number of replica write ACKs to satisfy the consistency level (CL), it sends a reply to the client. However, the write response handler remains alive until all replicas respond — the remaining writes are handled in the background. During shutdown, when all network connections are closed, these responses can no longer be received. As a result, the write response handler is only destroyed once the write timeout is reached. This becomes problematic because the ERM held by the handler blocks topology or schema change commands from executing. Since shutdown waits for these commands to complete, this can lead to unnecessary delays in node shutdown and restarts, and occasional test case failures. Test for: scylladb/scylladb#23665 (cherry picked from commit `bc934827bc`)	2025-07-24 13:02:56 +00:00
Tomasz Grabiec	fa1b97f0c5	Merge '[Backport 2025.2] Improve background disposal of tablet_metadata' from Scylladb[bot] As seen in #23284, when the tablet_metadata contains many tables, even empty ones, we're seeing a long queue of seastar tasks coming from the individual destruction of `tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`. This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects on their owner shard by sorting them into vectors, per- owner shard. Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the contained tablet_metadata would be cleared gently. Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom and verify that it is gone with this change. Fixes #24814 Refs #23284 This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables. * Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards - (cherry picked from commit `3acca0aa63`) - (cherry picked from commit `493a2303da`) - (cherry picked from commit `e0a19b981a`) - (cherry picked from commit `2b2cfaba6e`) - (cherry picked from commit `2c0bafb934`) - (cherry picked from commit `4a3d14a031`) - (cherry picked from commit `6e4803a750`) Parent PR: #24618 Closes scylladb/scylladb#24863 * github.com:scylladb/scylladb: token_metadata_impl: clear_gently: release version tracker early test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables token_metadata: clear_and_destroy_impl when destroyed token_metadata: keep a reference to shared_token_metadata token_metadata: move make_token_metadata_ptr into shared_token_metadata class replica: database: get and expose a mutable locator::shared_token_metadata locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction	2025-07-23 17:00:44 +02:00
Piotr Dulikowski	81d1790655	Merge '[Backport 2025.2] cdc: Forbid altering columns of CDC log tables directly' from Scylladb[bot] The set of columns of a CDC log table should be managed automatically by Scylla, and the user should not have the ability to manipulate them directly. That could lead to disastrous consequences such as a segmentation fault. In this commit, we're restricting those operations. We also provide two validation tests. One of the existing tests had to be adjusted as it modified the type of a column in a CDC log table. Since the test simply verifies that the user has sufficient permissions to perform `ALTER TABLE` on the log table, the test is still valid. Fixes scylladb/scylladb#24643 Backport: we should backport the change to all affected branches to prevent the consequences that may affect the user. - (cherry picked from commit `20d0050f4e`) - (cherry picked from commit `59800b1d66`) Parent PR: #25008 Closes scylladb/scylladb#25107 * github.com:scylladb/scylladb: cdc: Forbid altering columns of inactive CDC log table cdc: Forbid altering columns of CDC log tables directly	2025-07-22 12:35:20 +02:00
Piotr Dulikowski	ea50c02a02	Merge '[Backport 2025.2] cdc: throw error if column doesn't exist' from Scylladb[bot] in the CDC log transformer, when creating a CDC mutation based on some base table mutation, for each value of a base column we set the value in the CDC column with the same name. When looking up the column in the CDC schema by name, we may get a null pointer if a column by that name is not found. This shouldn't happen normally because the base schema and CDC schema should be compatible, and for each base column there should be a CDC column with the same name. However, there are scenarios where the base schema and CDC schema are incompatible for a short period of time when they are being altered. When a base column is being added or dropped, we could get a base mutation with this column set, and then the CDC transformer picks up the latest CDC schema which doesn't have this column. If such thing happens, we fix the code to throw an exception instead of crashing on null pointer dereference. Currently we don't have a safer approach to handle this, but this might be changed in the future. The other alternative is dropping that data silently which we prefer not to do. Throwing an error is acceptable because this scenario most likely indicates this behavior by the user: * The user adds a new column, and start writing values to the column before the ALTER is complete. or, * The user drops a column, and continues writing values to the column while it's being dropped. Both cases might as well fail with an error because the column is not found in the base table. Fixes scylladb/scylladb#24952 backport needed - simple fix for a node crash - (cherry picked from commit `b336f282ae`) - (cherry picked from commit `86dfa6324f`) Parent PR: #24986 Closes scylladb/scylladb#25066 * github.com:scylladb/scylladb: test: cdc: add test_cdc_with_alter cdc: throw error if column doesn't exist	2025-07-21 16:01:52 +02:00
Dawid Mędrek	c9735a6015	cdc: Forbid altering columns of inactive CDC log table When CDC becomes disabled on the base table, the CDC log table still exsits (cf. scylladb/scylladb@adda43edc7). If it continues to exist up to the point when CDC is re-enabled on the base table, no new log table will be created -- instead, the old olg table will be re-attached. Since we want to avoid situations when the definition of the log table has become misaligned with the definition of the base table due to actions of the user, we forbid modifying the set of columns or renaming them in CDC log tables, even when they're inactive. Validation tests are provided. (cherry picked from commit `59800b1d66`)	2025-07-21 11:43:14 +00:00
Dawid Mędrek	038ba48917	cdc: Forbid altering columns of CDC log tables directly The set of columns of a CDC log table should be managed automatically by Scylla, and the user should not have the ability to manipulate them directly. That could lead to disastrous consequences such as a segmentation fault. In this commit, we're restricting those operations. We also provide two validation tests. One of the existing tests had to be adjusted as it modified the type of a column in a CDC log table. Since the test simply verifies that the user has sufficient permissions to perform `ALTER TABLE` on the log table, the test is still valid. Fixes scylladb/scylladb#24643 (cherry picked from commit `20d0050f4e`)	2025-07-21 11:43:13 +00:00
Ernest Zaslavsky	6d8350b20d	s3_client: parse multipart response XML defensively Ensure robust handling of XML responses when initiating multipart uploads. Check for the existence of required nodes before access, and throw an exception if the XML is empty or malformed. Refs: https://github.com/scylladb/scylladb/issues/24676 Closes scylladb/scylladb#24990 (cherry picked from commit `342e94261f`) Closes scylladb/scylladb#25054	2025-07-21 12:08:25 +02:00
Benny Halevy	edce417036	token_metadata_impl: clear_gently: release version tracker early No need to wait for all members to be cleared gently. We can release the version earlier since the held version may be awaited for in barriers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `6e4803a750`)	2025-07-21 09:49:05 +03:00
Benny Halevy	179e2b3bf1	test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables Reproduces #23284 Currently skipped in release mode since it requires the `short_tablet_stats_refresh_interval` interval. Ref #24641 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `4a3d14a031`)	2025-07-21 09:49:02 +03:00
Benny Halevy	29c33cb065	token_metadata: clear_and_destroy_impl when destroyed We have a lot of places in the code where a token_metadata_ptr is kept in an automatic variable and destroyed when it leaves the scope. since it's a referenced counted lw_shared_ptr, the token_metadata object is rarely destroyed in those cases, but when it is, it doesn't go through clear_gently, and in particular its tablet_metadata is not cleared gently, leading to inefficient destruction of potentially many foreign_ptr:s. This patch calls clear_and_destroy_impl that gently clears and destroys the impl object in the background using the shared_token_metadata. Fixes #13381 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `2c0bafb934`)	2025-07-21 09:36:40 +03:00
Benny Halevy	4da9539831	token_metadata: keep a reference to shared_token_metadata To be used by a following patch to gently clean and destroy the token_data_impl in the background. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `2b2cfaba6e`)	2025-07-21 09:36:40 +03:00
Benny Halevy	390ca79ae4	token_metadata: move make_token_metadata_ptr into shared_token_metadata class So we can use the local shared_token_metadata instance for safe background destroy of token_metadata_impl:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `e0a19b981a`)	2025-07-21 09:36:40 +03:00
Benny Halevy	1113bb2580	replica: database: get and expose a mutable locator::shared_token_metadata Prepare for next patch, the will use this shared_token_metadata to make mutable_token_metadata_ptr:s Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `493a2303da`)	2025-07-21 09:36:40 +03:00
Benny Halevy	a59a1b422f	locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction Sort all tablet_map_ptr:s by shard_id and then destroy them on each shard to prevent long cross-shard task queues for foreign_ptr destructions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `3acca0aa63`)	2025-07-21 09:36:40 +03:00
Michael Litvak	23dbe8952b	test: cdc: add test_cdc_with_alter Add a test that tests adding and dropping a column to a table with CDC enabled while writing to it. (cherry picked from commit `86dfa6324f`)	2025-07-20 09:07:29 +02:00
Michael Litvak	f2af0c5f18	cdc: throw error if column doesn't exist in the CDC log transformer, when creating a CDC mutation based on some base table mutation, for each value of a base column we set the value in the CDC column with the same name. When looking up the column in the CDC schema by name, we may get a null pointer if a column by that name is not found. This shouldn't happen normally because the base schema and CDC schema should be compatible, and for each base column there should be a CDC column with the same name. However, there are scenarios where the base schema and CDC schema are incompatible for a short period of time when they are being altered. When a base column is being added or dropped, we could get a base mutation with this column set, and then the CDC transformer picks up the latest CDC schema which doesn't have this column. If such thing happens, we fix the code to throw an exception instead of crashing on null pointer dereference. Currently we don't have a safer approach to handle this, but this might be changed in the future. The other alternative is dropping that data silently which we prefer not to do. Throwing an error is acceptable because this scenario most likely indicates this behavior by the user: * The user adds a new column, and start writing values to the column before the ALTER is complete. or, * The user drops a column, and continues writing values to the column while it's being dropped. Both cases might as well fail with an error because the column is not found in the base table. Fixes scylladb/scylladb#24952 (cherry picked from commit `b336f282ae`)	2025-07-18 10:36:07 +00:00
Calle Wilund	0d61d63e7e	utils::http::dns_connection_factory: Use a shared certificate_credentials Fixes #24447 This factory type, which is really more a data holder/connection producer per connection instance, creates, if using https, a new certificate_credentials on every instance. Which when used by S3 client is per client and scheduling groups. Which eventually means that we will do a set_system_trust + "cold" handshake for every tls connection created this way. This will cause both IO and cold/expensive certificate checking -> possible stalls/wasted CPU. Since the credentials object in question is literally a "just trust system", it could very well be shared across the shard. This PR adds a thread local static cached credentials object and uses this instead. Could consider moving this to seastar, but maybe this is too much. Closes scylladb/scylladb#24448 (cherry picked from commit `80feb8b676`) Closes scylladb/scylladb#24461	2025-07-18 09:34:45 +03:00
Tomasz Grabiec	23e365fc7b	service: migration_manager: Run group0 barrier in gossip scheduling group Fixes two issues. One is potential priority inversion. The barrier will be executed using scheduling group of the first fiber which triggers it, the rest will block waiting on it. For example, CQL statements which need to sync the schema on replica side can block on the barrier triggered by streaming. That's undesirable. This is theoretical, not proved in the field. The second problem is blocking the error path. This barrier is called from the streaming error handling path. If the streaming concurrency semaphore is exhausted, and streaming fails due to timeout on obtaining the permit in check_needs_view_update_path(), the error path will block too because it will also attempt to obtain the permit as part of the group0 barrier. Running it in the gossip scheduling group prevents this. Fixes #24925 (cherry picked from commit `ee2fa58bd6`)	2025-07-17 17:25:10 +00:00
Piotr Dulikowski	97659e19b8	auth: fix crash when migration code runs parallel with raft upgrade The functions password_authenticator::start and standard_role_manager::start have a similar structure: they spawn a fiber which invokes a callback that performs some migration until that migration succeeds. Both handlers set a shared promise called _superuser_created_promise (those are actually two promises, one for the password authenticator and the other for the role manager). The handlers are similar in both cases. They check if auth is in legacy mode, and behave differently depending on that. If in legacy mode, the promise is set (if it was not set before), and some legacy migration actions follow. In auth-on-raft mode, the superuser is attempted to be created, and if it succeeds then the promise is _unconditionally_ set. While it makes sense at a glance to set the promise unconditionally, there is a non-obvious corner case during upgrade to topology on raft. During the upgrade, auth switches from the legacy mode to auth on raft mode. Thus, if the callback didn't succeed in legacy mode and then tries to run in auth-on-raft mode and succeds, it will unconditionally set a promise that was already set - this is a bug and triggers an assertion in seastar. Fix the issue by surrounding the `shared_promise::set_value` call with an `if` - like it is already done for the legacy case. Fixes: scylladb/scylladb#24975 Closes scylladb/scylladb#24976 (cherry picked from commit `a14b7f71fe`) Closes scylladb/scylladb#25018	2025-07-17 17:55:25 +02:00
Botond Dénes	4eb070b816	Merge '[Backport 2025.2] storage_service: Use utils::chunked_vector to avoid big allocation' from Scylladb[bot] The following was seen: ``` !WARNING \| scylla[6057]: [shard 12:strm] seastar_memory - oversized allocation: 212992 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at [Backtrace #0] void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&, bool) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:89 (inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:99 seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136 seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169 seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:848 seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:911 operator new(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:1706 std::allocator<dht::token_range_endpoints>::allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/allocator.h:196 (inlined by) std::allocator_traits<std::allocator<dht::token_range_endpoints> >::allocate(std::allocator<dht::token_range_endpoints>&, unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/alloc_traits.h:515 (inlined by) std::_Vector_base<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:380 (inlined by) void std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_realloc_append<dht::token_range_endpoints const&>(dht::token_range_endpoints const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/vector.tcc:596 locator::describe_ring(replica::database const&, gms::gossiper const&, seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:1294 std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242 (inlined by) seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:80 seastar::reactor::do_run() at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:2635 std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:4684 ``` Fix by using chunked_vector. Fixes #24158 - (cherry picked from commit `c5a136c3b5`) Parent PR: #24561 Closes scylladb/scylladb#24891 * github.com:scylladb/scylladb: storage_service: Use utils::chunked_vector to avoid big allocation utils: chunked_vector: implement erase() for single elements and ranges utils: chunked_vector: implement insert() for single-element inserts	2025-07-16 15:58:25 +03:00
Asias He	67375ecf14	storage_service: Use utils::chunked_vector to avoid big allocation The following was seen: ``` !WARNING \| scylla[6057]: [shard 12:strm] seastar_memory - oversized allocation: 212992 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at [Backtrace #0] void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&, bool) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:89 (inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:99 seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136 seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169 seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:848 seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:911 operator new(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:1706 std::allocator<dht::token_range_endpoints>::allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/allocator.h:196 (inlined by) std::allocator_traits<std::allocator<dht::token_range_endpoints> >::allocate(std::allocator<dht::token_range_endpoints>&, unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/alloc_traits.h:515 (inlined by) std::_Vector_base<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:380 (inlined by) void std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_realloc_append<dht::token_range_endpoints const&>(dht::token_range_endpoints const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/vector.tcc:596 locator::describe_ring(replica::database const&, gms::gossiper const&, seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:1294 std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242 (inlined by) seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:80 seastar::reactor::do_run() at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:2635 std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:4684 ``` Fix by using chunked_vector. Fixes #24158 Closes scylladb/scylladb#24561 (cherry picked from commit `c5a136c3b5`)	2025-07-16 07:43:39 +08:00
Avi Kivity	8f65d7e63b	utils: chunked_vector: implement erase() for single elements and ranges Implement using std::rotate() and resize(). The elements to be erased are rotated to the end, then resized out of existence. Again we defer optimization for trivially copyable types. Unit tests are added. Needed for range_streamer with token_ranges using chunked_vector. (cherry picked from commit `d6eefce145`)	2025-07-16 07:43:29 +08:00
Avi Kivity	c6b0bacfb1	utils: chunked_vector: implement insert() for single-element inserts partition_range_compat's unwrap() needs insert if we are to use it for chunked_vector (which we do). Implement using push_back() and std::rotate(). emplace(iterator, args) is also implemented, though the benefit is diluted (it will be moved after construction). The implementation isn't optimal - if T is trivially copyable then using std::memmove() will be much faster that std::rotate(), but this complex optimization is left for later. Unit tests are added. (cherry picked from commit `5301f3d0b5`)	2025-07-16 07:43:21 +08:00
Patryk Jędrzejczak	7bb43d812e	test: test_zero_token_nodes_multidc: properly handle reads with CL=ONE The test could fail with RF={DC1: 2, DC2: 0} and CL=ONE when: - both writes succeeded with the same replica responding first, - one of the following reads succeeded with the other replica responding before it applied mutations from any of the writes. We fix the test by not expecting reads with CL=ONE to return a row. We also harden the test by inserting different rows for every pair (CL, coordinator), where one of the two coordinators is a normal node from DC1, and the other one is a zero-token node from DC2. This change makes sure that, for example, every write really inserts a row. Fixes scylladb/scylladb#22967 The fix addresses CI flakiness and only changes the test, so it should be backported. Closes scylladb/scylladb#23518 (cherry picked from commit `21edec1ace`) Closes scylladb/scylladb#24984 scylla-2025.2.1 scylla-2025.2.1-candidate-20250716020406	2025-07-15 15:50:21 +02:00
Botond Dénes	9482f45d13	test/cluster/test_read_repair: write 100 rows in trace test This test asserts that a read repair really happened. To ensure this happens it writes a single partition after enabling the database_apply error injection point. For some reason, the write is sometimes reordered with the error injection and the write will get replicated to both nodes and no read repair will happen, failing the test. To make the test less sensitive to such rare reordering, add a clustering column to the table and write a 100 rows. The chance of all 100 of them being reordered with the error injection should be low enough that it doesn't happen again (famous last words). Fixes: #24330 Closes scylladb/scylladb#24403 (cherry picked from commit `495f607e73`) Closes scylladb/scylladb#24973	2025-07-15 13:27:31 +03:00
Aleksandra Martyniuk	5debdce91d	replica: hold compaction group gate during flush Destructor of database_sstable_write_monitor, which is created in table::try_flush_memtable_to_sstable, tries to get the compaction state of the processed compaction group. If at this point the compaction group is already stopped (and the compaction state is removed), e.g. due to concurrent tablet merge, an exception is thrown and a node coredumps. Add flush gate to compaction group to wait for flushes in compaction_group::stop. Hold the gate in seal function in table::make_memtable_list. seal function is turned into a coroutine to ensure it won't throw. Wait until async_gate is closed before flushing, to ensure that all data is written into sstables. Stop ongoing compactions beforehand. Remove unnecessary flush in tablet_storage_group_manager::merge_completion_fiber. Stop method already flushes the compaction group. Fixes: #23911. Closes scylladb/scylladb#24582 (cherry picked from commit `2ec54d4f1a`) Closes scylladb/scylladb#24951	2025-07-15 13:26:39 +03:00
Michael Litvak	15517ba529	tablets: stop storage group on deallocation When a tablet transitions to a post-cleanup stage on the leaving replica we deallocate its storage group. Before the storage can be deallocated and destroyed, we must make sure it's cleaned up and stopped properly. Normally this happens during the tablet cleanup stage, when table::cleanup_table is called, so by the time we transition to the next stage the storage group is already stopped. However, it's possible that tablet cleanup did not run in some scenario: 1. The topology coordinator runs tablet cleanup on the leaving replica. 2. The leaving replica is restarted. 3. When the leaving replica starts, still in `cleanup` stage, it allocates a storage group for the tablet. 4. The topology coordinator moves to the next stage. 5. The leaving replica deallocates the storage group, but it was not stopped. To address this scenario, we always stop the storage group when deallocating it. Usually it will be already stopped and complete immediately, and otherwise it will be stopped in the background. Fixes scylladb/scylladb#24857 Fixes scylladb/scylladb#24828 Closes scylladb/scylladb#24896 (cherry picked from commit `fa24fd7cc3`) Closes scylladb/scylladb#24908	2025-07-15 13:25:38 +03:00
Aleksandra Martyniuk	ccfc053dd5	repair: Reduce max row buf size when small table optimization is on If small_table_optimization is on, a repair works on a whole table simultaneously. It may be distributed across the whole cluster and all nodes might participate in repair. On a repair master, row buffer is copied for each repair peer. This means that the memory scales with the number of peers. In large clusters, repair with small_table_optimization leads to OOM. Divide the max_row_buf_size by the number of repair peers if small_table_optimization is on. Use max_row_buf_size to calculate number of units taken from mem_sem. Fixes: https://github.com/scylladb/scylladb/issues/22244. Closes scylladb/scylladb#24868 (cherry picked from commit `17272c2f3b`) Closes scylladb/scylladb#24905	2025-07-15 13:24:49 +03:00
Botond Dénes	6749954b2a	Merge '[Backport 2025.2] test.py: Fix start 3rd party services' from Scylladb[bot] Move 3rd party services starting under `try` clause to avoid situation that main process is collapses without going stopping services. Without this, if something wrong during start it will not trigger execution exit artifacts, so the process will stay forever. This functionality in 2025.2 and can potentially affect jobs, so backport needed. Fixes: #24773 - (cherry picked from commit `0ca539e162`) - (cherry picked from commit `c6c3e9f492`) Parent PR: #24734 Closes scylladb/scylladb#24774 * github.com:scylladb/scylladb: test.py: use unique hostname for Minio test.py: Catch possible exceptions during 3rd party services start	2025-07-15 13:23:12 +03:00
Pavel Emelyanov	71e9f5e662	sstables_loader: Fix load-and-stream vs skip-cleanup check The intention was to fail the REST API call in case --skip-cleanup is requested for --load-and-stream loading. The corresponding if expression is checking something else :( despite log message is correct. Fixes: https://github.com/scylladb/scylladb/issues/24913 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Signed-off-by: Ran Regev <ran.regev@scylladb.com> (cherry picked from commit `bd3bd089e1`) Closes scylladb/scylladb#24947	2025-07-15 13:08:32 +03:00
Yaron Kaikov	f1d4266b7a	dist/common/scripts/scylla_sysconfig_setup: fix `SyntaxWarning: invalid escape sequence` There are invalid escape sequence warnings where raw strings should be used for the regex patterns Fixes: https://github.com/scylladb/scylladb/issues/24915 Closes scylladb/scylladb#24916 (cherry picked from commit `fdcaa9a7e7`) Closes scylladb/scylladb#24968	2025-07-15 11:06:41 +02:00
Yaron Kaikov	9c0181e813	auto-backport.py: Avoid bot push to existing backport branches Changed the backport logic so that the bot only pushes the backport branch if it does not already exist in the remote fork. If the branch exists, the bot skips the push, allowing only users to update (force-push) the branch after the backport PR is open. Fixes: https://github.com/scylladb/scylladb/issues/24953 Closes scylladb/scylladb#24954 (cherry picked from commit `ed7c7784e4`) Closes scylladb/scylladb#24967	2025-07-15 10:27:28 +02:00
Aleksandra Martyniuk	6f8b378e80	nodetool: repair: skip tablet keyspaces Currently, nodetool repair command repairs both vnode and tablet keyspaces if no keyspace is specified. We should use this command to repair only vnode keyspaces, but this isn't easily accessible - we have to explicitly run repair only on vnode keyspaces. nodetool repair skips tablet keyspaces unless a tablet keyspace is explicitely passed as an argument. Fixes: #24040. Closes scylladb/scylladb#24042	2025-07-15 06:36:08 +03:00
Jenkins Promoter	afadcc648d	Update pgo profiles - aarch64	2025-07-15 05:39:11 +03:00
Jenkins Promoter	2397a93410	Update pgo profiles - x86_64	2025-07-15 05:23:22 +03:00
Yaron Kaikov	6e06c57fc7	packaging: add `ps` command to dependancies ScyllaDB container image doesn't have ps command installed, while this command is used by perftune.py script shipped within the same image. This breaks node and container tuning in Scylla Operator. Fixes: #24827 Closes scylladb/scylladb#24830 (cherry picked from commit `66ff6ab6f9`) Closes scylladb/scylladb#24955	2025-07-14 14:26:38 +03:00
Gleb Natapov	ece8a8b3bc	api: unregister raft_topology_get_cmd_status on shutdown In `c8ce9d1c60` we introduced raft_topology_get_cmd_status REST api but the commit forgot to unregister the handler during shutdown. Fixes #24910 Closes scylladb/scylladb#24911 (cherry picked from commit `89f2edf308`) Closes scylladb/scylladb#24922	2025-07-14 11:39:42 +02:00
Avi Kivity	5bed6c7a7f	storage_proxy: avoid large allocation when storing batch in system.batchlog Currently, when computing the mutation to be stored in system.batchlog, we go through data_value. In turn this goes through `bytes` type (#24810), so it causes a large contiguous allocation if the batch is large. Fix by going through the more primitive, but less contiguous, atomic_cell API. Fixes #24809. Closes scylladb/scylladb#24811 (cherry picked from commit `60f407bff4`) Closes scylladb/scylladb#24845	2025-07-13 14:11:01 +03:00
Patryk Jędrzejczak	605106a9c6	Merge '[Backport 2025.2] Make it easier to debug stuck raft topology operation.' from Scylladb[bot] The series adds more logging and provides new REST api around topology command rpc execution to allow easier debugging of stuck topology operations. Backport since we want to have in the production as quick as possible. Fixes #24860 - (cherry picked from commit `c8ce9d1c60`) - (cherry picked from commit `4e6369f35b`) Parent PR: #24799 Closes scylladb/scylladb#24879 * https://github.com/scylladb/scylladb: topology coordinator: log a start and an end of topology coordinator command execution at info level topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc	2025-07-09 12:58:14 +02:00
Piotr Dulikowski	e4dde34f52	Merge '[Backport 2025.2] main: don't start maintenance auth service if not enabled' from Scylladb[bot] In `f96d30c2b5` we introduced the maintenance service, which is an additional instance of auth::service. But this service has a somewhat confusing 2-level startup mechanism: it's initialized with sharded<Service>::start and then auth::service::start (different method with the same name to confuse even more). When maintenance_socket was disabled (default setting), the code did only the first part of the startup. This registered a config observer but didn't create a permission_cache instance. As a result, a crash on SIGHUP when config is reloaded can occur. Fixes: https://github.com/scylladb/scylladb/issues/24528 Backport: all not eol versions since 6.0 and 2025.1 - (cherry picked from commit `97c60b8153`) - (cherry picked from commit `dd01852341`) Parent PR: #24527 Closes scylladb/scylladb#24570 * github.com:scylladb/scylladb: test: add test for live updates of permissions cache config main: don't start maintenance auth service if not enabled	2025-07-09 09:47:57 +02:00
Piotr Dulikowski	8ebd67e1c3	Merge '[Backport 2025.2] batchlog_manager: abort replay of a failed batch on shutdown or node down' from Scylladb[bot] When replaying a failed batch and sending the mutation to all replicas, make the write response handler cancellable and abort it on shutdown or if some target is marked down. also set a reasonable timeout so it gets aborted if it's stuck for some other unexpected reason. Previously, the write response handler is not cancellable and has no timeout. This can cause a scenario where some write operation by the batchlog manager is stuck indefinitely, and node shutdown gets stuck as well because it waits for the batchlog manager to complete, without aborting the operation. backport to relevant versions since the issue can cause node shutdown to hang Fixes scylladb/scylladb#24599 - (cherry picked from commit `8d48b27062`) - (cherry picked from commit `fc5ba4a1ea`) - (cherry picked from commit `7150632cf2`) - (cherry picked from commit `74a3fa9671`) - (cherry picked from commit `a9b476e057`) - (cherry picked from commit `d7af26a437`) Parent PR: #24595 Closes scylladb/scylladb#24880 * github.com:scylladb/scylladb: test: test_batchlog_manager: batchlog replay includes cdc test: test_batchlog_manager: test batch replay when a node is down batchlog_manager: set timeout on writes batchlog_manager: abort writes on shutdown batchlog_manager: create cancellable write response handler storage_proxy: add write type parameter to mutate_internal	2025-07-08 15:39:58 +02:00
Michael Litvak	a26d8f72b6	test: test_batchlog_manager: batchlog replay includes cdc Add a new test that verifies that when replaying batch mutations from the batchlog, the mutations include cdc augmentation if needed. This is done in order to verify that it works currently as expected and doesn't break in the future. (cherry picked from commit `d7af26a437`)	2025-07-08 06:25:03 +00:00

1 2 3 4 5 ...

47941 Commits