scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	8f6296b905	Simplify ungzip implementation per review feedback - Remove manual gzip header parsing - libdeflate handles all format details - Rename linearize_chunked_content to build_input_buffer and free chunks as we copy - Add output chunking to split large decompressed data into 1MB chunks - Add comment explaining libdeflate's whole-buffer requirement - Use better initial size heuristic based on compression ratio Co-authored-by: nyh <584227+nyh@users.noreply.github.com>	2025-11-19 12:47:02 +00:00
copilot-swe-agent[bot]	4f44a61b3a	Add edge case check for length limit in ungzip - Check if total_decompressed >= length_limit before allocating output buffer - Prevents allocating a zero-sized buffer when limit is already reached - Ensures clear error message when limit is exceeded Co-authored-by: nyh <584227+nyh@users.noreply.github.com>	2025-11-19 11:50:31 +00:00
copilot-swe-agent[bot]	362491a650	Fix ungzip implementation to properly handle concatenated gzip files - Removed unused get_gzip_member_size function - Rely on libdeflate_gzip_decompress to tell us how many input bytes were consumed - Added check for zero bytes consumed to detect invalid state - Simplified the logic by removing unnecessary header size tracking Co-authored-by: nyh <584227+nyh@users.noreply.github.com>	2025-11-19 11:48:35 +00:00
copilot-swe-agent[bot]	b818331420	Add ungzip function implementation with libdeflate - Created utils/gzip.hh header with ungzip function declaration - Created utils/gzip.cc implementation using libdeflate - Updated utils/CMakeLists.txt to include gzip.cc and link libdeflate - Created comprehensive test suite in test/boost/gzip_test.cc - Added gzip_test to test/boost/CMakeLists.txt The implementation: - Uses libdeflate for high-performance gzip decompression - Handles chunked_content input/output (vector of temporary_buffer) - Supports concatenated gzip files - Validates gzip headers and detects invalid/truncated/corrupted data - Enforces size limits to prevent memory exhaustion - Runs in async context to avoid blocking the reactor Co-authored-by: nyh <584227+nyh@users.noreply.github.com>	2025-11-19 11:46:29 +00:00
copilot-swe-agent[bot]	c714159d5c	Initial plan	2025-11-19 11:32:38 +00:00
Patryk Jędrzejczak	e35ba974ce	test: test_raft_recovery_stuck: ensure mutual visibility before using driver Not waiting for nodes to see each other as alive can cause the driver to fail the request sent in `wait_for_upgrade_state()`. scylladb/scylladb#19771 has already replaced concurrent restarts with `ManagerClient.rolling_restart()`, but it has missed this single place, probably because we do concurrent starts here. Fixes #27055 Closes scylladb/scylladb#27075	2025-11-19 05:54:12 +01:00
David Garcia	3f2655a351	docs: add liveness::MustRestart support Closes scylladb/scylladb#27079	2025-11-18 15:28:55 +01:00
Szymon Wasik	f714876eaf	Add documentation about lack of returning similarity distances This patch adds the missing warning about the lack of possibility to return the similarity distance. This will be added in the next iteration. Fixes #27086 It has to be backported to 2025.4 as this is the limitation in 2025.4. Closes scylladb/scylladb#27096	2025-11-18 13:50:36 +01:00
Avi Kivity	f7413a47e4	sstables: writer: avoid recursion in variadic write() Following `9b6ce030d0` ("sstables: remove quadratic (and possibly exponential) compile time in parse()"), where we removed recursion in reading, we do the same here for variadic write. This results in a small reduction in compile time. Note the problem isn't very bad here. This is tail-recursion, so likely removed by the compiler during optimization, and we don't have additional amplification due to future::then() double-compiling the ready-future and unready-future paths. Still, better to avoid quadratic compile times. Closes scylladb/scylladb#27050	2025-11-18 08:17:17 +02:00
Botond Dénes	2ca66133a4	Revert "db/config: don't use RBNO for scaling" This reverts commit `43738298be`. This commit causes instability in dtests. Several non-gating dtests started failing, as well as some gating ones, see #27047. Closes scylladb/scylladb#27067 Fixes #27047	2025-11-18 08:17:17 +02:00
Botond Dénes	0dbad38eed	Merge 'docs/dev/topology-over-raft: make various updates' from Patryk Jędrzejczak The updates include: - adding missing parts like topology states and table rows, - documenting zero-token nodes, - replacing the old recovery procedure with the new one. Fixes #26412 Updates of internal docs (usually read on master) don't require backporting. Closes scylladb/scylladb#27022 * github.com:scylladb/scylladb: docs/dev/topology-over-raft: update the recovery section docs/dev/topology-over-raft: document zero-token nodes docs/dev/topology-over-raft: clarify the lack of tablet-specific states docs/dev/topology-over-raft: add the missing join_group0 state docs/dev/topology-over-raft: update the topology columns	2025-11-18 08:17:17 +02:00
Patryk Jędrzejczak	adaa0560d9	Merge 'Automatic cleanup improvements' from Gleb Natapov This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously. Fixes https://github.com/scylladb/scylladb/issues/26866 Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing. Closes scylladb/scylladb#26868 * https://github.com/scylladb/scylladb: cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster cleanup: Add RESTful API to allow reset cleanup needed flag	2025-11-18 08:17:17 +02:00
Pavel Emelyanov	02513ac2b8	alternator: Get feature service from proxy directly The executor::add_stream_options() obtains local database reference from proxy just to get feature service from it. Similar chain is used in executor::update_time_to_live(). It's shorter to get features from proxy itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26973	2025-11-18 08:17:16 +02:00
Botond Dénes	514c1fc719	Merge 'db: batchlog_manager: update _last_replay only if all batches were re…' from Aleksandra Martyniuk …played Currently, if flushing hints falls within the repair cache timeout, then the flush_time is set to batchlog_manager::_last_replay. _last_replay is updated on each replay, even if some batches weren't replayed. Due to that, we risk the data resurrection. Update _last_replay only if all batches were replayed. Fixes: https://github.com/scylladb/scylladb/issues/24415. Needs backport to all live versions. Closes scylladb/scylladb#26793 * github.com:scylladb/scylladb: test: extend test_batchlog_replay_failure_during_repair db: batchlog_manager: update _last_replay only if all batches were replayed	2025-11-18 08:17:16 +02:00
Botond Dénes	d54d409a52	Merge 'audit: write out to both table and syslog' from Dario Mirovic This patch adds support for multiple audit log outputs. If only one audit log output is enabled, the behavior does not change. If multiple audit log outputs are enabled, then the `audit_composite_storage_helper` class is used. It has a collection of `storage_helper` objects. Performance testing shows that read query throughput and auth request throughput are consistent even at high reactor utilization. It can also be observed that read query latency increases a bit. Read query ops = 60k/s AUTH ops = 200/s \| Audit Mode \| QUERY latency (p99) \| Δ% vs none \| \|------------\|---------------------\|------------\| \| none \| 777 \| 0 \| \|table\| 801 \| +3.09% \| \|syslog \| 803 \| +3.35% \| \|table,syslog \| 818 \| +5.28% \| Read query ops = 50k/s AUTH ops = 200/s \| Audit Mode \| QUERY latency (p99) \| Δ% vs none \| \|------------\|---------------------\|------------\| \| none \| 643 \| 0 \| \|table\| 647 \| +0.62% \| \|syslog \| 648 \| +0.78% \| \|table,syslog \| 656 \| +2.02% \| Detailed performance results are in the following Confluence document: [Audit performance impact test](https://scylladb.atlassian.net/wiki/spaces/RND/pages/148308005/Audit+performance+impact+test) Fixes #26022 Backport: The decision is to not backport for now. After making sure it works on the latest release, and if there is a need, we can do it. Closes scylladb/scylladb#26613 * github.com:scylladb/scylladb: test: dtest: audit_test.py: add AuditBackendComposite test: dtest: audit_test.py: group logs in dict per audit mode audit: write out to both table and syslog audit: move storage helper creation from `audit::start` to `audit::audit` audit: fix formatting in `audit::start_audit` audit: unify `create_audit` and `start_audit`	2025-11-17 15:04:15 +02:00
Gleb Natapov	0f0ab11311	cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster `97ab3f6622` changed "nodetool cleanup" (without arguments) to run cleanup on all dirty nodes in the cluster. This was somewhat unexpected, so this patch changes it back to run cleanup on the target node only (and reset "cleanup needed" flag afterwards) and it adds "nodetool cluster cleanup" command that runs the cleanup on all dirty nodes in the cluster.	2025-11-17 15:00:51 +02:00
Piotr Dulikowski	c29efa2cdb	Merge 'vector_search: Improve vector-store health checking' from Karol Nowacki A Vector Store node is now considered down if it returns an HTTP 500 server error. This can happen, for example, if the node fails to connect to the database or has not completed its initial full scan. The logic for marking a node as 'up' is also enhanced. A node is now only considered up when its status is explicitly 'SERVING'. Fixes: VECTOR-187 Backport to 2025.4 as this feature is expected to be available in 2025.4. Closes scylladb/scylladb#26413 * github.com:scylladb/scylladb: vector_search: Improve vector-store health checking vector_search: Move response_content_to_sstring to utils.hh vector_search: Add unit tests for client error handling vector_search: Enable mocking of status requests vector_search: Extract abort_source_timeout and repeat_until vector_search: Move vs_mock_server to dedicated files	2025-11-17 12:16:07 +01:00
Patryk Jędrzejczak	b5f38e4590	docs/dev/topology-over-raft: update the recovery section We have the new recovery procedure now, but this doc hasn't been updated. It still describes the old recovery procedure. For comparison, external docs can be found here: https://docs.scylladb.com/manual/master/troubleshooting/handling-node-failures.html#manual-recovery-procedure Fixes #26412	2025-11-17 10:40:23 +01:00
Patryk Jędrzejczak	785a3302e6	docs/dev/topology-over-raft: document zero-token nodes The topology transitions are a bit different for zero-token nodes, which is worth mentioning.	2025-11-17 10:40:23 +01:00
Patryk Jędrzejczak	d75558e455	docs/dev/topology-over-raft: clarify the lack of tablet-specific states Tablets are never mentioned before this part of the doc, so it may be confusing why some topology states are missing.	2025-11-17 10:40:23 +01:00
Patryk Jędrzejczak	c362ea4dcb	docs/dev/topology-over-raft: add the missing join_group0 state This state was added as a part of the join procedure, and we didn't update this part of the doc.	2025-11-17 10:40:23 +01:00
Patryk Jędrzejczak	182d416949	docs/dev/topology-over-raft: update the topology columns Some of the columns were added, but the doc wasn't updated. `upgrade_state` was updated in only one of the two places. `ignore_nodes` was changed to a static column.	2025-11-17 10:40:20 +01:00
Piotr Dulikowski	f0039381d2	Merge 'db/view/view_building_worker: support staging sstables intra-node migration and tablet merge' from Michał Jadwiszczak This PR fixes staging stables handling by view building coordinator in case of intra-node tablet migration or tablet merge. To support tablet merge, the worker stores the sstables grouped only be `table_id`, instead of `(table_id, last_token)` pair. There shouldn't be that many staging sstables, so selecting relevant for each `process_staging` task is fine. For the intra-node migration support, the patch adds methods to load migrated sstables on the destination shard and to cleanup them on source shard. The patch should be backported to 2025.4 Fixes https://github.com/scylladb/scylladb/issues/26244 Closes scylladb/scylladb#26454 * github.com:scylladb/scylladb: service/storage_service: migrate staging sstables in view building worker during intra-node migration db/view/view_building_worker: support sstables intra-node migration db/view_building_worker: fix indent db/view/view_building_worker: don't organize staging sstables by last token	2025-11-17 08:53:19 +01:00
Karol Nowacki	7f45f15237	vector_search: Improve vector-store health checking A Vector Store node is now considered down if it returns an HTTP 5xx status. This can happen, for example, if the node fails to connect to the database or has not completed its initial full scan. The logic for marking a node as 'up' is also enhanced. A node is now only considered up when its status is 'SERVING'.	2025-11-17 06:21:31 +01:00
Karol Nowacki	5c30994bc5	vector_search: Move response_content_to_sstring to utils.hh Move the response_content_to_sstring utility function from vector_store_client.cc to utils.hh to enable reuse across multiple files. This refactoring prepares for the upcoming `client.cc` implementation that will also need this functionality.	2025-11-17 06:21:31 +01:00
Karol Nowacki	4bbba099d7	vector_search: Add unit tests for client error handling Introduce dedicated unit tests for the client class to verify existing functionality and serve as regression tests. These tests ensure that invalid client requests do not cause nodes to be marked as down.	2025-11-17 06:21:31 +01:00
Karol Nowacki	cb654d2286	vector_search: Enable mocking of status requests Extend the mock server to allow inspecting incoming status requests and configuring their responses. This enables client unit tests to simulate various server behaviors, such as handling node failures and backoff logic.	2025-11-17 06:21:31 +01:00
Karol Nowacki	f665564537	vector_search: Extract abort_source_timeout and repeat_until The `abort_source_timeout` and `repeat_until` functions are moved to the shared utility header `test/vector_search/utils.hh`. This allows them to be reused by upcoming `client` unit tests, avoiding code duplication.	2025-11-17 06:21:31 +01:00
Karol Nowacki	ee3b83c9b0	vector_search: Move vs_mock_server to dedicated files The mock server utility is extracted into its own files so it can be reused by future `client` unit tests.	2025-11-17 06:21:30 +01:00
Artsiom Mishuta	696596a9ef	test.py: shutdown ManagerClient only in current loop In python 3.14 there is stricter policy regarding asyncio loops. This leads that we can not close clients from different loops. This change ensures that we are closing only client in the current loop. Closes scylladb/scylladb#26911	2025-11-16 19:19:46 +02:00
Jenkins Promoter	3672715211	Update pgo profiles - x86_64	2025-11-16 11:42:41 +02:00
Jenkins Promoter	41933b3f5d	Update pgo profiles - aarch64	2025-11-15 05:27:38 +02:00
Botond Dénes	8579e20bd1	Merge 'Enable digest+checksum verification for streaming/repair' from Taras Veretilnyk This PR enables integrity check of both checksum and digest for repair/streaming. In the past, streaming readers only verified the checksum of compressed SSTables. This change extends the checks to include the digest and the checksum (CRC) for both compressed and uncompressed SSTables. These additional checks require reading the digest and CRC components from disk, which may cause some I/O overhead. For uncompressed SSTables, this involves loading and computing checksums and digest from the data, while for compressed SSTables - where checksums are already verified inline - the only extra cost is reading and verifying the digest.If the reader range doesn't cover the full SSTable, the digest is not loaded and check is skipped. To support testing of these changes, a new option was added to the random_mutation_generator that allows disabling compression. Several new test cases were added to verify that the repair_reader correctly detects corruption. These tests corrupt digest or data component of an SSTable and confirm that the system throws the expected `malformed_sstable_exception`. Backport is not required, it is an improvement Refs #21776 Closes scylladb/scylladb#26444 * github.com:scylladb/scylladb: boost/repair_test: add repair reader integrity verification test cases test/lib: allow to disable compression in random_mutation_generator sstables: Skip checksum and digest reads for unlinked SSTables table: enable integrity checks for streaming reader table: Add integrity option to table::make_sstable_reader() sstables: Add integrity option to create_single_key_sstable_reader	2025-11-14 18:00:33 +02:00
Benny Halevy	f9ce98384a	scylla-sstable: correctly dump sharding_metadata This patch fixes 2 issues at one go: First, Currently sstables::load clears the sharding metadata (via open_data()), and so scylla-sstable always prints an empty array for it. Second, printing token values would generate invalid json as they are currently printed as binary bytes, and they should be printed simply as numbers, as we do elsewhere, for example, for the first and last keys. Fixes #26982 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26991	2025-11-14 17:55:41 +02:00
Aleksandra Martyniuk	e3dcb7e827	test: extend test_batchlog_replay_failure_during_repair Modify test_batchlog_replay_failure_during_repair to also check that there isn't data resurrection if flushing hints falls within the repair cache timeout.	2025-11-14 14:18:07 +01:00
Pavel Emelyanov	1c9c4c8c8c	Merge 'service: attach storage_service to migration_manager using pluggable' from Marcin Maliszkiewicz Migration manager depends on storage service. For instance, it has a reload_schema_in_bg background task which calls _ss.local() so it expects that storage service is not stopped before it stops. To solve this we use permit approach, and during storage_service stop: - we ignore new code execution in migration_manager which'd use storage_service - but wait with storage_service shutdown until all existing executions are done Fixes scylladb/scylladb#26734 Backport: no need, problem existed since very long time, code restructure in https://github.com/scylladb/scylladb/commit/389afcd (and following commits) made it hitting more often, as _ss was called earlier, but it's not released yet. Closes scylladb/scylladb#26779 * github.com:scylladb/scylladb: service: attach storage_service to migration_manager using pluggabe service: migration_manager: corutinize merge_schema_from service: migration_manager: corutinize reload_schema	2025-11-14 15:14:28 +03:00
Piotr Dulikowski	2ccc94c496	Merge 'topology_coordinator: include joining node in barrier' from Michael Litvak Previously, only nodes in the 'normal' state and decommissioning nodes were included in the set of nodes participating in barrier and barrier_and_drain commands. Joining nodes are not included because they don't coordinate requests, given their cql port is closed. However, joining nodes may receive mutations from other nodes, for which they may generate and coordinate materialized view updates. If their group0 state is not synchronized it could cause lost view updates. For example: 1. On the topology coordinator, the join completes and the joining node becomes normal, but the joining node's state lags behind. Since it's not synchronized by the barrier, it could be in an old state such as `write_both_read_old`. 2. A normal node coordinates a write and sends it to the new node as the new replica. 3. The new node applies the base mutation but doesn't generate a view update for it, because it calculates the base-view pairing according to its own state and replication map, and determines that it doesn't participate in the base-view pairing. Therefore, since the joining node participates as a coordinator for view updates, it should be included in these barriers as well. This ensures that before the join completes, the joining node's state is `write_both_read_new`, where it does generate view updates. Fixes https://github.com/scylladb/scylladb/issues/26976 backport to previous versions since it fixes a bug in MV with vnodes Closes scylladb/scylladb#27008 * github.com:scylladb/scylladb: test: add mv write during node join test topology_coordinator: include joining node in barrier	2025-11-14 12:41:16 +01:00
Patryk Jędrzejczak	1141342c4f	Merge 'topology: refactor excluded nodes' from Petr Gusev This PR refactors excluded nodes handling for tablets and topology. For tablets a dedicated variable `topology::excluded_tablet_nodes` is introduced, for topology operations a method get_excluded_nodes() is inlined into topology_coordinator and renamed to `get_excluded_nodes_for_topology_request`. The PR improves codes readability and efficiency, no behavior changes. backport: this is a refactoring/optimization, no need to backport Closes scylladb/scylladb#26907 * https://github.com/scylladb/scylladb: topology_coordinator: drop unused exec_global_command overload topology_coordinator: rename get_excluded_nodes -> get_excluded_nodes_for_topology_request topology_state_machine: inline get_excluded_nodes messaging_service: simplify and optimize ban_host storage_service: topology_state_load: extract topology variable topology_coordinator: excluded_tablet_nodes -> ignored_nodes topology_state_machine: add excluded_tablet_nodes field	2025-11-14 11:52:00 +01:00
Piotr Dulikowski	68407a09ed	Merge 'vector_store_client: Add support for failed-node backoff' from Karol Nowacki vector_search: Add backoff for failed nodes Introduces logic to mark nodes that fail to answer an ANN request as "down". Down nodes are omitted from further requests until they successfully respond to a health check. Health checks for down nodes are performed in the background using the `status` endpoint, with an exponential backoff retry policy ranging from 100ms to 20s. Client list management is moved to separate files (clients.cc/clients.hh) to improve code organization and modularity. References: VECTOR-187. Backport to 2025.4 as this feature is expected to be available in 2025.4. Closes scylladb/scylladb#26308 * github.com:scylladb/scylladb: vector_search: Set max backoff delay to 2x read request timeout vector_search: Report status check exception via on_internal_error_noexcept vector_search: Extract client management into dedicated class vector_search: Add backoff for failed clients vector_search: Make endpoint available vector_search: Use std::expected for low-level client errors vector_search: Extract client class	2025-11-14 11:49:18 +01:00
Piotr Dulikowski	833b824905	Merge 'service/qos: Fall back to default scheduling group when using maintenance socket' from Dawid Mędrek The service level controller relies on `auth::service` to collect information about roles and the relation between them and the service levels (those attached to them). Unfortunately, the service level controller is initialized way earlier than `auth::service` and so we had to prevent potential invalid queries of user service levels (cf. `46193f5e79`). Unfortunately, that came at a price: it made the maintenance socket incompatible with the current implementation of the service level controller. The maintenance socket starts early, before the `auth::service` is fully initialized and registered, and is exposed almost immediately. If the user attempts to connect to Scylla within this time window, via the maintenance socket, one of the things that will happen is choosing the right service level for the connection. Since the `auth::service` is not registered, Scylla with fail an assertion and crash. A similar scenario occurs when using maintenance mode. The maintenance socket is how the user communicates with the database, and we're not prepared for that either. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. Some accesses to `auth::service` are not affected and we do not modify those. Fixes scylladb/scylladb#26816 Backport: yes. This is a fix of a regression. Closes scylladb/scylladb#26856 * github.com:scylladb/scylladb: test/cluster/test_maintenance_mode.py: Wait for initialization test: Disable maintenance mode correctly in test_maintenance_mode.py test: Fix keyspace in test_maintenance_mode.py service/qos: Do not crash Scylla if auth_integration absent	2025-11-14 11:12:28 +01:00
Botond Dénes	43738298be	db/config: don't use RBNO for scaling Remove bootstrap and decomission from allowed_repair_based_node_ops. Using RBNO over streaming for these operations has no benefits, as they are not exposed to the out-of-date replica problem that replace, removenode and rebuild are. On top of that, RBNO is known to have problems with empty user tables. Using streaming for boostrap and decomission is safe and faster than RBNO in all condition, especially when the table is small. One test needs adjustment as it relies on RBNO being used for all node ops. Fixes: #24664 Closes scylladb/scylladb#26330	2025-11-14 13:03:50 +03:00
Piotr Dulikowski	43506e5f28	Merge 'db/view: Add backoff when RPC fails' from Dawid Mędrek The view building coordinator manages the process by sending RPC requests to all nodes in the cluster, instructing them what to do. If processing that message fails, the coordinator decides if it wants to retry it or (temporarily) abandon the work. An example of the latter scenario could be if one of the target nodes dies and any attempts to communicate with it would fail. Unfortunately, the current approach to it is not perfect and may result in a storm of warnings, effectively clogging the logs. As an example, take a look at scylladb/scylladb#26686: the gossiper failed to mark one of the dead nodes as DOWN fast enough, and it resulted in a warning storm. To prevent situations like that, we implement a form of backoff. If processing an RPC message fails, we postpone finishing the task for a second. That should reduce the number of messages in the logs and avoid retries that are likely to fail as well. We provide a reproducer test. Fixes scylladb/scylladb#26686 Backport: impact on the user. We should backport it to 2025.4. Closes scylladb/scylladb#26729 * github.com:scylladb/scylladb: tet/cluster/mv: Clean up test_backoff_when_node_fails_task_rpc db/view/view_building_coordinator: Rate limit logging failed RPC db/view: Add backoff when RPC fails	2025-11-14 10:17:57 +01:00
Piotr Dulikowski	308c5d0563	Merge 'cdc: set column drop timestamp in the future' from Michael Litvak When dropping a column from a CDC log table, set the column drop timestamp several seconds into the future. If a value is written to a column concurrently with dropping that column, the value's timestamp may be after the column drop timestamp. If this value is also flushed to an SSTable, the SSTable would be corrupted, because it considers the column missing after the drop timestamp and doesn't allow values for it. While this issue affects general tables, it especially impacts CDC tables because this scenario can occur when writing to a table with CDC preimage enabled while dropping a column from the base table. This happens even if the base mutation doesn't write to the dropped column, because CDC log mutations can generate values for a column even if the base mutation doesn't. For general tables, this issue can be avoided by simply not writing to a column while dropping it. We fix this for the more problematic case of CDC log tables by setting the column drop timestamp several seconds into the future, ensuring that writes concurrent with column drops are much less likely to have timestamps greater than the column drop timestamp. Fixes https://github.com/scylladb/scylladb/issues/26340 the issue affects all previous releases, backport to improve stability Closes scylladb/scylladb#26533 * github.com:scylladb/scylladb: test: test concurrent writes with column drop with cdc preimage cdc: check if recreating a column too soon cdc: set column drop timestamp in the future migration_manager: pass timestamp to pre_create	2025-11-14 08:52:34 +01:00
Marcin Maliszkiewicz	958d04c349	service: attach storage_service to migration_manager using pluggabe Migration manager depends on storage service. For instance, it has a reload_schema_in_bg background task which calls _ss.local() so it expects that storage service is not stopped before it stops. To solve this we use permit approach, and during storage_service stop: - we ignore new code execution in migration_manager which'd use storage_service - but wait with storage_service shutdown until all existing executions are done Fixes scylladb/scylladb#26734	2025-11-14 08:50:19 +01:00
Marcin Maliszkiewicz	cf9b2de18b	service: migration_manager: corutinize merge_schema_from It's needed to easily keep-alive pluggable storage_service permit in a following commit.	2025-11-14 08:50:19 +01:00
Marcin Maliszkiewicz	5241e9476f	service: migration_manager: corutinize reload_schema It's needed to easily keep-alive pluggable storage_service permit in a following commit.	2025-11-14 08:50:18 +01:00
Tomasz Grabiec	27e74fa567	tools: scylla-sstable: Print filename and tablet ids on error Since error is not printed to stdout, when working with multiple files, we don't know whith which sstable the error is associated with. Closes scylladb/scylladb#27009	2025-11-14 09:47:38 +02:00
Karol Nowacki	1972fb315b	vector_search: Set max backoff delay to 2x read request timeout The maximum backoff delay for status checking now depends on the `read_request_timeout_in_ms` configuration option. The delay is set to twice the value of this parameter.	2025-11-14 08:05:21 +01:00
Karol Nowacki	097c0f9592	vector_search: Report status check exception via on_internal_error_noexcept This exception should only occur due to internal errors, not client or external issues. If triggered, it indicates an internal problem. Therefore, we notify about this exception using on_internal_error_noexcept.	2025-11-14 08:05:21 +01:00
Karol Nowacki	940ed239b2	vector_search: Extract client management into dedicated class Refactor client list management by moving it to separate files (clients.cc/clients.hh) to improve code organization and modularity.	2025-11-14 08:05:21 +01:00

1 2 3 4 5 ...

50559 Commits