scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 10:00:35 +00:00

Author	SHA1	Message	Date
Kefu Chai	d7265a1bc2	storage_proxy: Prevent integer overflow in abstract_read_executor::execute Fix UBSan abort caused by integer overflow when calculating time difference between read and write operations. The issue occurs when: 1. The queried partition on replicas is not purgeable (has no recorded modified time) 2. Digests don't match across replicas 3. The system attempts to calculate timespan using missing/negative last_modified timestamps This change skips cross-DC repair optimization when write timestamp is negative or missing, as this optimization is only relevant for reads occurring within write_timeout of a write. Error details: ``` service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long') SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80 Aborting on shard 1, in scheduling group sl:default ``` Related to previous fix `39325cf` which handled negative read_timestamp cases. Fixes #23314 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23359 (cherry picked from commit `ebf9125728`) Closes scylladb/scylladb#23387	2025-04-09 14:56:10 +03:00
Dawid Mędrek	c56e47f72f	db/hints: Cancel draining when stopping node Draining hints may occur in one of the two scenarios: * a node leaves the cluster and the local node drains all of the hints saved for that node, * the local node is being decommissioned. Draining may take some time and the hint manager won't stop until it finishes. It's not a problem when decommissioning a node, especially because we want the cluster to retain the data stored in the hints. However, it may become a problem when the local node started draining hints saved for another node and now it's being shut down. There are two reasons for that: * Generally, in situations like that, we'd like to be able to shut down nodes as fast as possible. The data stored in the hints won't disappear from the cluster yet since we can restart the local node. * Draining hints may introduce flakiness in tests. Replaying hints doesn't have the highest priority and it's reflected in the scheduling groups we use as well as the explicitly enforced throughput. If there are a large number of hints to be replayed, it might affect our tests. It's already happened, see: scylladb/scylladb#21949. To solve those problems, we change the semantics of draining. It will behave as before when the local node is being decommissioned. However, when the local node is only being stopped, we will immediately cancel all ongoing draining processes and stop the hint manager. To amend for that, when we start a node and it initializes a hint endpoint manager corresponding to a node that's already left the cluster, we will begin the draining process of that endpoint manager right away. That should ensure all data is retained, while possibly speeding up the shutdown process. There's a small trade-off to it, though. If we stop a node, we can then remove it. It won't have a chance to replay hints it might've before these changes, but that's an edge case. We expect this commit to bring more benefit than harm. We also provide tests verifying that the implementation works as intended. Fixes scylladb/scylladb#21949 Closes scylladb/scylladb#22811 (cherry picked from commit `0a6137218a`) Closes scylladb/scylladb#23370	2025-04-03 09:09:05 +02:00
Ferenc Szili	cf147d8f85	truncate: create session during request handling Currently, the session ID under which the truncate for tablets request is running is created during the request creation and queuing. This is a problem because this could overwrite the session ID of any ongoing operation on system.topology#session This change moves the creation of the session ID for truncate from the request creation to the request handling. Fixes #22613 Closes scylladb/scylladb#22615 (cherry picked from commit `a59618e83d`) Closes scylladb/scylladb#22705	2025-02-06 10:09:00 +02:00
Gleb Natapov	d45ce6fa12	storage_proxy: translate ips to ids in forward array using gossiper We already use it to translate reply_to, so do it for consistency and to drop ip based API usage.	2025-01-16 16:37:08 +02:00
Gleb Natapov	0ec9f7de64	gossiper: drop get_unreachable_token_owners functions It is used by truncate code only and even there it only check if the returned set is not empty. Check for dead token owners in the truncation code directly.	2025-01-16 16:37:07 +02:00
Gleb Natapov	ae8dc595e1	hints: move id to ip translation into store_hint() function Also use gossiper to translate instead of token_metadata since we want to get rid of ip base APIs there.	2025-01-16 16:37:06 +02:00
Gleb Natapov	2ea8df2cf5	storage_proxy: drop is_alive that works on ip since it is not used any more	2025-01-16 16:37:06 +02:00
Gleb Natapov	448282dc93	storage_proxy: used gossiper for map ip to host id in connection_dropped callback We want to drop ips from token_metadata so move to different API to map ip to id.	2025-01-15 16:30:29 +02:00
Gleb Natapov	4d7c05ad82	hints: move create_hint_sync_point function to host ids One of its caller is in the RESTful API which gets ips from the user, so we convert ips to ids inside the API handler using gossiper before calling the function. We need to deprecate ip based API and move to host id based.	2025-01-15 16:30:28 +02:00
Kefu Chai	7215d4bfe9	utils: do not include unused headers these unused includes were identifier by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. please note, because quite a few source files relied on `utils/to_string.hh` to pull in the specialization of `fmt::formatter<std::optional<T>>`, after removing `#include <fmt/std.h>` from `utils/to_string.hh`, we have to include `fmt/std.h` directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-14 07:56:39 -05:00
Kamil Braun	48a4efba2f	Merge 'Fix possible data corruption due to token keys clashing in read repair.' from Sergey Zolotukhin This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2 Closes scylladb/scylladb#21996 * github.com:scylladb/scylladb: storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. test: Add test case for checking read repair diff calculation when having conflicting keys.	2025-01-13 10:54:34 +01:00
Michael Litvak	35316a40c8	service/storage_proxy: consider all replicas participating in write for MV backpressure replica writes are delayed according to the view update backlog in order to apply backpressure and reduce the rate of incoming base writes when the backlog is large, allowing slow replicas to catch up. previously the backlog calculation considered only the pending targets, excluding targets that replied successfuly, probably due to confusion in the code. instead, we want to consider the backlog of all the targets participating in the write. Fixes scylladb/scylladb#21672 Closes scylladb/scylladb#21935	2025-01-08 12:03:26 +01:00
Sergey Zolotukhin	155480595f	storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. The `data_read_resolver` class inherits from `abstract_read_resolver`, which already includes the `schema_ptr _schema` member. Therefore, using a separate function parameter in `data_read_resolver::resolve` initialized with the same variable in `abstract_read_executor` is redundant.	2025-01-03 10:04:13 +01:00
Sergey Zolotukhin	39785c6f4e	storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101	2025-01-03 09:53:02 +01:00
Botond Dénes	7d42b80228	service/storage_proxy: data_read_resolver::resolve(): remove unneded maybe_yield() We already have a yield in the loop via apply_gently(), the maybe_yield is superfluous so remove it. Follow-up to https://github.com/scylladb/scylladb/pull/21884 Closes scylladb/scylladb#21984	2025-01-02 16:13:29 +01:00
Benny Halevy	3a3df43799	storage_proxy: sort_endpoints_by_proximity: lookup my_id only if cannot sort by proximity topology::sort_by_proximity already sorts the local node address first, if present, so look it up only when using SimpleSnitch, where sort_by_proximity() is a no-op. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-24 12:19:20 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Botond Dénes	1a717f3014	service/storage_proxy: data_resolver::resolve(): apply mutations gently The data resolved has to apply all mutations from all replica to a single mutation. In the extreme case, when all rows are dead, the mutations can have around 10K rows in them. This is not a huge amount, but it is enough to cause moderate stalls of <20ms. To avoid this, use the gentle variant of apply(), which can yield in the middle. Fixes: scylladb/scylladb#21818 Closes scylladb/scylladb#21884	2024-12-18 15:21:19 +01:00
Botond Dénes	34a8b492be	Merge 'materialized view: make flow-control maximum delay configurable' from Piotr Dulikowski This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test. Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays. This hard-coded one maximum second delay was considered huge - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client. So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before. Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it. The new parameter's help string mentions both these use cases of the parameter. Fixes #18187 This is new functionality, no need to backport to any open source release. Closes scylladb/scylladb#21647 * github.com:scylladb/scylladb: materialized views: test for the MV delay configuration parameter service: add injection for skipping view update backlog materialized view: make flow-control maximum delay configurable	2024-12-16 14:20:33 +02:00
muthu90tech	e49381119d	locator: topology: use node& instead of node* This change goes thru locator:topology to use node& instead of node* where nullptr is not possible. There are places where the node object is used in unordered_set, in those cases the node is wrapped in std::reference_wrapper. Fixes scylladb/scylladb#20357 Closes scylladb/scylladb#21863	2024-12-12 13:22:55 +01:00
Ferenc Szili	781f0a2397	storage_proxy: fix indentation and remove empty catch/rethrow This change fixes code indentation in storage_proxy::remote::send_truncate_blocking() It also removes an empty catch and rethrow block.	2024-12-09 16:38:50 +01:00
Ferenc Szili	4cd7a1acab	storage_proxy: use new TRUNCATE for tablets This change adds branching based on keyspace replication method, and uses the new TRUNCATE for keyspaces with tablets.	2024-12-09 16:38:50 +01:00
Ferenc Szili	93cfeb9160	truncate: make TRUNCATE a global topology operation This commit adds the code needed to create a TRUNCATE global topology request. It also adds the handler for this request to the topology coordinator. The execution of the truncate operation is not canceled on a timeout, but the query coordinator side will return a timeout error.	2024-12-09 16:38:37 +01:00
Nadav Har'El	49f11f655c	materialized view: make flow-control maximum delay configurable Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays. This hard-coded one maximum second delay was considered huge - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client. So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before. Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it. The new parameter's help string mentions both these use cases of the parameter. Fixes #18187 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-12-05 09:51:56 +01:00
Ferenc Szili	36d35d2297	RPC: add truncate_with_tablets RPC with frozen_topology_guard This change introduces a new truncate_with_tablets RPC with a parameter of type service::frozen_topology_guard. This is materialized on replica nodes into a topology_guard which guarantees that truncate is performed under a global session, which, in turn, makes sure that we don't execute truncate as a result of stale RPCs. Also, this RPC does not have a timeout. Timeout will be handled on the coordinator side, and the truncate operation will not be allowed to time out.	2024-12-04 11:30:07 +01:00
Ferenc Szili	7f29b7d8f6	storage_proxy: propagate group0 client and TSM dependency This commit makes storage_proxy::remote dependent on raft_group0_client and topology_state_machine. storage_proxy::remote gets references to these via the call to start_remote(). These references will be needed to call storage_service::truncate_table_with_tablets().	2024-12-04 11:30:06 +01:00
Gleb Natapov	7d751709e3	gossiper: change get_live_token_owners to return host ids Also amend the only user and drop the ip to id translation.	2024-12-02 10:31:13 +02:00
Gleb Natapov	20d1b80535	view: move view building to host id Use host ids in view building code as well.	2024-12-02 10:31:13 +02:00
Gleb Natapov	0ca14ef8b7	hints: use host id to send hints Drop address translation that no longer needed. Templates here are used temporarily until another user of the function (MV) is converted as well.	2024-12-02 10:31:12 +02:00
Gleb Natapov	5b9e4c2f07	storage_proxy: remove id_vector_to_addr since it is no longer used Was needed during transition period only.	2024-12-02 10:31:12 +02:00
Gleb Natapov	6116751e44	db: consistency_level: change is_sufficient_live_nodes to work on host ids It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	eb3d2307ce	replication_strategy: move sanity_check_read_replicas to host id It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	ccbfabb858	db: consistency_level: move filter_for_query to host id It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	474b47ed22	database: move hits rates handling to host ids Hits rates map is now indexed by ip. Change it to be indexed by host id since this is what storage proxy uses now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	d2cf5ca030	messaging_service: pass host id to connection_dropped handler id available RPC clients which are host id aware may pass the id to connection_dropped callback and save the need for translation.	2024-12-02 10:31:12 +02:00
Gleb Natapov	9f7183286a	storage_proxy: change batchlog to work on host ids It was not translated in the first pass.	2024-12-02 10:31:12 +02:00
Gleb Natapov	a1fdc8c847	storage_proxy: change mutation rpcs to send forward and reply addresses as host ids RPCs from old nodes will still use old format so translation will be used in this case. The change is backwards compatible thanks to RPC extensibility.	2024-12-02 10:31:12 +02:00
Gleb Natapov	cd9b349886	migration_manager: move to use host ids instead of ips Users also amended to pass ids instead of ips.	2024-12-02 10:31:12 +02:00
Gleb Natapov	12937aeb7f	storage_proxy: move to addressing nodes by host ids instead of ips In this rather large path we mode to address nodes in storage proxy by host ids instead of ips. Some subsystems storage proxy calls to are not yet converted to host ids, so we translate back and forth when we interact with them.	2024-12-02 10:31:11 +02:00
Gleb Natapov	020e8010e8	storage_proxy: remove unused function	2024-11-24 11:01:39 +02:00
Gleb Natapov	3d6fe7beb3	storage_proxy: co-routinize handle_paxos_prepare	2024-11-24 11:01:31 +02:00
Gleb Natapov	e337e5a3f6	storage_proxy: co-routinise handle_paxos_prune	2024-11-24 11:01:15 +02:00
Nadav Har'El	f23800181a	Merge 'Align Metric Family Descriptions' from Amnon Heiman Metrics families (e.g., all metrics with the same name but with different labels) should have the same description. The metric layer does not enforce that. Instead, it will use the first description provided. It's a minor issue but the results are different than what you expect. No need to backport. Closes scylladb/scylladb#19947 * github.com:scylladb/scylladb: service/storage_proxy.cc All metric groups should have the same description raft/server.cc: All metric groups should have the same description	2024-11-17 16:49:57 +02:00
Avi Kivity	d59038fa93	storage_proxy: convert boost range algorithms to std::ranges Standardize on a single range library. The changes are mostly mechanical. The only exception is boost::join, which has no analog in std::ranges (rightly so, since it cannot be implemented efficiently). A variety of tricks were used to convert it: - use std::ranges::join() on an std::array of std::span (when the inputs were all contiguous) - copy to a utils::small_vector (when it is expected that there will be no allocation) - use a small_vector of pointers and iterate+dereference that Closes scylladb/scylladb#21082	2024-10-15 16:52:27 +02:00
Sergey Zolotukhin	c373edab2d	Add conditions checking for get_read_executor During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in get_endpoints_for_reading(): the map is considered incorrect the number of read replica nodes is higher than replication factor. The check is applied only when built in non release mode. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625	2024-10-11 09:38:25 +02:00
Sergey Zolotukhin	ad93cf5753	Improve code readability in consistency_level.cc and storage_proxy.cc Add const correctness and rename some variables to improve code readability.	2024-10-11 09:38:25 +02:00
Pavel Emelyanov	7163fbcef5	Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity To avoid depending on two similar libraries (boost ranges and std \<ranges), replace uses of the former with the latter. This series tackles the utils/ directory. Code cleanup, no backport. Closes scylladb/scylladb#20997 * github.com:scylladb/scylladb: utils: logalloc: replace boost with std utils: lsa: chunked_managed_vector: replace boost with std utils: config_file: replace boost with std utils: loading_cache: replace boost with std utils: fragment_range: replace boost with std utils: error_injector: replace boost with std utils: crc: replace boost for_each with built-in range for utils: class_registrator: replace boost with std utils: chunked_vector: replace boost with std utils: observable: replace boost with std	2024-10-09 16:04:48 +03:00
Gleb Natapov	d62fbd795b	storage_proxy: make sure there is no end iterator in _live_iterators array storage_proxy::cancellable_write_handlers_list::update_live_iterators assumes that iterators in _live_iterators can be dereferenced, but the code does not make any attempt to make sure this is the case. The iterator can be the end iterator which cannot be dereferenced. The patch makes sure that there is no end iterator in _live_iterators. Fixes scylladb/scylladb#20874 Closes scylladb/scylladb#20977	2024-10-08 13:16:27 +03:00
Avi Kivity	b259389a3e	utils: observable: replace boost with std	2024-10-07 21:11:07 +03:00
Benny Halevy	5a0f3889e0	treewide: use std::ranges sort functions rather than boost Using the standard library is preffered over boost. In cql3/expr/expression.cc to_sorted_vector got more of a face-list and was modernized to use also std::unique and while at it, to move its input range in the uniquely sorted result vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-01 14:19:05 +03:00

1 2 3 4 5 ...

1227 Commits