scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Lakshmi Narayanan Sreethar	5dffc19f2d	sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() When an sstable is unlinked or deactivated, it should be removed from the component memory tracking metrics and any further reload/reclaim should be disabled. This patch adds a new method that implements the above mentioned functionality. This patch also updates the deactivate() to use the new method. Next patch will use it to disable tracking when an sstable is unlinked. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:43 +05:30
Lakshmi Narayanan Sreethar	b7b4c5c661	sstables: introduce disable_component_memory_reload() Added a new method to disable reload of previously reclaimed components from the sstable. This will be used to disable reload of bloom filters after an sstable has been unlinked or deactivated. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:43 +05:30
Lakshmi Narayanan Sreethar	6ad962cb38	sstables_manager: log sstable name when reclaiming components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:36 +05:30
Aleksandra Martyniuk	d0cda8ebef	replica: check enabled features in tablet_map_to_mutation Before adding a value to a new column in tablet_map_to_mutation check if the column is supported by the whole cluster. Closes scylladb/scylladb#21941	2024-12-17 07:02:11 +02:00
Botond Dénes	e6447f60c2	Merge 'db,auth,locator: Remove unused member variables' from Kefu Chai this issue was identified by clang-20. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21835 * github.com:scylladb/scylladb: locator: remove unused member variable auth: remove unused member variable db: remove unused member variable	2024-12-16 15:16:17 +02:00
Kefu Chai	f2638c3d18	test: topology_custom: restrcuture comment as ordered list When investigating issue #21724, the docstring for `test_recover_stuck_raft_recovery` was found to be difficult to follow. Restructured the docstring into an ordered list to: 1. Improve readability 2. Clearly outline the test steps 3. Make the test's logic and flow more immediately comprehensible Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21728	2024-12-16 14:30:13 +02:00
Pavel Emelyanov	7db9132b56	test: Add validation of getting/changing compaction strategy via REST API The /column_family/compaction_strategy has GET and POST implemented, the latter changes the strategy on the table. Unknown strategy name implicitly renders internal server error code by catching exception from compaction_strategy::type() that tries to convert strategy name string to strategy enum class type. This is to finish validation of #21533 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21569	2024-12-16 14:28:23 +02:00
Botond Dénes	34a8b492be	Merge 'materialized view: make flow-control maximum delay configurable' from Piotr Dulikowski This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test. Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays. This hard-coded one maximum second delay was considered huge - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client. So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before. Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it. The new parameter's help string mentions both these use cases of the parameter. Fixes #18187 This is new functionality, no need to backport to any open source release. Closes scylladb/scylladb#21647 * github.com:scylladb/scylladb: materialized views: test for the MV delay configuration parameter service: add injection for skipping view update backlog materialized view: make flow-control maximum delay configurable	2024-12-16 14:20:33 +02:00
Yaron Kaikov	2e6755ecca	.github/scripts/auto-backport.py: Add comment to PR when conflicts apply When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR) Let's add a comment to notify the user we have conflicts Closes scylladb/scylladb#21939	2024-12-16 14:17:40 +02:00
Raphael S. Carvalho	013e0d53ff	replica: Fix use-after-free due to a race between split and cleanup There is an assumption that every destroyed compaction_group will be stopped first. Otherwise, the group is still referenced by compaction manager and can use it after freed. That's what happened in issue #21867 in the context of merge. The issue is pre-existing but was made more likely with merge. One problem is a race between split and cleanup, where if split is emitted while cleanup is stopping groups, it can happen split preparation adds new groups that will never be closed, since cleanup is already past the group stopping step. Another problem found is that split completion handler is not accounting for possible existence of merging groups, if split happens right after merge. Split completion handler should stop all empty groups that previously had data split from them. The problems will be fixed by guaranteeing that new groups will not be added for a tablet being migrated away, and that empty groups are properly closed when handling split completion. A reproducer was added. Fixes #21867. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#21920	2024-12-16 13:19:26 +02:00
Avi Kivity	fe9fcdfe30	task_manager.hh: replace boost ranges with std ranges Standardize on one range library to reduce dependency load. Unfortunately, std::views::concat (the replacement for boost::join), is C++26 only. We use two separate inserts to the result vector to compensate, and rationalize it by saying that boost::join() is likely slow due to the need for type-erasure. Closes scylladb/scylladb#21834	2024-12-16 13:08:02 +02:00
Artsiom Mishuta	e4dc86b552	fix(test.py): adjust break_manager method remove unnecessary _mark_dirty call server_broken_event - stop the whole file execution (prevent the next tests from running because Pyhon server object is broken PR: scylladb/scylladb#18236). and next file execution will create its new cluster so _mark_dirty will not change anything Closes scylladb/scylladb#21429	2024-12-16 11:24:03 +01:00
Botond Dénes	5880a1b90b	Merge 'tasks: add tablet migration virtual task' from Aleksandra Martyniuk In this change, tablet_virtual_task starts supporting tablet migration, in addition to tablet repair. Both tablet operations reuse the same virtual_task because their task data is retrieved similarly. However, it changes nothing from the task manager API users' perspective. They can list running migrations or check their statuses all the same as if migration had its own virtual_task. Users can see running migration tasks - finished tasks are not presented with the task manager API. However, the result of the migration (whether it succeeded or failed) would be presented to users, if they use wait API. If a migration was reverted, it will appear to users as failed. We assume that the migration was reverted, when its destination does not contain a tablet replica. Fixes: https://github.com/scylladb/scylladb/issues/21365. No backport, new feature Closes scylladb/scylladb#21729 * github.com:scylladb/scylladb: test: boost: check migration_task_info in tablet_test.cc replica: add repair related fields to tablet_map_to_mutation test: add tests to check the failed migration virtual tasks test: add tests to check the list of migration virtual tasks test: add tests to check migration virtual tasks status test: topology_tasks: generalize repair task functions service: extend tablet_virtual_task::abort service: extend tablet_virtual_task::wait service: extend tablet_virtual_task::get_status_helper service: extend tablet_virtual_task::contains service: extend tablet_virtual_task::get_stats service: tasks: make get_table_id a method of virtual_task_hint service: tasks: extend virtual_task_hint replica: service: add migration_task_info column to system.tablets locator: extend tablet_task_info to cover migration tasks locator: rename tablet_task_info methods	2024-12-13 10:54:03 +02:00
Yaron Kaikov	b4b7617554	github: check if PR is closed instead of merge In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged), In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR) Fixing it Closes scylladb/scylladb#21906	2024-12-13 06:36:03 +02:00
Avi Kivity	0114e4c2ae	Update seastar submodule * seastar 72c7ac575...3133ecdd6 (12): > util/backtrace: Optimize formatter to reduce memory allocation overhead > scheduler: Report long queue stall > log: drop specialization of boost::lexical_cast for log_level > stall-detector: Remove unused _stall_detector_reports_per_minute > Merge 'when_all: add Sentinel support to when_all_succeed() ' from Kefu Chai > scripts/perftune.py: Implement AWS IMDSv2 call > net/tls: Add a way to disable certificate validation > tests: Improve websocket parser tests > scripts/stall-analyser: improve error messages on invalid input > reserve-memory: document that seastar just doesnt use the reserves > Merge 'Minor metrics memory optimizations' from Stephan Dollberg > json_formatter: Add support for standard range containers Closes scylladb/scylladb#21869	2024-12-12 18:30:54 +02:00
Gleb Natapov	34a4144a17	messaging_service: do not rely on address map to find an IP rpc client is connected to Store the endpoint ip address together with the client (note it may be different from the address the client is connected to in case preferable address is different). This allows up to drop lookup in the address map which may eventually fail if an endpoint was already deleted. Fixes: scylladb/scylladb#21840 Message-ID: <Z1mpMMe-o0ggBU_F@scylladb.com>	2024-12-12 18:10:58 +02:00
Avi Kivity	ecd78c88bf	Merge "move more verbs to idl" from Gleb " The series moves node ops, repair and streaming verbs to IDL. Also contains IDL related cleanups. In addition to the CI tested manually by bootstrapping a node with the series into a cluster of old nodes with repair and streaming both in gossiper and raft mode. This exercises repair, streaming and node_ops paths. " * 'gleb/move-more-rpcs-to-idl-v3' of github.com:scylladb/scylla-dev: repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such streaming: move streaming verbs to IDL messaging_service: move repair verbs to IDL node_ops: move node_ops_cmd to IDL idl: rename partition_checksum.dist.hh to repair.dist.hh idl: move node_ops related stuff from the repair related IDL	2024-12-12 17:19:43 +02:00
muthu90tech	e49381119d	locator: topology: use node& instead of node* This change goes thru locator:topology to use node& instead of node* where nullptr is not possible. There are places where the node object is used in unordered_set, in those cases the node is wrapped in std::reference_wrapper. Fixes scylladb/scylladb#20357 Closes scylladb/scylladb#21863	2024-12-12 13:22:55 +01:00
Aleksandra Martyniuk	8943188442	test: boost: check migration_task_info in tablet_test.cc	2024-12-12 11:40:55 +01:00
Aleksandra Martyniuk	3f9c76c52d	replica: add repair related fields to tablet_map_to_mutation	2024-12-12 11:40:40 +01:00
Botond Dénes	05246e123d	Merge 'sstables: Avoid computing column_values_fixed_lengths on each read' from Tomasz Grabiec Reads which need sstable index were computing column_values_fixed_lengths each time. This showed up in perf profile for a sstable-read heavy workload, and amounted to about 1-2% of time. Computing it involves type name parsing. Avoid by using cached per-sstable mapping. There is already sstable::_column_translation which can be used for this. It caches the mapping for the least-recently used schema. Since the cursor uses the mapping only for primary key columns, which are stable, any schema will do, so we can use the last _column_translation. We only need to make sure that it's always armed, so sstable loading is augmented with arming with sstable's schema. Also, fixes a potential use-after-free on schema in column_translation. Closes scylladb/scylladb#21347 * github.com:scylladb/scylladb: sstables: Fix potential use-after-free on column_translation::column_info::name sstables: Avoid computing column_values_fixed_lengths on each read	2024-12-12 12:22:32 +02:00
Kefu Chai	714d12014e	sstable/mx: use subrange.advance() when appropriate Replace manual subrange advancement with the more concise and readable `subrange.advance()` method. This change: - Eliminates unnecessary subrange instance creation - Improves code readability - Reduces potential for unnecessary object allocation - Leverages the built-in `advance()` method for cleaner iterator handling The modification simplifies the iteration logic while maintaining the same functional behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21865	2024-12-12 10:04:12 +02:00
Gleb Natapov	c095f63ea5	repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such After `b3b3e880d3` target_nodes is not used by the receiver, so we can skip setting it on sender as well.	2024-12-11 18:26:57 +02:00
Gleb Natapov	92c2558a83	streaming: move streaming verbs to IDL	2024-12-11 18:26:50 +02:00
Aleksandra Martyniuk	bc17535427	test: add tests to check the failed migration virtual tasks	2024-12-11 15:17:16 +01:00
Aleksandra Martyniuk	be8dfd220f	test: add tests to check the list of migration virtual tasks	2024-12-11 15:17:16 +01:00
Aleksandra Martyniuk	b473efbefd	test: add tests to check migration virtual tasks status	2024-12-11 15:17:15 +01:00
Aleksandra Martyniuk	c81dcfc465	test: topology_tasks: generalize repair task functions Generalize repair task functions so that they can be reused for other tablet tasks.	2024-12-11 15:17:15 +01:00
Aleksandra Martyniuk	e0d3182fa0	service: extend tablet_virtual_task::abort Set migration tasks as non abortable.	2024-12-11 15:17:15 +01:00
Aleksandra Martyniuk	4c529a8f2e	service: extend tablet_virtual_task::wait Extend tablet_virtual_task::wait to support migration tasks. To decide what is a state of a finished migration virtual task (done or failed), the tablet replicas are checked. The task state is set to done, if the replicas contain the destination of a tablet migration.	2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk	de191fb851	service: extend tablet_virtual_task::get_status_helper Extend tablet_virtual_task::get_status_helper to cover migration tasks. get_status_helper is used by get_status and wait methods. Waiting for a task in the latter will be modified in the following patch.	2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk	50ce3d9106	service: extend tablet_virtual_task::contains Extend tablet_virtual_task::contains to check migration operations. Returned virtual_task_hint contains also tablet_id (only for migration tasks) and task_type. Return immediately from methods that do not support migration for non-repair task types. The methods' support for migration will be implemented in the following patches.	2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk	53bd61a539	service: extend tablet_virtual_task::get_stats Extend tablet_virtual_task::get_stats to list migration tasks.	2024-12-11 15:17:13 +01:00
Aleksandra Martyniuk	215a15d103	service: tasks: make get_table_id a method of virtual_task_hint	2024-12-11 15:17:08 +01:00
Aleksandra Martyniuk	0caffd67f8	service: tasks: extend virtual_task_hint Extend virtual_task_hint to contain task_type and tablet_id. These fields would be used by tablet_virtual_task in the following patches.	2024-12-11 15:15:28 +01:00
Anna Stuchlik	98860905d8	doc: remove wrong image upgrade info (5.2-to-2023.1) This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted). Refs https://github.com/scylladb/scylladb/issues/15733 Closes scylladb/scylladb#21876	2024-12-11 14:00:30 +02:00
Kefu Chai	03599477af	dht: include a smaller header file Replace `dht/sharder.hh` with a "smaller" header, which provides just the enough dependencies. in `f744007e`, we traded `database.hh` with a smaller set of headers. but it turns out `dht/sharder.hh` can be replaced with a even smaller one. because `dht::sharder` is defined by `dht/token-sharding.hh`, and what we need from `dht/sharder.hh` is this class's declaration. `clang-include-cleaner` identified this issue. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21881	2024-12-11 13:53:01 +02:00
Aleksandra Martyniuk	9fad3a621a	replica: service: add migration_task_info column to system.tablets Add migration_task_info column to system.tablets. Set migration_task_info value on migration request if the feature is enabled in the cluster. Reflect the column content in tablet_metadata.	2024-12-11 12:07:36 +01:00
Aleksandra Martyniuk	332347490c	locator: extend tablet_task_info to cover migration tasks	2024-12-11 12:07:36 +01:00
Aleksandra Martyniuk	dee6404aa4	locator: rename tablet_task_info methods	2024-12-11 12:07:36 +01:00
Michael Litvak	373855b493	service/qos/service_level_controller: update cache on startup Update the service level cache in the node startup sequence, after the service level and auth service are initialized. The cache update depends on the service level data accessor being set and the auth service being initialized. Before the commit, it may happen that a cache update is not triggered after the initialization. The commit adds an explicit call to update the cache where it is guaranteed to be ready. Fixes scylladb/scylladb#21763 Closes scylladb/scylladb#21773	2024-12-11 12:05:28 +01:00
Tomasz Grabiec	440a96605f	Merge 'topology_custom/test_tablets: add remove/replace tests for edge cases' from Benny Halevy Test cases related to #21826: 1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to remove a node with another node down in the datacenter, leaving no normal token owners in that dc (reproducing #21826). Removenode is expected to fail in this case since it should have no place to rebuild the removed node replicas, yet it currently succeeds unexpectedly. 2. test_remove_failure_then_replace: verify that removenode fails as expected when there are not enough nodes to rebuild its replicas on, with and without additional zero-token nodes. 3. test_replace_with_no_normal_token_owners_in_dc: verify that nodes can be replaced in a datacenter that has no live token owners, with and without additional zero-token nodes. Tablet replace uses all replicas to rebuild the lost replicas and therefore should succeed in the edge case. The restored data is verified as well. Refs #21826 * New tests, no backport needed Closes scylladb/scylladb#21827 * github.com:scylladb/scylladb: topology_custom/test_tablets: add remove/replace tests for edge cases test: pylib: _cluster_remove_node: log message on successful paths test: pylib: _cluster_remove_node: mark server as removed only when removenode succeeded	2024-12-11 12:04:14 +01:00
Kefu Chai	9f749487cd	main.cc: fix typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21868	2024-12-11 08:42:41 +02:00
Benny Halevy	b95312064f	topology_custom/test_tablets: add remove/replace tests for edge cases Test cases related to #21826: 1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to remove a node with another node down in the datacenter, leaving no normal token owners in that dc (reproducing #21826). Removenode is expected to fail in this case since it should have no place to rebuild the removed node replicas, yet it currently succeeds unexpectedly. 2. test_remove_failure_then_replace: verify that removenode fails as expected when there are not enough nodes to rebuild its replicas on, with and without additional zero-token nodes. 3. test_replace_with_no_normal_token_owners_in_dc: verify that nodes can be replaced in a datacenter that has no live token owners, with and without additional zero-token nodes. Tablet replace uses all replicas to rebuild the lost replicas and therefore should succeed in the edge case. The restored data is verified as well. Refs #21826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-10 21:39:15 +02:00
Tomasz Grabiec	8e60a0b831	Merge 'truncate: make TRUNCATE TABLE safe with tablets' from Ferenc Szili Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped. It works with tablets, but is not safe. A concurrent replication process may bring back old data. This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request. Backporting is not needed. Fixes #16411 Closes scylladb/scylladb#19789 * github.com:scylladb/scylladb: docs: docs: topology-over-raft: Document truncate_table request storage_proxy: fix indentation and remove empty catch/rethrow test: add tests for truncate with tablets storage_proxy: use new TRUNCATE for tablets truncate: make TRUNCATE a global topology operation storage_service: move logic of wait_for_topology_request_completion() RPC: add truncate_with_tablets RPC with frozen_topology_guard feature_service: added cluster feature for system.topology schema change system.topology_requests: change schema storage_proxy: propagate group0 client and TSM dependency	2024-12-10 17:50:50 +01:00
Kefu Chai	8d63d31e57	service: fix a typo in comment s/contraints/constraints/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21851	2024-12-10 15:58:49 +02:00
Gleb Natapov	fbfee9666e	locator: put real host id into the replication map for everywhere replication strategy Everywhere replication strategy returns zero host id in replica set instead of the real one if no tokens are configured yet in token metadata. It worked because code that translates ids to ips knows that zero host id is a special one, so putting zero there was equivalent to allow local access. But now we use host ids directly so we need to return real host id here to allow local access before token metadata is populated. Message-ID: <Z1hBHsEo4wYzzgvJ@scylladb.com>	2024-12-10 15:36:00 +02:00
Patryk Jędrzejczak	74dad7d1eb	raft: improve logs for abort while waiting for apply New logs allow us to easily distinguish two cases in which waiting for apply times out: - the node didn't receive the entry it was waiting for, - the node received the entry but didn't apply it in time. Distinguishing these cases simplifies reasoning about failures. The first case indicates that something went wrong on the leader. The second case indicates that something went wrong on the node on which waiting for apply timed out. As it turns out, many different bugs result in the `read_barrier` (which calls `wait_for_apply`) timeout. This change should help us in debugging bugs like these. We want to backport this change to all supported branches so that it helps us in all tests. Closes scylladb/scylladb#21855	2024-12-10 14:23:39 +01:00
Tomasz Grabiec	bf18a17bd6	tablets: scheduler: Fix temporary imbalance in a mixed-capacity cluster on decommission When tablet scheduler drains nodes, it chooses target location based on "badness" metric. Nodes with lowest score are preferred. Before the patch, the score which was used was the number of tablets on that node post-movement. This way we populate least-loaded node first. But this works only if nodes have equal number of shards. If nodes have different capacity, then number of tablets is not a good metric, because we don't aim to equalize per-node count, but per-shard count. We assume that each shard has equal capacity. Because of this bug, during decommission, the nodes with fewer shards would be preferred to receive replicas, which may lead to overloading of those nodes. This imbalance would be later fixed by the normal load balancing logic, but it's still problematic. Fixes #21783 Closes scylladb/scylladb#21860	2024-12-10 14:18:03 +02:00
Benny Halevy	eeb6d3dd74	test: pylib: _cluster_remove_node: log message on successful paths Log a message when removenode succeeded as expected or when it failed as expected with the `expected_error`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-10 11:55:27 +02:00

1 2 3 4 5 ...

45830 Commits