scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	ff51551a94	qos: use the shares field in service level reads/writes Now, the newly introduced `shares` field is used when service levels are either read from or written into system tables.	2025-01-02 07:13:34 +01:00
Piotr Dulikowski	ea25b29684	db/system_distributed_keyspace: add shares column and upgrade code Add the "shares" column to the system_distributed_keyspace.service_levels table, which is used by legacy code. Because this table is in a distributed and not local keyspace, adding the column to an existing cluster during rolling upgrade requires a bit of care. A callback is added to the workload prioritization cluster feature which runs when the feature becomes enabled and adds the column for all nodes in the cluster.	2025-01-02 07:13:34 +01:00
Piotr Dulikowski	346fc84c3e	db/system_keyspace: adjust SL schema for workload prioritization Add a "shares" column which hold the number of shares allocated to given service level. It is not used by the code at all right now, subsequent commits will make good use of it.	2025-01-02 07:13:34 +01:00
Avi Kivity	76cf5148e1	Merge 'message: introduce advanced rpc compression' from Michał Chojnowski This is a forward port (from scylla-enterprise) of additional compression options (zstd, dictionaries shared across messages) for inter-node network traffic. It works as follows: After the patch, messaging_service (Scylla's interface for all inter-node communication) compresses its network traffic with compressors managed by the new advanced_rpc_compression::tracker. Those compressors compress with lz4, but can also be configured to use zstd as long as a CPU usage limit isn't crossed. A precomputed compression dictionary can be fed to the tracker. Each connection handled by the tracker will then start a negotiation with the other end to switch to this dictionary, and when it succeeds, the connection will start being compressed using that dictionary. All traffic going through the tracker is passed as a single merged "stream" through dict_sampler. dictionary_service has access to the dict_sampler. On chosen nodes (in the "usual" configuration: the Raft leader), it uses the sampler to maintain a random multi-megabyte sample of the sampler's stream. Every several minutes, it copies the sample, trains a compression dictionary on it (by calling zstd's training library via the alien_worker thread) and publishes the new dictionary to system.dicts via Raft's write_mutation command. This update triggers (eventually) a callback on all nodes, which feeds the new dictionary to advanced_rpc_compression::tracker, and this switches (eventually) all inter-node connections to this dictionary. Closes scylladb/scylladb#22032 * github.com:scylladb/scylladb: messaging_service: use advanced_rpc_compression::tracker for compression message/dictionary_service: introduce dictionary_service service: make Raft group 0 aware of system.dicts db/system_keyspace: add system.dicts utils: add advanced_rpc_compressor utils: add dict_trainer utils: introduce reservoir_sampling utils: introduce alien_worker utils: add stream_compressor	2024-12-31 15:02:57 +02:00
Michał Chojnowski	fdb2d2209c	messaging_service: use advanced_rpc_compression::tracker for compression This patch sets up an `alien_worker`, `advanced_rpc_compression::tracker`, `dict_sampler` and `dictionary_service` in `main()`, and wires them to each other and to `messaging_service`. `messaging_service` compresses its network traffic with compressors managed by the `advanced_rpc_compression::tracker`. All this traffic is passed as a single merged "stream" through `dict_sampler`. `dictionary_service` has access to `dict_sampler`. On chosen nodes (by default: the Raft leader), it uses the sampler to maintain a random multi-megabyte sample of the sampler's stream. Every several minutes, it copies the sample, trains a compression dictionary on it (by calling zstd's training library via the `alien_worker` thread) and publishes the new dictionary to `system.dicts` via Raft. This update triggers a callback into `advanced_rpc_compression::tracker` on all nodes, which updates the dictionary used by the compressors it manages.	2024-12-27 10:17:58 +01:00
Kefu Chai	6acc5294a4	treewide: migrate from boost::copy_range to std::ranges::to now that we are allowed to use C++23. we now have the luxury of using `std::ranges::to`. in this change, we: - replace `boost::copy_range` to `std::ranges::to` - remove unused `#include` of boost headers Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21880	2024-12-26 11:46:26 +02:00
Michał Chojnowski	cc15ca329e	db/system_keyspace: add system.dicts Adds a new system table which will act as the medium for distributing compression dictionaries over the cluster. This table will be managed by Raft (group 0). It will be hooked up to it in follow-up commits.	2024-12-23 23:37:02 +01:00
Avi Kivity	eb62593f2c	treewide: use angle brackets when including seastar headers We treat Seastar as a "system" library, and those are included with angle brackets. Closes scylladb/scylladb#21959	2024-12-20 16:16:28 +02:00
Aleksandra Martyniuk	ee4bd287fd	node_ops: rename a method that get node ops entries	2024-12-20 12:25:48 +01:00
Aleksandra Martyniuk	a7fc566c7e	node_ops: filter topology_requests entries Currently node_ops_virtual_task shows stats of all system.topology_request entries. However, the table also contains info about non-node_ops requests, e.g. truncate. Filter the entries used by node_ops_virtual_task by their type. With this change bootstrap of the first node will not be visible. Update the test accordingly.	2024-12-20 12:20:42 +01:00
Kefu Chai	93be8f3a0c	db,sstables: migate boost::range::stable_partition to std library now that we are allowed to use C++23. we now have the luxury of using `std::ranges::stable_partition`. in this change, we: - replace `boost::range::stable_parition()` to `std::ranges::stable_parition()` - since `std::ranges::stable_parition()` returns a subrange instead of an iterator, change the names of variables which were previously used for holding the return value of `boost::range::stable_partition()` accordingly for better readability. - remove unused `#include` of boost headers Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21911	2024-12-19 14:56:07 +02:00
Wojciech Mitros	37a25d3af4	mv: avoid stalls when calculating affected clustering ranges Currently, when finishing db::view::calculate_affected_clustering_ranges we deoverlap, transform and copy all ranges prepared before. This is all done within a single continuation and can cause stalls. We fix this by adding yields after each transform and moving elements to the final vector one by one instead of copying them all at the end. After this change, the longest continuation in this code will be deoverlapping the initial ranges (and one transform). While it has a relatively high computational complexity (we sort all ranges), it should execute quickly because we're operating on views there and we don't need to copy the actual bytes. If we encounter a stall there, we'll need to implement an asynchronous `deoverlap` method. Fixes scylladb/scylladb#21843 Closes scylladb/scylladb#21846	2024-12-19 12:50:30 +01:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Botond Dénes	e6447f60c2	Merge 'db,auth,locator: Remove unused member variables' from Kefu Chai this issue was identified by clang-20. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21835 * github.com:scylladb/scylladb: locator: remove unused member variable auth: remove unused member variable db: remove unused member variable	2024-12-16 15:16:17 +02:00
Botond Dénes	34a8b492be	Merge 'materialized view: make flow-control maximum delay configurable' from Piotr Dulikowski This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test. Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays. This hard-coded one maximum second delay was considered huge - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client. So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before. Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it. The new parameter's help string mentions both these use cases of the parameter. Fixes #18187 This is new functionality, no need to backport to any open source release. Closes scylladb/scylladb#21647 * github.com:scylladb/scylladb: materialized views: test for the MV delay configuration parameter service: add injection for skipping view update backlog materialized view: make flow-control maximum delay configurable	2024-12-16 14:20:33 +02:00
muthu90tech	e49381119d	locator: topology: use node& instead of node* This change goes thru locator:topology to use node& instead of node* where nullptr is not possible. There are places where the node object is used in unordered_set, in those cases the node is wrapped in std::reference_wrapper. Fixes scylladb/scylladb#20357 Closes scylladb/scylladb#21863	2024-12-12 13:22:55 +01:00
Tomasz Grabiec	8e60a0b831	Merge 'truncate: make TRUNCATE TABLE safe with tablets' from Ferenc Szili Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped. It works with tablets, but is not safe. A concurrent replication process may bring back old data. This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request. Backporting is not needed. Fixes #16411 Closes scylladb/scylladb#19789 * github.com:scylladb/scylladb: docs: docs: topology-over-raft: Document truncate_table request storage_proxy: fix indentation and remove empty catch/rethrow test: add tests for truncate with tablets storage_proxy: use new TRUNCATE for tablets truncate: make TRUNCATE a global topology operation storage_service: move logic of wait_for_topology_request_completion() RPC: add truncate_with_tablets RPC with frozen_topology_guard feature_service: added cluster feature for system.topology schema change system.topology_requests: change schema storage_proxy: propagate group0 client and TSM dependency	2024-12-10 17:50:50 +01:00
Kefu Chai	ce2f80c227	treewide: migrate from boost::make_iterator_range to ranges::subrange Replace boost::make_iterator_range() with std::ranges::subrange. This change improves code modernization and reduces external dependencies: - Replace boost::make_iterator_range() with std::ranges::subrange - Remove boost/range/iterator_range.hpp include - Improve iterator type detection in interval.hh using std::ranges::const_iterator_t<Range> This is part of ongoing efforts to modernize our codebase and minimize external dependencies. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21787	2024-12-09 21:31:53 +02:00
Kefu Chai	48c8d24345	treewide: drop support for fmt < v10 since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also, we've switched to the build image based on fedora 40, which ships fmt-devel v10.2.1, there is no need to support fmt < 10. in this change, we drop the support fmt < 10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21847	2024-12-09 20:42:38 +02:00
Michael Litvak	53224d90be	service/qos: increase timeout of internal get_service_levels queries The function get_service_levels is used to retrieve all service levels and it is called from multiple different contexts. Importantly, it is called internally from the context of group0 state reload, where it should be executed with a long timeout, similarly to other internal queries, because a failure of this function affects the entire group0 client, and a longer timeout can be tolerated. The function is also called in the context of the user command LIST SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is preferred. The commit introduces a function parameter to indicate whether the context is internal or not. For internal context, a long timeout is chosen for the query. Otherwise, the timeout is shorter, the same as before. When the distinction is not important, a default value is chosen which maintains the same behavior. The main purpose is to fix the case where the timeout is too short and causes a failure that propagates and fails the group0 client. Fixes scylladb/scylladb#20483 Closes scylladb/scylladb#21748	2024-12-09 13:20:32 +01:00
Kefu Chai	fea0548b44	db: remove unused member variable this issue was identified by clang-20: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Debug/seastar/gen/include -isystem /usr/include/p11-kit-1 -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -std=gnu++23 -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -DWITH_GZFILEOP -MD -MT db/CMakeFiles/db.dir/Debug/hints/resource_manager.cc.o -MF db/CMakeFiles/db.dir/Debug/hints/resource_manager.cc.o.d -o db/CMakeFiles/db.dir/Debug/hints/resource_manager.cc.o -c /home/kefu/dev/scylladb/db/hints/resource_manager.cc In file included from /home/kefu/dev/scylladb/db/hints/resource_manager.cc:9: /home/kefu/dev/scylladb/db/hints/resource_manager.hh:130:29: error: private field '_proxy' is not used [-Werror,-Wunused-private-field] 130 \| service::storage_proxy& _proxy; \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-12-09 10:31:09 +08:00
Avi Kivity	9024e4940c	counters.hh: drop unused boost includes Re-add them to source files that need them. Closes scylladb/scylladb#21738	2024-12-05 12:27:41 +02:00
Nadav Har'El	49f11f655c	materialized view: make flow-control maximum delay configurable Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays. This hard-coded one maximum second delay was considered huge - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client. So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before. Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it. The new parameter's help string mentions both these use cases of the parameter. Fixes #18187 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-12-05 09:51:56 +01:00
Ferenc Szili	3ac44109e3	system.topology_requests: change schema This commit adds the new column in the system.topology_requests table which are needed for the new global topology request.	2024-12-04 11:30:06 +01:00
Avi Kivity	841481c202	Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb " This rather large patch series moves storage proxy and some adjacent services (like migration manager) to use host ids to identify nodes rather than ips. Messaging service gains a capability to address nodes by host ids (which allows dropping translations from topology coordinator code that worked on host ids already) and also makes sure that a node with incorrect host id will reject a message (can happen during address changes). The series gets rid of the raft address map completely and replaces it with the gossiper address map which is managed by the gossiper since translation is now done in the layer below raft. Fixes: scylladb/scylladb#6403 perf-simple-query -- smp 1 -m 1G output Before: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64336.82 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41291 insns/op, 24485 cycles/op, 0 errors) 62669.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41277 insns/op, 24695 cycles/op, 0 errors) 69172.12 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41326 insns/op, 24463 cycles/op, 0 errors) 56706.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41143 insns/op, 24513 cycles/op, 0 errors) 56416.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41186 insns/op, 24851 cycles/op, 0 errors) throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65 instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80 cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70 After: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 65237.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 40733 insns/op, 23145 cycles/op, 0 errors) 59283.09 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40624 insns/op, 23948 cycles/op, 0 errors) 70851.03 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40625 insns/op, 23027 cycles/op, 0 errors) 70549.61 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40650 insns/op, 23266 cycles/op, 0 errors) 68634.96 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40622 insns/op, 22935 cycles/op, 0 errors) throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09 instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33 cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59 CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/ SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/ Tested mixed cluster manually. " * 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits) group0: drop unused field from replace_info struct test: rename raft_address_map_test to address_map_test and move if from raft tests raft_address_map: remove raft address map topology coordinator: do not modify expire state for left/new nodes any more in raft address map topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used group0: drop raft address map dependency from raft_rpc group0: move raft_ticker_type definition from raft_address_map.hh storage_service: do not update raft address map on gossiper events group0: drop raft address map dependency from raft_server_with_timeouts group0: move group0 upgrade code to host ids repair: drop raft address map dependency group0: remove unused raft address map getter from raft_group0 group0: drop raft address map from group0_state_machine dependency since it is not used there any more group0: remove dependency on raft address map from group0_state_id_handler gossiper: add get_application_state_ptr that searches by host_id gossiper: change get_live_token_owners to return host ids view: move view building to host id hints: use host id to send hints storage_proxy: remove id_vector_to_addr since it is no longer used db: consistency_level: change is_sufficient_live_nodes to work on host ids ...	2024-12-03 18:18:48 +02:00
Kefu Chai	bab12e3a98	treewide: migrate from boost::adaptors::transformed to std::views::transform now that we are allowed to use C++23. we now have the luxury of using `std::views::transform`. in this change, we: - replace `boost::adaptors::transformed` with `std::views::transform` - use `fmt::join()` when appropriate where `boost::algorithm::join()` is not applicable to a range view returned by `std::view::transform`. - use `std::ranges::fold_left()` to accumulate the range returned by `std::view::transform` - use `std::ranges::fold_left()` to get the maximum element in the range returned by `std::view::transform` - use `std::ranges::min()` to get the minimal element in the range returned by `std::view::transform` - use `std::ranges::equal()` to compare the range views returned by `std::view::transform` - remove unused `#include <boost/range/adaptor/transformed.hpp>` - use `std::ranges::subrange()` instead of `boost::make_iterator_range()`, to feed `std::views::transform()` a view range. to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. limitations: there are still a couple places where we are still using `boost::adaptors::transformed` due to the lack of a C++23 alternative for `boost::join()` and `boost::adaptors::uniqued`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21700	2024-12-03 09:41:32 +02:00
Kefu Chai	99de3962c3	db/schema_applier: Fix spelling annotations to pass codespell checks This commit addresses inconsistent spelling annotations that triggered codespell warnings in our codebase. Problem: - Previous annotations like "CREATEing" and "DROPing" were flagged as misspellings by the codespell workflow - These annotations were used to describe CQL statement execution contexts Solution: - Updated annotations to "CREAT'ing" and "DROP'ing" - Preserves the intent of the original annotations - Silences codespell warnings without changing the underlying meaning - Ensures consistent and spell-checker-friendly code documentation Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21741	2024-12-03 09:01:26 +02:00
Gleb Natapov	fbaf0a3cce	group0: move group0 upgrade code to host ids Drop unneeded ip to id translation.	2024-12-02 10:31:13 +02:00
Gleb Natapov	20d1b80535	view: move view building to host id Use host ids in view building code as well.	2024-12-02 10:31:13 +02:00
Gleb Natapov	0ca14ef8b7	hints: use host id to send hints Drop address translation that no longer needed. Templates here are used temporarily until another user of the function (MV) is converted as well.	2024-12-02 10:31:12 +02:00
Gleb Natapov	6116751e44	db: consistency_level: change is_sufficient_live_nodes to work on host ids It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	ccbfabb858	db: consistency_level: move filter_for_query to host id It is called from storage proxy which works on host ids now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	474b47ed22	database: move hits rates handling to host ids Hits rates map is now indexed by ip. Change it to be indexed by host id since this is what storage proxy uses now.	2024-12-02 10:31:12 +02:00
Gleb Natapov	12937aeb7f	storage_proxy: move to addressing nodes by host ids instead of ips In this rather large path we mode to address nodes in storage proxy by host ids instead of ips. Some subsystems storage proxy calls to are not yet converted to host ids, so we translate back and forth when we interact with them.	2024-12-02 10:31:11 +02:00
Kefu Chai	f436edfa22	mutation: remove unused "#include"s these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, because `mutation/mutation.hh` does not include `seastar/coroutine/maybe_yield.hh` anymore, and quite a few source files were relying on this header to bring in the declaration of `maybe_yield()`, we have to include this header in the places where this symbol is used. the same applies to `seastar/core/when_all.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-29 14:01:44 +08:00
Piotr Smaron	a49ed7074d	Update in-memory ks.metadata.init_tablets after ALTER KS Once e.g. `ALTER KEYSPACE` is performed, all in-memory objects should be updated accordingly, but this is not entirely true for keyspace metadata object. The reason for that is that keyspace metadata are stored in 2 system tables: `system_schema.keyspaces` and `system_schema.scylla_keyspaces`. Up until now the in-memory keyspace metadata object has been updated only with entries from the first table, and missed updates when entries from the 2nd table changed. These entries were e.g. initial tablets or storage options. This change fixes this oversight by considering both tables when checking if keyspace metadata need to be updated. From the implementation point of view, the change is simple: we're considering `system_schema.scylla_keyspaces` also in `merge_keyspaces()` and if old and new schemas have any differences, we include that when altering ks. Fixes #20768 Backport: no need, I don't think the issue is severe, atm it seems like it can only influence the tablets number, which should not bring the cluster down nor result in returning bad data, it can mostly influence the speed of the db. Closes scylladb/scylladb#20852	2024-11-28 13:46:32 +01:00
Dawid Mędrek	7cce9a8f64	db/hints: Prevent dereferencing a null pointer Before these changes, we dereferenced `app_state` in `manager::endpoint_downtime_not_bigger_than()` before checking that it's not a null pointer. We fix that. Fixes scylladb/scylladb#21699 Closes scylladb/scylladb#21676	2024-11-28 11:31:57 +01:00
Kefu Chai	5e391eee25	treewide: use coroutine::parallel_for_each(range) when appropriate `coroutine::parallel_for_each` accepts both a range and a pair of iterators. let's use the former when appropriate. it is simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21684	2024-11-27 21:00:47 +02:00
Botond Dénes	ccb433d767	Merge 'tasks: add api_task_ttl for tasks started with API' from Aleksandra Martyniuk When users start an operation asynchronously with API, they are expected to check the operation's status. Hence, the status should be kept in task manager for reasonable time after the operation is done. The operations that are started internally usually don't need to stay in task manager for that long. Add api_task_ttl that will be used for tasks started with API. By default it's 1 hour. The time for which non-API tasks stay in task manager isn't changed. Fixes: #21499. Refs: #21425. No backport needed - previous versions may use task_ttl Closes scylladb/scylladb#21505 * github.com:scylladb/scylladb: test: add test to check user_task_ttl tasks: api: move make_task method docs: nodetool: update backup and restore commands docs docs: update task manager docs nodetool: add nodetool tasks user-ttl command node_ops: use user task ttl for node ops virtual task tasks: use user_task_ttl for tasks started by user api: task_manager: add /task_manager/user_ttl to get and set user task ttl tasks: add task_manager::task::is_user_task method tasks: keep updateable_value of task_ttl in task manager db: config: add user_task_ttl_seconds named value	2024-11-27 09:57:57 +02:00
Ernest Zaslavsky	793f2c95d1	snapshots: Stop taking snapshots of MVs Stop taking snapshots of MVs and allow taking snapshot of individual tables, now one can take a snapshot of any base table, any view or index. Also add tests to cover new cases both boost test (using cc code) and pytest (using the API) Also, update documentation to reflect the change fixes: #21339 fixes: #20760 Closes scylladb/scylladb#21433	2024-11-26 15:27:30 +02:00
Kefu Chai	a5ee0c896b	treewide: migrate from boost::adaptors::filtered to std::views::filter Modernize the codebase by replacing Boost range adaptors with C++23 standard library views, reducing external dependencies and leveraging modern C++ language features. Key Changes: - Replace `boost::adaptors::filtered` with `std::views::filter` - Remove `#include <boost/range/adaptor/filtered.hpp>` - Utilize standard library range views Motivation: - Reduce project's external dependency footprint - Leverage standard library's range and view capabilities - Improve long-term code maintainability - Align with modern C++ best practices Implementation Challenges and Considerations: 1. Range Conversion and Move Semantics - `std::ranges::to` adaptor requires rvalue references - Necessitated updates to variable and parameter constness - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const` from `common` to enable efficient range conversion 2. Range Iteration and Mutation - Range views may mutate internal state during iteration - Cannot pass ranges by const reference in some scenarios - Solution: Pass ranges by rvalue reference to explicitly indicate state invalidation Limitations: - One instance of `boost::adaptors::filtered` temporarily preserved due to lack of a C++23 alternative for `boost::join()` - A comprehensive replacement will be addressed in a follow-up change This change is part of our ongoing effort to modernize the codebase, reducing external dependencies and adopting modern C++ practices. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21648	2024-11-26 14:26:50 +02:00
Aleksandra Martyniuk	292d00463a	tasks: add task_manager::task::is_user_task method	2024-11-25 14:21:53 +01:00
Aleksandra Martyniuk	1bf073704c	db: config: add user_task_ttl_seconds named value Add user_task_ttl_seconds config option and keep the value in task manager. In the following patches tasks started by user will be kept in task manager for user_task_ttl_seconds after they are finished.	2024-11-25 14:16:06 +01:00
Asias He	9ace191616	repair: Enable small table optimization for RBNO bootstrap and decommission The non local strategy system keyspaces usually contain very litte data. All the tables within them have to be repaired for all the token ranges, which could be large in clusters with a large number of nodes. In multiple DC setup, the repair in RBNO is dominated by the network latency. As a result, it takes a long time to repair those tables even if they are almost empty. To speed up the RBNO bootstrap, especially for starting empty clusters, this patch enables small table optimization for RBNO for system tables. We could enable it for small user tables as a follow up. Tests: 1) A 5ms latency is added to simulate cross dc network delay, 256 tokens per node, 10 nodes: - Before topology_custom dev topology_custom.test_boot_time.1 1287.06s - After topology_custom dev topology_custom.test_boot_time.1 12.48s The test shows 100X boot time improvement 2) A SCT test to bootstrap 3 DCs, 3 nodes in each DC. - Before Time to bootstrap = 1h23m - After Time to bootstrap = 13m The test shows 6X bootstrap time improvement Fixes #19131	2024-11-25 13:46:17 +08:00
Dawid Mędrek	f913ae571f	db/view: Don't generate view updates for unselected columns The semantics of Scylla's materialized views may vary depending on how their primary keys correspond to the base table's one. One of the differences is how we handle writes to columns in the base table that are not selected by a view: * Case 1: The view's PK is a permutation of the base table's PK: Since the view's primary key cannot be changed in an update, a row in the view remains alive as long as the corresponding row in the base table is alive. The tricky part comes when the base table has columns that are NOT selected by the view. CQL3 used to not allow for defining a table that didn't have any other columns besides its primary key. Also, when inserting a row into a table, it was mandatory to provide at least one value aside from the primary key. At some point it changed [1] and the implementation of the solution relied on the notion of the row marker. Putting the details aside, consider the following scenario: (i) the base table has a primary key consisting of columns c_1, ..., c_k, and it has regular columns rc_1, ..., rc_n, (ii) the primary key of an MV defined on that table consists of a permutation of c_1, ..., c_k. The MV doesn't select at least one of the regular columns of the base table. Without loss of generality, let that unselected column be rc_1. (iii) the base table has a row R whose only non-null value is the one in the regular column rc_1. Now, what will R correspond to in the MV? The base table doesn't have a row marker, but all of its regular columns in the MV will be NULLs. That's NOT allowed. To solve that problem, all unselected columns have corresponding virtual columns in the MV; the only information they provide is whether there is a value in the base table or not. This way, the MV knows if a row is still alive or not. For that reason, we send view updates to virtual columns in the following cases: (i) the value in the column changes from NULL to a value, i.e. it's created, (ii) the value in the column exists, but its TTL has been updated. * Case 2: The view's PK has one more column that the base table's one: Since the primary key of the view has a regular column C from the base table, it is guaranteed that if there's a row in the MV, the corresponding row in the base table can remain alive: since C is part of the view's PK, it must have a value, so the row in the base table has a value in C too. The problem with virtual columns from the previous case doesn't manifest in this one. The liveness of the cell in C determines the liveness of the whole row in the view. The semantics gets more complex, but the conclusion is this: in case 1, virtual columns exist and we may need to generate view updates for them, while in case 2 virtual columns do NOT exist and so we don't generate view updates for them. What changes in this patch is we adjust the code to it. If a view has a regular column from the base table as part of its primary key, we no longer emit view updates when we change a column unselected by that view. It is purely an OPTIMIZATION change. [1]: https://issues.apache.org/jira/browse/CASSANDRA-4361 Fixes scylladb/scylladb#21652 Closes scylladb/scylladb#21653	2024-11-24 19:01:28 +02:00
Avi Kivity	29497f8c5d	Merge 'Automatically compute schema version of system tables' from Tomasz Grabiec Schema of system tables is defined statically and table_schema_version needs to be explicitly set in code like this: ``` builder.with_version(system_keyspace::generate_schema_version(table_id, version_offset)); ``` Whenever schema is changed, the schema version needs to change, otherwise we hit undefined behavior when trying to interpret mutation data created with the old schema using the new schema. It's not obvious that one needs to do that and developers often forget to do that. There were several instances of mistakes of omission, some caught during review, some not, e.g.: `31ea74b96e`. This patch changes definitions to call the new `schema_builder::with_hash_version()`, which will make the schema builder compute version from schema definition so that changes of the schema will automatically change the version. This way we no longer rely on the developer to remember to bump the version offset. All nodes should arrive at the same version, which is verified by existing `test_group0_schema_versioning` and a new unit test: `test_system_schema_version_is_stable`. Closes scylladb/scylladb#21602 * github.com:scylladb/scylladb: system_tables: Compute schema version automatically schema_builder: Introduce with_hash_version() schema: Store raw_view_info in schema::raw_schema schema: Remove dead comment hashing: Add hasher for unordered_map hashing: Add hasher for unique_ptr hashing: Add hasher for double [avi: add missing include <memory> to hashing.hh]	2024-11-24 18:44:32 +02:00
Tomasz Grabiec	0d2583600d	Merge 'Add tablet repair scheduler support' from Asias He This adds a new tablet migration kind: repair. It allows tablet repair scheduler to use this migration kind to schedule repair jobs. The current repair scheduler implementation does the following: - A tablet is picked to be repaired when is requested by user - The tablet repair can be scheduled along with tablet migration and rebuild. It runs in the tablet_migration track. - Repair jobs are scheduled in a smart way so that at any point in time, there are no more than configured jobs per shard, which is similar to scylla manager's control. New feature. No backport is needed. Closes scylladb/scylladb#21088 * github.com:scylladb/scylladb: test: Add tests for tablet repair scheduler repair: Add restful API for tablet repair repair: Add tablet repair scheduler internal API support docs: Update system_keyspace.md for tablet repair related info docs: Add docs for tablet repair migration repair: Add core tablet repair scheduler support messaging_service: Introduce TABLET_REPAIR verb tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info network_topology_strategy: Preserve fields of task_info in reallocate_tablets	2024-11-20 13:28:17 +01:00
Asias He	b71a563030	repair: Add core tablet repair scheduler support This adds a new tablet migration kind: repair. It allows tablet repair scheduler to use this migration kind to schedule repair jobs. The current repair scheduler implementation does the following: - A tablet is picked to be repaired when the time since last repair is bigger than a threshold (auto repair mode) or it is requested by user (manual repair mode) - The tablet repair can be scheduled along with tablet migration and rebuild. It runs in the tablet_migration track. - Repair jobs are scheduled in a smart way so that at any point in time, there are no more than configured jobs per shard, which is similar to scylla manager's control. In this patch, both the manual repair and the auto repair are not enabled yet.	2024-11-20 09:42:41 +08:00
Kefu Chai	33a0e5b892	treewide: replace boost::find_if with std::ranges::find_if now that we are allowed to use C++23. we now have the luxury of using `std::ranges::find_if`. in this change, we: - replace `boost::find_if` with `std::ranges::find_if` - remove all `#include <boost/range/algorithm/find_if.hpp>` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-19 10:50:01 +08:00
Nadav Har'El	e639434a89	change remaining sstring_view to std::string_view Our "sstring_view" is an historic alias for the standard std::string_view. The patch changes the last remaining random uses of this old alias across our source directory to the standard type name. After this patch, there are no more uses of the "sstring_view" alias. It will be removed in the following patch. Refs #4062. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-11-18 16:48:57 +02:00

... 17 18 19 20 21 ...

4972 Commits