scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Michał Chojnowski	d45dec49d3	view_updating_consumer: make buffer limit a variable The limit doesn't change at runtime, but we this patch makes it variable for unit testing purposes.	2023-07-11 10:45:58 +02:00
Michał Chojnowski	32645e6e3e	view: fix range tombstone handling on flushes in view_updating_consumer View update routines accept `mutation` objects. But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects. To build view updates after a repair/streaming, we have to convert the fragment stream into `mutation`s. This is done by piping the stream to mutation_rebuilder_v2. To keep memory usage limited, the stream for a single partition might have to be split into multiple partial `mutation` objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error. This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next `mutation` object). The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic. Fixes #14503	2023-07-11 10:45:58 +02:00
Botond Dénes	488d36f77e	Merge 'doc: fix rollback in the 4.3-to-2021.1, 5.0-to-2022.1, and 5.1-to-2022.2 upgrade guides' from Anna Stuchlik This PR fixes the Restore System Tables section of the upgrade guides by adding a command to clean upgraded SStables during rollback or adding the entire section to restore system tables (which was missing from the older documents). This PR fixes is a bug and must be backported to branch-5.3, branch-5.2., and branch-5.1. Refs: https://github.com/scylladb/scylla-enterprise/issues/3046 - [x] 5.1-to-2022.2 - update command (backport to branch-5.3, branch-5.2, and branch-5.1) - [x] 5.0-to-2022.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1) - [x] 4.3-to-2021.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1) (see https://github.com/scylladb/scylla-enterprise/issues/3046#issuecomment-1604232864) Closes #14444 * github.com:scylladb/scylladb: doc: fix rollback in 4.3-to-2021.1 upgrade guide doc: fix rollback in 5.0-to-2022.1 upgrade guide doc: fix rollback in 5.1-to-2022.2 upgrade guide (cherry picked from commit `8a7261fd70`)	2023-07-10 15:17:07 +03:00
Raphael S. Carvalho	3dd0cb3221	Make off-strategy compaction wait for view building completion Prior to off-strategy compaction, streaming / repair would place staging files into main sstable set, and wait for view building completion before they could be selected for regular compaction. The reason for that is that view building relies on table providing a mutation source without data in staging files. Had regular compaction mixed staging data with non-staging one, table would have a hard time providing the required mutation source. After off-strategy compaction, staging files can be compacted in parallel to view building. If off-strategy completes first, it will place the output into the main sstable set. So a parallel view building (on sstables used for off-strategy) may potentially get a mutation source containing staging data from the off-strategy output. That will mislead view builder as it won't be able to detect changes to data in main directory. To fix it, we'll do what we did before. Filter out staging files from compaction, and trigger the operation only after we're done with view building. We're piggybacking on off-strategy timer for still allowing the off-strategy to only run at the end of the node operation, to reduce the amount of compaction rounds on the data introduced by repair / streaming. Fixes #11882. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11919 (cherry picked from commit `a57724e711`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14365	2023-07-10 15:12:24 +03:00
Raphael S. Carvalho	e7a5c13aab	compaction: avoid excessive reallocation and during input list formatting with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging. in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13907 (cherry picked from commit `5544d12f18`) Fixes scylladb/scylladb#14071	2023-07-09 23:55:12 +03:00
Marcin Maliszkiewicz	da5926c080	docs: link general repairs page to RBNO page Information was duplicated before and the version on this page was outdated - RBNO is enabled for replace operation already. Closes #12984 (cherry picked from commit `bd7caefccf`)	2023-07-07 16:39:19 +02:00
Raphael S. Carvalho	f588b46f55	table: Optimize creation of reader excluding staging for view building View building from staging creates a reader from scratch (memtable + sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s to INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s Refs #14089. Fixes #14244. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14476	2023-07-06 10:32:31 +03:00
Botond Dénes	0a6676a594	Merge 'readers: evictable_reader: don't accidentally consume the entire partition' from Kamil Braun The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s. Fixes #13491 Closes #14375 * github.com:scylladb/scylladb: readers: evictable_reader: don't accidentally consume the entire partition test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position (cherry picked from commit `586102b42e`)	2023-06-29 12:05:04 +03:00
Michał Chojnowski	f13f8954a4	range_tombstone_change_generator: fix an edge case in flush() range_tombstone_change_generator::flush() mishandles the case when two range tombstones are adjacent and flush(pos, end_of_range=true) is called with pos equal to the end bound of the lesser-position range tombstone. In such case, the start change of the greater-position rtc will be accidentally emitted, and there won't be an end change, which breaks reader assumptions by ending the stream with an unclosed range tombstone, triggering an assertion. This is due to a non-strict inequality used in a place where strict inequality should be used. The modified line was intended to close range tombstones which end exactly on the flush position, but this is unnecessary because such range tombstones are handled by the last `if` in the function anyway. Instead, this line caused range tombstones beginning right after the flush position to be emitted sometimes. Fixes #12462 Closes #13906 (cherry picked from commit `9b0679c140`)	2023-06-27 07:43:28 +03:00
Anna Mikhlin	b635a30b59	release: prepare for 5.1.13 scylla-5.1.13	2023-06-22 16:44:06 +03:00
Avi Kivity	342d13e26a	Update seastar submodule (default priority class shares) * seastar 8d7cc3129d...5c27348333 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: `37e6e65211`	2023-06-21 21:24:56 +03:00
Pavel Emelyanov	db01be31c6	Backport 'Merge 'Enlighten messaging_service::shutdown()'' This includes seastar update titled 'Merge 'Split rpc::server stop into two parts'' Includes backport of #12244 fix * br-5.1-backport-ms-shutdown: messaging_service: Shutdown rpc server on shutdown messaging_service: Generalize stop_servers() messaging_service: Restore indentation after previous patch messaging_service: Coroutinize stop() messaging_service: Coroutinize stop_servers() messaging: Shutdown on stop() if it wasn't shut down earlier Update seastar submodule refs: #14031	2023-06-14 09:28:56 +03:00
Pavel Emelyanov	87531915d9	messaging_service: Shutdown rpc server on shutdown The RPC server now has a lighter .shutdown() method that just does what m.s. shutdown() needs, so call it. On stop call regular stop to finalize the stopping process backport: The messaging_service::shutdown() had conflict due to missing `e147681d85` commit Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:28:23 +03:00
Pavel Emelyanov	4075daf96d	messaging_service: Generalize stop_servers() Make it do_with_servers() and make it accept method to call and message to print. This gives the ability to reuse this helper in next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:28:23 +03:00
Pavel Emelyanov	b27c5567fa	messaging_service: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:28:23 +03:00
Pavel Emelyanov	8877f0b28a	messaging_service: Coroutinize stop() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:28:23 +03:00
Pavel Emelyanov	fabc7df720	messaging_service: Coroutinize stop_servers() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:28:23 +03:00
Tomasz Grabiec	bdafe2b98c	messaging: Shutdown on stop() if it wasn't shut down earlier All rpc::client objects have to be stopped before they are destroyed. Currently this is done in messaging_service::shutdown(). The cql_test_env does not call shutdown() currently. This can lead to use-after-free on the rpc::client object, manifesting like this: Segmentation fault on shard 0. Backtrace: column_mapping::~column_mapping() at schema.cc:? db::cql_table_large_data_handler::internal_record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:180 operator() at ./db/large_data_handler.cc:123 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>(std::__invoke_other, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>(db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:114 (inlined by) std::_Function_handler<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long), db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1>::_M_invoke(std::_Any_data const&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290 std::function<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long)>::operator()(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591 (inlined by) db::cql_table_large_data_handler::record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:175 seastar::rpc::log_exception(seastar::rpc::connection&, seastar::log_level, char const, std::__exception_ptr::exception_ptr) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:109 operator() at ./build/release/seastar/./seastar/src/rpc/rpc.cc:788 operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1682 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}&&) at ./build/release/seastar/./seastar/include/seastar/core/future.hh:2134 (inlined by) operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1681 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:781 seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2319 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2756 seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2925 seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2808 seastar::app_template::run_deprecated(int, char, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:265 seastar::app_template::run(int, char, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:156 operator() at ./build/release/seastar/./seastar/src/testing/test_runner.cc:75 (inlined by) void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>(seastar::testing::test_runner::start_thread(int, char)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111 (inlined by) std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char)::$_0>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290 std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73 Fix by making sure that shutdown() is called prior to destruction. Fixes #12244 Closes #12276	2023-06-14 09:28:23 +03:00
Pavel Emelyanov	d78bc60a74	Update seastar submodule * seastar 09063faa...8d7cc312 (1): > rpc: Introduce server::shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-14 09:21:20 +03:00
Raphael S. Carvalho	97985a68a1	compaction: Fix incremental compaction for sstable cleanup After `c7826aa910`, sstable runs are cleaned up together. The procedure which executes cleanup was holding reference to all input sstables, such that it could later retry the same cleanup job on failure. Turns out it was not taking into account that incremental compaction will exhaust the input set incrementally. Therefore cleanup is affected by the 100% space overhead. To fix it, cleanup will now have the input set updated, by removing the sstables that were already cleaned up. On failure, cleanup will retry the same job with the remaining sstables that weren't exhausted by incremental compaction. New unit test reproduces the failure, and passes with the fix. Fixes #14035. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14038 (cherry picked from commit `23443e0574`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14195	2023-06-13 09:57:59 +03:00
Avi Kivity	c1278994d8	Merge 'multishard_mutation_query: make reader_context::lookup_readers() exception safe' from Botond Dénes With regards to closing the looked-up querier if an exception is thrown. In particular, this requires closing the querier if a semaphore mismatch is detected. Move the table lookup above the line where the querier is looked up, to avoid having to handle the exception from it. As a consequence of closing the querier on the error path, the lookup lambda has to be made a coroutine. This is sad, but this is executed once per page, so its cost should be insignificant when spread over an entire page worth of work. Also add a unit test checking that the mismatch is detected in the first place and that readers are closed. Fixes: #13784 Closes #13790 * github.com:scylladb/scylladb: test/boost/database_test: add unit test for semaphore mismatch on range scans partition_slice_builder: add set_specific_ranges() multishard_mutation_query: make reader_context::lookup_readers() exception safe multishard_mutation_query: lookup_readers(): make inner lambda a coroutine (cherry picked from commit `1c0e8c25ca`)	2023-06-08 05:12:12 -04:00
Michał Chojnowski	c33fb41802	data_dictionary: fix forgetting of UDTs on ALTER KEYSPACE Due to a simple programming oversight, one of keyspace_metadata constructors is using empty user_types_metadata instead of the passed one. Fix that. Fixes #14139 Closes #14143 (cherry picked from commit `1a521172ec`)	2023-06-06 21:53:03 +03:00
Kamil Braun	bca4bf6c11	auth: don't use infinite timeout in `default_role_row_satisfies` query A long long time ago there was an issue about removing infinite timeouts from distributed queries: #3603. There was also a fix: `620e950fc8`. But apparently some queries escaped the fix, like the one in `default_role_row_satisfies`. With the right conditions and timing this query may cause a node to hang indefinitely on shutdown. A node tries to perform this query after it starts. If we kill another node which is required to serve this query right before that moment, the query will hang; when we try to shutdown the querying node, it will wait for the query to finish (it's a background task in auth service), which it never does due to infinite timeout. Use the same timeout configuration as other queries in this module do. Fixes #13545. Closes #14134 (cherry picked from commit `f51312e580`)	2023-06-06 19:39:55 +03:00
Anna Mikhlin	cf08b19dad	release: prepare for 5.1.12 scylla-5.1.12	2023-06-05 18:13:42 +03:00
Vlad Zolotarov	0d5751b4b6	scylla_prepare: correctly handle a former 'MQ' mode Fixes a regression introduced in `80917a1054`: "scylla_prepare: stop generating 'mode' value in perftune.yaml" When cpuset.conf contains a "full" CPU set the negation of it from the "full" CPU set is going to generate a zero mask as a irq_cpu_mask. This is an illegal value that will eventually end up in the generated perftune.yaml, which in line will make the scylla service fail to start until the issue is resolved. In such a case a irq_cpu_mask must represent a "full" CPU set mimicking a former 'MQ' mode. \Fixes scylladb/scylladb#11701 Tested: - Manually on a 2 vCPU VM in an 'auto-selection' mode. - Manually on a large VM (48 vCPUs) with an 'MQ' manually enforced. Message-Id: <20221004004237.2961246-1-vladz@scylladb.com> (cherry picked from commit `8195dab92a`)	2023-06-04 19:26:04 +03:00
Vlad Zolotarov	1c7cdf68d6	scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions This patch fixes the regression introduced by `3a51e78` which broke a very important contract: perftune.yaml should not be "touched" by Scylla scriptology unless explicitly requested. And a call for scylla_cpuset_setup is such an explicit request. The issue that the offending patch was intending to fix was that cpuset.conf was always generated anew for every call of scylla_cpuset_setup - even if a resulting cpuset.conf would come out exactly the same as the one present on the disk before tha call. And since the original code was following the contract mentioned above it was also deleting perftune.yaml every time too. However, this was just an unavoidable side-effect of that cpuset.conf re-generation. The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf we should not "touch" perftune.yaml and vise versa. This patch implements exactly that together with reverting the dangerous logic introduced by `3a51e78`. \Fixes scylladb/scylladb#11385 \Fixes scylladb/scylladb#10121 (cherry picked from commit `c538cc2372`)	2023-06-04 19:25:41 +03:00
Vlad Zolotarov	8fc0591f98	scylla_prepare: stop generating 'mode' value in perftune.yaml Modern perftune.py supports a more generic way of defining IRQ CPUs: 'irq_cpu_mask'. This patch makes our auto-generation code create a perftune.yaml that uses this new parameter instead of using outdated 'mode'. As a side effect, this change eliminates the notion of "incorrect" value in cpuset.conf - every value is valid now as long as it fits into the 'all' CPU set of the specific machine. Auto-generated 'irq_cpu_mask' is going to include all bits from 'all' CPU mask except those defined in cpuset.conf. \Fixes scylladb/scylladb#9903 (cherry picked from commit `80917a1054`)	2023-06-04 19:25:27 +03:00
Pavel Emelyanov	a0fa2d043d	Update seastar submodule * seastar a6389d17...09063faa (1): > rpc: Wait for server socket to stop before killing conns Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-30 20:03:52 +03:00
Botond Dénes	50a3cc6b90	compatible_ring_position_or_view: make it cheap to copy This class exists for one purpose only: to serve as glue code between dht::ring_position and boost::icl::interval_map. The latter requires that keys in its intervals are: * default constructible * copyable * have standalone compare operations For this reason we have to wrap `dht::ring_position` in a class, together with a schema to provide all this. This is `compatible_ring_position`. There is one further requirement by code using the interval map: it wants to do lookups without copying the lookup key(s). To solve this, we came up with `compatible_ring_position_or_view` which is a union of a key or a key view + schema. As we recently found out, boost::icl copies its keys a lot. It seems to assume these keys are cheap to copy and carelessly copies them around even when iterating over the map. But `compatible_ring_position_or_view` is not cheap to copy as it copies a `dht::ring_position` which allocates, and it does that via an `std::optional` and `std::variant` to add insult to injury. This patch make said class cheap to copy, by getting rid of the variant and storing the `dht::ring_position` via a shared pointer. The view is stored separately and either points to the ring position stored in the shared pointer or to an outside ring position (for lookups). Fixes: #11669 Closes #11670 (cherry picked from commit `169a8a66f2`)	2023-05-25 17:30:40 +03:00
Botond Dénes	cfa8fa1d77	Merge 'Backport compaction reevaluation fixes to branch-5.1' from Raphael "Raph" Carvalho Fixes #13429. Fixes #12390. Fixes #13430. Closes #14009 * github.com:scylladb/scylladb: compaction: Make compaction reevaluation actually periodic compaction_manager: Fix reactor stalls during periodic submissions compaction_manager: reindent postponed_compactions_reevaluation() compaction_manager: coroutinize postponed_compactions_reevaluation() compaction_manager: make postponed_compactions_reevaluation() return a future replica: Reevaluate regular compaction on off-strategy completion	2023-05-25 07:55:17 +03:00
Raphael S. Carvalho	6cdd5ccabd	compaction: Make compaction reevaluation actually periodic The manager intended to periodically reevaluate compaction need for each registered table. But it's not working as intended. The reevaluation is one-off. This means that compaction was not kicking in later for a table, with low to none write activity, that had expired data 1 hour from now. Also make sure that reevaluation happens within the compaction scheduling group. Fixes #13430. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `156ac0a67a`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:30:47 -03:00
Raphael S. Carvalho	204baa0c1e	compaction_manager: Fix reactor stalls during periodic submissions Every 1 hour, compaction manager will submit all registered table_state for a regular compaction attempt, all without yielding. This can potentially cause a reactor stall if there are 1000s of table states, as compaction strategy heuristics will run on behalf of each, and processing all buckets and picking the best one is not cheap. This problem can be magnified with compaction groups, as each group is represented by a table state. This might appear in dashboard as periodic stalls, every 1h, misleading the investigator into believing that the problem is caused by a chronological job. This is fixed by piggybacking on compaction reevaluation loop which can yield between each submission attempt if needed. Fixes #12390. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12391 (cherry picked from commit `67ebd70e6e`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:18:36 -03:00
Avi Kivity	3556d2b4e8	compaction_manager: reindent postponed_compactions_reevaluation() (cherry picked from commit `d2b1d2f695`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:18:18 -03:00
Avi Kivity	6b699c9667	compaction_manager: coroutinize postponed_compactions_reevaluation() So much nicer. (cherry picked from commit `1669025736`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:17:58 -03:00
Avi Kivity	316ea63ea0	compaction_manager: make postponed_compactions_reevaluation() return a future postponed_compactions_reevaluation() runs until compaction_manager is stopped, checking if it needs to launch new compactions. Make it return a future instead of stashing its completion somewhere. This makes is easier to convert it to a coroutine. (cherry picked from commit `d2c44cba77`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:17:26 -03:00
Raphael S. Carvalho	bafde878ba	replica: Reevaluate regular compaction on off-strategy completion When off-strategy compaction completes, regular compaction is not triggered. If off-strategy output causes the table's SSTable set to not conform the strategy goal, it means that read and space amplification will be suboptimal until the next compaction kicks in, which can take undefinite amount of time (e.g. when active memtable is flushed). Let's reevaluate compaction on main SSTable set when off-strategy ends. Fixes #13429. Backport note: conflict is around compaction_group vs table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `2652b41606`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-23 21:06:14 -03:00
Yaron Kaikov	88eeab7838	release: prepare for 5.1.11	2023-05-22 15:14:22 +03:00
Raphael S. Carvalho	f6230b5eec	sstables: Fix use-after-move when making reader in reverse mode static report: sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed] legacy_reverse_slice_to_native_reverse_slice(schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor); Fixes #13394. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `213eaab246`)	2023-05-15 20:27:51 +03:00
Raphael S. Carvalho	737285d342	db/view/build_progress_virtual_reader: Fix use-after-move use-after-free in ctor, which potentially leads to a failure when locating table from moved schema object. static report In file included from db/system_keyspace.cc:51: ./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed] _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS), Fixes #13395. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `1ecba373d6`)	2023-05-15 20:26:17 +03:00
Raphael S. Carvalho	f1ee68e128	index/built_indexes_virtual_reader.hh: Fix use-after-move static report: ./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed] _db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS), Fixes #13396. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `f8df3c72d4`)	2023-05-15 20:24:52 +03:00
Raphael S. Carvalho	7d4abd9e64	replica: Fix use-after-move in table::make_streaming_reader Variant used by streaming/stream_transfer_task.cc: , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs)) as full slice is retrieved after schema is moved (clang evaluates left-to-right), the stream transfer task can be potentially working on a stale slice for a particular set of partitions. static report: In file included from replica/dirty_memory_manager.cc:6: replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed] return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice()); Fixes #13397. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `04932a66d3`)	2023-05-15 20:22:15 +03:00
Asias He	0b4b5c21ad	tombstone_gc: Fix gc_before for immediate mode The immediate mode is similar to timeout mode with gc_grace_seconds zero. Thus, the gc_before returned should be the query_time instead of gc_clock::time_point::max in immediate mode. Setting gc_before to gc_clock::time_point::max, a row could be dropped by compaction even if the ttl is not expired yet. The following procedure reproduces the issue: - Start 2 nodes - Insert data ``` CREATE KEYSPACE ks2a WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 }; CREATE TABLE ks2a.tb (pk int, ck int, c0 text, c1 text, c2 text, PRIMARY KEY(pk, ck)) WITH tombstone_gc = {'mode': 'immediate'}; INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (10 ,1, 'x', 'y', 'z') USING TTL 1000000; INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (20 ,1, 'x', 'y', 'z') USING TTL 1000000; INSERT into ks2a.tb (pk,ck, c0, c1, c2) values (30 ,1, 'x', 'y', 'z') USING TTL 1000000; ``` - Run nodetool flush and nodetool compact - Compaction drops all data ``` ~128 total partitions merged to 0. ``` Fixes #13572 Closes #13800 (cherry picked from commit `7fcc403122`)	2023-05-15 10:34:16 +03:00
Takuya ASADA	3c7d2a3284	scylla_kernel_check: suppress verbose iotune messages Stop printing verbose iotune messages while the check, just print error message. Fixes #13373. Closes #13362 (cherry picked from commit `160c184d0b`)	2023-05-14 21:26:15 +03:00
Benny Halevy	d7e65a1a0a	view: view_builder: start: demote sleep_aborted log error This is not really an error, so print it in debug log_level rather than error log_level. Fixes #13374 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13462 (cherry picked from commit `cc42f00232`)	2023-05-14 21:22:21 +03:00
Raphael S. Carvalho	73d80d55d1	Fix use-after-move when initializing row cache with dummy entry Courtersy of clang-tidy: row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move] _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema}); ^ row_cache.cc:1191:60: note: move occurred here _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema}); ^ row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema}); The use-after-move is UB, as for it to happen, depends on evaluation order. We haven't hit it yet as clang is left-to-right. Fixes #13400. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13401 (cherry picked from commit `d2d151ae5b`)	2023-05-14 21:02:52 +03:00
Anna Mikhlin	37538f00f5	release: prepare for 5.1.10 scylla-5.1.10 scylla-5.1.11	2023-05-08 22:11:10 +03:00
Botond Dénes	1cb11e7e2f	Update seastar submodule * seastar 84858fde...a6389d17 (2): > core/on_internal_error: always log error with backtrace > on_internal_error: refactor log_error_and_backtrace Fixes: #13786	2023-05-08 10:36:55 +03:00
Marcin Maliszkiewicz	f4200098ce	db: view: use deferred_close for closing staging_sstable_reader When consume_in_thread throws the reader should still be closed. Related https://github.com/scylladb/scylla-enterprise/issues/2661 Closes #13398 Refs: scylladb/scylla-enterprise#2661 Fixes: #13413 (cherry picked from commit `99f8d7dcbe`)	2023-05-08 09:45:54 +03:00
Botond Dénes	f751613924	Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used. Fixes: #12939 Closes #12963 * github.com:scylladb/scylladb: test:boost: counter column parallelized aggregation test service:forward_service: use long type when column is counter (cherry picked from commit `61e67b865a`)	2023-05-07 14:29:33 +03:00
Michał Jadwiszczak	b38d56367f	test/boost/cql_query_test: enable `parallelized_aggregation` Run tests for parallelized aggregation with `enable_parallelized_aggregation` set always to true, so the tests work even if the default value of the option is false. Closes #12409 (cherry picked from commit `83bb77b8bb`) Ref #12939.	2023-05-07 14:29:33 +03:00

1 2 3 4 5 ...

32657 Commits