Commit Graph

265 Commits

Author SHA1 Message Date
Gleb Natapov
85cffd1aeb lwt: rewrite storage_proxy::cas using coroutings
Makes code much simpler to understand.

Message-Id: <20201201160213.GW1655743@scylladb.com>
2020-12-17 18:15:35 +01:00
Piotr Sarna
27fba35832 storage_proxy: start propagating local timeouts as timeouts
A local timeout was previously propagated to the client as WriteFailure,
while there exists a more concrete error type for that: WriteTimeout.
2020-12-14 07:50:40 +01:00
Pavel Emelyanov
4c7bc8a3d1 storage_proxy: Add .local_db() getters
To facilitate the next patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 18:48:02 +03:00
Piotr Dulikowski
0fd36e2579 api: allow changing hinted handoff configuration
This commit makes it possible to change hints manager's configuration at
runtime through HTTP API.

To preserve backwards compatibility, we keep the old behavior of not
creating and checking hints directories if they are not enabled at
startup. Instead, hint directories are lazily initialized when hints are
enabled for the first time through HTTP API.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
1302f1b5bf storage_proxy: always create hints manager
Now, the hints manager object for regular hints is always created, even
if hints are disabled in configuration. Please note that the behavior of
hints will be unchanged - no hints will be sent when they are disabled.
The intent of this change is to make enabling and disabling hints in
runtime easier to implement.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
cefe5214ff config: plug in hints::host_filter object into configuration
Uses db::hints::host_filter as the type of hinted_handoff_enabled
configuration option.

Previously, hinted_handoff_enabled used to be a string option, and it
was parsed later in a separate function during startup. The function
returned a std::optional<std::unordered_set<sstring>>, whose meaning in
the context of hints is rather enigmatic for an observer not familiar
with hints.

Now, hinted_handoff_enabled has type of db::hints::host_filter, and it
is plugged into the config parsing framework, so there is no need for
later post-processing.
2020-11-17 10:24:42 +01:00
Piotr Sarna
fc8ffe08b9 storage_proxy: unify retiring view response handlers
Materialized view updates participate in a retirement program,
which makes sure that they are immediately taken down once their
target node is down, without having to wait for timeout (since
views are a background operation and it's wasteful to wait in the
background for minutes). However, this mechanism has very delicate
lifetime issues, and it already caused problems more than once,
most recently in #5459.
In order to make another bug in this area less likely, the two
implementations of the mechanism, in on_down() and drain_on_shutdown(),
are unified.

Possibly refs #7572

Closes #7624
2020-11-16 18:50:49 +02:00
Benny Halevy
572638671c storage_proxy: query_ranges_to_vnodes_generator ranges_to_vnodes: use token_metadata_ptr
Fixes use-after-free seen with putget_with_reloaded_certificates_test:
```
==215==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000a8b180 at pc 0x000012eb5a83 bp 0x7ffd2c16d4c0 sp 0x7ffd2c16d4b0
READ of size 8 at 0x603000a8b180 thread T0
    #0 0x12eb5a82 in std::__uniq_ptr_impl<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::_M_ptr() const /usr/include/c++/10/bits/unique_ptr.h:173
    #1 0x12ea230d in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::get() const /usr/include/c++/10/bits/unique_ptr.h:422
    #2 0x12e8d3e8 in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::operator->() const /usr/include/c++/10/bits/unique_ptr.h:416
    #3 0x12e5d0a2 in locator::token_metadata::ring_range(std::optional<interval_bound<dht::ring_position> > const&, bool) const locator/token_metadata.cc:1712
    #4 0x112d0126 in service::query_ranges_to_vnodes_generator::process_one_range(unsigned long, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > >&) service/storage_proxy.cc:4658
    #5 0x112cf3c5 in service::query_ranges_to_vnodes_generator::operator()(unsigned long) service/storage_proxy.cc:4616
    #6 0x112b2261 in service::storage_proxy::query_partition_key_range_concurrent(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, std::allocator<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, unsigned long, unsigned int, std::unordered_map<nonwrapping_interval<dht::token>, std::vector<utils::UUID, std::allocator<utils::UUID> >, std::hash<nonwrapping_interval<dht::token> >, std::equal_to<nonwrapping_interval<dht::token> >, std::allocator<std::pair<nonwrapping_interval<dht::token> const, std::vector<utils::UUID, std::allocator<utils::UUID> > > > >, service_permit) service/storage_proxy.cc:4023
    #7 0x112b094e in operator() service/storage_proxy.cc:4160
    #8 0x1139c8bb in invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2088
    #9 0x1136625b in futurize_invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2119
    #10 0x11366372 in operator()<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1480
    #11 0x1139cc3b in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145
    #12 0x116f4944 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::operator()(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #13 0x116b3397 in seastar::future<service::query_partition_key_range_concurrent_result> std::__invoke_impl<seastar::future<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(std::__invoke_other, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:60
    #14 0x1165c3a6 in std::__invoke_result<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::type std::__invoke<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:96
    #15 0x115e6542 in decltype(auto) std::__apply_impl<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >, 0ul>(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, std::integer_sequence<unsigned long, 0ul>) /usr/include/c++/10/tuple:1724
    #16 0x115e6663 in decltype(auto) std::apply<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&) /usr/include/c++/10/tuple:1736
    #17 0x115e63f9 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #18 0x1165c4b9 in void seastar::futurize<seastar::future<service::query_partition_key_range_concurrent_result> >::satisfy_with_result_of<seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2073
    #19 0x115e61f5 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1528
    #20 0x1176e9cc in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:746
    #21 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196
    #22 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575
    #23 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730
    #24 0x168ae4f7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207
    #25 0x168ac541 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115
    #26 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504
    #27 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041)
    #28 0xd67c9ed in _start (/local/home/bhalevy/.dtest/dtest-o0qoqmkr/test/node3/bin/scylla+0xd67c9ed)

0x603000a8b180 is located 16 bytes inside of 24-byte region [0x603000a8b170,0x603000a8b188)
freed by thread T0 here:
    #0 0x7f8d92a190cf in operator delete(void*, unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb30cf)
    #1 0xd7ebe54 in seastar::internal::lw_shared_ptr_accessors_no_esft<locator::token_metadata>::dispose(seastar::lw_shared_ptr_counter_base*) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:213
    #2 0x112b155d in seastar::lw_shared_ptr<locator::token_metadata const>::~lw_shared_ptr() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:300
    #3 0x112b155d in ~<lambda> service/storage_proxy.cc:4137
    #4 0x1132e92d in ~<lambda> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1479
    #5 0x1139cc91 in destroy /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:148
    #6 0x11565673 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::~noncopyable_function() /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:181
    #7 0x1176e783 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::~continuation() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:729
    #8 0x1176ea06 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:750
    #9 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196
    #10 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575
    #11 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730
    #12 0x168ae4f7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207
    #13 0x168ac541 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115
    #14 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504
    #15 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041)

previously allocated by thread T0 here:
    #0 0x7f8d92a18067 in operator new(unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb2067)
    #1 0x13cf7132 in seastar::lw_shared_ptr<locator::token_metadata> seastar::lw_shared_ptr<locator::token_metadata>::make<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:266
    #2 0x13cc3bfa in seastar::lw_shared_ptr<locator::token_metadata> seastar::make_lw_shared<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:422
    #3 0x13ca3007 in seastar::lw_shared_ptr<locator::token_metadata> locator::make_token_metadata_ptr<locator::token_metadata>(locator::token_metadata) locator/token_metadata.hh:338
    #4 0x13c9bdd4 in locator::shared_token_metadata::clone() const locator/token_metadata.hh:358
    #5 0x13c9c18a in service::storage_service::get_mutable_token_metadata_ptr() service/storage_service.hh:184
    #6 0x13a5a445 in service::storage_service::handle_state_normal(gms::inet_address) service/storage_service.cc:1129
    #7 0x13a6371c in service::storage_service::on_change(gms::inet_address, gms::application_state, gms::versioned_value const&) service/storage_service.cc:1421
    #8 0x12a86269 in operator() gms/gossiper.cc:1639
    #9 0x12ad3eea in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145
    #10 0x12be2aff in seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>::operator()(seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #11 0x12bb8e98 in atomic_vector<seastar::shared_ptr<gms::i_endpoint_state_change_subscriber> >::for_each(seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>) utils/atomic_vector.hh:62
    #12 0x12a8662b in gms::gossiper::do_on_change_notifications(gms::inet_address, gms::application_state const&, gms::versioned_value const&) gms/gossiper.cc:1638
    #13 0x12a9387c in operator() gms/gossiper.cc:1978
    #14 0x12b49b20 in __invoke_impl<void, gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:60
    #15 0x12b21fd6 in __invoke<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:95
    #16 0x12b02865 in __apply_impl<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1723
    #17 0x12b028d8 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1734
    #18 0x12b02967 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2052
    #19 0x12ad866a in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/thread.hh:258
    #20 0x12b609c2 in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:116
    #21 0xdfabb5f in seastar::noncopyable_function<void ()>::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #22 0x16e21bb4 in seastar::thread_context::main() /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:297
    #23 0x16e2190f in seastar::thread_context::s_main(int, int) /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:275
    #24 0x7f8d9060322f  (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x5222f)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
3fab0f8694 storage_proxy: convert to shared_token_metadata
get() the latest token_metadata_ptr from the
shared_token_metadata before each use.

expose get_token_metadata_ptr() rather than get_token_metadata()
so that caller can keep it across continuations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
29ed59f8c4 main: start a shared_token_metadata
And use it to get a token_metadata& compatible
with current usage, until the services are converted to
use token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Eliran Sinvani
933b44f676 Storage proxy: use hints smp group in mutate locally
We are using mutate_locally to handle hint mutations that arrived
through RPC. The current implementation makes no distinction whether
the mutation came through hint verb or a mutation verb resulting in
using the same smp group for both. This commit adds the ability to
reference different smp group in mutate_locally private calls and
makes the handlers pass the correct smp group to mutate_locally.
2020-09-08 10:03:50 +03:00
Eliran Sinvani
342fc07bd6 Storage proxy: add a dedicated smp group for hints
Hints and regular writes currently uses the same cross shard
operation semaphore, which can lead to priority inversion, making
cross shard writes wait for cross shard hints. This commit adds
an smp_service_group for hints and adds it usage in the mutate_hint
function.
2020-09-07 15:46:12 +03:00
Avi Kivity
0dcb16c061 Merge "Constify access to token_metadata" from Benny
"
We keep refrences to locator::token_metadata in many places.
Most of them are for read-only access and only a few want
to modify the token_metadata.

Recently, in 94995acedb,
we added yielding loops that access token_metadata in order
to avoid cpu stalls.  To make that possible we need to make
sure they token_metadata object they are traversing won't change
mid-loop.

This series is a first step in ensuring the serialization of
updates to shared token metadata to reading it.

Test: unit(dev)
Dtest: bootstrap_test:TestBootstrap.start_stop_test{,_node}, update_cluster_layout_tests.py -a next-gating(dev)
"

* tag 'constify-token-metadata-access-v2' of github.com:bhalevy/scylla:
  api/http_context: keep a const sharded<locator::token_metadata>&
  gossiper: keep a const token_metadata&
  storage_service: separate get_mutable_token_metadata
  range_streamer: keep a const token_metadata&
  storage_proxy: delete unused get_restricted_ranges declaration
  storage_proxy: keep a const token_metadata&
  storage_proxy: get rid of mutable get_token_metadata getter
  database: keep const token_metadata&
  database: keyspace_metadata: pass const locator::token_metadata& around
  everywhere_replication_strategy: move methods out of line
  replication_strategy: keep a const token_metadata&
  abstract_replication_strategy: get_ranges: accept const token_metadata&
  token_metadata: rename calculate_pending_ranges to update_pending_ranges
  token_metadata: mark const methods
  token_ranges: pending_endpoints_for: return empty vector if keyspace not found
  token_ranges: get_pending_ranges: return empty vector if keyspace not found
  token_ranges: get rid of unused get_pending_ranges variant
  replication_strategy: calculate_natural_endpoints: make token_metadata& param const
  token_metadata: add get_datacenter_racks() const variant
2020-08-22 20:47:45 +03:00
Benny Halevy
2c61383215 storage_proxy: delete unused get_restricted_ranges declaration
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Benny Halevy
c8390da5f9 storage_proxy: keep a const token_metadata&
storage_proxy doesn't need to change token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Benny Halevy
dfa5f8ff1e storage_proxy: get rid of mutable get_token_metadata getter
We'd like to strictly control who can modify token metadata
and nobody currently needs a mutable reference to storage_proxy::_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Pavel Emelyanov
24cb1b781f storage_proxy: Keep reference on messaging
The proxy is another user of messaging, so keep the reference on it. Its
real usage will come in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Wojciech Mitros
45215746fe increase the maximum size of query results to 2^64
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.

The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).

In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.

The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).

Fixes #5101.
2020-08-03 17:32:49 +02:00
Botond Dénes
159d37053d storage_proxy: use read_command::max_result_size to pass max result size around
Use the recently added `max_result_size` field of `query::read_command`
to pass the max result size around, including passing it to remote
nodes. This means that the max result size will be sent along each read,
instead of once per connection.
As we want to select the appropriate `max_result_size` based on the type
of the query as well as based on the query class (user or internal) the
previous method won't do anymore. If the remote doesn't fill this
field, the old per-connection value is used.
2020-07-28 18:00:29 +03:00
Botond Dénes
9eb6d704b2 storage_proxy: add get_max_result_size()
Meant to be used by the coordinator node to obtain the max result size
applicable to the query-class (determined based on the current
scheduling group). For normal  paged queries the previously used
`query::result_memory_limiter::maximum_result_size` is used uniformly.
For reverse and unpaged queries, a query class dependent value is used.
For user reads, the value of the
`max_memory_for_unlimited_query_{soft,hard}_limit` configuration items
is used, for other classes no limit is used
(`query::result_memory_limiter::unlimited_result_size`).
2020-07-28 18:00:29 +03:00
Pavel Emelyanov
f845a78d9a storage_proxy: Detach rpc unregistration from stop
The proxy's stop method is not called (and unlikely will be soon), but stopping
the message handlers is needed now, so prepare the existing method for this.'

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-22 16:31:57 +03:00
Pavel Emelyanov
a80403e8f3 storage_proxy: Remove frozen_mutation.hh inclustion
Nothing in it requres the needed classes any longer, forward
declarations are enough.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:47:30 +03:00
Pavel Emelyanov
6174252282 storage_proxy: Move paxos/*.hh inclusions from .hh to .cc
The storage_proxy.hh can live with forward declarations of paxos
classes it refers to.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:44:02 +03:00
Pavel Emelyanov
3df4f3078f storage_proxy: Move hint_wrapper from .hh to .cc
It's only used there, but requires mutation_query.hh, which can thus be
removed from storage_proxy.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:40:25 +03:00
Pavel Solodovnikov
5ff5df1afd storage_proxy: un-hardcode force sync flag for mutate_locally(mutation) overload
Corresponding overload of `storage_proxy::mutate_locally`
was hardcoded to pass `db::commitlog::force_sync::no` to the
`database::apply`. Unhardcode it and substitute `force_sync::no`
to all existing call sites (as it were before).

`force_sync::yes` will be used later for paxos learn writes
when trying to apply mutations upgraded from an obsolete
schema version (similar to the current case when applying
locally a `frozen_mutation` stored in accepted proposal).

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com>
2020-07-16 16:38:48 +03:00
Avi Kivity
e5be3352cf database, streaming, messaging: drop streaming memtables
Before Scylla 3.0, we used to send streaming mutations using
individual RPC requests and flush them together using dedicated
streaming memtables. This mechanism is no longer in use and all
versions that use it have long reached end-of-life.

Remove this code.
2020-06-25 15:25:54 +02:00
Kamil Braun
013330199d cdc/storage_proxy: keep cdc_service alive in storage_proxy operations
storage_proxy is never deinitialized, so it may have still used cdc_service
after its destructor was called.

This fixes the problem by cdc_service inheriting from
async_sharded_service and storage_proxy calling shared_from_this on
the service whenever it uses it.

cdc_service inherits from async_sharded_service and not simply from
enable_shared_from_this, because there might be other services that
cdc_service depends on. Assuming that these services are
deinitialized after cdc_service (as they should), i.e. after stop() is
called on cdc_service, making cdc_service async_sharded_service will
keep their deinitialization code from being called until all references
to cdc_service disappear (async_sharded_service keeps stop() from
returning until this happens).

Some more improvements should be possible through some refactoring:
1. Make augment_mutation_call a free function, not a member of
   cdc_service: it doesn't need any state that cdc_service has.
   db_context can be passed down from storage_proxy when it calls the
   function.
2. Remove the storage_proxy -> cdc_service reference. storage_proxy
   only needs augment_mutation_call, which would not be a part of the
   service. This would also get rid of the proxy -> cdc -> proxy
   reference cycle that we have now, and would allow storage_proxy to be
   safely deinitialized after cdc_service.
3. Maybe we could even remove the cdc_service -> storage_proxy
   reference. Is it really needed?
2020-06-08 13:25:51 +03:00
Gleb Natapov
e3ff88e674 lwt: prune system.paxos table when quorum of replicas learned the value
Instead of waiting for all replicas to reply execute prune after quorum
of replicas. This will keep system.paxos smaller in the case where one
node is down.

Fixes #6330

Message-Id: <20200525110822.GC233208@scylladb.com>
2020-05-27 08:40:05 +03:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Gleb Natapov
d555fb60d7 lwt: add counters for background and foreground paxos operations
Paxos may leave an operation in a background after returning result to a
caller. Lest add a counter for background/foreground paxos handlers so
that it will be easier to detect memory related issues.

Message-Id: <20200510092942.GA24506@scylladb.com>
2020-05-11 14:37:00 +02:00
Gleb Natapov
4622c61a37 lwt: linearise reads
Currently the following scenario may happen:

Consider 3 nodes A, B and C and a LWT failed write operation that
managed to get V accepted on A. The value is read twice. First read
access B and C and returns nothing. Next one access A and B, notices
failed round and completes it. Returns value V. Since two consequent
reads without any writes in the middle return different value this
breaks linearisability.

This happens because read does not do full paxos round. The patch
makes read code to reuse the same logic as write by writing a dummy
value which ensures that complete paxos round is used.
2020-05-05 15:37:42 +03:00
Gleb Natapov
0c2db6f42d lwt: linearise unmet condition operations
Currently the following scenario may happen:

Consider 3 nodes A, B and C and a LWT failed write operation that
managed to get V accepted on A. Next operation may be conditioned on a
value been V, but it may access nodes B and C first and fail. Retrying
the same operation without any writes in the middle may now access A
and B and succeed since it will notice V and will complete previous
transaction. Having to different outcome for the same operation without
any writes in the middle breaks linearisability.

This happens because when condition is unmet we abandon the paxos round,
so this patch makes us complete it with empty value. Now if first
conditional write after failure access B and C it will write accepted
ballot there with the value greater than one of V and V will no longer be
replayed ever.
2020-05-05 12:38:31 +03:00
Gleb Natapov
0fed86e4c6 lwt: change cas_request::apply signature
Change the way query result is passed from getting a reference to a
result to getting a foreign_ptr<lw_shared_ptr<query::result>>. This will
allow cas_request to keep it without copying.
2020-05-05 12:38:23 +03:00
Gleb Natapov
fbb04698d0 lwt: pass paxos::proposal as a shared pointer everywhere
paxos::proposal reference is passed into a lot of functions and sometimes
it has to be copied to prolong its lifetime. Create it as a shared
pointer and pass it everywhere to avoid those copies.
2020-04-22 13:51:43 +03:00
Konstantin Osipov
18b9bb57ac lwt: rename metrics to match accepted terminology
Rename inherited metrics cas_propose and cas_commit
to cas_accept and cas_learn respectively.

A while ago we made a decision to stick to widely accepted
terms for Paxos rounds: prepare, accept, learn. The rest
of the code is using these terms, so rename the metrics
to avoid confusion/technical debt.

While at it, rename a few internal methods and functions.

Fixes #6169

Message-Id: <20200414213537.129547-1-kostja@scylladb.com>
2020-04-15 12:20:30 +02:00
Gleb Natapov
8a408ac5a8 lwt: remove entries from system.paxos table after successful learn stage
The learning stage of PAXOS protocol leaves behind an entry in
system.paxos table with the last learned value (which can be large). In
case not all participants learned it successfully next round on the same
key may complete the learning using this info. But if all nodes learned
the value the entry does not serve useful purpose any longer.

The patch adds another round, "prune", which is executed in background
(limited to 1000 simultaneous instances) and removes the entry in
case all nodes replied successfully to the "learn" round.  It uses the
ballot's timestamp to do the deletion, so not to interfere with the
next round. Since deletion happens very close to previous writes it will
likely happen in memtable and will never reach sstable, so that reduces
memtable flush and compaction overhead.

Fixes #5779

Message-Id: <20200330154853.GA31074@scylladb.com>
2020-03-30 21:02:14 +03:00
Piotr Dulikowski
ef1c62aa04 storage_proxy: track CDC operations in standard flow
Register cdc operation result tracker for write response handlers
coming from the usual write requests.
2020-03-23 14:05:25 +01:00
Piotr Dulikowski
cccc33f0fd storage_proxy: add cdc tracker hooks to write response handlers
Adds a field to abstract_write_response_handler that points to the cdc
operation result tracker, and a function for registering the tracker in
the handlers that currently write to a CDC log table.
2020-03-23 14:05:25 +01:00
Piotr Dulikowski
e7062de02b cdc: register metric counters
This patch defines a CDC metrics object and registers all of its
counters.

storage_proxy is chosen as the owner of the metrics object. Because in
subsequent commits it will become possible for CDC metrics to be updated
after a write operation ends, and because the cdc_service has shorter
lifetime than storage_proxy, we could risk a use-after-free if we placed
this object inside cdc_service.
2020-03-23 14:05:25 +01:00
Piotr Dulikowski
41d82e39ea storage proxy: rename mutate_hint_from_scratch
Changes the name of storage_proxy::mutate_hint_from_scratch function to
another name, whose meaning is more clear: send_hint_to_all_replicas.

Tests: unit(dev)
2020-02-24 17:30:22 +02:00
Avi Kivity
6c7aa18238 Merge "Introduce schema::get_partitioner" from Piotr
"
Introduce schema::get_partitioner and use it instead of dht::global_partitioner.

Fixes #5493

Tests: unit(dev, release, debug)
"

* 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits)
  cdc: stop using partitioners
  partitioner_test: stop calling set_global_partitioner
  storage_service: stop calling global_partitioner()
  mutation_writer_test: stop calling global_partitioner()
  schema: reduce number of global_partitioner() calls
  test_services: stop calling global_partitioner()
  sstable_utils: stop calling global_partitioner()
  sstable_resharding_test: stop depending on global partitioner
  sstable_mutation_test: stop calling global_partitioner()
  sstable_data_file_test: stop calling global_partitioner()
  random_schema: stop taking partitioner in constructor
  mutation_reader_test: stop calling global_partitioner()
  multishard_mutation_query_test: stop calling global_partitioner()
  row_level repair: stop calling global_partitioner()
  distribute_reader_and_consume_on_shards: don't take partitioner
  thrift: reduce global_partitioner() calls
  binary_search: stop calling global_partitioner()
  index_entry: stop calling global_partitioner()
  mc writer: stop calling global_partitioner()
  sstable: stop calling global_partitioner()
  ...
2020-02-17 18:12:53 +02:00
Piotr Dulikowski
01084a79b8 hh: send orphaned hints on HINT_MUTATION verb
When replaying a hint with a destination node that is no longer in the
cluster, it will be sent with cl=ALL to all its new replicas. Before
this patch, the MUTATION verb was used, which causes such hints to be
handled on the same connection and with the same priority as regular
writes. This can cause problems when a large number of hints is
orphaned and they are scheduled to be sent at once. Such situation
may happen when replacing a dead node - all nodes that accumulated hints
for the dead node will now send them with cl=ALL to their new replicas.

This patch changes the verb used to send such hints to HINT_MUTATION.
This verb is handled on a separate connection and with streaming
scheduling group, which gives them similar priority to non-orphaned
hints.

Refs: #4712

Tests: unit(dev)
2020-02-17 14:45:22 +01:00
Piotr Jastrzebski
abd76e566f dht::shard_of: stop calling global_partitioner()
Take const schema& as a parameter of shard_of and
use it to obtain partitioner instead of calling
global_partitioner().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:23:16 +01:00
Gleb Natapov
7694f164c4 lwt: add more tracing to paxos stages
Message-Id: <20200211160653.30317-1-gleb@scylladb.com>
2020-02-16 11:22:30 +02:00
Pavel Emelyanov
fecea1de7e proxy: Use own token_metadata
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-10 20:54:32 +03:00
Avi Kivity
bed61b96a2 Merge "Move features from storage- into feature-service" from Pavel
"
There's a lot of code around that needs storage service purely to
get the specific feature value (cluster_supports_<something> calls).
This creates several circular dependencies, e.g. storage_service <->
migration_manager one and database <-> storage_servuce. Also features
sit on storage_service, but register themselfs on the feature_service
and the former subscribes on them back which also looks strange.

I propose to keep all the features on feature_service, this keeps the
latter intependent from other components, makes it possible to break
one of the mentioned circle dependencyand heavily relax the other.

Also the set helps us fighting the globals and, after it, the
feature_service can be safely stopped at the very last moment.

Tests: unit(dev), manual debug build start-stop
"

* 'br-features-to-service-5' of https://github.com/xemul/scylla:
  gossiper: Avoid string merge-split for nothing
  features: Stop on shutdown
  storage_service: Remove helpers
  storage_service: Prepare to switch from on-board feature helpers
  cql3: Check feature in .validate
  database: Use feature service
  storage_proxy: Use feature service
  migration_manager: Use feature service
  start: Pass needed feature as argument into migrate_truncation_records
  features: Unfriend storage_service
  features: Simplify feature registration
  features: Introduce known_feature_set
  features: Move disabled features set from storage_service
  features: Move schema_features helper
  features: Move all features from storage_service to feature_service
  storage_service: Use feature_config from _feature_service
  features: Add feature_config
  storage_service: Kill set_disabled_features
  gms: Move features stuff into own .cc file
  migration_manager: Move some fns into class
2020-02-09 19:22:07 +02:00
Nadav Har'El
9fd9ec14c2 storage_proxy: make it into a peering sharded service
We consider globals like service::get_storage_proxy() a bad idea,
and would like to reduce their use as much as possible - and eventually,
eliminate it completely.

One easy case to fix case is when we already have a shard-local proxy,
but now we need the sharded object, to invoke_on() something on it.

In this patch, we turn storage_proxy into a peering_sharded_service.
This means that if you already have a storage_proxy, you can call
its container() function to get the sharded<storage_proxy>, without
needing to call the global service::get_storage_proxy().

We found a few such cases in storage_proxy itself, and in Alternator,
and fixed them to use container() instead of the global function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-02-05 21:14:18 +02:00
Pavel Emelyanov
12c1378be0 storage_proxy: Use feature service
Keep reference on local feature service from storage_proxy
and use it in places that have (local) storage_proxy at hands.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-03 15:16:23 +03:00
Eliran Sinvani
971711a546 storage proxy: migrate to per scheduling group statistics
This commit builds on top of the introduced per scheduling group
statistics template and employs it for achieving a per scheduling
group statistics in storage_proxy.

Some of the statistics also had meaning as a global - per
shard one. Those are the ones for determining if to
throttle the write request. This was handled by creating a
global stats struct that will hold those stats and by changing
the stat update to also include the global one.

One point that complicated it is an already existing aggregation
over the per shard stats that now became a per scheduling group
per shard stats, converting the aggregation to a two-dimensional
aggregation.

One thing this commit doesn't handle is validating that an individual
statistic didn't "cross a scheduling group boundary", such validation
is possible but it can easily be added in the future. There is a
subtlety to doing so since if the operation did cross to other
scheduling group two connected statistics can lose balance
for example written bytes and completed write transactions.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2020-01-30 15:01:44 +01:00
Gleb Natapov
0d0c05a569 lwt: allow only one paxos instance to run for each key simultaneously
This will prevent contention in case of parallel updates of the same row
by the same coordinator. The patch does it by introducing a new per key
lock map and taking it before running PAXOS protocol (either for write
of for read).

Message-Id: <20200117101228.GA14816@scylladb.com>
2020-01-28 12:39:23 +02:00