Commit Graph

33638 Commits

Author SHA1 Message Date
Kamil Braun
bdeef77f20 service/raft: ping raft::server_ids, not gms::inet_addresses
Whenever a Raft configuration change is performed, `raft::server` calls
`raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc`
implementation has a function, `_on_server_update`, passed in the
constructor, which it called in `add_server`/`remove_server`;
that function would update the set of endpoints detected by the
direct failure detector. `_on_server_update` was passed an IP address
and that address was added to / removed from the failure detector set
(there's another translation layer between the IP addresses and internal
failure detector 'endpoint ID's; but we can ignore it for the purposes
of this commit).

Therefore: the failure detector was pinging a certain set of IP
addresses. These IP addresses were updated during Raft configuration
changes.

To implement the `is_alive(raft::server_id)` function (required by
`raft::failure_detector` interface), we would translate the ID using
the Raft address map, which is currently also updated during
configuration changes, to an IP address, and check if that IP address is
alive according to the direct failure detector (which maintained an
`_alive_set` of type `unordered_set<gms::inet_address>`).

This all works well but it assumes that servers can be identified using
IP addresses - it doesn't play well with the fact that servers may
change their IP addresses. The only immutable identifier we have for a
server is `raft::server_id`. In the future, Raft configurations will not
associate IP addresses with Raft servers; instead we will assume that IP
addresses can change at any time, and there will be a different
mechanism that eventually updates the Raft address map with the latest
IP address for each `raft::server_id`.

To prepare us for that future, in this commit we no longer operate in
terms of IP addresses in the failure detector, but in terms of
`raft::server_id`s. Most of the commit is boilerplate, changing
`gms::inet_address` to `raft::server_id` and function/variable names.
The interesting changes are:
- in `is_alive`, we no longer need to translate the `raft::server_id` to
  an IP address, because now the stored `_alive_set` already contains
  `raft::server_id`s instead of `gms::inet_address`es.
- the `ping` function now takes a `raft::server_id` instead of
  `gms::inet_address`. To send the ping message, we need to translate
  this to IP address; we do it by the `raft_address_map` pointer
  introduced in an earlier commit.

Thus, there is still a point where we have to translate between
`raft::server_id` and `gms::inet_address`; but observe we now do it at
the last possible moment - just before sending the message. If we
have no translation, we consider the `ping` to have failed - it's
equivalent to a network failure where no route to a given address was
found.
2022-11-04 09:38:08 +01:00
Kamil Braun
ac70a05c7e service/raft: store raft_address_map reference in direct_fd_pinger
The pinger will use the map to translate `raft::server_id`s to
`gms::inet_address`es when pinging.
2022-11-04 09:38:08 +01:00
Kamil Braun
2c20f2ab9d gms: gossiper: move direct_fd_pinger out to a separate service
In later commit `direct_fd_pinger` will operate in terms of
`raft::server_id`s. Decouple it from `gossiper` since we don't want to
entangle `gossiper` with Raft-specific stuff.
2022-11-04 09:38:08 +01:00
Kamil Braun
e9a4263e14 gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class
`gms::gossiper::direct_fd_pinger` serves multiple purposes: one of them
is to maintain a mapping between `gms::inet_address`es and
`direct_failure_detector::pinger::endpoint_id`s, another is to cache the
last known gossiper's generation number to use it for sending gossip
echo messages. The latter is the only gossiper-specific thing in this
class.

We want to move `direct_fd_pinger` utside `gossiper`. To do that, split the
gossiper-specific thing -- the generation number management -- to a
smaller class, `echo_pinger`.

`echo_pinger` is a top-level class (not a nested one like
`direct_fd_pinger` was) so we can forward-declare it and pass references
to it without including gms/gossiper.hh header.
2022-11-04 09:38:08 +01:00
Pavel Emelyanov
efbfcdb97e Merge 'Replicate raft_address_map non-expiring entries to other shards' from Kamil Braun
Replicating `raft_address_map` entries is needed for the following use
cases:
- the direct failure detector - currently it assumes a static mapping of
  `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft
  group 0 configuration changes. To handle dynamic mappings we need to
  modify the failure detector so it pings `raft::server_id`s and obtains
  the `gms::inet_address` before sending the message from
  `raft_address_map`. The failure detector is sharded, so we need the
  mappings to be available on all shards.
- in the future we'll have multiple Raft groups running on different
  shards. To send messages they'll need `raft_address_map`.

Initially I tried to replicate all entries - expiring and non-expiring.
The implementation turned out to be very complex - we need to handle
dropping expired entries and refreshing expiring entries' timestamps
across shards, and doing this correctly while accounting for possible
races is quite problematic.

Eventually I arrived at the conclusion that replicating only
non-expiring entries, and furthermore allowing non-expiring entries to
be added only on shard 0, is good enough for our use cases:
- The direct failure detector is pinging group 0 members only; group
  0 members correspond exactly to the non-expiring entries.
- Group 0 configuration changes are handled on shard 0, so non-expiring
  entries are added/removed on shard 0.
- When we have multiple Raft groups, we can reuse a single Raft server
  ID for all Raft servers running on a single node belonging to
  different groups; they are 'namespaced' by the group IDs. Furthermore,
  every node has a server that belongs to group 0. Thus for every Raft
  server in every group, it has a corresponding server in group 0 with
  the same ID, which has a non-expiring entry in `raft_address_map`,
  which is replicated to all shards; so every group will be able to
  deliver its messages.

With these assumptions the implementation is short and simple.
We can always complicate it in the future if we find that the
assumptions are too strong.

Closes #11791

* github.com:scylladb/scylladb:
  test/raft: raft_address_map_test: add replication test
  service/raft: raft_address_map: replicate non-expiring entries to other shards
  service/raft: raft_address_map: assert when entry is missing in drop_expired_entries
  service/raft: turn raft_address_map into a service
2022-11-03 18:34:42 +03:00
Avi Kivity
ca2010144e test: loading_cache_test: fix use-after-free in test_loading_cache_remove_leaves_no_old_entries_behind
We capture `key` by reference, but it is in a another continuation.

Capture it by value, and avoid the default capture specification.

Found by clang 15 + asan + aarch64.

Closes #11884
2022-11-03 17:23:40 +02:00
Avi Kivity
0c3967cf5e Merge 'scylla-gdb.py: improve scylla-fiber' from Botond Dénes
The main theme of this patchset is improving `scylla-fiber`, with some assorted unrelated improvement tagging along.
In lieu of explicit support for mapping up continuation chains in memory from seastar (there is one but it uses function calls), scylla fiber uses a quite crude method to do this: it scans task objects for outbound references to other task objects to find waiters tasks and scans inbound references from other tasks to find waited-on tasks. This works well for most objects, but there are some problematic ones:
* `seastar::thread_context`: the waited-on task (`seastar::(anonymous namespace)::thread_wake_task`) is allocated on the thread's stack which is not in the object itself. Scylla fiber now scans the stack bottom-up to find this task.
* `seastar::smp_message_queue::async_work_item`: the waited on task lives on another shard. Scylla fiber now digs out the remote shard from the work item and continues the search on the remote shard.
* `seastar::when_all_state`: the waited on task is a member in the same object tripping loop detection and terminating the search. Seastar fiber now uses the `_continuation` member explicitely to look for the next links.

Other minor improvements were also done, like including the shard of the task in the printout.
Example demonstrating all the new additions:
```
(gdb) scylla fiber 0x000060002d650200
Stopping because loop is detected: task 0x000061c00385fb60 was seen before.
[shard 28] #-13 (task*) 0x000061c00385fba0 0x00000000003b5b00 vtable for seastar::internal::when_all_state_component<seastar::future<void> > + 16
[shard 28] #-12 (task*) 0x000061c00385fb60 0x0000000000417010 vtable for seastar::internal::when_all_state<seastar::internal::identity_futures_tuple<seastar::future<void>, seastar::future<void> >, seastar::future<void>, seastar::future<void> > + 16
[shard 28] #-11 (task*) 0x000061c009f16420 0x0000000000419830 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureISt5tupleIJNS4_IvEES6_EEE14discard_resultEvEUlDpOT_E_ZNS8_14then_impl_nrvoISC_S6_EET0_OT_EUlOS3_RSC_ONS_12future_stateIS7_EEE_S7_EE + 16
[shard 28] #-10 (task*) 0x000061c0098e9e00 0x0000000000447440 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>::run_and_dispose()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> >(seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-9 (task*) 0x000060000858dcd0 0x0000000000449d68 vtable for seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> + 16
[shard  0] #-8 (task*) 0x0000600050c39f60 0x00000000007abe98 vtable for seastar::parallel_for_each_state + 16
[shard  0] #-7 (task*) 0x000060000a59c1c0 0x0000000000449f60 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::sharded<cql_transport::cql_server>::stop()::{lambda(seastar::future<void>)#2}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#2}>({lambda(seastar::future<void>)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-6 (task*) 0x000060000a59c400 0x0000000000449ea0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, cql_transport::controller::do_stop_server()::{lambda(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&)#1}::operator()(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&) const::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda()#1}, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda()#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-5 (task*) 0x0000600009d86cc0 0x0000000000449c00 vtable for seastar::internal::do_with_state<std::tuple<std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > > >, seastar::future<void> > + 16
[shard  0] #-4 (task*) 0x00006000019ffe20 0x00000000007ab368 vtable for seastar::(anonymous namespace)::thread_wake_task + 16
[shard  0] #-3 (task*) 0x00006000085ad080 0x0000000000809e18 vtable for seastar::thread_context + 16
[shard  0] #-2 (task*) 0x0000600009c04100 0x00000000006067f8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS6_E_clES7_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE0_ZNS_6futureIvE14then_impl_nrvoIST_SV_EET0_SQ_EUlOS3_RST_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #-1 (task*) 0x000060000a59c080 0x0000000000606ae8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS9_E_clESA_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSF_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSJ_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SX_EENSD_ISG_E4typeEOT0_EUlOS3_RSX_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #0  (task*) 0x000060002d650200 0x0000000000606378 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<service::storage_service::run_with_api_lock<service::storage_service::drain()::{lambda(service::storage_service&)#1}>(seastar::basic_sstring<char, unsigned int, 15u, true>, service::storage_service::drain()::{lambda(service::storage_service&)#1}&&)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&)::{lambda()#1}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(service::storage_service&)#1}>({lambda(service::storage_service&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(service::storage_service&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #1  (task*) 0x000060000bc40540 0x0000000000606d48 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_3smp9submit_toIZNS_7shardedIN7service15storage_serviceEE9invoke_onIZNSB_17run_with_api_lockIZNSB_5drainEvEUlRSB_E_EEDaNS_13basic_sstringIcjLj15ELb1EEEOT_EUlSF_E_JES5_EET1_jNS_21smp_submit_to_optionsESK_DpOT0_EUlvE_EENS_8futurizeINSt9result_ofIFSJ_vEE4typeEE4typeEjSN_SK_EUlvE_Lb0EEEZNS5_17then_wrapped_nrvoIS5_S10_EENSS_ISJ_E4typeEOT0_EUlOS3_RS10_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #2  (task*) 0x000060000332afc0 0x00000000006cb1c8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<seastar::json::json_return_type>, api::set_storage_service(api::http_context&, seastar::httpd::routes&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >) const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}, {lambda()#1}<seastar::json::json_return_type> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::json::json_return_type>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #3  (task*) 0x000060000a1af700 0x0000000000812208 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::function_handler(std::function<seastar::future<seastar::json::json_return_type> (std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)> const&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >) const::{lambda(seastar::json::json_return_type&&)#1}, seastar::future<seastar::json::json_return_type>::then_impl_nrvo<seastar::json::json_return_type&&, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > >(seastar::json::json_return_type&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, seastar::json::json_return_type&, seastar::future_state<seastar::json::json_return_type>&&)#1}, seastar::json::json_return_type> + 16
[shard  0] #4  (task*) 0x0000600009d86440 0x0000000000812228 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::handle(seastar::basic_sstring<char, unsigned int, 15u, true> const&, std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future>({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #5  (task*) 0x0000600009dba0c0 0x0000000000812f48 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::handle_exception<std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&>(std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&)::{lambda(auto:1&&)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_wrapped_nrvo<seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, {lambda(auto:1&&)#1}>({lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(auto:1&&)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #6  (task*) 0x0000600026783ae0 0x00000000008118b0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<bool>, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}<bool> >({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<bool>&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #7  (task*) 0x000060000a4089c0 0x0000000000811790 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read_one()::{lambda()#1}::operator()()::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(std::default_delete<std::unique_ptr>)#1}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(bool)#2}, seastar::future<bool>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}, {lambda(std::default_delete<std::unique_ptr>)#1}<void> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&, seastar::future_state<bool>&&)#1}, bool> + 16
[shard  0] #8  (task*) 0x000060000a5b16e0 0x0000000000811430 vtable for seastar::internal::do_until_state<seastar::httpd::connection::read()::{lambda()#1}, seastar::httpd::connection::read()::{lambda()#2}> + 16
[shard  0] #9  (task*) 0x000060000aec1080 0x00000000008116d0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read()::{lambda(seastar::future<void>)#3}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#3}>({lambda(seastar::future<void>)#3}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#3}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #10 (task*) 0x000060000b7d2900 0x0000000000811950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::httpd::connection::read()::{lambda()#4}, true>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::httpd::connection::read()::{lambda()#4}>(seastar::httpd::connection::read()::{lambda()#4}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::httpd::connection::read()::{lambda()#4}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16

Found no further pointers to task objects.
If you think there should be more, run `scylla fiber 0x000060002d650200 --verbose` to learn more.
Note that continuation across user-created seastar::promise<> objects are not detected by scylla-fiber.
```

Closes #11822

* github.com:scylladb/scylladb:
  scylla-gdb.py: collection_element: add support for boost::intrusive::list
  scylla-gdb.py: optional_printer: eliminate infinite loop
  scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects
  scylla-gdb.py: scylla-fiber: reject self-references when probing pointers
  scylla-gdb.py: scylla-fiber: add starting task to known tasks
  scylla-gdb.py: scylla-fiber: add support for walking over when_all
  scylla-gdb.py: add when_all_state to task type whitelist
  scylla-gdb.py: scylla-fiber: also print shard of tasks
  scylla-gdb.py: scylla-fiber: unify task printing
  scylla-gdb.py: scylla fiber: add support for walking over shards
  scylla-gdb.py: scylla fiber: add support for walking over seastar threads
  scylla-gdb.py: scylla-ptr: keep current thread context
  scylla-gdb.py: improve scylla column_families
  scylla-gdb.py: scylla_sstables.filename(): fix generation formatting
  scylla-gdb.py: improve schema_ptr
  scylla-gdb.py: scylla memory: restore compatibility with <= 5.1
2022-11-03 13:52:31 +02:00
Kamil Braun
2049962e11 Fix version numbers in upgrade page title
Closes #11878
2022-11-03 10:06:25 +02:00
Takuya ASADA
45789004a3 install-dependencies.sh: update node_exporter to 1.4.0
To fix CVE-2022-24675, we need to a binary compiled in <= golang 1.18.1.
Only released version which compiled <= golang 1.18.1 is node_exporter
1.4.0, so we need to update to it.

See scylladb/scylla-enterprise#2317

Closes #11400

[avi: regenerated frozen toolchain]

Closes #11879
2022-11-03 10:15:22 +04:00
Yaron Kaikov
20110bdab4 configure.py: remove un-used tar files creation
Starting from https://github.com/scylladb/scylla-pkg/pull/3035 we
removed all old tar.gz prefix from uploading to S3 or been used by
downstream jobs.

Hence, there is no point building those tar.gz files anymore

Closes #11865
2022-11-02 17:44:09 +02:00
Anna Stuchlik
d1f7cc99bc doc: fix the external links to the ScyllaDB University lesson about TTL
Closes #11876
2022-11-02 15:05:43 +02:00
Nadav Har'El
59fa8fe903 Merge 'doc: add the information about AArch64 support to Requirements' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-doc-issues/issues/864

This PR:
- updates the introduction to add information about AArch64 and rewrite the content.
- replaces "Scylla" with "ScyllaDB".

Closes #11778

* github.com:scylladb/scylladb:
  Update docs/getting-started/system-requirements.rst
  doc: fix the link to the OS Support page
  doc: replace Scylla with ScyllaDB
  doc: update the info about supported architecture and rewrite the introduction
2022-11-02 11:18:20 +02:00
Anna Stuchlik
ea799ad8fd Update docs/getting-started/system-requirements.rst
Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>
2022-11-02 09:56:56 +01:00
guy9
097a65df9f adding top banner to the Docs website with a link to the ScyllaDB University fall LIVE event
Closes #11873
2022-11-02 10:20:40 +02:00
Nadav Har'El
b9d88a3601 cql/pytest: add reproducer for timestamp column validation issue
This patch adds a reproducing test for issue #11588, which is still open
so the test is expected to fail on Scylla ("xfail), and passes on Cassandra.

The test shows that Scylla allows an out-of-range value to be written to
timestamp column, but then it can't be read back.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11864
2022-11-01 08:11:01 +02:00
Botond Dénes
dc46bfa783 Merge 'Prepare repair for task manager integration' from Aleksandra Martyniuk
The PR prepares repair for task manager integration:
- Creates repair_module
- Keeps repair_module in repair_service
- Moves tracker methods to repair_module
- Changes UUID to task_id in repair module

Closes #11851

* github.com:scylladb/scylladb:
  repair: check shutdown with abort source in repair module
  repair: use generic module gate for repair module operations
  repair: move tracker to repair module
  repair: move next_repair_command to repair_module
  repair: generate repair id in repair module
  repair: keep shard number in repair_uniq_id
  repair: change UUID to task_id
  repair: add task_manager::module to repair_service
  repair: create repair module and task
2022-11-01 08:05:14 +02:00
Aleksandra Martyniuk
f2fe586f03 repair: check shutdown with abort source in repair module
In repair module the shutdown can be checked using abort_source.
Thus, we can get rid of shutdown flag.
2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk
2d878cc9b5 repair: use generic module gate for repair module operations
Repair module uses a gate to prevent starting new tasks on shutdown.
Generic module's gate serves the same purpose, thus we can
use it also in repair specific context.
2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk
4aae7e9026 repair: move tracker to repair module
Since both tracker and repair_module serve similar purpose,
it is confusing where we should seek for methods connected to them.
Thus, to make it more transparent, tracker class is deleted and all
its attributes and methods are moved to repair_module.
2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk
a5c05dcb60 repair: move next_repair_command to repair_module
Number of the repair operation was counted both with
next_repair_command from tracer and sequence number
from task_manager::module.

To get rid of redundancy next_repair_command was deleted and all
methods using its value were moved to repair_module.
2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk
c81260fb8b repair: generate repair id in repair module
repair_uniq_id for repair task can be generated in repair module
and accessed from the task.
2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk
6432a26ccf repair: keep shard number in repair_uniq_id
Execution shard is one of the traits specific to repair tasks.
Child task should freely access shard id of its parent. Thus,
the shard number is kept in a repair_uniq_id struct.
2022-10-31 10:41:17 +01:00
guy9
276ec377c0 removed broken roadmap link
Closes #11854
2022-10-31 11:33:03 +02:00
Aleksandra Martyniuk
e2c7c1495d repair: change UUID to task_id
Change type of repair id from utils::UUID to task_id to distinguish
them from ids of other entities.
2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk
dc80af33bc repair: add task_manager::module to repair_service
repair_service keeps a shared pointer to repair_module.
2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk
576277384a repair: create repair module and task
Create repair_task_impl and repair_module inheriting from respectively
task manager task_impl and module to integrate repair operations with
task manager.
2022-10-31 10:04:48 +01:00
Takuya ASADA
159bc7c7ea install-dependencies.sh: use binary distributions of PIP package
We currently avoid compiling C code in "pip3 install scylla-driver", but
we actually providing portable binary distributions of the package,
so we should use it by "pip3 install --only-binary=:all: scylla-driver".
The binary distribution contains dependency libraries, so we won't have
problem loading it on relocatable python3.

Closes #11852
2022-10-31 10:38:36 +02:00
Kamil Braun
db6cc035ed test/raft: raft_address_map_test: add replication test 2022-10-31 09:17:12 +01:00
Kamil Braun
7d84007fd5 service/raft: raft_address_map: replicate non-expiring entries to other shards
Replicating `raft_address_map` entries is needed for the following use
cases:
- the direct failure detector - currently it assumes a static mapping of
  `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft
  group 0 configuration changes. To handle dynamic mappings we need to
  modify the failure detector so it pings `raft::server_id`s and obtains
  the `gms::inet_address` before sending the message from
  `raft_address_map`. The failure detector is sharded, so we need the
  mappings to be available on all shards.
- in the future we'll have multiple Raft groups running on different
  shards. To send messages they'll need `raft_address_map`.

Initially I tried to replicate all entries - expiring and non-expiring.
The implementation turned out to be very complex - we need to handle
dropping expired entries and refreshing expiring entries' timestamps
across shards, and doing this correctly while accounting for possible
races is quite problematic.

Eventually I arrived at the conclusion that replicating only
non-expiring entries, and furthermore allowing non-expiring entries to
be added only on shard 0, is good enough for our use cases:
- The direct failure detector is pinging group 0 members only; group
  0 members correspond exactly to the non-expiring entries.
- Group 0 configuration changes are handled on shard 0, so non-expiring
  entries are added/removed on shard 0.
- When we have multiple Raft groups, we can reuse a single Raft server
  ID for all Raft servers running on a single node belonging to
  different groups; they are 'namespaced' by the group IDs. Furthermore,
  every node has a server that belongs to group 0. Thus for every Raft
  server in every group, it has a corresponding server in group 0 with
  the same ID, which has a non-expiring entry in `raft_address_map`,
  which is replicated to all shards; so every group will be able to
  deliver its messages.

With these assumptions the implementation is short and simple.
We can always complicate it in the future if we find that the
assumptions are too strong.
2022-10-31 09:17:12 +01:00
Kamil Braun
acacbad465 service/raft: raft_address_map: assert when entry is missing in drop_expired_entries 2022-10-31 09:17:12 +01:00
Kamil Braun
159bb32309 service/raft: turn raft_address_map into a service 2022-10-31 09:17:10 +01:00
Botond Dénes
139fbb466e Merge 'Task manager extension' from Aleksandra Martyniuk
The PR adds changes to task manager that allow more convenient integration with modules.

Introduced changes:
- adds internal flag in task::impl that allows user to filter too specific tasks
- renames `parent_data` to more appropriate name `task_info`
- creates `tasks/types.hh` which allows using some types connected with task manager without the necessity to include whole task manager
- adds more flexible version of `make_task` method

Closes #11821

* github.com:scylladb/scylladb:
  tasks: add alternative make_task method
  tasks: rename parent_data to task_info and move it
  tasks: move task_id to tasks/types.hh
  tasks: add internal flag for task_manager::task::impl
2022-10-31 09:57:10 +02:00
Botond Dénes
2c021affd1 Merge 'storage_service, repair: use per-shard abort_source' from Benny Halevy
Prevent copying shared_ptr across shards
in do_sync_data_using_repair by allocating
a shared_ptr<abort_source> per shard in
node_ops_meta_data and respectively in node_ops_info.

Fixes #11826

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11827

* github.com:scylladb/scylladb:
  repair: use sharded abort_source to abort repair_info
  repair: node_ops_info: add start and stop methods
  storage_service: node_ops_abort_thread: abort all node ops on shutdown
  storage_service: node_ops_abort_thread: co_return only after printing log message
  storage_service: node_ops_meta_data: add start and stop methods
  repair: node_ops_info: prevent accidental copy
2022-10-31 09:43:34 +02:00
Botond Dénes
63a90cfb6c scylla-gdb.py: collection_element: add support for boost::intrusive::list 2022-10-31 08:18:20 +02:00
Botond Dénes
2fa1864174 scylla-gdb.py: optional_printer: eliminate infinite loop
Currently, to_string() recursively calls itself for engaged optionals.
Eliminate it. Also, use the std_optional wrapper instead of accessing
std::optional internals directly.
2022-10-31 08:18:20 +02:00
Botond Dénes
77b2555a04 scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects
Scylla fiber uses a crude method of scanning inbound and outbound
references to/from other task objects of recognized type. This method
cannot detect user instantiated promise<> objects. Add a note about this
to the printout, so users are beware of this.
2022-10-31 08:18:20 +02:00
Botond Dénes
2276565a2e scylla-gdb.py: scylla-fiber: reject self-references when probing pointers
A self-reference is never the pointer we are looking for when looking
for other tasks referencing us. Reject such references when scanning
outright.
2022-10-31 08:18:20 +02:00
Botond Dénes
f4365dd7f5 scylla-gdb.py: scylla-fiber: add starting task to known tasks
We collect already seen tasks in a set to be able to detect perceived
task loops and stop when one is seen. Initialize this set with the
starting task, so if it forms a loop, we won't repeat it in the trace
before cutting the loop.
2022-10-31 08:18:20 +02:00
Botond Dénes
48bbf2e467 scylla-gdb.py: scylla-fiber: add support for walking over when_all 2022-10-31 08:18:20 +02:00
Botond Dénes
cb8f02e24b scylla-gdb.py: add when_all_state to task type whitelist 2022-10-31 08:18:20 +02:00
Botond Dénes
62621abc44 scylla-gdb.py: scylla-fiber: also print shard of tasks
Now that scylla-fiber can cross shards, it is important to display the
shard each task in the chain lives on.
2022-10-31 08:18:19 +02:00
Botond Dénes
c21c80f711 scylla-gdb.py: scylla-fiber: unify task printing
Currently there is two loops and a separate line printing the starting
task, all duplicating the formatting logic. Define a method for it and
use it in all 3 places instead.
2022-10-31 08:18:19 +02:00
Botond Dénes
c103280bfd scylla-gdb.py: scylla fiber: add support for walking over shards
Shard boundaries can be crossed in one direction currently: when looking
for waiters on a task, but not in the other direction (looking for
waited-on tasks). This patch fixes that.
2022-10-31 08:18:19 +02:00
Botond Dénes
437f888ba0 scylla-gdb.py: scylla fiber: add support for walking over seastar threads
Currently seastar threads end any attempt to follow waited-on-futures.
Seastar threads need special handling because it allocates the wake up
task on its stack. This patch adds this special handling.
2022-10-31 08:18:19 +02:00
Botond Dénes
fcc63965ed scylla-gdb.py: scylla-ptr: keep current thread context
scylla_ptr.analyze() switches to the thread the analyzed object lives
on, but forgets to switch back. This was very annoying as any commands
using it (which is a bunch of them) were prone to suddenly and
unexpectedly switching threads.
This patch makes sure that the original thread context is switched back
to after analyzing the pointer.
2022-10-31 08:18:19 +02:00
Botond Dénes
91516c1d68 scylla-gdb.py: improve scylla column_families
Rename to scylla tables. Less typing and more up-to-date.
By default it now only lists tables from local shard. Added flag -a
which brings back old behaviour (lists on all shards).
Added -u (only list user tables) and -k (list tables of provided
keyspace only) filtering options.
2022-10-31 08:18:19 +02:00
Botond Dénes
1d3d613b76 scylla-gdb.py: scylla_sstables.filename(): fix generation formatting
Generation was recently converted from an integer to an object. Update
the filename formatting, while keeping backward compatibility.
2022-10-31 08:18:19 +02:00
Botond Dénes
c869f54742 scylla-gdb.py: improve schema_ptr
Add __getitem__(), so members can be accessed.
Strip " from ks_name and cf_name.
Add is_system().
2022-10-31 08:18:19 +02:00
Botond Dénes
66832af233 scylla-gdb.py: scylla memory: restore compatibility with <= 5.1
Recent reworks around dirty memory manager broke backward compatibility
of the scylla memory command (and possibly others). This patch restores
it.
2022-10-31 08:18:19 +02:00
Tenghuan He
e0948ba199 Add directory change instruction
Add directory change instruction while building scylla

Closes #11717
2022-10-30 23:53:02 +02:00