scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Kamil Braun	bdeef77f20	service/raft: ping `raft::server_id`s, not `gms::inet_address`es Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found.	2022-11-04 09:38:08 +01:00
Kamil Braun	ac70a05c7e	service/raft: store `raft_address_map` reference in `direct_fd_pinger` The pinger will use the map to translate `raft::server_id`s to `gms::inet_address`es when pinging.	2022-11-04 09:38:08 +01:00
Kamil Braun	2c20f2ab9d	gms: gossiper: move `direct_fd_pinger` out to a separate service In later commit `direct_fd_pinger` will operate in terms of `raft::server_id`s. Decouple it from `gossiper` since we don't want to entangle `gossiper` with Raft-specific stuff.	2022-11-04 09:38:08 +01:00
Kamil Braun	e9a4263e14	gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class `gms::gossiper::direct_fd_pinger` serves multiple purposes: one of them is to maintain a mapping between `gms::inet_address`es and `direct_failure_detector::pinger::endpoint_id`s, another is to cache the last known gossiper's generation number to use it for sending gossip echo messages. The latter is the only gossiper-specific thing in this class. We want to move `direct_fd_pinger` utside `gossiper`. To do that, split the gossiper-specific thing -- the generation number management -- to a smaller class, `echo_pinger`. `echo_pinger` is a top-level class (not a nested one like `direct_fd_pinger` was) so we can forward-declare it and pass references to it without including gms/gossiper.hh header.	2022-11-04 09:38:08 +01:00
Pavel Emelyanov	efbfcdb97e	Merge 'Replicate `raft_address_map` non-expiring entries to other shards' from Kamil Braun Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong. Closes #11791 * github.com:scylladb/scylladb: test/raft: raft_address_map_test: add replication test service/raft: raft_address_map: replicate non-expiring entries to other shards service/raft: raft_address_map: assert when entry is missing in drop_expired_entries service/raft: turn raft_address_map into a service	2022-11-03 18:34:42 +03:00
Avi Kivity	ca2010144e	test: loading_cache_test: fix use-after-free in test_loading_cache_remove_leaves_no_old_entries_behind We capture `key` by reference, but it is in a another continuation. Capture it by value, and avoid the default capture specification. Found by clang 15 + asan + aarch64. Closes #11884	2022-11-03 17:23:40 +02:00
Avi Kivity	0c3967cf5e	Merge 'scylla-gdb.py: improve scylla-fiber' from Botond Dénes The main theme of this patchset is improving `scylla-fiber`, with some assorted unrelated improvement tagging along. In lieu of explicit support for mapping up continuation chains in memory from seastar (there is one but it uses function calls), scylla fiber uses a quite crude method to do this: it scans task objects for outbound references to other task objects to find waiters tasks and scans inbound references from other tasks to find waited-on tasks. This works well for most objects, but there are some problematic ones: * `seastar::thread_context`: the waited-on task (`seastar::(anonymous namespace)::thread_wake_task`) is allocated on the thread's stack which is not in the object itself. Scylla fiber now scans the stack bottom-up to find this task. * `seastar::smp_message_queue::async_work_item`: the waited on task lives on another shard. Scylla fiber now digs out the remote shard from the work item and continues the search on the remote shard. * `seastar::when_all_state`: the waited on task is a member in the same object tripping loop detection and terminating the search. Seastar fiber now uses the `_continuation` member explicitely to look for the next links. Other minor improvements were also done, like including the shard of the task in the printout. Example demonstrating all the new additions: ``` (gdb) scylla fiber 0x000060002d650200 Stopping because loop is detected: task 0x000061c00385fb60 was seen before. [shard 28] #-13 (task) 0x000061c00385fba0 0x00000000003b5b00 vtable for seastar::internal::when_all_state_component<seastar::future<void> > + 16 [shard 28] #-12 (task) 0x000061c00385fb60 0x0000000000417010 vtable for seastar::internal::when_all_state<seastar::internal::identity_futures_tuple<seastar::future<void>, seastar::future<void> >, seastar::future<void>, seastar::future<void> > + 16 [shard 28] #-11 (task) 0x000061c009f16420 0x0000000000419830 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureISt5tupleIJNS4_IvEES6_EEE14discard_resultEvEUlDpOT_E_ZNS8_14then_impl_nrvoISC_S6_EET0_OT_EUlOS3_RSC_ONS_12future_stateIS7_EEE_S7_EE + 16 [shard 28] #-10 (task) 0x000061c0098e9e00 0x0000000000447440 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>::run_and_dispose()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> >(seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-9 (task) 0x000060000858dcd0 0x0000000000449d68 vtable for seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> + 16 [shard 0] #-8 (task) 0x0000600050c39f60 0x00000000007abe98 vtable for seastar::parallel_for_each_state + 16 [shard 0] #-7 (task) 0x000060000a59c1c0 0x0000000000449f60 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::sharded<cql_transport::cql_server>::stop()::{lambda(seastar::future<void>)#2}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#2}>({lambda(seastar::future<void>)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-6 (task) 0x000060000a59c400 0x0000000000449ea0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, cql_transport::controller::do_stop_server()::{lambda(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&)#1}::operator()(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&) const::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda()#1}, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda()#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-5 (task) 0x0000600009d86cc0 0x0000000000449c00 vtable for seastar::internal::do_with_state<std::tuple<std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > > >, seastar::future<void> > + 16 [shard 0] #-4 (task) 0x00006000019ffe20 0x00000000007ab368 vtable for seastar::(anonymous namespace)::thread_wake_task + 16 [shard 0] #-3 (task) 0x00006000085ad080 0x0000000000809e18 vtable for seastar::thread_context + 16 [shard 0] #-2 (task) 0x0000600009c04100 0x00000000006067f8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS6_E_clES7_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE0_ZNS_6futureIvE14then_impl_nrvoIST_SV_EET0_SQ_EUlOS3_RST_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #-1 (task) 0x000060000a59c080 0x0000000000606ae8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS9_E_clESA_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSF_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSJ_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SX_EENSD_ISG_E4typeEOT0_EUlOS3_RSX_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #0 (task) 0x000060002d650200 0x0000000000606378 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<service::storage_service::run_with_api_lock<service::storage_service::drain()::{lambda(service::storage_service&)#1}>(seastar::basic_sstring<char, unsigned int, 15u, true>, service::storage_service::drain()::{lambda(service::storage_service&)#1}&&)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&)::{lambda()#1}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(service::storage_service&)#1}>({lambda(service::storage_service&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(service::storage_service&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #1 (task) 0x000060000bc40540 0x0000000000606d48 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_3smp9submit_toIZNS_7shardedIN7service15storage_serviceEE9invoke_onIZNSB_17run_with_api_lockIZNSB_5drainEvEUlRSB_E_EEDaNS_13basic_sstringIcjLj15ELb1EEEOT_EUlSF_E_JES5_EET1_jNS_21smp_submit_to_optionsESK_DpOT0_EUlvE_EENS_8futurizeINSt9result_ofIFSJ_vEE4typeEE4typeEjSN_SK_EUlvE_Lb0EEEZNS5_17then_wrapped_nrvoIS5_S10_EENSS_ISJ_E4typeEOT0_EUlOS3_RS10_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #2 (task) 0x000060000332afc0 0x00000000006cb1c8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<seastar::json::json_return_type>, api::set_storage_service(api::http_context&, seastar::httpd::routes&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >) const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}, {lambda()#1}<seastar::json::json_return_type> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::json::json_return_type>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #3 (task) 0x000060000a1af700 0x0000000000812208 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::function_handler(std::function<seastar::future<seastar::json::json_return_type> (std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)> const&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >) const::{lambda(seastar::json::json_return_type&&)#1}, seastar::future<seastar::json::json_return_type>::then_impl_nrvo<seastar::json::json_return_type&&, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > >(seastar::json::json_return_type&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, seastar::json::json_return_type&, seastar::future_state<seastar::json::json_return_type>&&)#1}, seastar::json::json_return_type> + 16 [shard 0] #4 (task) 0x0000600009d86440 0x0000000000812228 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::handle(seastar::basic_sstring<char, unsigned int, 15u, true> const&, std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future>({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #5 (task) 0x0000600009dba0c0 0x0000000000812f48 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::handle_exception<std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&>(std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&)::{lambda(auto:1&&)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_wrapped_nrvo<seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, {lambda(auto:1&&)#1}>({lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(auto:1&&)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #6 (task) 0x0000600026783ae0 0x00000000008118b0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<bool>, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}<bool> >({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<bool>&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #7 (task) 0x000060000a4089c0 0x0000000000811790 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read_one()::{lambda()#1}::operator()()::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(std::default_delete<std::unique_ptr>)#1}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(bool)#2}, seastar::future<bool>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}, {lambda(std::default_delete<std::unique_ptr>)#1}<void> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&, seastar::future_state<bool>&&)#1}, bool> + 16 [shard 0] #8 (task) 0x000060000a5b16e0 0x0000000000811430 vtable for seastar::internal::do_until_state<seastar::httpd::connection::read()::{lambda()#1}, seastar::httpd::connection::read()::{lambda()#2}> + 16 [shard 0] #9 (task) 0x000060000aec1080 0x00000000008116d0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read()::{lambda(seastar::future<void>)#3}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#3}>({lambda(seastar::future<void>)#3}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#3}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #10 (task) 0x000060000b7d2900 0x0000000000811950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::httpd::connection::read()::{lambda()#4}, true>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::httpd::connection::read()::{lambda()#4}>(seastar::httpd::connection::read()::{lambda()#4}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::httpd::connection::read()::{lambda()#4}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 Found no further pointers to task objects. If you think there should be more, run `scylla fiber 0x000060002d650200 --verbose` to learn more. Note that continuation across user-created seastar::promise<> objects are not detected by scylla-fiber. ``` Closes #11822 * github.com:scylladb/scylladb: scylla-gdb.py: collection_element: add support for boost::intrusive::list scylla-gdb.py: optional_printer: eliminate infinite loop scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects scylla-gdb.py: scylla-fiber: reject self-references when probing pointers scylla-gdb.py: scylla-fiber: add starting task to known tasks scylla-gdb.py: scylla-fiber: add support for walking over when_all scylla-gdb.py: add when_all_state to task type whitelist scylla-gdb.py: scylla-fiber: also print shard of tasks scylla-gdb.py: scylla-fiber: unify task printing scylla-gdb.py: scylla fiber: add support for walking over shards scylla-gdb.py: scylla fiber: add support for walking over seastar threads scylla-gdb.py: scylla-ptr: keep current thread context scylla-gdb.py: improve scylla column_families scylla-gdb.py: scylla_sstables.filename(): fix generation formatting scylla-gdb.py: improve schema_ptr scylla-gdb.py: scylla memory: restore compatibility with <= 5.1	2022-11-03 13:52:31 +02:00
Kamil Braun	2049962e11	Fix version numbers in upgrade page title Closes #11878	2022-11-03 10:06:25 +02:00
Takuya ASADA	45789004a3	install-dependencies.sh: update node_exporter to 1.4.0 To fix CVE-2022-24675, we need to a binary compiled in <= golang 1.18.1. Only released version which compiled <= golang 1.18.1 is node_exporter 1.4.0, so we need to update to it. See scylladb/scylla-enterprise#2317 Closes #11400 [avi: regenerated frozen toolchain] Closes #11879	2022-11-03 10:15:22 +04:00
Yaron Kaikov	20110bdab4	configure.py: remove un-used tar files creation Starting from https://github.com/scylladb/scylla-pkg/pull/3035 we removed all old tar.gz prefix from uploading to S3 or been used by downstream jobs. Hence, there is no point building those tar.gz files anymore Closes #11865	2022-11-02 17:44:09 +02:00
Anna Stuchlik	d1f7cc99bc	doc: fix the external links to the ScyllaDB University lesson about TTL Closes #11876	2022-11-02 15:05:43 +02:00
Nadav Har'El	59fa8fe903	Merge 'doc: add the information about AArch64 support to Requirements' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/864 This PR: - updates the introduction to add information about AArch64 and rewrite the content. - replaces "Scylla" with "ScyllaDB". Closes #11778 * github.com:scylladb/scylladb: Update docs/getting-started/system-requirements.rst doc: fix the link to the OS Support page doc: replace Scylla with ScyllaDB doc: update the info about supported architecture and rewrite the introduction	2022-11-02 11:18:20 +02:00
Anna Stuchlik	ea799ad8fd	Update docs/getting-started/system-requirements.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2022-11-02 09:56:56 +01:00
guy9	097a65df9f	adding top banner to the Docs website with a link to the ScyllaDB University fall LIVE event Closes #11873	2022-11-02 10:20:40 +02:00
Nadav Har'El	b9d88a3601	cql/pytest: add reproducer for timestamp column validation issue This patch adds a reproducing test for issue #11588, which is still open so the test is expected to fail on Scylla ("xfail), and passes on Cassandra. The test shows that Scylla allows an out-of-range value to be written to timestamp column, but then it can't be read back. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11864	2022-11-01 08:11:01 +02:00
Botond Dénes	dc46bfa783	Merge 'Prepare repair for task manager integration' from Aleksandra Martyniuk The PR prepares repair for task manager integration: - Creates repair_module - Keeps repair_module in repair_service - Moves tracker methods to repair_module - Changes UUID to task_id in repair module Closes #11851 * github.com:scylladb/scylladb: repair: check shutdown with abort source in repair module repair: use generic module gate for repair module operations repair: move tracker to repair module repair: move next_repair_command to repair_module repair: generate repair id in repair module repair: keep shard number in repair_uniq_id repair: change UUID to task_id repair: add task_manager::module to repair_service repair: create repair module and task	2022-11-01 08:05:14 +02:00
Aleksandra Martyniuk	f2fe586f03	repair: check shutdown with abort source in repair module In repair module the shutdown can be checked using abort_source. Thus, we can get rid of shutdown flag.	2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk	2d878cc9b5	repair: use generic module gate for repair module operations Repair module uses a gate to prevent starting new tasks on shutdown. Generic module's gate serves the same purpose, thus we can use it also in repair specific context.	2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk	4aae7e9026	repair: move tracker to repair module Since both tracker and repair_module serve similar purpose, it is confusing where we should seek for methods connected to them. Thus, to make it more transparent, tracker class is deleted and all its attributes and methods are moved to repair_module.	2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk	a5c05dcb60	repair: move next_repair_command to repair_module Number of the repair operation was counted both with next_repair_command from tracer and sequence number from task_manager::module. To get rid of redundancy next_repair_command was deleted and all methods using its value were moved to repair_module.	2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk	c81260fb8b	repair: generate repair id in repair module repair_uniq_id for repair task can be generated in repair module and accessed from the task.	2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk	6432a26ccf	repair: keep shard number in repair_uniq_id Execution shard is one of the traits specific to repair tasks. Child task should freely access shard id of its parent. Thus, the shard number is kept in a repair_uniq_id struct.	2022-10-31 10:41:17 +01:00
guy9	276ec377c0	removed broken roadmap link Closes #11854	2022-10-31 11:33:03 +02:00
Aleksandra Martyniuk	e2c7c1495d	repair: change UUID to task_id Change type of repair id from utils::UUID to task_id to distinguish them from ids of other entities.	2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk	dc80af33bc	repair: add task_manager::module to repair_service repair_service keeps a shared pointer to repair_module.	2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk	576277384a	repair: create repair module and task Create repair_task_impl and repair_module inheriting from respectively task manager task_impl and module to integrate repair operations with task manager.	2022-10-31 10:04:48 +01:00
Takuya ASADA	159bc7c7ea	install-dependencies.sh: use binary distributions of PIP package We currently avoid compiling C code in "pip3 install scylla-driver", but we actually providing portable binary distributions of the package, so we should use it by "pip3 install --only-binary=:all: scylla-driver". The binary distribution contains dependency libraries, so we won't have problem loading it on relocatable python3. Closes #11852	2022-10-31 10:38:36 +02:00
Kamil Braun	db6cc035ed	test/raft: raft_address_map_test: add replication test	2022-10-31 09:17:12 +01:00
Kamil Braun	7d84007fd5	service/raft: raft_address_map: replicate non-expiring entries to other shards Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong.	2022-10-31 09:17:12 +01:00
Kamil Braun	acacbad465	service/raft: raft_address_map: assert when entry is missing in drop_expired_entries	2022-10-31 09:17:12 +01:00
Kamil Braun	159bb32309	service/raft: turn raft_address_map into a service	2022-10-31 09:17:10 +01:00
Botond Dénes	139fbb466e	Merge 'Task manager extension' from Aleksandra Martyniuk The PR adds changes to task manager that allow more convenient integration with modules. Introduced changes: - adds internal flag in task::impl that allows user to filter too specific tasks - renames `parent_data` to more appropriate name `task_info` - creates `tasks/types.hh` which allows using some types connected with task manager without the necessity to include whole task manager - adds more flexible version of `make_task` method Closes #11821 * github.com:scylladb/scylladb: tasks: add alternative make_task method tasks: rename parent_data to task_info and move it tasks: move task_id to tasks/types.hh tasks: add internal flag for task_manager::task::impl	2022-10-31 09:57:10 +02:00
Botond Dénes	2c021affd1	Merge 'storage_service, repair: use per-shard abort_source' from Benny Halevy Prevent copying shared_ptr across shards in do_sync_data_using_repair by allocating a shared_ptr<abort_source> per shard in node_ops_meta_data and respectively in node_ops_info. Fixes #11826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11827 * github.com:scylladb/scylladb: repair: use sharded abort_source to abort repair_info repair: node_ops_info: add start and stop methods storage_service: node_ops_abort_thread: abort all node ops on shutdown storage_service: node_ops_abort_thread: co_return only after printing log message storage_service: node_ops_meta_data: add start and stop methods repair: node_ops_info: prevent accidental copy	2022-10-31 09:43:34 +02:00
Botond Dénes	63a90cfb6c	scylla-gdb.py: collection_element: add support for boost::intrusive::list	2022-10-31 08:18:20 +02:00
Botond Dénes	2fa1864174	scylla-gdb.py: optional_printer: eliminate infinite loop Currently, to_string() recursively calls itself for engaged optionals. Eliminate it. Also, use the std_optional wrapper instead of accessing std::optional internals directly.	2022-10-31 08:18:20 +02:00
Botond Dénes	77b2555a04	scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects Scylla fiber uses a crude method of scanning inbound and outbound references to/from other task objects of recognized type. This method cannot detect user instantiated promise<> objects. Add a note about this to the printout, so users are beware of this.	2022-10-31 08:18:20 +02:00
Botond Dénes	2276565a2e	scylla-gdb.py: scylla-fiber: reject self-references when probing pointers A self-reference is never the pointer we are looking for when looking for other tasks referencing us. Reject such references when scanning outright.	2022-10-31 08:18:20 +02:00
Botond Dénes	f4365dd7f5	scylla-gdb.py: scylla-fiber: add starting task to known tasks We collect already seen tasks in a set to be able to detect perceived task loops and stop when one is seen. Initialize this set with the starting task, so if it forms a loop, we won't repeat it in the trace before cutting the loop.	2022-10-31 08:18:20 +02:00
Botond Dénes	48bbf2e467	scylla-gdb.py: scylla-fiber: add support for walking over when_all	2022-10-31 08:18:20 +02:00
Botond Dénes	cb8f02e24b	scylla-gdb.py: add when_all_state to task type whitelist	2022-10-31 08:18:20 +02:00
Botond Dénes	62621abc44	scylla-gdb.py: scylla-fiber: also print shard of tasks Now that scylla-fiber can cross shards, it is important to display the shard each task in the chain lives on.	2022-10-31 08:18:19 +02:00
Botond Dénes	c21c80f711	scylla-gdb.py: scylla-fiber: unify task printing Currently there is two loops and a separate line printing the starting task, all duplicating the formatting logic. Define a method for it and use it in all 3 places instead.	2022-10-31 08:18:19 +02:00
Botond Dénes	c103280bfd	scylla-gdb.py: scylla fiber: add support for walking over shards Shard boundaries can be crossed in one direction currently: when looking for waiters on a task, but not in the other direction (looking for waited-on tasks). This patch fixes that.	2022-10-31 08:18:19 +02:00
Botond Dénes	437f888ba0	scylla-gdb.py: scylla fiber: add support for walking over seastar threads Currently seastar threads end any attempt to follow waited-on-futures. Seastar threads need special handling because it allocates the wake up task on its stack. This patch adds this special handling.	2022-10-31 08:18:19 +02:00
Botond Dénes	fcc63965ed	scylla-gdb.py: scylla-ptr: keep current thread context scylla_ptr.analyze() switches to the thread the analyzed object lives on, but forgets to switch back. This was very annoying as any commands using it (which is a bunch of them) were prone to suddenly and unexpectedly switching threads. This patch makes sure that the original thread context is switched back to after analyzing the pointer.	2022-10-31 08:18:19 +02:00
Botond Dénes	91516c1d68	scylla-gdb.py: improve scylla column_families Rename to scylla tables. Less typing and more up-to-date. By default it now only lists tables from local shard. Added flag -a which brings back old behaviour (lists on all shards). Added -u (only list user tables) and -k (list tables of provided keyspace only) filtering options.	2022-10-31 08:18:19 +02:00
Botond Dénes	1d3d613b76	scylla-gdb.py: scylla_sstables.filename(): fix generation formatting Generation was recently converted from an integer to an object. Update the filename formatting, while keeping backward compatibility.	2022-10-31 08:18:19 +02:00
Botond Dénes	c869f54742	scylla-gdb.py: improve schema_ptr Add __getitem__(), so members can be accessed. Strip " from ks_name and cf_name. Add is_system().	2022-10-31 08:18:19 +02:00
Botond Dénes	66832af233	scylla-gdb.py: scylla memory: restore compatibility with <= 5.1 Recent reworks around dirty memory manager broke backward compatibility of the scylla memory command (and possibly others). This patch restores it.	2022-10-31 08:18:19 +02:00
Tenghuan He	e0948ba199	Add directory change instruction Add directory change instruction while building scylla Closes #11717	2022-10-30 23:53:02 +02:00

1 2 3 4 5 ...

33638 Commits