scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-28 10:41:12 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	677e80a4d5	table: Coroutinize table::delete_sstables_atomically() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18499	2024-05-07 17:10:28 +02:00
Kamil Braun	53443f566a	Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov It needs some local naming cleanup, but otherwise it's pretty simple Closes scylladb/scylladb#18510 * github.com:scylladb/scylladb: generic_server: Fix indentation after previous patch generic_server: Coroutinize listen() method generic_server: Rename creds argument to builder	2024-05-07 17:08:59 +02:00
Avi Kivity	9b8dfb2b19	compaction: compaction_strategy validation: don't rely on optional<> formatting std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18533	2024-05-07 12:02:33 +03:00
Kefu Chai	7e578ae964	message: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18527	2024-05-07 11:59:36 +03:00
Raphael S. Carvalho	570e3f8df0	compaction: exclude expired sstables from calculation of base timestamps base timestamps are feeded into the sstable writer for calculating delta, used by varints. given that expired ssts are bypassed, we don't have to account them. so if we compacting fully expired and new sstable together, we can save a bit by having a base ts closer to the data actually written into output. also I wanted to move the calculation into the loop in setup(), to avoid two iterations over input set that can have even more than 1k elements. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18504	2024-05-07 08:43:50 +03:00
Raphael S. Carvalho	2d9142250e	Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this will be handled by explicitly closing the connection, so if scylla fails to update gossiper state due to premature abort on shutdown, then we won't be stuck in an endless reconnection attempt (later through heartbeats (30s interval)), causing the test to timeout. Manifests in scylla logs like this: gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted) gossip - failure_detector_loop: Finished main loop migration_manager - stopping migration service storage_service - Shutting down native transport server gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) cql_server_controller - CQL server stopped ... gossip - My status = NORMAL gossip - Announcing shutdown gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924 gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924 gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested) Refs #14746. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18484	2024-05-07 02:31:02 +02:00
Piotr Dulikowski	5459cfed6a	Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables) Fixes https://github.com/scylladb/scylladb/issues/17737 Closes scylladb/scylladb#17939 * github.com:scylladb/scylladb: auth: don't run legacy migrations on auth-v2 startup auth: fix indent in password_authenticator::start auth: remove unused service::has_existing_legacy_users func	2024-05-06 19:53:35 +02:00
Wojciech Mitros	8472c46c8a	service_level_controller: coroutinize notify_service_level_removed To avoid conflicts arising from the discrepancy between different versions of the repository, use coroutines instead of continuations in service_level_controller::notify_service_level_removed(). Closes scylladb/scylladb#18525	2024-05-06 14:20:49 +03:00
Kamil Braun	ccbb9f5343	Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. https://github.com/scylladb/scylladb/issues/17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Additionally, improve `test_raft_snapshot_request` that would become flaky after the change so it's not sensitive to changes anymore. The raft-based topology was experimental before 6.0, no need to backport. Ref: scylladb/scylladb#17545 Closes scylladb/scylladb#18497 * github.com:scylladb/scylladb: topology_coordinator: clear obsolete generations earlier test: test_raft_snapshot_request: improve the last assertion test: test_raft_snapshot_request: find raft leader after restart test: test_raft_shanpshot_request: simplify appended_command	2024-05-06 12:03:33 +02:00
Kamil Braun	1a50a524e7	Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198 Closes scylladb/scylladb#18261 * github.com:scylladb/scylladb: test: topology: test that upgrade succeeds after recent removal topology_coordinator: compute cluster size correctly during upgrade	2024-05-06 11:06:09 +02:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak	628d7e709e	cdc: generation: fix retrieve_generation_data_v2 `system_keyspace::read_cdc_generation_opt` queries `system.cdc_generations_v3`, which stores ids of CDC generations as timeuuids. This function shouldn't be called with a normal uuid (used by `system.cdc_generations_v2` to store generation ids). Such a call would end with a marshaling error. Before this patch,`retrieve_generation_data_v2` could call `system_keyspace::read_cdc_generation_opt` with a normal uuid if the generation wasn't present in `system.cdc_generations_v2`. This logic caused a marshaling error while handling the `check_and_repair_cdc_streams` request in the `cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest. This patch fixes the code being added in 6.0, no need to backport it. Fixes scylladb/scylladb#18473 Closes scylladb/scylladb#18483	2024-05-06 09:12:47 +02:00
Kamil Braun	16846bf5ce	Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb With topology over raft all operation are already serialized by the coordinator anyway, so no need to synchronize removenode using api lock. All others are still synchronized since there cannot be executed in parallel for the same node anyway. * 'gleb/17681-fix' of github.com:scylladb/scylla-dev: storage_service: do not take API lock for removenode operation if topology coordinator is enabled test: return file mark from wait_for that points after the found string	2024-05-06 09:03:03 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Kefu Chai	5ca9a46a91	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18515	2024-05-05 23:31:48 +03:00
Kefu Chai	0b0e661a85	build: bring abseil submodule back because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689, the rebuilt abseil package provided by fedora has different settings than the ones if the tree is built with the sanitizer enabled. this inconsistency leads to a crash. to address this problem, we have to reinstate the abseil submodule, so we can built it with the same compiler options with which we build the tree. in this change * Revert "build: drop abseil submodule, replace with distribution abseil" * update CMake building system with abseil header include settings * bump up the abseil submodule to the latest LTS branch of abseil: lts_2024_01_16 * update scylla-gdb.py to adapt to the new structure of flat_hash_map This reverts commit `8635d24424`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18511	2024-05-05 23:31:09 +03:00
Kefu Chai	ea791919cf	service/storage_proxy: drop unused operator<< operator<<(ostream, paxos_response_handler) is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18520	2024-05-05 16:33:29 +03:00
Nadav Har'El	21557cfaa6	cql3: Fix invalid JSON parsing for JSON object with different key types More than three years ago, in issue #7949, we noticed that trying to set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the fromJson() function) fails - the ascii key is incorrectly parsed. We fixed that issue in commit `75109e9519` but unfortunately, did not do our due diligence: We did not write enough tests inspired by this bug, and failed to discover that actually we have the same bug for many other key types, not just for "ascii". Specifically, the following key types have exactly the same bug: * blob * date * inet * time * timestamp * timeuuid * uuid Other types, like numbers or boolean worked "by accident" - instead of parsing them as a normal string, we asked the JSON parser to parse them again after removing the quotes, and because unquoted numbers and unquoted true/false happwn to work in JSON, this didn't fail. The fix here is very simple - for all native types (i.e., not collections or tuples), the encoding of the key in JSON is simply a quoted string - and removing the quotes is all we need to do and there's no need to run the JSON parser a second time. Only for more elaborate types - collections and tuples - we need to run the JSON parser a second time on the key string to build the more elaborate object. This patch also includes tests for fromJson() reading a map with all native key types, confirming that all the aforementioned key types were broken before this patch, and all key types (including the numbers and booleans which worked even befoe this patch) work with this patch. Fixes #18477. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18482	2024-05-05 15:42:43 +03:00
Kefu Chai	f2b1c47dfc	test/boost: s/boost::range::random_shuffle/std::ranges::shuffle/ `boost::range::random_shuffle()` uses the deprecated `std::random_shuffle()` under the hood, so let's use `std::ranges::shuffle()` which is available since C++20. this change should address the warning like: ``` [312/753] CXX build/debug/test/boost/counter_test.o In file included from test/boost/counter_test.cc:17: /usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr ecated-declarations] 106 \| detail::random_shuffle(boost::begin(rng), boost::end(rng)); \| ^ test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here 507 \| boost::range::random_shuffle(shards); \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' has been explicitly marked deprecated here 4489 \| _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle") \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST' 1957 \| # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST' 1941 \| __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) \| ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18517	2024-05-05 15:39:57 +03:00
Pavel Emelyanov	99f9807f15	sstables: Remove operator<<(std::ostream&, const deletion_time&) It's completely unused, likely in favor of recently added formatter for the type in question. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18502	2024-05-05 14:43:27 +03:00
Pavel Emelyanov	ddd2623418	generic_server: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:29:08 +03:00
Pavel Emelyanov	a1daa7093e	generic_server: Coroutinize listen() method Straightforward. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:28:42 +03:00
Pavel Emelyanov	030f1ef81c	generic_server: Rename creds argument to builder So that it doesn't clash with local creds variable that will appear in this method after its coroutinization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:27:37 +03:00
Kefu Chai	53b98a8610	test: string_format_test: disable test if {fmt} >= 10.0.0 {fmt} v10.0.0 introduces formatter for `std::optional`, so there is no need to test it. furthermore the behavior of this formatter is different from our homebrew one. so let's skip this test if {fmt} v10.0.0 or up is used. Refs #18508 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18509	2024-05-03 11:34:23 +03:00
Kefu Chai	3421e6dcc1	tools/scylla-nodetool: add formatter for char* in {fmt} version 10.0.0, it has a regression, which dropped the formatter for `char `, even it does format `const char`, as the latter is convertible to `fmt::stirng_view`. and this issue was addressed in 10.1.0 using 616a4937, which adds the formatter for `Char ` back, where `Char` is a template parameter. but we do need to print `vector<char>`, so, to address the build failure with {fmt} version 10.0.0, which is shipped along with fedora 39. let's backport this formatter. Fixes #18503 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18505	2024-05-02 23:25:24 +03:00
Avi Kivity	8de81f8f91	Merge 'Unstall merge topology snapshot' from Benny Halevy This series adds facilities to gently convert canonical mutations back to mutations and to gently make canonical mutations or freeze mutations in a seastar thread. Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls due to large mutation, as seed in the test_add_many_nodes_under_load dtest. Also, migration_manager migration_request was converted to use a seastar thread to use the above facilities to prevent reactor stalls with large schema mutations, e,g, with a large number of tables, and/or when reading tablets mutations with a large number of tablets in a table. perf-simple-query --write results: Before: ``` median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) ``` After: ``` median 79716.73 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53314 insns/op, 0 errors) ``` Closes scylladb/scylladb#18290 * github.com:scylladb/scylladb: storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method raft: group0_state_machine: write_mutations_to_database: freeze mutations gently database: apply_in_memory: unfreeze_gently large mutations storage_service: get_system_mutations: make_canonical_mutation_gently tablets: read_tablet_mutations: make_canonical_mutation_gently schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference storage_service: merge_topology_snapshot: freeze_gently canonical_mutation: add make_canonical_mutation_gently frozen_mutation: move unfreeze_gently to async_utils mutation: add freeze_gently idl-compiler: generate async serialization functions for stub members raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently storage_service: merge_topology_snapshot: co_await to_mutation_gently canonical_mutation: add to_mutation_gently idl-compiler: emit include directive in generated impl header file mutation_partition: add apply_gently collection_mutation: improve collection_mutation_view formatting mutation_partition: apply_monotonically: do not support schema upgrade test/perf: report also log_allocations/op	2024-05-02 23:24:38 +03:00
Nadav Har'El	f604269f0a	cql3, secondary index: consistently choose index to use in a query When a table has secondary indexes on multiple columns, and several such columns are used for filtering in a query, Scylla chooses one of these indexes as the main driver of the query, and the second column's restriction is implemented as filtering. Before this patch, the index to use was chosen fairly randomly, based on the order of the indexes in the schema. This order may be different in different coordinators, and may even change across restarts on the same coordinators. This is not only inconsistent, it can cause outright wrong results when using paging and switching (or restarting) coordinates in the middle of a paged scan... One coordinator saves one index's key in the paging state, and then the other coordinator gets this paging state and wrongly believes it is supposed to be a key of a different index. The fix in this patch is to pick the index suitable for the first indexed column mentioned in the query. This has two benefits over the situation before the patch: 1. The decision of which index to use no longer changes between coordinators or across restarts - it just depends on the schema and the specific query. 2. Different indexes can have different "specificity" so using one or the other can change the query's performance. After this patch, the user is in control over which index is used by changing the order of terms in the query. A curious user can use tracing to check which index was used to implement a particular query. An xfailing test we had for this issue no longer fails, so the "xfail" marker is removed. Fixes #7969 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#14450	2024-05-02 19:52:42 +02:00
Benny Halevy	890b890e36	storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method Generalizing the ad-hoc implementation out of group0_state_machine.write_mutations_to_database. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:42:58 +03:00
Benny Halevy	4ae5bbb058	raft: group0_state_machine: write_mutations_to_database: freeze mutations gently write_mutations_to_database might need to handle large mutations from system tables, so to prevent reactor stalls, freeze the mutations gently and call proxy.mutate_locally in parallel on the individual frozen mutations, rather than calling the vector<mutation> based entry point that eventually freezes each mutation synchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	a9f157b648	database: apply_in_memory: unfreeze_gently large mutations Prevent stalls coming from applying large mutations in memory synchronously, like the ones seen with the test_add_many_nodes_under_load dtest: ``` \| \| \| ++[5#2/2 44%] addr=0x1498efb total=256 count=3 avg=85: \| \| \| \| replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}::operator() at ./replica/memtable.cc:804 \| \| \| \| (inlined by) logalloc::allocating_section::with_reclaiming_disabled<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:500 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:527 \| \| \| \| (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:471 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:526 \| \| \| \| (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator() at ./replica/memtable.cc:800 \| \| \| \| (inlined by) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0> at ././utils/allocation_strategy.hh:318 \| \| \| \| (inlined by) replica::memtable::apply at ./replica/memtable.cc:799 \| \| \| ++[6#1/1 100%] addr=0x145047b total=1731 count=21 avg=82: \| \| \| \| replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&> at ./replica/table.cc:2896 \| \| \| ++[7#1/1 100%] addr=0x13ddccb total=2852 count=32 avg=89: \| \| \| \| replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0::operator() at ./replica/table.cc:2924 \| \| \| \| (inlined by) seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2032 \| \| \| \| (inlined by) seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2066 \| \| \| \| (inlined by) replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0> at ./replica/dirty_memory_manager.hh:572 \| \| \| \| (inlined by) replica::table::apply at ./replica/table.cc:2923 \| \| \| ++ - addr=0x1330ba1: \| \| \| \| replica::database::apply_in_memory at ./replica/database.cc:1812 \| \| \| ++ - addr=0x1360054: \| \| \| \| replica::database::do_apply at ./replica/database.cc:2032 ``` This change has virtually no effect on small mutations (up to 128KB in size). build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 Before: median 80092.06 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53291 insns/op, 0 errors) After: median 78780.86 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53311 insns/op, 0 errors) To estimate the performance ramifications on large mutations, I measured perf-simple-query --write calling unfreeze_gently in all cases: median 77411.26 tps ( 71.3 allocs/op, 8.0 logallocs/op, 14.3 tasks/op, 53280 insns/op, 0 errors) Showing the allocations that moved out of logalloc (in memtable::apply of frozen_mutation) into seastar allocations (in unfreeze_gently) and <1% cpu overhead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7dd6a81026	storage_service: get_system_mutations: make_canonical_mutation_gently and also unfreeze_gently the result frozen_mutation:s to prevent the following stalls that were seen with the test_add_many_nodes_under_load dtest: ``` ++[1#1/58 5%] addr=0x16330e9 total=321 count=4 avg=80: \| utils::uleb64_express_encode_impl at ././utils/vle.hh:73 \| (inlined by) utils::uleb64_express_encode<void (&)(char const, unsigned long), void (&)(char const, unsigned long)> at ././utils/vle.hh:82 \| (inlined by) logalloc::region_impl::object_descriptor::encode at ./utils/logalloc.cc:1658 \| (inlined by) logalloc::region_impl::alloc_small at ./utils/logalloc.cc:1743 ++ - addr=0x1634cff: \| logalloc::region_impl::alloc at ./utils/logalloc.cc:2104 \| ++[2#1/2 83%] addr=0x116e22c total=321 count=4 avg=80: \| \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:552 \| \| ++[3#1/3 51%] addr=0x1551288 total=198 count=3 avg=66: \| \| \| compound_wrapper<clustering_key_prefix, clustering_key_prefix_view>::compound_wrapper at ././keys.hh:149 \| \| \| (inlined by) prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key_prefix>::prefix_compound_wrapper at ././keys.hh:574 \| \| \| (inlined by) clustering_key_prefix::clustering_key_prefix at ././keys.hh:865 \| \| \| (inlined by) rows_entry::rows_entry at ./mutation/mutation_partition.hh:957 \| \| ++ - addr=0x153f09f: \| \| \| allocation_strategy::construct<rows_entry, schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>&, seastar::bool_class<continuous_tag>&> at ././utils/allocation_strategy.hh:160 \| \| ++ - addr=0x151409a: \| \| \| mutation_partition::append_clustered_row at ./mutation/mutation_partition.cc:719 \| \| ++ - addr=0x14ab38f: \| \| \| partition_builder::accept_row at ././partition_builder.hh:57 \| \| \| ++[4#1/1 100%] addr=0x1579766 total=577 count=7 avg=82: \| \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| \| ++[5#1/2 56%] addr=0x14e737c total=321 count=4 avg=80: \| \| \| \| frozen_mutation::unfreeze at ./mutation/frozen_mutation.cc:116 \| \| \| \| ++[6#1/1 100%] addr=0x24fb47e total=1476 count=18 avg=82: \| \| \| \| \| service::storage_service::get_system_mutations at ./service/storage_service.cc:6401 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	3143f575e5	tablets: read_tablet_mutations: make_canonical_mutation_gently To prevent reactor stalls due to large tablets mutations (that can contain over 100,000 rows). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7f372dd9ae	schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently To prevent stalls due to large schema mutations. While at it, reserve the result canonical_mutation vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	61dea98185	schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference The function upgrades the input mutation only in certain cases. Currently it accepts the input mutation by value, which may cause and extraneous copy if the caller doesn't move the mutation, as done in `adjust_schema_for_schema_features`. Getting an rvalue reference instead makes the interface clearer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	bc1985b8ce	storage_service: merge_topology_snapshot: freeze_gently Freezing large mutations synchronously may cause reactor stalls, as seen in the test_add_many_nodes_under_load dtest: ``` ++[1#1/37 5%] addr=0x15b0bf total=99 count=2 avg=50: ?? ??:0 \| ++[2#1/2 67%] addr=0x15a331f total=66 count=1 avg=66: \| \| bytes_ostream::write at ././bytes_ostream.hh:248 \| \| (inlined by) bytes_ostream::write at ././bytes_ostream.hh:263 \| \| (inlined by) ser::serialize_integral<unsigned int, bytes_ostream> at ././serializer.hh:203 \| \| (inlined by) ser::integral_serializer<unsigned int>::write<bytes_ostream> at ././serializer.hh:217 \| \| (inlined by) ser::serialize<unsigned int, bytes_ostream> at ././serializer.hh:254 \| \| (inlined by) ser::writer_of_column<bytes_ostream>::write_id at ./build/dev/gen/idl/mutation.dist.impl.hh:4680 \| \| ++[3#1/1 100%] addr=0x159df71 total=132 count=2 avg=66: \| \| \| (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}::operator() at ./mutation/mutation_partition_serializer.cc:99 \| \| \| (inlined by) row::maybe_invoke_with_hash<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1} const, cell_and_hash const> at ./mutation/mutation_partition.hh:133 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}::operator() at ./mutation/mutation_partition.hh:152 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>::operator() at ././utils/compact-radix-tree.hh:1888 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit_slot<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1560 \| \| ++ - addr=0x159d84d: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1364 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > at ././utils/compact-radix-tree.hh:799 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:807 \| \| ++[4#1/1 100%] addr=0x1596f4a total=329 count=5 avg=66: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:473 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:1626 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walk<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}> at ././utils/compact-radix-tree.hh:1909 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}> at ./mutation/mutation_partition.hh:151 \| \| \| (inlined by) (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:97 \| \| \| (inlined by) write_row<ser::writer_of_deletable_row<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:168 \| \| ++[5#1/2 80%] addr=0x15a310c total=263 count=4 avg=66: \| \| \| mutation_partition_serializer::write_serialized<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:180 \| \| \| ++[6#1/2 62%] addr=0x14eb60a total=428 count=7 avg=61: \| \| \| \| frozen_mutation::frozen_mutation(mutation const&)::$_0::operator()<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/frozen_mutation.cc:85 \| \| \| \| (inlined by) ser::after_mutation__key<bytes_ostream>::partition<frozen_mutation::frozen_mutation(mutation const&)::$_0> at ./build/dev/gen/idl/mutation.dist.impl.hh:7058 \| \| \| \| (inlined by) frozen_mutation::frozen_mutation at ./mutation/frozen_mutation.cc:84 \| \| \| \| ++[7#1/1 100%] addr=0x14ed388 total=532 count=9 avg=59: \| \| \| \| \| freeze at ./mutation/frozen_mutation.cc:143 \| \| \| \| ++[8#1/2 74%] addr=0x252cf55 total=394 count=6 avg=66: \| \| \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:763 ``` This change uses freeze_gently to freeze the cdc_generations_v2 mutations one at a time to prevent the stalls reported above. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	a016e1d05d	canonical_mutation: add make_canonical_mutation_gently Make a canonical mutation gently using an async serialization function. Similar to freeze_gently, yielding is considered only in-between range tombstones and rows. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:04 +03:00
Benny Halevy	a126160d7e	frozen_mutation: move unfreeze_gently to async_utils Unfreeze_gently doesn't have to be a method of frozen_mutation. It might as well be implemented as a free function reading from a frozen_mutation and preparing a mutation gently. The logic will be used in a later patch to make a canonical mutation directly from a frozen_mutation instead of unfreezing it and then converting it to a canonical_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	aa27ef8811	mutation: add freeze_gently Allow yielding in between serializing of range tombstones and rows to prevent reactor stalls due to large mutations with many rows or range tombstones. mutations that have many cells might still stall but those are considered infrequent enough to ignore for now. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	0da2940c72	idl-compiler: generate async serialization functions for stub members To be used in a following patch for e.g. mutation::freeze_gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	504a9ab897	raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently Prevent stalls coming from writing large mutations like the ones seen with the test_add_many_nodes_under_load dtest: ``` ++[1#11/11 6%] addr=0x15408f6 total=33 count=1 avg=33: \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:284 \| (inlined by) atomic_cell_or_collection::atomic_cell_or_collection at ./mutation/atomic_cell_or_collection.hh:25 \| (inlined by) cell_and_hash::cell_and_hash at ./mutation/mutation_partition.hh:73 \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::emplace<atomic_cell_or_collection, seastar::optimized_optional<cell_hash> > at ././utils/compact-radix-tree.hh:1809 ++ - addr=0x1518bae: \| row::append_cell at ./mutation/mutation_partition.cc:1344 ++ - addr=0x14acb23: \| partition_builder::accept_row_cell at ././partition_builder.hh:70 ++ - addr=0x157a6a6: \| mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor::accept_atomic_cell at ./mutation/mutation_partition_view.cc:218 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor::operator() at ./mutation/mutation_partition_view.cc:138 \| (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>::internal_visit<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/variant.hpp:1028 \| (inlined by) boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type> > at /usr/include/boost/variant/detail/visitation_impl.hpp:117 \| (inlined by) boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:157 \| (inlined by) boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::mpl::l_item<mpl_::long_<2l>, ser::collection_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:238 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void> at /usr/include/boost/variant/variant.hpp:2337 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false> > at /usr/include/boost/variant/variant.hpp:2349 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const> at /usr/include/boost/variant/variant.hpp:2393 \| (inlined by) boost::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:68 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor> at ./mutation/mutation_partition_view.cc:158 \| (inlined by) mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:224 ++ - addr=0x151234a: \| mutation_partition::apply at ./mutation/mutation_partition.cc:476 ++ - addr=0x14e1103: \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:76 ++ - addr=0x283f9ee: \| service::write_mutations_to_database at ./service/raft/group0_state_machine.cc:124 ++ - addr=0x283f36c: \| service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2::operator() at ./service/raft/group0_state_machine.cc:165 ++ - addr=0x28395e3: \| std::__invoke_impl<seastar::future<void>, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| (inlined by) std::__invoke<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96 \| (inlined by) std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<seastar::future<void> > (*)(seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>&&, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1032 \| (inlined by) std::__do_visit<std::__detail::__variant::__deduce_visit_result<seastar::future<void> >, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1793 \| (inlined by) std::visit<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1854 \| (inlined by) service::group0_state_machine::merge_and_apply at ./service/raft/group0_state_machine.cc:156 ++ - addr=0x284781e: \| service::group0_state_machine::apply at ./service/raft/group0_state_machine.cc:220 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	574cb7d977	storage_service: merge_topology_snapshot: co_await to_mutation_gently Perevent stalls from "unpacking" of large canonical mutations seen with test_add_many_nodes_under_load when called from `group0_state_machine::transfer_snapshot`: ``` ++[1#1/44 14%] addr=0x395b2f total=569 count=6 avg=95: ?? ??:0 \| ++[2#1/2 56%] addr=0x3991e3 total=321 count=4 avg=80: ?? ??:0 \| ++ - addr=0x1587159: \| \| std::__new_allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/new_allocator.h:147 \| \| (inlined by) std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/allocator.h:198 \| \| (inlined by) std::allocator_traits<std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:482 \| \| (inlined by) std::_Vector_base<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::_M_allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:378 \| \| (inlined by) std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::reserve at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/vector.tcc:79 \| \| (inlined by) ser::idl::serializers::internal::vector_serializer<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > > >::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer_impl.hh:226 \| \| (inlined by) ser::deserialize<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:31 \| ++ - addr=0x1587085: \| \| seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void> at ././seastar/include/seastar/core/simple-stream.hh:646 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:28 \| \| (inlined by) ser::deserialize<clustering_key_prefix, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::deletable_row_view::key() const::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> const> at ./build/dev/gen/idl/mutation.dist.impl.hh:1268 \| \| ++[3#1/1 100%] addr=0x15865a3 total=577 count=7 avg=82: \| \| \| seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::deletable_row_view::key() const::{lambda(auto:1&)#1}> at ././seastar/include/seastar/core/simple-stream.hh:491 \| \| \| (inlined by) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::deletable_row_view::key() const::{lambda(auto:1&)#1}, void> at ././seastar/include/seastar/core/simple-stream.hh:639 \| \| \| (inlined by) ser::deletable_row_view::key at ./build/dev/gen/idl/mutation.dist.impl.hh:1264 \| \| ++[4#1/1 100%] addr=0x157cf27 total=643 count=8 avg=80: \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| ++ - addr=0x1516cac: \| \| \| mutation_partition::apply at ./mutation/mutation_partition.cc:497 \| \| ++[5#1/1 100%] addr=0x14e4433 total=1765 count=22 avg=80: \| \| \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:60 \| \| ++[6#1/2 98%] addr=0x2452a60 total=1732 count=21 avg=82: \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:761 \| \| ++ - addr=0x2858782: \| \| \| service::group0_state_machine::transfer_snapshot at ./service/raft/group0_state_machine.cc:303 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	c485ed6287	canonical_mutation: add to_mutation_gently to_mutation_gently generates mutation from canonical_mutation asynchronously using the newly introduced mutation_partition accept_gently method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:54 +03:00
Benny Halevy	7f7e4616ab	idl-compiler: emit include directive in generated impl header file The generated implementation header file depends on the generated header file for the types it uses. Generate a respective #include directive to make it self-sufficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:50:16 +03:00
Benny Halevy	e1411f3911	mutation_partition: add apply_gently To be used for freezing mutations or making canonical mutations gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:45:24 +03:00
Benny Halevy	f625cd76a9	collection_mutation: improve collection_mutation_view formatting Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	15e8ecb670	mutation_partition: apply_monotonically: do not support schema upgrade Currently, if the input mutation_partition requires schema upgrade, apply_monotonically always silently reverts to being non-preemptible, even if the caller passed is_preemptible::yes. To prevent that from happening, put the burden of upgrading the mutation_partition schem on the caller, which is today the apply() methods, which are synchronous anyhow. With that, we reduce the proliferation of the `apply_monotonically` overloads and keep only the low level one (which could potentially be private as well, as it's called only from within the mutation/ source files and from tests) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	e5ca65f78b	test/perf: report also log_allocations/op Currently perf-simple-query --write ignores log allocations that happen on the memtable apply path. This change adds tracking and accounting of the number of log allocation, and reporting of thereof. For reference, here's the output of build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 ``` random-seed=1 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction 78073.55 tps ( 59.4 allocs/op, 16.3 logallocs/op, 14.3 tasks/op, 52991 insns/op, 0 errors) 77263.59 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53282 insns/op, 0 errors) 79913.07 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53295 insns/op, 0 errors) 79554.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53284 insns/op, 0 errors) 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median absolute deviation: 761.54 maximum: 79913.07 minimum: 77263.59 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Avi Kivity	e0d597348b	Merge 'Remove sstable_directory::_sstable_dir member' from Pavel Emelyanov Different sstable storage backends use slightly different notion of what sstable location is. Filesystem storage knows it's `/var/lib/data/ks/cf-uuid/state` path, while s3 storage keeps only this path's part without state (and even that's not very accurate, because bucket prefix is missing as well as "/var/lib/data" prefix is not needed and eventually should be omitted). Nonetheless, the sstable_directory still keeps the filsystem-like path, while it's really only needed by the filesystem lister. This PR removes it. Closes scylladb/scylladb#18496 * github.com:scylladb/scylladb: sstable_directory: Remove _sstable_dir member sstable_directory: Create sstable path with make_path() when logging sstable_directory: Use make_path to construct filesystem lister sstable_directory: Move some logging around	2024-05-02 17:52:21 +03:00
Patryk Jędrzejczak	b8e3bf4b09	topology_coordinator: clear obsolete generations earlier We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. scylladb/scylladb#17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Also, prevent `test_cdc_generation_clearing` from being flaky by firing the `increase_cdc_generation_leeway` error injection on the server being the topology coordinator. Ref: scylladb/scylladb#17545	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	f61c50baa4	test: test_raft_snapshot_request: improve the last assertion The last assertion in the test is very sensitive to changes. The constant has already been increased from 0 to 1 due to flakiness. The old comment explains it. In the following patch, we change the CDC generation publisher so that it clears the obsolete CDC generations earlier. This change would make this assertion flaky again. After restarting the servers, the new topology coordinator could remove the first generation if it became obsolete. This operation appends a new entry to the log. If it happened after triggering snapshot, the assertion could fail with `2 <= 1`. We could increase the constant again to unflake the test, but we better improve it once and for all. We change the assertion so that it's not sensitive to changes in the code based on Raft. The explanation is in the new comment.	2024-05-02 12:46:33 +02:00

1 2 3 4 5 ...

42483 Commits