scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 15:33:15 +00:00

Author	SHA1	Message	Date
Mikołaj Grzebieluch	65cfb9b4e0	storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft Waiting for gossip to settle slows down the bootstrap of the cluster. It is safe to disable it if the topology is based on Raft. Fixes scylladb/scylladb#16055 Closes scylladb/scylladb#17960	2024-04-20 17:56:51 +02:00
Kefu Chai	5ab527e669	main: do not echo parsed options when calling scylla interactively in `2f0f53ac`, we added logging of parsed command line options so that we can see how scylla is launched in case it fails to boot. but when scylla is called interactively in console. this echo is a little bit annoying. see following console session ```console $ scylla --help-loggers Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ... command used: "./RelWithDebInfo/scylla --help-loggers" pid: 996503 parsed command line options: [help-loggers] Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ... ``` so in this change, we check if the stdin is associated with a terminal device, if that the case, we don't print the scylla version, parsed command line and pid. and the interactive session looks like: ```console $ scylla --help-loggers Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ``` no more distracting information printed. the original behavior can be tested like: ```console $ : \| ./RelWithDebInfo/scylla --help-loggers ``` assuming scylla is always launched with systemd, which connects stdin to /dev/null. see https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output . so this behavior is preserved with this change. Refs #4203 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18309	2024-04-19 15:00:05 +03:00
Raphael S. Carvalho	223214439b	compaction: Disconsider active tables in the hourly compaction reevaluation This hourly reevaluation is there to help tablets that have very low write activity, which can go a long time without flushing a memtable, and it's important to reevaluate compaction as data can get expired. Today it can happen that we reevaluate a table that is being compacted actively, which is waste of cpu as the reevaluation will happen anyway when there are changes to sstable set. This waste can be amplified with a significant tablet count in a given shard. Eventually, we could make the revaluation time per table based on expiration histogram, but until we get there, let's avoid this waste by only reevaluating tables that are compaction idle for more than 1h. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18280	2024-04-19 14:33:40 +03:00
Kefu Chai	a5dae74aee	doc: update `nodetool setlogginglevel` sample output with most recent loggers list in order to reduce the confusion like: > I cannot find foobar in the list, is it supported? also, take this opportunity to use "console" instead of "shell" for rendering the code block. it's a better fit in this case. since we are using pygment for syntax highlighting, see https://pygments.org/docs/lexers/#pygments.lexers.shell.BashSessionLexer for details on the "console" lexer. and add a prompt before the command line, so that "console" lexer can render the command line and output better. also, add a note explaining that user should refer the output of `scylla` to see the list of logger classes. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18311	2024-04-19 13:25:39 +03:00
Kefu Chai	c04654e865	storage_service: capture this explicitly clang-19 complains with `-Wdeprecated-this-capture`: ``` /home/kefu/dev/scylladb/service/storage_service.cc:5837:22: error: implicit capture of 'this' with a capture default of '=' is deprecated [-Werror,-Wdeprecated-this-capture] 5837 \| auto* node = get_token_metadata().get_topology().find_node(dst.host); \| ^ /home/kefu/dev/scylladb/service/storage_service.cc:5830:44: note: add an explicit capture of 'this' to capture '*this' by reference 5830 \| co_await transit_tablet(table, token, [=] (const locator::tablet_map& tmap, api::timestamp_type write_timestamp) { \| ^ \| , this ``` since https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0806r2.html was approved, see https://eel.is/c++draft/depr.capture.this. and newer versions of C++ compilers implemented it, so we need to capture `this` explicitly to be more standard compliant, and to be more future-proof in this regard. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18306	2024-04-19 10:05:57 +03:00
Kefu Chai	168ade72f8	treewide: replace formatter<std::string_view> with formatter<string_view> in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>` for `std::string_view` as well as the specialization of `fmt::formatter<..>` for `fmt::string_view` which is an implementation builtin in {fmt} for compatibility of pre-C++17. and this type is used even if the code is compiled with C++ stadandard greater or equal to C++17. also, before v10, the `fmt::formatter<std::string_view>::format()` is defined so it accepts `std::string_view`. after v10, `fmt::formatter<std::string_view>` still exists, but it is now defined using `format_as()` machinery, so it's `format()` method does not actually accept `std::string_view`, it accepts `fmt::string_view`, as the former can be converted to `fmt::string_view`. this is why we can inherit from `fmt::formatter<std::string_view>` and use `formatter<std::string_view>::format(foo, ctx);` to implement the `format()` method with {fmt} v9, but we cannot do this with {fmt} v10, and we would have following compilation failure: ``` FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o /home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc /home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format' 254 \| return formatter<std::string_view>::format(it->second, ctx); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ /usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument 2759 \| FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const \| ^ ~~~~~~~~~~~~ ``` because the inherited `format()` method actually comes from `fmt::formatter<fmt::string_view>`. to reduce the confusion, in this change, we just inherit from `fmt::format<string_view>`, where `string_view` is actually `fmt::string_view`. this follows the document at https://fmt.dev/latest/api.html#formatting-user-defined-types, and since there is less indirection under the hood -- we do not use the specialization created by `FMT_FORMAT_AS` which inherit from `formatter<fmt::string_view>`, hopefully this can improve the compilation speed a little bit. also, this change addresses the build failure with {fmt} v10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18299	2024-04-19 07:44:07 +03:00
Avi Kivity	6e487a49aa	Merge 'toolchain: support building an optimized clang' from Takuya ASADA This is complete version of #12786, since I take over the issue from @mykaul. Update from original version are: - Support ARM64 build (disable BOLT for now since it doesn't functioning) - Changed toolchain settings to get current scylla able to build (support WASM, etc) - Stop git clone scylladb repo, create git-archive of current scylla directory and import it - Update Clang to 17.0.6 - Save entire clang directory for BUILD mode, not just /usr/bin/clang binary - Implemented INSTALL_PREBUILT mode to install prebuilt image which built in BUILD mode Note that this patch drops cross-build support of frozen toolchain, since building clang and scylla multiple time in qemu-user-static will very slow, it's not usable. Instead, we should build the image for each architecture natively. ---- This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path. Fixes: #10985 Fixes: scylladb/scylla-enterprise#2539 Closes scylladb/scylladb#17196 * github.com:scylladb/scylladb: toolchain: support building an optimized clang configure.py: add --build-dir option	2024-04-18 19:20:23 +00:00
Anna Stuchlik	a3481a4566	doc: document the system_auth_v2 feature This commit includes updates related to replacing system_auth with system_auth_v2. - The keyspace name system_auth is renamed to system_auth_v2. - The procedures are updated to account for system_auth_v2. - No longer required system_auth RF changes are removed from procedures. - The information is added that if the consistent topology updates feature was not enabled upon upgrade from 5.4, there are limitations or additional steps to do (depending on the procedure). The files with that kind of information are to be found in _common folders and included as needed. - The upgrade guide has been updated to reflect system_auth_v2 and related impacts. Closes scylladb/scylladb#18077	2024-04-18 18:33:49 +02:00
Kefu Chai	21b03d2ce3	topology_coordinator: remove unused variable when compiling the tree with clang-19, it complains: ``` /home/kefu/dev/scylladb/service/topology_coordinator.cc:1968:31: error: variable 'reject' set but not used [-Werror,-Wunused-but-set-variable] 1968 \| if (auto* reject = std::get_if<join_node_response_params::rejected>(&validation_result)) { \| ^ 1 error generated. ``` so, despite that we evaluate the assignment statement to see it evaluates to true or false, the compiler still believes that the variable is not used. probably, the value of the statement is not dependent on the value of the value being assigned. either way, let's use `std::holds_alternative<..>` instead of `std::get_if<..>`, to silence this warning, and the code is a little bit more compacted, in the sense of less tokens in the `if` statement. in order to be self-contained, we take the opportunity to include `<variant>` in this source file, as a function declared in this header is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18291	2024-04-18 18:04:56 +03:00
Amnon Heiman	e8410848a8	Update seastar submodule This patch updates the seastar submodule to get the latest safety patch for the metric layer. The latest patch allows manipulating metric_families early in the start-up process and is safer if someone chooses to aggregate summaries. * seastar f3058414...8fabb30a (4): > stall-analyser: improve stall pattern matching > TLS: Move background BYE handshake to engine::run_in_background > metrics.cc: Safer set_metric_family_configs > src/core/metrics.cc: handle SUMMARY add operator Closes scylladb/scylladb#18293	2024-04-18 18:02:28 +03:00
Tomasz Grabiec	393cb54c01	Merge 'Generalize tablet transition API calls' from Pavel Emelyanov Recently there had been added add_tablet_replica and del_tablet_replica API calls that copy big portion of the existing move_tablet API call's logic. This PR generalizes the common parts Closes scylladb/scylladb#18272 * github.com:scylladb/scylladb: tablets: Generalize transition mutations preparation tablets: Generalize tablet-already-in-transition check tablets: Generalize raft communications for tablet transition API calls tablets: Drop src vs dst equality check from move_tablet()	2024-04-18 14:42:10 +02:00
Anna Stuchlik	ad81f9f56a	doc: replace Scylla with ScyllaDB in Glossary This commit replaces "Scylla" with "ScyllaDB" on the Glossary page. The product has been rebranded as "ScyllaDB". Closes scylladb/scylladb#18296	2024-04-18 14:59:23 +03:00
Kamil Braun	9c2a836607	Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `298a7fcbf2`, reversing changes made to `5cf53e670d`. The change made CI flaky. Fixes: scylladb/scylladb#18278	2024-04-18 11:50:41 +02:00
Aleksandr Bykov	e8833c6f2a	test: Kill coordinator during topology operation If coordinator node was killed, restarted, become not operatable during topology operation, new coordinator should be elected, operation should be aborted and cluster should be rolled back Error injection will be used to kill the coordinator before streaming starts Closes scylladb/scylladb#16197	2024-04-17 17:24:20 +02:00
Tomasz Grabiec	c6c8347493	migration_manager: Pull all of group0 state on repair Current code uses non-raft path to pull the schema, which violates group0 linearizability because the node will have latest schema but miss group0 updates of other system tables. In particular, system.tablets. This manifests as repair errors due to missing tablet_map for a given table when trying to access it. Tablet map is always created together with the table in the same group0 command. When a node is bootstrapping, repair calls sync_schema() to make sure local schema is up to date. This races with group0 catch up, and if sync_schema() wins, repair may fail on misssing tablet map. Fix by making sync_schema() do a group0 read barrier when in raft mode. Fixes #18002 Closes scylladb/scylladb#18175	2024-04-17 16:21:05 +02:00
Nadav Har'El	e78fc75323	Merge 'tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands' from Botond Dénes Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link. The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable. Closes scylladb/scylladb#18147 * github.com:scylladb/scylladb: tools/scylla-nodetool: fix typo: Fore -> For tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands	2024-04-17 15:15:56 +03:00
Asias He	642f9a1966	repair: Improve estimated_partitions to reduce memory usage Currently, we use the sum of the estimated_partitions from each participant node as the estimated_partitions for sstable produced by repair. This way, the estimated_partitions is the biggest possible number of partitions repair would write. Since repair will write only the difference between repair participant nodes, using the biggest possible estimation will overestimate the partitions written by repair, most of the time. The problem is that overestimated partitions makes the bloom filter consume more memory. It is observed that it causes OOM in the field. This patch changes the estimation to use a fraction of the average partitions per node instead of sum. It is still not a perfect estimation but it already improves memory usage significantly. Fixes #18140 Closes scylladb/scylladb#18141	2024-04-17 14:31:38 +03:00
Pavel Emelyanov	1b2cd56bcc	tablets: Generalize transition mutations preparation Tablet transition handlers prepare two mutations -- one for tablets table, that sets transition state, transition mode and few others; and another one for topology table that "activates" the tablet_migration state for topology coordinator. The latter is common to all three handlers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:51 +03:00
Pavel Emelyanov	3beccb8165	tablets: Generalize tablet-already-in-transition check Continuation of the previous patch -- there's a common sanity check of tablet transition API handlers, namely that this tablet is not in transition already. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:02 +03:00
Pavel Emelyanov	14923812ad	tablets: Generalize raft communications for tablet transition API calls There are three transition calls -- move, add replica and del replica -- and all three work similarly. In a loop they try to get guard for raft operation, then perform sanity checks on topology state, then prepare mutations and then try to apply them to raft. After the loop finishes all three wait for transition for the given tablet to complete. This patch generalizes the raft kicking loop and the transition completion waiting code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:59:03 +03:00
Pavel Emelyanov	c4d538320e	tablets: Drop src vs dst equality check from move_tablet() The code here looks like this if src.host == dst.host throw "Local migration not possible" if src == dst co_return; The 2nd check is apparently never satisfied -- if src == dst this means that src.host == dst.host and it should have thrown already Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:57:10 +03:00
Kefu Chai	e431e7dc16	test: paritioner_test: print using fmt::print() instead of using `operator<<`, use `fmt::print()` to format and print, so we can ditch the `operator<<`-based formatters. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18259	2024-04-17 07:13:20 +03:00
Kefu Chai	0ff28b2a2a	test: extract boost_test_print_type() into test_utils.hh since Boost.Test relies on operator<< or `boost_test_print_type()` to print the value of variables being compared, instead of defining the fallback formatter of `boost_test_print_type()` for each individual test, let's define it in `test/lib/test_utils.hh`, so that it can be shared across tests. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18260	2024-04-17 07:12:39 +03:00
Kefu Chai	2bb8e7c3c3	utils: include "seastarx.hh" in composite_abort_source.hh there is chance that `utils/small_vector.hh` does not include `using namespace seastar`, and even if it does, we should not rely on it. but if it does not, checkhh would fail. so let's include "seastarx.hh" in this header, so it is self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18265	2024-04-17 07:11:01 +03:00
David Garcia	6707bc673c	docs: update theme 1.7 Closes scylladb/scylladb#18252	2024-04-16 13:48:11 +02:00
Kamil Braun	eb9ba914a3	Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy The problem this series solves is correctly ignoring DOWN nodes state when replacing a node. When a node is replaced and there are other nodes that are down, the replacing node is told to ignore those DOWN nodes using the `ignore_dead_nodes_for_replace` option. Since the replacing node is bootstrapping it starts with an empty system.peers table so it has no notion about any node state and it learns about all other nodes via gossip shadow round done in `storage_service::prepare_replacement_info`. Normally, since the DOWN nodes to ignore already joined the ring, the remaining node will have their endpoint state already in gossip, but if the whole cluster was restarted while those DOWN nodes did not start, the remaining nodes will only have a partial endpoint state from them, which is loaded from system.peers. Currently, the partial endpoint state contains only `HOST_ID` and `TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`. The first part of this series loads also `DC` and `RACK` from system.peers to make them available to the replacing node as they are crucial for building a correct replication map with network topology replication strategy. But still, without a `STATUS` those nodes are not considered as normal token owners yet, and they do not go through handle_state_normal which adds them to the topology and token_metadata. The second part of this series uses the endpoint state retrieved in the gossip shadow round to explicitly add the ignored nodes' state to topology (including dc and rack) and token_metadata (tokens) in `prepare_replacement_info`. If there are more DOWN nodes that are not explicitly ignored replace will fail (as it should). Fixes scylladb/scylladb#15787 Closes scylladb/scylladb#15788 * github.com:scylladb/scylladb: storage_service: join_token_ring: load ignored nodes state if replacing storage_service: replacement_info: return ignore_nodes state locator: host_id_or_endpoint: keep value as variant gms: endpoint_state: add getters for host_id, dc_rack, and tokens storage_service: topology_state_load: set local STATUS state using add_saved_endpoint gossiper: add_saved_endpoint: set dc and rack gossiper: add_saved_endpoint: fixup indentation gossiper: add_saved_endpoint: make host_id mandatory gossiper: add load_endpoint_state gossiper: start_gossiping: log local state	2024-04-16 10:27:36 +02:00
Pavel Emelyanov	2c3d6fe72f	storage_proxy: Simplify create_hint_sync_point() code It tries to call container().invoke_on_all() the hard way. Calling it directly is not possible, because there's no sharded::invoke_on_all() const overload Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18202	2024-04-16 07:26:06 +03:00
Nadav Har'El	a175e34375	cql-pytest: add instructions on how to get Cassandra The cql-pytest framework allows running tests also against Cassandra, but developers need to install Cassandra on their own because modern distributions such as Fedora no longer carry a Cassandra package. This patch adds clear and easy to follow (I think) instructions on how to download a pre-compiled Cassadra, or alternatively how to download and build Cassandra from source - and how either can be used with the test/cql-pytest/run-cassandra script. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18138	2024-04-16 07:23:36 +03:00
Botond Dénes	298a7fcbf2	Merge 'Drain view_builder in generic drain' from ScyllaDB For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself. refs: #2737 refs: #2795 Closes scylladb/scylladb#16558 * github.com:scylladb/scylladb: storage_service: Drain view builder on drain too view_builder: Generalize mark_as_built(view_ptr) method view_builder: Move mark_existing_views_as_built from storage service storage_service: Add view_builder& reference main,cql_test_env: Move view_builder start up (and make unconditional)	2024-04-16 07:21:42 +03:00
Pavel Emelyanov	5cf53e670d	replica: Remove unused ex variable from table::take_snapshot Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18215	2024-04-16 07:16:38 +03:00
Pavel Emelyanov	f17c594d21	large_data_handler: If-less statistics increment The partitions_bigger_than_threshold is incremented only if the previous check detects that the partition exceeds a threshold by its size. It's done with an extra if, but it can be done without (explicit) condition as bool type is guaranteed by the standard to convert into integers as true = 1 and false = 0 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18217	2024-04-16 07:16:05 +03:00
Pavel Emelyanov	0f70d276d2	tools/scylla-sstable: Use shorter check is unordered_set contains a key Currentl code counts the number of keys in it just to see if this number is non-zero. Using .contains() method is better fit here Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18219	2024-04-16 07:14:48 +03:00
Pavel Emelyanov	1df7c2a0e9	topology_coordinator: Mark retake_node() const Runaway from `4d83a8c12c` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18218	2024-04-16 07:13:07 +03:00
Pavel Emelyanov	05c4042511	api/lsa: Don't use database to perform invoke-on-all The sharded<database> is used as a invoke_in_all() method provider, there's no real need in database itself. Simple smp::invoke_on_all() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18221	2024-04-16 07:12:40 +03:00
Pavel Emelyanov	4a6291dce5	test/sstable: Use .handle_exception_type() shortcut Some tests want to ignore out_of_range exception in continuation and go the longer route for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18216	2024-04-16 07:11:35 +03:00
Pavel Emelyanov	1612aa01ca	cql3: Reserve vector with pk columns When constructing a vector with partition key data, the size of that vector is known beforehand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18239	2024-04-16 07:06:07 +03:00
Pavel Emelyanov	f3edde7d2e	api: Qualify callback commitlog* argument with const There's a helper map-reducer that accepts a function to call on commitlog. All callers accumulate statistics with it, so the commitlog argument is const pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18238	2024-04-16 07:02:31 +03:00
Botond Dénes	162c9ad6f6	Merge 'gossiper: lock local endpoint when updating heart_beat' from Kamil Braun In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. The PR also adds a regression test. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720 Closes scylladb/scylladb#18184 * github.com:scylladb/scylladb: test: reproducer for missing gossiper updates gossiper: lock local endpoint when updating heart_beat	2024-04-16 06:46:24 +03:00
Tzach Livyatan	289793d964	Update Driver root page The right term is Amazon DynamoDB not AWS DynamoDB See https://aws.amazon.com/dynamodb/ Closes scylladb/scylladb#18214	2024-04-16 06:41:28 +03:00
Beni Peled	223275b4d1	test.py: add the pytest junit_suite_name parameter By default the suitename in the junit files generated by pytest is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft` With this change, the junit files will use the real suitename This change doesn't affect the Test Report in Jenkins, but it raised part of the other task of publishing the test results to elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950 where we parse the XMLs and we need the correct suitename Closes scylladb/scylladb#18172	2024-04-15 21:07:00 +03:00
Tomasz Grabiec	95d93c1668	Merge 'Extend tablet_transition_kind::rebuild to remove replicas' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks may have less replicas. Part of this process is removing replicas from some node(s). This PR extends the tablets rebuild transition to handle this case by making pending_replica optional. fixes: #18176 Closes scylladb/scylladb#18203 * github.com:scylladb/scylladb: test: Tune up tablet-transition test to check del_replica api: Add method to delete replica from tablet tablet: Make pending replica optional	2024-04-15 21:01:03 +03:00
Pavel Emelyanov	c60639d582	sstables: Coroutinize drop_caches() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18220	2024-04-15 17:22:59 +03:00
Pavel Emelyanov	b06b85c270	test: Tune up tablet-transition test to check del_replica For that the test case is modified to have 3 nodes and 2 replicas on start. Existing test cases are changed slightly in the way "from" host is detected. Also, the final check for data presense is modified to check that hosts in "replicas" have data and other hosts don't have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	8bad828208	api: Add method to delete replica from tablet Copied from the add_replica counterpart TODO: Generalize common parts of move_tablet and add_\|del_tablet_replica Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	725b2863d2	tablet: Make pending replica optional Just like leaving replica could be optional when adding replica to tablet, the pending replica can be optional too if we're removing a replica from tablet Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Amnon Heiman	06dc56df01	Update seastar submodule Fixes scylladb/scylladb#18083 * seastar cd8a9133...f3058414 (18): > src/core/metrics.cc: rewrite set_metric_family_configs > include/seastar/core/metrics_api.hh: Revert d2929c2ade5bd0125a73d53280c82ae5da86218e > sstring: include <fmt/format.h> instead of <fmt/ostream.h> > seastar.cc: include used header > tls: include used header of <unordered_set> > docs: remove unused parameter from handle_connection function of echo-HTTP-server tutorial example > stall-analyser: use 0 for the default value of --width > http: Move parsed params and urls > scripts: use raw string to avoid invalid escape sequences > timed_out_error: add fmt::formatter for timed_out_error > scripts/stall-analyser: change default branch-threshold to 3% > scripts/stall-analyser: resolve string escape sequence warning > io_queue: Use static vector for fair groups too > io_queue: Use static vector to store fair queues > stall-analyser: add space around '=' in param list > stall-analyser: add a space between 'var: Type' in type annotation > stall-analyser: move variables closer to where they are used > memory: drop support for compilers that don't support aligned new Closes scylladb/scylladb#18235	2024-04-15 15:19:59 +02:00
Tomasz Grabiec	2ceef1d600	scripts: tablet-mon.py: Support for annotating tablets by table id Closes scylladb/scylladb#18225	2024-04-15 15:19:59 +02:00
Benny Halevy	655d624e01	storage_service: join_token_ring: load ignored nodes state if replacing When a node bootstraps or replaces a node after full cluster shutdown and restart, some nodes may be down. Existing nodes in the cluster load the down nodes TOKENS (and recently, in this series, also DC and RACK) from system.peers and then populate locator::topology and token_metadata accordingly with the down nodes' tokens in storage_service::join_cluster. However, a bootstrapping/replacing node has no persistent knowledge of the down nodes, and it learns about their existance only from gossip. But since the down nodes have unknown status, they never go through `handle_state_normal` (in gossiper mode) and therefore they are not accounted as normal token owners. This is handled by `topology_state_load`, but not with gossip-based node operations. This patch updates the ignored nodes (for replace) state in topology and token_metadata as if they were loaded from system tables, after calling `prepare_replacement_info` when raft topology changes are disabled, based on the endpoint_state retrieved in the shadow round initiated in prepare_replacement_info. Fixes scylladb/scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:45:55 +03:00
Benny Halevy	e4c3c07510	storage_service: replacement_info: return ignore_nodes state Instead of `parse_node_list` resolving host ids to inet_address let `prepare_replacement_info` get host_id_or_endpoint from parse_node_list and prepare `loaded_endpoint_state` for the ignored nodes so it can be used later by the callers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:43:19 +03:00
Benny Halevy	7c2bd8dc34	locator: host_id_or_endpoint: keep value as variant Rather than allowing to keep both host_id and endpoint, keep only one of them and provide resolve functions that use the token_metadata to resolve the host_id into an inet_address or vice verse. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:25:50 +03:00

1 2 3 4 5 ...

42251 Commits