scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Lakshmi Narayanan Sreethar	24064064e9	sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables The testcase uses an sstable whose mutation key and the generation are owned by different shards. Due to this, when process_sstable_dir is called, the sstable gets loaded into a different shard than the one that was intended. This also means that the sstable and the sstable manager end up in different shards. The following patch will introduce a condition variable in sstables manager which will be signalled from the sstables. If the sstable and the sstable manager are in different shards, the signalling will cause the testcase to fail in debug mode with this error : "Promise task was set on shard x but made ready on shard y". So, fix it by supplying appropriate generation number owned by the same shard which owns the mutation key as well. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	69b2a127b0	sstable_datafile_test: add test to verify reclaimed components reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	54bb03cff8	sstables: support reloading reclaimed components Added support to reload components from which memory was previously reclaimed as the total memory of reclaimable components crossed a threshold. The implementation is kept simple as only the bloom filters are considered reclaimable for now. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	2340ab63c6	sstables_manager: add new intrusive set to track the reclaimed sstables The new set holds the sstables from where the memory has been reclaimed and is sorted in ascending order of the total memory reclaimed. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	140d8871e1	sstable: add link and comparator class to support new instrusive set Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	3ef2f79d14	sstable: renamed intrusive list link type Renamed the intrusive list link type to differentiate it from the set link type that will be added in an upcoming patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	02d272fdb3	sstable: track memory reclaimed from components per sstable Added a member variable _total_memory_reclaimed to the sstable class that tracks the total memory reclaimed from a sstable. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	a53af1f878	sstable: rename local variable in sstable::total_reclaimable_memory_size Renamed local variable in sstable::total_reclaimable_memory_size in preparation for the next patch which adds a new member variable _total_memory_reclaimed to the sstable class. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Kefu Chai	906700d523	test/nodetool: accept -1 returncode also when --help is invoked in newer seastar, 0 is returned as the returncode of the application when handling `--help`. to prepare for this behavior, let's accept it before updating the seastar submodule. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18574	2024-05-09 08:26:44 +03:00
Kefu Chai	6047b3b6aa	build: cmake: build async_utils.cc async_utils.cc was introduced in `e1411f39`, so let's update the cmake building system to build it. without which, we'd run into link failure like: ``` ld.lld: error: undefined symbol: to_mutation_gently(canonical_mutation const&, seastar::lw_shared_ptr<schema const>) >>> referenced by storage_service.cc >>> storage_service.cc.o:(service::storage_service::merge_topology_snapshot(service::raft_snapshot)) in archive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>)) inarchive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>) (.resume)) in archive service/Dev/libservice.a >>> referenced 1 more times ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18524	2024-05-09 08:26:44 +03:00
Kefu Chai	c336904722	build: cmake: mark abseil include SYSTEM this change is a followup of `0b0e661a`. it helps to ensure that the header files in abseil submodule have higher priority when the compiler includes abseil headers when building with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18523	2024-05-09 08:26:44 +03:00
Kefu Chai	2a9a874e19	db,service: fix typos in comments Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18567	2024-05-09 08:26:44 +03:00
Anna Stuchlik	65c8b81051	doc: add OS support in version 6.0 This commit adds OS support in version 6.0. In addition, it removes the information about version 5.2, as this version is no longer supported, according to our policy. Closes scylladb/scylladb#18562	2024-05-09 08:26:44 +03:00
Anna Stuchlik	74fb9808ed	doc: update Consistent Topology with Raft This PR: - Removes the `.. only:: opensource` directive from Consistent Topology with Raft. This feature is no longer an Open Source-only experimental feature. - Removes redundant version-specific information. - Moves the necessary version-specific information to a separate file. This is a follow-up to `55b011902e`. Refs https://github.com/scylladb/scylladb/pull/18285/ Closes scylladb/scylladb#18553	2024-05-09 08:26:44 +03:00
Calle Wilund	79d56ccaad	commitlog: Fix request_controller semaphore accounting. Fixes #18488 Due to the discrepancy between bytes added to CL and bytes written to disk (due to CRC sector overhead), we fail to account for the proper byte count when issuing account_memory_usage in allocate (using bytes added) and in cycle:s notify_memory_written (disk bytes written). This leads us to slowly, but surely, add to the semaphore all the time. Eventually rendering it useless. Also, terminate call would _not_ take any of this into account, and the chunk overhead there would cause a (smaller) discrepancy as well. Fix by simply ensuring that buffer alloc handles its byte usage, then accounting based on buffer position, not input byte size. Closes scylladb/scylladb#18489	2024-05-09 08:26:44 +03:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Aleksandra Martyniuk	67bbaad62e	tasks: use default task_ttl in scylla.yaml Currently default task_ttl_in_seconds is 0, but scylla.yaml changes the value to 10. Change task_ttl_in_seconds in scylla.yaml to 0, so that there are consistent defaults. Comment it out. Fixes: #16714. Closes scylladb/scylladb#18495	2024-05-09 08:26:44 +03:00
Botond Dénes	0438febdc9	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter	2024-05-09 08:26:43 +03:00
Yaniv Michael Kaul	124064844f	docs/dev/object_stroage.md: convert example AWS keys to be more innocent Someone thought that they actually represent real keys (the 'EXAMPLE' in their name was not enough). Converted them to be as clear as can be, example data. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#18565	2024-05-09 08:26:43 +03:00
Asias He	46269a99d8	repair: Add ranges_parallelism option support for tablet The ranges_parallelism option is introduced in commit `9b3fd9407b`. Currently, this option works for vnode table repair only. This patch enables it for tablet repair, since it is useful for tablet repair too. Fixes #18383 Closes scylladb/scylladb#18385	2024-05-09 08:26:43 +03:00
Benny Halevy	0156e97560	storage_proxy: cas: reject for tablets-enabled tables Currently, LWT is not supported with tablets. In particular the interaction between paxos and tablet migration is not handled yet. Therefore, it is better to outright reject LWT queries for tablets-enabled tables rather than support them in a flaky way. This commit also marks tests that depend on LWT as expeced to fail. Fixes scylladb/scylladb#18066 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18103	2024-05-09 08:26:43 +03:00
Patryk Jędrzejczak	053a2893cf	raft topology: join_token_ring: prevent shutdown hangs Shutdown of a bootstrapping node could hang on `_topology_state_machine.event.when()` in `wait_for_topology_request_completion`. It caused scylladb/scylladb#17246 and scylladb/scylladb#17608. On a normal node, `wait_for_group0_stop` would prevent it, but this function won't be called before we join group 0. Solve it by adding a new subscriber to `_abort_source`. Additionally, trigger `_group0_as` to prevent other hang scenarios. Note that if both the new subscriber and `wait_for_group0_stop` are called, nothing will break. `abort_source::request_abort` and `conditional_variable::broken` can be called multiple times. The raft-based topology is moved out of experimental in 6.0, no need to backport the patch. Fixes scylladb/scylladb#17246 Fixes scylladb/scylladb#17608 Closes scylladb/scylladb#18549	2024-05-09 08:26:43 +03:00
Botond Dénes	96a7ed7efb	Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili This is the second half of the fix for issue #13968. The first half is already merged with PR #18346 Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row. This change adds a new counter for dead rows to the large_partitions table. This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells. This is an enhancement, and backporting is not needed. Fixes #13968 Closes scylladb/scylladb#18458 * github.com:scylladb/scylladb: sstable: added test for counting dead rows sstable: added docs for system.large_partitions.dead_rows sstable: added cluster feature for dead rows and range tombstones sstable: write dead_rows count to system.large_partitions sstable: added counter for dead rows	2024-05-09 08:26:43 +03:00
David Garcia	d63d418ae3	docs: change "create an issue" github label to "type/documentation" Closes scylladb/scylladb#18550	2024-05-09 08:26:43 +03:00
Kefu Chai	02be1e9309	.github: add clang-tidy workflow clang-tidy is a tool provided by Clang to perform static analysis on C++ source files. here, we are mostly intersted in using its https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html check to reveal the potential issues. this workflow is added to run clang-tidy when building the tree, so that the warnings from clang-tidy can be noticed by developers. a dedicated action is added so other github workflow can reuse it to setup the building environment in an ubuntu:jammy runner. clang-tidy-matcher.json is added to annotate the change, so that the warnings are more visible with github webpage. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18342	2024-05-09 08:26:43 +03:00
David Garcia	4a1b109641	docs: add swagger ui extension Renders the API Reference from api/api-doc using Swagger UI 2.2.10. address comments Closes scylladb/scylladb#18253	2024-05-09 08:26:43 +03:00
Kamil Braun	4dcae66380	Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369 Closes scylladb/scylladb#18379 * github.com:scylladb/scylladb: test: get rid of server-side server_restart test: util: get rid of the `restart` helper test: {auth,topology}: use manager.rolling_restart	2024-05-08 09:45:08 +02:00
Piotr Dulikowski	180cb7a2b9	storage_service: notify lifecycle subs only after token metadata update Currently, in raft mode, when raft topology is reloaded from disk or a notification is received from gossip about an endpoint change, token metadata is updated accordingly. While updating token metadata we detect whether some nodes are joining or are leaving and we notify endpoint lifecycle subscribers if such an event occurs. These notifications are fired _before_ we finish updating token metadata and before the updated version is globally available. This behavior, for "node leaving" notifications specifically, was not present in legacy topology mode. Hinted handoff depends on token metadata being updated before it is notified about a leaving node (we had a similar issue before: scylladb/scylladb#5087, and we fixed it by enforcing this property). Because this is not true right now for raft mode, this causes the hint draining logic not to work properly - when a node leaves the cluster, there should be an attempt to send out hints for that node, but instead hints are not sent out and are kept on disk. In order to fix the issue with hints, postpone notifying endpoint lifecycle subscribers about joined and left nodes only after the final token metadata is computed and replicated to all shards. Fixes: scylladb/scylladb#17023 Closes scylladb/scylladb#18377	2024-05-08 09:40:44 +02:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Anna Stuchlik	98367cb6a1	doc: Snitch switch is not supported with tablets This commit adds the tablets-related limitation: if you use tablets, then changing snitch is not supported Refs:https://github.com/scylladb/scylladb/issues/17513 See: https://github.com/scylladb/scylladb/issues/17513#issuecomment-2022552677 Closes scylladb/scylladb#18548	2024-05-07 17:26:05 +02:00
Pavel Emelyanov	677e80a4d5	table: Coroutinize table::delete_sstables_atomically() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18499	2024-05-07 17:10:28 +02:00
Kamil Braun	53443f566a	Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov It needs some local naming cleanup, but otherwise it's pretty simple Closes scylladb/scylladb#18510 * github.com:scylladb/scylladb: generic_server: Fix indentation after previous patch generic_server: Coroutinize listen() method generic_server: Rename creds argument to builder	2024-05-07 17:08:59 +02:00
Ferenc Szili	60bf846f68	sstable: added test for counting dead rows	2024-05-07 15:44:33 +02:00
Ferenc Szili	8e9771d010	sstable: added docs for system.large_partitions.dead_rows	2024-05-07 15:44:33 +02:00
Avi Kivity	9b8dfb2b19	compaction: compaction_strategy validation: don't rely on optional<> formatting std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18533	2024-05-07 12:02:33 +03:00
Kefu Chai	7e578ae964	message: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18527	2024-05-07 11:59:36 +03:00
Raphael S. Carvalho	570e3f8df0	compaction: exclude expired sstables from calculation of base timestamps base timestamps are feeded into the sstable writer for calculating delta, used by varints. given that expired ssts are bypassed, we don't have to account them. so if we compacting fully expired and new sstable together, we can save a bit by having a base ts closer to the data actually written into output. also I wanted to move the calculation into the loop in setup(), to avoid two iterations over input set that can have even more than 1k elements. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18504	2024-05-07 08:43:50 +03:00
Raphael S. Carvalho	2d9142250e	Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this will be handled by explicitly closing the connection, so if scylla fails to update gossiper state due to premature abort on shutdown, then we won't be stuck in an endless reconnection attempt (later through heartbeats (30s interval)), causing the test to timeout. Manifests in scylla logs like this: gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted) gossip - failure_detector_loop: Finished main loop migration_manager - stopping migration service storage_service - Shutting down native transport server gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) cql_server_controller - CQL server stopped ... gossip - My status = NORMAL gossip - Announcing shutdown gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924 gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924 gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested) Refs #14746. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18484	2024-05-07 02:31:02 +02:00
Piotr Dulikowski	5459cfed6a	Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables) Fixes https://github.com/scylladb/scylladb/issues/17737 Closes scylladb/scylladb#17939 * github.com:scylladb/scylladb: auth: don't run legacy migrations on auth-v2 startup auth: fix indent in password_authenticator::start auth: remove unused service::has_existing_legacy_users func	2024-05-06 19:53:35 +02:00
Wojciech Mitros	8472c46c8a	service_level_controller: coroutinize notify_service_level_removed To avoid conflicts arising from the discrepancy between different versions of the repository, use coroutines instead of continuations in service_level_controller::notify_service_level_removed(). Closes scylladb/scylladb#18525	2024-05-06 14:20:49 +03:00
Piotr Dulikowski	92e5018ddb	test: get rid of server-side server_restart Restarting a node amounts to just shutting it down and then starting again. There is no good reason to have a dedicated endpoint in the ScyllaClusterManager for restarting when it can be implemented by calling two endpoints in a sequence: stop and start - it's just code duplication. Remove the server_restart endpoint in ScyllaClusterManager and reimplement it as two endpoint calls in the ManagerClient.	2024-05-06 12:54:53 +02:00
Piotr Dulikowski	8de2bda7ae	test: util: get rid of the `restart` helper We already have `ManagerClient.server_restart`, which can be used in its place.	2024-05-06 12:24:40 +02:00
Piotr Dulikowski	897e603bf0	test: {auth,topology}: use manager.rolling_restart Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369	2024-05-06 12:24:40 +02:00
Kamil Braun	ccbb9f5343	Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. https://github.com/scylladb/scylladb/issues/17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Additionally, improve `test_raft_snapshot_request` that would become flaky after the change so it's not sensitive to changes anymore. The raft-based topology was experimental before 6.0, no need to backport. Ref: scylladb/scylladb#17545 Closes scylladb/scylladb#18497 * github.com:scylladb/scylladb: topology_coordinator: clear obsolete generations earlier test: test_raft_snapshot_request: improve the last assertion test: test_raft_snapshot_request: find raft leader after restart test: test_raft_shanpshot_request: simplify appended_command	2024-05-06 12:03:33 +02:00
Kamil Braun	1a50a524e7	Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198 Closes scylladb/scylladb#18261 * github.com:scylladb/scylladb: test: topology: test that upgrade succeeds after recent removal topology_coordinator: compute cluster size correctly during upgrade	2024-05-06 11:06:09 +02:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak	628d7e709e	cdc: generation: fix retrieve_generation_data_v2 `system_keyspace::read_cdc_generation_opt` queries `system.cdc_generations_v3`, which stores ids of CDC generations as timeuuids. This function shouldn't be called with a normal uuid (used by `system.cdc_generations_v2` to store generation ids). Such a call would end with a marshaling error. Before this patch,`retrieve_generation_data_v2` could call `system_keyspace::read_cdc_generation_opt` with a normal uuid if the generation wasn't present in `system.cdc_generations_v2`. This logic caused a marshaling error while handling the `check_and_repair_cdc_streams` request in the `cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest. This patch fixes the code being added in 6.0, no need to backport it. Fixes scylladb/scylladb#18473 Closes scylladb/scylladb#18483	2024-05-06 09:12:47 +02:00
Kamil Braun	16846bf5ce	Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb With topology over raft all operation are already serialized by the coordinator anyway, so no need to synchronize removenode using api lock. All others are still synchronized since there cannot be executed in parallel for the same node anyway. * 'gleb/17681-fix' of github.com:scylladb/scylla-dev: storage_service: do not take API lock for removenode operation if topology coordinator is enabled test: return file mark from wait_for that points after the found string	2024-05-06 09:03:03 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Kefu Chai	5ca9a46a91	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18515	2024-05-05 23:31:48 +03:00

1 2 3 4 5 ...

42525 Commits