scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Aleksandra Martyniuk	92fad5769a	test: repair tasks test Add tests checking whether repair tasks are properly structured and their progress is gathered correctly.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	848dfb26ef	repair: add methods making repair progress more precise Override methods returning expected children number and job size in repair tasks. With them get_progress method would be able to return more precise progress value.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	564454f8c2	tasks: make progress related methods virtual	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	4766f74623	repair: add get_progress method to shard_repair_task_impl Count shard_repair_task_impl progress based on a number of ranges which have already been repaired.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	c9d68869b8	repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	09abbdddae	repair: log a name of a particular table repair is working on Instead of logging the list of all tables' names for a given repair, log the particular name of a table the repair is working on.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	587715b399	tasks: delete move and copy constructors from task_manager::task::impl	2023-08-30 15:34:22 +02:00
Botond Dénes	3e7ec6cc83	Merge 'Move cell assertion from cql_test_env to cql_assertions' from Pavel Emelyanov The cql_test_env has a virtual require_column_has_value() helper that better fits cql_assertions crowd. Also, the helper in question duplicates some existing code, so it can also be made shorter (and one class table helper gets removed afterwards) Closes #15208 * github.com:scylladb/scylladb: cql_assertions: Make permit from env table: Remove find_partition_slow() helper sstable_compaction_test: Do not re-decorate key cql_test_env: Move .require_column_has_value cql_test_env: Use table.find_row() shortcut	2023-08-30 08:34:05 +03:00
Kamil Braun	0bff96a611	Merge 'gossip: add group0_id attribute to gossip_digest_syn' from Mikołaj Grzebieluch Motivation: The user can bootstrap 3 different clusters and then connect them (#14448). When these clusters start gossiping, their token rings will be merged, but there will be 3 different group 0s in there. It results in a corrupted cluster. We need to prevent such situations from happening in clusters which don't use Raft-based topology. ------- Gossiper service sets its group0 id on startup if it is stored in `scylla_local` or sets it during joining group0. Send group0_id (if it is set) when the node tries to initiate the gossip round. When a node gets gossip_digest_syn it checks if its group0 id equals the local one and if not, the message is discarded. Fixes #14448 Performed manual tests with the following scenario: 1. setup a cluster of two nodes (one compiled with and one without this patch) 2. setup a new node 3. create a basic keyspace and table 4. execute simple select and insert queries Tested 4 scenarios: the seed node was with or without this patch, and the third node was with or without this patch. These tests didn't detect any errors. Closes #15004 * github.com:scylladb/scylladb: tests: raft: cluster of nodes with different group0 ids gossip: add group0_id attribute to gossip_digest_syn	2023-08-29 16:41:29 +02:00
Pavel Emelyanov	5c95b1cb7f	scylla-gdb: Remove _cost_capacity from fair-group debug This field is about to be removed in newer seastar, so it shouldn't be checked in scylla-gdb (see also `ae6fdf1599`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15203	2023-08-29 16:13:50 +03:00
Pavel Emelyanov	137c7116dc	cql_assertions: Make permit from env To call table::find_row() one needs to provide a permit. Tests have short and neat helper to create one from cql_test_env Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 16:01:29 +03:00
Kefu Chai	6a55e4120e	encoding_state: mark helper methods protected these methods are only used by the public methods of this class and its derived class "memtable_encoding_stats_collector". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15190	2023-08-29 15:41:13 +03:00
Pavel Emelyanov	c2f2e0fd7a	table: Remove find_partition_slow() helper It's no longer used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:38:41 +03:00
Pavel Emelyanov	0a727a9b2e	sstable_compaction_test: Do not re-decorate key The is_partition_dead() local helper accepts partition key argument and decorates it. Howerver, its caller gets partition key from decorated key itself, and can just pass it along Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:38:41 +03:00
Pavel Emelyanov	4e9f380608	cql_test_env: Move .require_column_has_value This env helper is only used by tests (from cql_query_test) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:38:33 +03:00
Pavel Emelyanov	7597663ef5	cql_test_env: Use table.find_row() shortcut The require_column_has_value() finds the cell in three steps -- finds partition, then row, then cell. The class table already has a method to facilitate row finding by partition and clustering key Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:37:27 +03:00
Kamil Braun	ebc9056237	Merge 'Restore storage_service -> cdc_generation_service dependency' from Pavel Emelyanov The main goal of this PR is to stop cdc_generation_service from calling system_keyspace::bootstrap_complete(). The reason why it's there is that gen. service doesn't want to handle generation before node joined the ring or after it was decommissioned. The cleanup is done with the help of storage_service->cdc_generation_service explicit dependency brought back and this, in turn, suddenly freed the raft and API code from the need to carry cdc gen. service reference around. Closes #15047 * github.com:scylladb/scylladb: cdc: Remove bootstrap state assertion from after_join() cdc: Rework gen. service check for bootstrap state api: Don't carry cdc gen. service over storage_service: Use local cdc gen. service in join_cluster() storage_service: Remove cdc gen. service from raft_state_monitor_fiber() raft: Do not carry cdc gen. service over storage_service: Use local cdc gen. service in topo calls storage_service: Bring cdc_generation_service dependency back	2023-08-29 14:10:06 +02:00
Mikołaj Grzebieluch	bac8aa38d9	tests: raft: cluster of nodes with different group0 ids The reproducer for #14448. The test starts two nodes with different group0_ids. The second node is restarted and tries to join the cluster consisting of the first node. gossip_digest_syn message should be rejected by the first node, so the second node will not be able to join the cluster. This test uses repair-based node operations to make this test easier. If the second node successfully joins the cluster, their tokens metadata will be merged and the repair service will allow to decommission the second node. If not - decommissioning the second node will fail with an exception "zero replica after the removal" thrown by the repair service.	2023-08-29 11:09:15 +02:00
Mikołaj Grzebieluch	2230abc9b2	gossip: add group0_id attribute to gossip_digest_syn Gossiper service sets its group0 id on startup if it is stored in `scylla_local` or sets it during joining group0. Send group0_id (if it is set) when the node tries to initiate the gossip round. When a node gets gossip_digest_syn it checks if its group0 id equals the local one and if not, the message is discarded. Fixes #14448.	2023-08-29 11:09:15 +02:00
Aleksandra Martyniuk	5e31ca7d20	tasks: api: show tasks' scopes To make manual analysis of task manager tasks easier, task_status and task_stats contain operation scope (e.g. shard, table). Closes #15172	2023-08-29 11:32:16 +03:00
Botond Dénes	57deeb5d39	Merge 'gossiper: add get_unreachable_members_synchronized and use over api' from Benny Halevy Modeled after get_live_members_synchronized, get_unreachable_members_synchronized calls replicate_live_endpoints_on_change to synchronize the state of unreachable_members on all shards. Fixes #12261 Fixes #15088 Also, add rest_api unit test for those apis Closes #15093 * github.com:scylladb/scylladb: test: rest_api: add test_gossiper gossiper: add get_unreachable_members_synchronized	2023-08-29 10:43:22 +03:00
Pavel Emelyanov	4bf8f693ee	cdc: Remove bootstrap state assertion from after_join() As was described in the previous patch, this method is explicitly called by storage service after updating the bootstrap state, so it's unneeded Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:47:35 +03:00
Pavel Emelyanov	566f57b683	cdc: Rework gen. service check for bootstrap state The legacy_handle_cdc_generation() checks if the node had bootstrapped with the help of system_keyspace method. The former is called in two cases -- on boot via cdc_generation_service::after_join() and via gossiper on_...() notifications. The notifications, in turn, are set up in the very same after_join(). The after_join(), in turn, is called from storage_service explicitly after the bootstrap state is updated to be "complete", so the check for the state in legacy_handle_...() seems unnecessary. However, there's still the case when it may be stepped on -- decommission. When performed it calls storage_service::leave_ring() which udpates the bootstrap state to be "needed", thus preventing the cdc gen. service from doing anything inside gossiper's on_...() notifications. It's more correct to stop cdc gen. service handling gossiper notifications by unsubscribing it, but by adding fragile implicit dependencies on the bootstrap state. Checks for sys.dist.ks in the legacy_handle_...() are kept in a form of on-internal-error. The system distributed keyspace is activated by storage service even before the bootstrap state is updated and is never deactivated, but it's anyway good to have this assertion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:46:13 +03:00
Pavel Emelyanov	c3a6e31368	api: Don't carry cdc gen. service over There's a storage_service/cdc_streams_check_and_repair endpoint that needs to provide cdc gen. service to call storage_service method on. Now the latter has its own reference to the former and API can stop taking care of that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	a61454be00	storage_service: Use local cdc gen. service in join_cluster() The method in question accepts cdc_generation_service ref argument from main and cql_test_env, but storage service now has local cdcv gen. service reference, so this argument and its propagation down the stack can be removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	acc646fab6	storage_service: Remove cdc gen. service from raft_state_monitor_fiber() This argument is just unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	4c89249c29	raft: Do not carry cdc gen. service over There's a cdc_generation_service ref sitting on group0_state_machine and the only reason it's there is to call storage_service::topology_...() mathods. Now when storage service can access cdc gen. service on its own, raft code can forget about cdc Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	c03d419adc	storage_service: Use local cdc gen. service in topo calls The topology_state_load() and topology_transition() both take cdc gen. service an an argument, but can work with the local reference. This makes the next patch possible Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	933ea0afe6	storage_service: Bring cdc_generation_service dependency back It sort of reverts the `5a97ba7121` commit, because storage service now uses the cdc generation service to serve raft topo updates which, in turn, takes the cdc gen. service all over the raft code _just_ to make it as an argument to storage service topo calls. Also there's API carrying cdc gen. service for the single call and also there's an implicit need to kick cdc gen. service on decommission which also needs storage service to reference cdc gen. after boot is complete Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Benny Halevy	2c54d7a35a	view, storage_proxy: carry effective_replication_map along with endpoints When sending mutation to remote endpoint, the selected endpoints must be in sync with the current effective_replication_map. Currently, the endpoints are sent down the storage_proxy stack, and later on an effective_replication_map is retrieved again, and it might not match the target or pending endpoints, similar to the case seen in https://github.com/scylladb/scylladb/issues/15138 The correct way is to carry the same effective replication map used to select said endpoints and pass it down the stack. See also https://github.com/scylladb/scylladb/pull/15141 Fixes scylladb/scylladb#15144 Fixes scylladb/scylladb#14730 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15142	2023-08-29 09:08:42 +03:00
Benny Halevy	0b9f221f2a	gossiper: wait_for_live_nodes_to_show_up: increase timeout This function is too flaky with the 30 seconds timeout. For example, the following was seen locally with `test_updated_shards_during_add_decommission_node` in dev mode: alternator_stream_tests.py::TestAlternatorStreams::test_updated_shards_during_add_decommission_node/node6.log: ``` INFO 2023-08-27 15:47:25,753 [shard 0] gossip - Waiting for 2 live nodes to show up in gossip, currently 1 present... INFO 2023-08-27 15:47:30,754 [shard 0] gossip - (rate limiting dropped 498 similar messages) Waiting for 2 live nodes to show up in gossip, currently 1 present... INFO 2023-08-27 15:47:35,761 [shard 0] gossip - (rate limiting dropped 495 similar messages) Waiting for 2 live nodes to show up in gossip, currently 1 present... INFO 2023-08-27 15:47:40,766 [shard 0] gossip - (rate limiting dropped 498 similar messages) Waiting for 2 live nodes to show up in gossip, currently 1 present... INFO 2023-08-27 15:47:45,768 [shard 0] gossip - (rate limiting dropped 497 similar messages) Waiting for 2 live nodes to show up in gossip, currently 1 present... INFO 2023-08-27 15:47:50,768 [shard 0] gossip - (rate limiting dropped 497 similar messages) Waiting for 2 live nodes to show up in gossip, currently 1 present... ERROR 2023-08-27 15:47:55,758 [shard 0] gossip - Timed out waiting for 2 live nodes to show up in gossip INFO 2023-08-27 15:47:55,759 [shard 0] init - Shutting down group 0 service ``` alternator_stream_tests.py::TestAlternatorStreams::test_updated_shards_during_add_decommission_node/node1.log: ``` INFO 2023-08-27 15:48:02,532 [shard 0] gossip - InetAddress 127.0.43.6 is now UP, status = UNKNOWN ... WARN 2023-08-27 15:48:03,552 [shard 0] gossip - failure_detector_loop: Send echo to node 127.0.43.6, status = failed: seastar::rpc::closed_error (connection is closed) ``` Note that node1 saw node6 as UP after node6 already timed out and was shutting down. Increase the timeout to 3 minutes in all modes to reduce flakiness. Fixes #15185 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15186	2023-08-29 09:02:41 +03:00
Michał Chojnowski	2000a09859	reader_concurrency_semaphore: fix a deadlock between stop() and execution_loop() Permits added to `_ready_list` remain there until executed by `execution_loop()`. But `execution_loop()` exits when `_stopped == true`, even though nothing prevents new permits from being added to `_ready_list` after `stop()` sets `_stopped = true`. Thus, if there are reads concurrent with `stop()`, it's possible for a permit to be added to `_ready_list` after `execution_loop()` has already quit. Such a permit will never be destroyed, and `stop()` will forever block on `_permit_gate.close()`. A natural solution is to dismiss `execution_loop()` only after it's certain that `_ready_list` won't receive any new permits. This is guaranteed by `_permit_gate.close()`. After this call completes, it is certain that no permits exist. After this patch, `execution_loop()` no longer looks at `_stopped`. It only exits when `_ready_list_cv` breaks, and this is triggered by `stop()` right after `_permit_gate.close()`. Fixes #15198 Closes #15199	2023-08-29 08:18:49 +03:00
Benny Halevy	5afc242814	token_metadata: get_endpoint_to_host_id_map_for_reading: just inform that normal node has null host_id It is too early to require that all nodes in normal state have a non-null host_id. The assertion was added in `44c14f3e2b` but unfortunately there are several call sites where we add the node as normal, but without a host_id and we patch it in later on. In the future we should be able to require that once we identify nodes by host_id over gossiper and in token_metadata. Fixes scylladb/scylladb#15181 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15184	2023-08-28 21:40:55 +03:00
Botond Dénes	47ce69e9bf	Merge 'paxos_response_handler: carry effective replication map' from Benny Halevy As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Closes #15141 * github.com:scylladb/scylladb: storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler storage_proxy: send_to_live_endpoints: throw on_internal_error if node not found	2023-08-28 11:42:38 +03:00
Kefu Chai	86e8be2dcd	replica:database: log if endpoint not found if the endpoint specified when creating a KEYSPACE is not found, when flushing a memtable, we would throw an `std::out_of_range` exception when looking up the client in `storage_manager::_s3_endpoints` by the name of endpoint. and scylla would crash because of it. so far, we don't have a good way to error out early. since the storage option for keyspace is still experimental, we can live with this, but would be better if we can spot this error in logging messages when testing this feature. also, in this change, `std::invalid_argument` is thrown instead of `std::out_of_range`. it's more appropriate in this circumstance. Refs #15074 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15075	2023-08-28 10:51:19 +03:00
Avi Kivity	fb8375e1e7	Merge 'storage_proxy: mutate_atomically_result: carry effective replication map down to create_write_response_handler' from Benny Halevy The effective_replication_map_ptr passed to `create_write_response_handler` by `send_batchlog_mutation` must be synchronized with the one used to calculate _batchlog_endpoints to ensure they use the same topology. Fixes scylladb/scylladb#15147 Closes #15149 * github.com:scylladb/scylladb: storage_proxy: mutate_atomically_result: carry effective_replication_map down to create_write_response_handler storage_proxy: mutate_atomically_result: keep schema of batchlog mutation in context	2023-08-27 16:34:34 +03:00
Benny Halevy	a5d5b6ded1	gossiper: remove_endpoint: call on_dead notifications is endpoint was alive Since `75d1dd3a76` gossiper::convict will no longer call `mark_dead` (e.g. when called from the failure detection loop after a node is stopped following decommission) and therefore the on_dead notification won't get called. To make that explicit, if the node was alive before remove_endpoint erased it from _live_endpoint, and it has an endpoint_state, call the on_dead notifications. These are imporant to clean up after the node is dead e.g. in storage_proxy::on_down which cancels all respective write handlers. This preferred over going through `mark_dead` as the latter marks the endpoint as unreachable, which is wrong in this case as the node left the cluster. Fixes #15178 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15179	2023-08-27 16:18:27 +03:00
Takuya ASADA	ae25a216bc	scylla_fstrim_setup: stop disabling fstrim.timer Disabling fstrim.timer was for avoid running fstrim on /var/lib/scylla from both scylla-fstrim.timer and fstrim.timer, but fstrim.timer actually never do that, since it is only looking on fstab entries, not our systemd unit. To run fstrim correctly on rootfs and other filesystems not related scylla, we should stop disabling fstrim.timer. Fixes #15176 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #15177	2023-08-27 14:56:37 +03:00
Kefu Chai	83ceedb18b	storage_service: do not cast a string to string_view before formatting seastar::format() just forward the parameters to be formatted to `fmt::format_to()`, which is able to format `std::string`, so there is no need to cast the `std::string` instance to `std::string_view` for formatting it. in this change, the cast is dropped. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15143	2023-08-25 16:43:38 +03:00
Mikołaj Grzebieluch	a031a14249	tests: add asynchronous log browsing functionality Add a class that handles log file browsing with the following features: * mark: returns "a mark" to the current position of the log. * wait_for: asynchronously checks if the log contains the given message. * grep: returns a list of lines matching the regular expression in the log. Add a new endpoint in `ManagerClient` to obtain the scylla logfile path. Fixes #14782 Closes #14834	2023-08-25 14:19:09 +02:00
Raphael S. Carvalho	a22f74df00	table: Introduce storage snapshot for upcoming tablet streaming New file streaming for tablets will require integration with compaction groups. So this patch introduces a way for streaming to take a storage snapshot of a given tablet using its token range. Memtable is flushed first, so all data of a tablet can be streamed through its sstables. The interface is compaction group / tablet agnostic, but user can easily pick data from a single tablet by using the range in tablet metadata for a given tablet. E.g.: auto erm = table.get_effective_replication_map(); auto& tm = erm->get_token_metadata(); auto tablet_map = tm.tablets().get_tablet_map(table.schema()->id()); for (auto tid : tablet_map.tablet_ids()) { auto tr = tmap.get_token_range(tid); auto ssts = co_await table.take_storage_snapshot(tr); ... } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #15128	2023-08-25 13:06:02 +02:00
Patryk Jędrzejczak	9806bddf75	test: fix a test case in raft_address_map_test The test didn't test what it was supposed to test. It would pass even if set_nonexpiring() didn't insert a new entry. Closes #15157	2023-08-25 12:11:33 +02:00
Kefu Chai	d2d1141188	sstables: writer: delegate flush() in checksummed_file_data_sink_impl before this change, `checksummed_file_data_sink_impl` just inherits the `data_sink_impl::flush()` from its parent class. but as a wrapper around the underlying `_out` data_sink, this is not only an unusual design decision in a layered design of an I/O system, but also could be problematic. to be more specific, the typical user of `data_sink_impl` is a `data_sink`, whose `flush()` member function is called when the user of `data_sink` want to ensure that the data sent to the sink is pushed to the underlying storage / channel. this in general works, as the typical user of `data_sink` is in turn `output_stream`, which calls `data_sink.flush()` before closing the `data_sink` with `data_sink.close()`. and the operating system will eventually flush the data after application closes the corresponding fd. to be more specific, almost none of the popular local filesystem implements the file_operations.op, hence, it's safe even if the `output_stream` does not flush the underlying data_sink after writing to it. this is the use case when we write to sstables stored on local filesystem. but as explained above, if the data_sink is backed by a network filesystem, a layered filesystem or a storage connected via a buffered network device, then it is crucial to flush in a timely manner, otherwise we could risk data lost if the application / machine / network breaks when the data is considerered persisted but they are _not_! but the `data_sink` returned by `client::make_upload_jumbo_sink` is a little bit different. multipart upload is used under the hood, and we have to finalize the upload once all the parts are uploaded by calling `close()`. but if the caller fails / chooses to close the sink before flushing it, the upload is aborted, and the partially uploaded parts are deleted. the default-implemented `checksummed_file_data_sink_impl::flush()` breaks `upload_jumbo_sink` which is the `_out` data_sink being wrapped by `checksummed_file_data_sink_impl`. as the `flush()` calls are shortcircuited by the wrapper, the `close()` call always aborts the upload. that's why the data and index components just fail to upload with the S3 backend. in this change, we just delegate the `flush()` call to the wrapped class. Fixes #15079 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15134	2023-08-24 18:03:10 +03:00
Raphael S. Carvalho	d6cc752718	test: Fix flakiness in sstable_compaction_test.autocompaction_control_test It's possible that compaction task is preempted after completion and before reevaluation, causing pending_tasks to be > 1. Let's only exit the loop if there are no pending tasks, and also reduce 100ms sleep which is an eternity for this test. Fixes #14809. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #15059	2023-08-24 13:37:06 +03:00
Benny Halevy	4a2e367e92	storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:45:13 +03:00
Benny Halevy	672ec66769	test: rest_api: add test_gossiper Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:37:12 +03:00
Benny Halevy	8825828817	gossiper: add get_unreachable_members_synchronized Modeled after get_live_members_synchronized, get_unreachable_members_synchronized calls replicate_live_endpoints_on_change to synchronize the state of unreachable_members on all shards. Fixes scylladb/scylladb#15088 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:36:55 +03:00
Benny Halevy	6af0b281a6	storage_proxy: mutate_atomically_result: carry effective_replication_map down to create_write_response_handler The effective_replication_map_ptr passed to `create_write_response_handler` by `send_batchlog_mutation` must be synchronized with the one used to calculate _batchlog_endpoints to ensure they use the same topology. Fixes scylladb/scylladb#15147 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 10:43:40 +03:00
Benny Halevy	098dd5021a	storage_proxy: mutate_atomically_result: keep schema of batchlog mutation in context The batchlog mutation is for system.batchlog. Rather than looking the schema up in multiple places do that once and keep it in the context object. It will be used in the next patch to get a respective effective_replication_map_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 10:43:23 +03:00
Benny Halevy	27c33015a5	storage_proxy: send_to_live_endpoints: throw on_internal_error if node not found Return error in production rather than crashing as in https://github.com/scylladb/scylladb/issues/15138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 08:59:38 +03:00

1 2 3 4 5 ...

38562 Commits