scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Benny Halevy	53fdf75cf9	repair: pass erm down to get_hosts_participating_in_repair and get_neighbors Now that it is available in repair_info. Fixes #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:30 +02:00
Benny Halevy	b69be61f41	repair: pass effective_replication_map down to repair_info And make sure the token_metadata ring version is same as the reference one (from the erm on shard 0), when starting the repair on each shard. Refs #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:29 +02:00
Benny Halevy	c47d36b53d	repair: coroutinize sync_data_using_repair Prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	58b1c17f5d	repair: futurize do_repair_start Turn it into a coroutine to prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	d6b2124903	repair: sync_data_using_repair: require to run on shard 0 And with that do_sync_data_using_repair can be folded into sync_data_using_repair. This will simplify using the effective_replication_map throughout the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	0c56c75cf8	repair: require all node operations to be called on shard 0 To simplify using of the effective_replication_map / token_metadata_ptr throught the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	64b0756adc	repair: repair_info: keep effective_replication_map Sampled when repair info is constructed. To be used throughout the repair process. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	c7d753cd44	repair: do_repair_start: use keyspace erm to get keyspace local ranges Rather than calling db.get_keyspace_local_ranges that looks up the keyspace and its erm again. We want all the inforamtion derived from the erm to be based on the same source. The function is synchronous so this changes doesn't fix anything, just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	aaf74776c2	repair: do_repair_start: use keyspace erm for get_primary_ranges Ensure that the primary ranges are in sync with the keyspace erm. The function is synchronous so this change doesn't fix anything, it just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	9200e6b005	repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc Ensure the erm and topology are in sync. The function is synchronous so this change doesn't fix anything, just cleans up the code. Fix mistake in comment while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:57:56 +02:00
Benny Halevy	59dc2567fd	repair: do_repair_start: check_in_shutdown first Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	881eb0df83	repair: get_db().local() where needed In several places we get the sharded database using get_db() and then we only use db.local(). Simplify the code by keeping reference only to the local database upfront. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	c22c4c8527	repair: get topology from erm/token_metdata_ptr We want the topology to be synchronized with the respective effective_replication_map / token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Aleksandra Martyniuk	f2fe586f03	repair: check shutdown with abort source in repair module In repair module the shutdown can be checked using abort_source. Thus, we can get rid of shutdown flag.	2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk	2d878cc9b5	repair: use generic module gate for repair module operations Repair module uses a gate to prevent starting new tasks on shutdown. Generic module's gate serves the same purpose, thus we can use it also in repair specific context.	2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk	4aae7e9026	repair: move tracker to repair module Since both tracker and repair_module serve similar purpose, it is confusing where we should seek for methods connected to them. Thus, to make it more transparent, tracker class is deleted and all its attributes and methods are moved to repair_module.	2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk	a5c05dcb60	repair: move next_repair_command to repair_module Number of the repair operation was counted both with next_repair_command from tracer and sequence number from task_manager::module. To get rid of redundancy next_repair_command was deleted and all methods using its value were moved to repair_module.	2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk	c81260fb8b	repair: generate repair id in repair module repair_uniq_id for repair task can be generated in repair module and accessed from the task.	2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk	6432a26ccf	repair: keep shard number in repair_uniq_id Execution shard is one of the traits specific to repair tasks. Child task should freely access shard id of its parent. Thus, the shard number is kept in a repair_uniq_id struct.	2022-10-31 10:41:17 +01:00
Aleksandra Martyniuk	e2c7c1495d	repair: change UUID to task_id Change type of repair id from utils::UUID to task_id to distinguish them from ids of other entities.	2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk	dc80af33bc	repair: add task_manager::module to repair_service repair_service keeps a shared pointer to repair_module.	2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk	576277384a	repair: create repair module and task Create repair_task_impl and repair_module inheriting from respectively task manager task_impl and module to integrate repair operations with task manager.	2022-10-31 10:04:48 +01:00
Benny Halevy	0ea8250e83	repair: use sharded abort_source to abort repair_info Currently we use a single shared_ptr<abort_source> that can't be copied across shards. Instead, use a sharded<abort_source> in node_ops_info so that each repair_info instance will use an (optional) abort_source* on its own shard. Added respective start and stop methodsm plus a local_abort_source getter to get the shard-local abort_source (if available). Fixes #11826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:18:30 +03:00
Benny Halevy	88f993e5ed	repair: node_ops_info: add start and stop methods Prepare for adding a sharded<abort_source> member. Wire start/stop in storage_service::node_ops_meta_data. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:18:30 +03:00
Benny Halevy	5c25066ea7	repair: node_ops_info: prevent accidental copy Delete node_ops_info copy and move constructors before we add a sharded<abort_source> member for the per-shard repairs in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Pavel Emelyanov	3dc7c33847	repair: Remove ops_uuid It used to be used to abort repair_info by the corresponding node-ops uuid, but this code is no longer there, so it's good to drop the uuid as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	b835c3573c	repair: Remove abort_repair_node_ops() altogether This code is dead after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	8231b4ec1b	repair: Subscribe on node_ops_info::as abortion When node_ops_meta_data aborts it also kicks repair to find and abort all relevant repair_infos. Now it can be simplified by subscribing repair_meta on the abort source and aborting it without explicit kick Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	bf5825daac	repair: Keep abort source on node_ops_info Next patches will need to subscribe on node_ops_meta_data's abort source inside repair code, so keep the pointer on node_ops_info too. At the same time, the node_ops_info::abort becomes obsolete, because the same check can be performed via the abort_source->abort_requested() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	bbb7fca09c	repair: Pass node_ops_info arg to do_sync_data_using_repair() Next patches will need to know more than the ops_uuid. The needed info is (well -- will be) sitting on node_ops_info, so pass it along Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	5e9c3c65b5	repair: Mark repair_info::abort() noexcept Next patch will call it inside abort_source subscription callback which requires the calling code to be noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Asias He	c194c811df	repair: Yield in repair_service::do_decommission_removenode_with_repair When walking through the ranges, we should yield to prevent stalls. We do similar yield in other node operations. Fix a stall in 5.1.dev.20220724.f46b207472a3 with build-id d947aaccafa94647f71c1c79326eb88840c5b6d2 ``` !INFO \| scylla[6551]: Reactor stalled for 10 ms on shard 0. Backtrace: 0x4bbb9d2 0x4bba630 0x4bbb8e0 0x7fd365262a1f 0x2face49 0x2f5caff 0x36ca29f 0x36c89c3 0x4e3a0e1 ```` Fixes #11146 Closes #11160	2022-09-28 18:21:35 +03:00
Benny Halevy	6a11c410fd	repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:04:16 +03:00
Benny Halevy	5dd15aa3c8	tombstone_gc: introduce tombstone_gc_state and use it to access the repair history maps. At this introductory patch, we use default-constructed tombstone_gc_state to access the thread-local maps temporarily and those use sites will be replaced in following patches that will gradually pass the tombstone_gc_state down from the compaction_manager to where it's used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	b2b211568e	repair_service: simplify update_repair_time error handling There's no need for per-shard try/catch here. Just catch exceptions from the overall sharded operation to update_repair_time. Also, update warning to indicate that only updating the repair history time failed, not "Loading repair history". Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Benny Halevy	7d13811297	tombstone_gc: update_repair_time: get table_id rather than schema_ptr The function doesn't need access to the whole schema. The table_id is just enough to get by. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Pavel Emelyanov	b6fdea9a79	code: Call sort_endpoints_by_proximity() via topology The method is about to be moved from snitch to topology, this patch prepares the rest of the code to use the latter to call it. The topology's method just calls snitch, but it's going to change in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:14:01 +03:00
Pavel Emelyanov	43e83c5415	storage_service,dht,repair: Provide local dc/rack from system ks When a node starts it adds itself to the topology. Mostly it's done in the storage_service::join_cluster() and whoever it calls. In all those places the dc/rack for the added node is taken from the system keyspace (it's cache was populated with local dc/rack by the previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:52:16 +03:00
Pavel Emelyanov	4cbe6ee9f4	topology: Require entry in the map for update_normal_tokens() The method in question tries to be on the safest side and adds the enpoint for which it updates the tokens into the topology. From now on it's up to the caller to put the endpoint into topology in advance. So most of what this patch does is places topology.update_endpoint() into the relevant places of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:44:08 +03:00
Pavel Emelyanov	7305061674	replication_strategy: Accept dc-rack as get_pending_address_ranges argument The method creates a copy of token metadata and pushes an endpoint (with some tokens) into it. Next patches will require providing dc/rack info together with the endpoint, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:39:44 +03:00
Avi Kivity	be44fd63f9	Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy This series converts the synchronous `effective_replication_map::get_range_addresses` to async by calling the replication strategy async entry point with the same name, as its callers are already async or can be made so easily. To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map, let the callers of this patch hold a effective_replication_map per keyspace and pass it down to the (now asynchronous) functions that use it (making affected storage_service methods static where possible if they no longer depend on the storage_service instance). Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate that is true for local_strategy and everywhere_replication_strategy, and is false otherwise. With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow. Refs #11005 Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping) Closes #11009 * github.com:scylladb/scylladb: effective_replication_map: make get_range_addresses asynchronous range_streamer: add_ranges and friends: get erm as param storage_service: get_new_source_ranges: get erm as param storage_service: get_changed_ranges_for_leaving: get erm as param storage_service: get_ranges_for_endpoint: get erm as param repair: use get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces storage_service: coroutinize update_pending_ranges effective_replication_map: add get_replication_strategy effective_replication_map: get_range_addresses: use the precalculated replication_map abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies abstract_replication_strategy: reindent utils: sequenced_set: expose set and `contains` method abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set utils: sequenced_set: templatize VectorType utils: sanitize sequenced_set utils: sequenced_set: delete mutable get_vector method	2022-08-09 13:25:53 +03:00
Benny Halevy	cffe00cc58	repair: use get_non_local_strategy_keyspaces_erms Use get_non_local_strategy_keyspaces_erms for getting a coherent set of keyspace names and their respective effective replication strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	7ee6048255	database: add get_non_local_strategy_keyspaces For node operations, we currently call get_non_system_keyspaces but really want to work on all keyspace that have non-local replication strategy as they are replicated on other nodes. Reflect that in the replica::database function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	ebe1edc091	utils: sequenced_set: expose set and `contains` method And use that in sights using the endpoint set returned by abstract_replication_strategy::calculate_natural_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	7017ad6822	abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set So it could be used also for easily searching for an endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Benny Halevy	2948a4feb6	repair: delete unused include of utils/bit_cast.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Asias He	d3c6e72c69	repair: Allow abort repair jobs in early stage Consider this: - User starts a repair job with http api - User aborts all repair - The repair_info object for the repair job is created - The repair job is not aborted In this patch, the repair uuid is recorded before repair_info object is created, so that repair can now abort repair jobs in the early stage. Fixes #10384 Closes #10428	2022-06-27 16:39:36 +03:00
Avi Kivity	3131cbea62	Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones. This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters. Refs: https://github.com/scylladb/scylla/issues/3672 Refs: https://github.com/scylladb/scylla/issues/7689 Refs: https://github.com/scylladb/scylla/issues/7933 Tests: manual upgrade test. I wrote a data set with: ``` ./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000 ``` This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload: ``` ./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100 ``` I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs. Closes #10829 * github.com:scylladb/scylla: query: have replica provide the last position idl/query: add last_position to query_result mutlishard_mutation_query: propagate compaction state to result builder multishard_mutation_query: defer creating result builder until needed querier: use full_position instead of ad-hoc struct querier: rely on compactor for position tracking mutation_compactor: add current_full_position() convenience accessor mutation_compactor: s/_last_clustering_pos/_last_pos/ mutation_compactor: add state accessor to compact_mutation introduce full_position idl: move position_in_partition into own header service/paging: use position_in_partition instead of clustering_key for last row alternator/serialization: extract value object parsing logic service/pagers/query_pagers.cc: fix indentation position_in_partition: add to_string(partition_region) and parse_partition_region() mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh	2022-06-27 12:23:21 +03:00
Benny Halevy	9c231ad0ce	repair_reader: construct _reader_handle before _reader Currently, the `_reader` member is explicitly initialized with the result of the call to `make_reader`. And `make_reader`, as a side effect, assigns a value to the `_reader_handle` member. Since C++ initializes class members sequentially, in the order they are defined, the assignment to `_reader_handle` in `make_reader()` happens before `_reader_handle` is initialized. This patch fixes that by changing the definition order, and consequently, the member initialization order in the constructor so that `_reader_handle` will be (default-)initialized before the call to `make_reader()`, avoiding the undefined behavior. Fixes #10882 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10883	2022-06-26 20:17:47 +03:00

1 2 3 4 5 ...

690 Commits