scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Botond Dénes	0c381572fd	repair::row_level: pin table for local reads The repair reader depends on the table object being alive, while it is reading. However, for local reads, there was no synchronization between the lifecycle of the repair reader and that of the table. In some cases this can result in use-after-free. Solve by using the table's existing mechanism for lifecycle extension: `read_in_progress()`. For the non-local reader, when the local node's shard configuration is different from the remote one's, this problem is already solved, as the multishard streaming reader already pins table objects on the used shards. This creates an inconsistency that might be suprising (in a bad way). One reader takes care of pinning needed resources while the other one doesn't. I was thorn on how to reconcile this, and decided to go with the simplest solution, explicitely pinning the table for local reads, that is conserve the inconsistency. It was suggested that this inconsitency is remedied by building resource pinning into the local reader as well [1] but there is opposition to this [2]. Adding a wrapper reader which does just the resource pinning seems excessive, both in code and runtime overhead. Spotted while investigating repair-related crashes which occured during interrupted repairs. Fixes: #4342 [1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050 [2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657 Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>	2019-03-20 14:45:22 +02:00
Asias He	a949ccee82	repair: Reject combination of -dc and -hosts options 4 nodes in the cluster n1, n2 in dc1 n3, n4 in dc2 dc1 RF=2, dc2 RF=2. If we run nodetool repair -hosts 127.0.0.1,127.0.03 -dc "dc1,dc2" multi on n1. The -hosts option will be ignored and only the -dc option will be used to choose which hosts to repair. In this case, n1 to n4 will be repaired. If user wants to select specific hosts to repair with, there is no need to specify the -dc option. Use the -hosts option is enough. Reject the combination and not to surprise the user. In https://issues.apache.org/jira/browse/CASSANDRA-9876, the same logic is introduced as well. Refs #3836 Message-Id: <e95ac1099f98dd53bb9d6534316005ea3577e639.1551406529.git.asias@scylladb.com>	2019-03-02 16:42:29 +02:00
Tomasz Grabiec	1a63a313c8	Merge "repair: Rename names to be consistent with rpc verb " from Asias Some of the function names are not updated after we change the rpc verb names. Rename them to make them consistent with the rpc verb names. * seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1: repair: Rename request_sync_boundary to get_sync_boundary repair: Rename request_full_row_hashes to get_full_row_hashes repair: Rename request_combined_row_hash to get_combined_row_hash repair: Rename request_row_diff to get_row_diff repair: Rename send_row_diff to put_row_diff repair: Update function name in docs/row_level_repair.md	2019-02-26 13:01:36 +01:00
Asias He	62104902db	repair: Rename send_row_diff to put_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6e4ea1b3c4	repair: Rename request_row_diff to get_row_diff Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	5b29fb30ac	repair: Rename request_combined_row_hash to get_combined_row_hash Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	6f6c4878d5	repair: Rename request_full_row_hashes to get_full_row_hashes Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Asias He	02ddfa393e	repair: Rename request_sync_boundary to get_sync_boundary Make it consistent with the row level repair rpc verb.	2019-02-25 15:13:39 +08:00
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Avi Kivity	468f8c7ee7	Merge "Print a warning if a row is too large" from Rafael " This is a first step in fixing #3988. " * 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla: Rename large_partition_handler Print a warning if a row is too large Remove defaut parameter value Rename _threshold_bytes to _partition_threshold_bytes keys: add schema-aware printing for clustering_key_prefix	2019-02-03 13:57:42 +02:00
Asias He	9d9ecda619	repair: Log keyspace and table name in repair_cf_range When a repair failed, we saw logs like: repair - Checksum of range (8235770168569320790, 8235957818553794560] on 127.0.0.1 failed: std::bad_alloc (std::bad_alloc) It is hard to tell which keyspace and table has failed. To fix, log the keyspace and table name. It is useful to know when debugging. Fixes #4166 Message-Id: <8424d314125b88bf5378ea02a703b0f82c2daeda.1548818669.git.asias@scylladb.com>	2019-01-31 12:36:46 +02:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Piotr Jastrzebski	fab1b7a3a2	Fix cross shard cf usage in repair Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 18:13:49 +01:00
Asias He	4b9e1a9f1d	repair: Add row level metrics Number of rows sent and received - tx_row_nr - rx_row_nr Bytes of rows sent and received - tx_row_bytes - rx_row_bytes Number of row hashes sent and received - tx_hashes_nr - rx_hashes_nr Number of rows read from disk - row_from_disk_nr Bytes of rows read from disk - row_from_disk_bytes Message-Id: <d1ee6b8ae8370857fe45f88b6c13087ea217d381.1547603905.git.asias@scylladb.com>	2019-01-16 14:04:57 +02:00
Duarte Nunes	04a14b27e4	Merge 'Add handling staging sstables to /upload dir' from Piotr " This series adds generating view updates from sstables added through /upload directory if their tables have accompanying materialized views. Said sstables are left in /upload directory until updates are generated from them and are treated just like staging sstables from /staging dir. If there are no views for a given tables, sstables are simply moved from /upload dir to datadir without any changes. Tests: unit (release) " * 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla: all: rename view_update_from_staging_generator distributed_loader: fix indentation service: add generating view updates from uploaded sstables init: pass view update generator to storage service sstables: treat sstables in upload dir as needing view build sstables,table: rename is_staging to requires_view_building distributed_loader: use proper directory for opening SSTable db,view: make throttling optional for view_update_generator	2019-01-15 18:19:27 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	08a42d47a5	repair: add stream phasing to row level repair In order to allow other services to wait for incoming streams to finish, row level repair uses stream phasing when creating new sstables from incoming data. Fixes scylladb#4032	2019-01-15 10:28:21 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Asias He	1de24c8495	repair: Use mf.visit() in fragment_hasher When new fragment type is added, it will fail to compile instead of producing runtime errors. Message-Id: <cf10200e4185c779aad15da3a776a5b79f5323af.1546930796.git.asias@scylladb.com>	2019-01-08 12:02:42 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Piotr Sarna	bc74ac6f09	repair: add staging sstables support to row level repair In some cases, sstables created during row level repair should be enqueued as staging in order to generate view updates from them. Fixes #4034	2019-01-03 08:36:45 +01:00
Piotr Sarna	a0003c52cf	main,repair: add params to row level repair init Row level repair needs references to system distributed keyspace and view update generator in order to enqueue some sstables as staging.	2019-01-03 08:31:41 +01:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Asias He	b9e0db801d	repair: Enable row level repair Finally, enable new row level repair if the cluster supports it. If not, fallback to the old partition level repair. Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	d372317e99	repair: Add row_level_repair === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes #3033	2018-12-12 16:49:01 +08:00
Asias He	fab31efae1	repair: Add repair_init_messaging_service_handler This patch implements all the rpc handlers for row level repair.	2018-12-12 16:49:01 +08:00
Asias He	3c80727d51	repair: Add repair_meta This patch introduces repair_meta class that is the core class for the row level repair. For each range to repair, repair_meta objects are created on both repair master and repair slaves. It stores the meta data for the row level repair algorithms, e.g, the current sync boundary, the buffer used to hold the rows the peers are working on, the reader to read data from sstable and the writer to write data to sstable. This patch also implements the RPC verbs for row level repair, for example, REPAIR_ROW_LEVEL_START/REPAIR_ROW_LEVEL_STOP to starts/stops row level repair for a range, REPAIR_GET_SYNC_BOUNDARY to get sync boundary peers want to work on, REPAIR_GET_ROW_DIFF to get missing rows from repair slaves and REPAIR_PUT_ROW_DIFF to pus missing rows to repair slaves.	2018-12-12 16:49:01 +08:00
Asias He	65099bac85	repair: Add repair_writer repair_writer uses multishard_writer to apply the mutation_fragments to sstable. The repair master needs one such writer for each of the repair slave. The repair slave needs one writer for the repair master.	2018-12-12 16:49:01 +08:00
Asias He	5b75f64e0e	repair: Add repair_reader repair_reader is used to read data from disk. It is simply a local flat_mutation_reader reader for the repair master. It is more complicated for the repair slave. The repair slaves have to follow what repair master read from disk. For example, Assume repair master has 2 shards and repair slave has 3 shards Repair master on shard 0 asks repair slave on shard 0 to read range [0,100). Repair master on shard 1 asks repair slave on shard 1 to read range [0,100). Repair master on shard 0 will only read the data that belongs to shard 0 within range [0,100). Since master and slave have different shard count, repair slave on shard 0 has to use the multi shard reader to collect data on all the shards. It can not pass range [0, 100) to the multi shard reader, otherwise it will read more data than the repair master. Instead, repair slave uses a sharder using sharding configuration of the repair master, to generate the sub ranges belong to shard 0 of repair master. If repair master and slave has the same sharding configuration, a simple local reader is enough for repair slave.	2018-12-12 16:49:01 +08:00
Asias He	27128d132d	repair: Add repair_row repair_row is the in-memory representation of "row" that the row level repair works on. It represents a mutation_fragment that is read from the flat_mutation reader. The hash of a repair_row is the combination of the mutation_fragment hash and partition_key hash.	2018-12-12 16:49:01 +08:00
Asias He	3e7b1d2ef4	repair: Add fragment_hasher It is used to calculate the hash of a mutation_fragment.	2018-12-12 16:49:01 +08:00
Asias He	e135871e4a	repair: Add decorated_key_with_hash Represents a decorated_key and the hash for it so that we do not need to calculate more than once if the decorated_key is used more than once.	2018-12-12 16:49:01 +08:00
Asias He	16c1b26937	repair: Add get_random_seed Get a random uint64_t number as the seed for the repair row hashing. The seed is passed to xx_hasher. We add the randomization when hashing rows so that when we run repair for the next time the same row produces different hashing number.	2018-12-12 16:49:01 +08:00
Asias He	54888ac52c	repair: Add get_common_diff_detect_algorithm It is used to find the common difference detection algorithms supported by repair master and repair slaves. It is up to repair master to choose what algorithm to use.	2018-12-12 16:49:01 +08:00
Asias He	0b294d5829	repair: Add shard_config It is used to store the shard configuration.	2018-12-12 16:49:01 +08:00
Asias He	a36b0966cf	repair: Add suportted_diff_detect_algorithms It returns a vector of row level repair difference detection algorithms supported by this node. We are going to implement the "send_full_set" in the following patches.	2018-12-12 16:49:01 +08:00
Asias He	42f2cd8dc5	repair: Add repair_stats to repair_info Also add update_statistics() to update current stats.	2018-12-12 16:49:01 +08:00
Asias He	43c04302f3	repair: Introduce repair_stats It is used by row level repair to track repair statistics.	2018-12-12 16:49:01 +08:00
Asias He	8cfdcf435e	repair: Export the repair logger It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	e62aeae2db	repair: Export repair_info It will be used by the row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	6be3b35d52	repair: Export estimate_partitions It will be used by row level repair soon.	2018-12-12 16:49:01 +08:00
Asias He	1a0bc8acf1	repair: Add struct hash<node_repair_meta_id> for node_repair_meta_id	2018-12-12 16:49:01 +08:00
Asias He	28d090ffda	repair: Add struct hash<repair_hash> for repair_hash	2018-12-12 16:49:01 +08:00
Asias He	ce70225b1c	repair: Introduce row_level_diff_detect_algorithm It specifies the algorithm that is used to find the row difference in repair.	2018-12-12 16:49:01 +08:00
Asias He	e9251df478	repair: Introduce partition_key_and_mutation_fragments Represent a partition_key and frozen_mutation_fragments within the partition_key.	2018-12-12 16:49:01 +08:00
Asias He	5d5a1beaec	repair: Introduce node_repair_meta_id It uses an IP address and a repair_meta_id to identify a repair instance started by the row level repair.	2018-12-12 16:49:01 +08:00
Asias He	edd72e10ac	repair: Introduce get_sync_boundary_response The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used in the row level repair code soon.	2018-12-12 16:49:01 +08:00
Asias He	95b9a889cf	repair: Introduce repair_hash It represents the hash value of a repair row.	2018-12-12 16:49:01 +08:00
Asias He	3e86b7a646	repair: Introduce repair_sync_boundary Represent a position of a mutation_fragment read from a flat mutation reader. Repair nodes negotiate a small sub range identified by two repair_sync_boundary to work on in each round.	2018-12-12 16:49:01 +08:00

1 2 3 4

182 Commits