scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	bf107dae84	test/unit: add fmt::formatter for tree_test_key_base before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for the classes derived from `tree_test_key_base` (this change was extracted from a larger change at #15599) Refs #13245	2024-02-23 10:52:12 +08:00
Kefu Chai	a70318e722	test: add printer for type for BOOST_REQUIRE_EQUAL after dropping the operator<< for vector, we would not able to use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this, less defined the printer for Boost.test Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	63396f780d	test: add fmt::formatters the operator<< for `cql3::expr::test_utils::mutation_column_value` is preserved, as it used by test/lib/expr_test_utils.cc, which prints std::map<sstring, cql3::expr::test_utils::mutation_column_value> using the homebrew generic formatter for std::map<>. and the formatter uses operator<< for printing the elements in map. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	2ccd9e695d	test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * scheduling_latency_measurer * perf_result and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:17:50 +08:00
Avi Kivity	67f8dc5a7c	Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17461 * github.com:scylladb/scylladb: mutation: add fmt::formatter for clustering_row and friends mutation: add fmt::formatter for row_tombstone and friends	2024-02-22 16:16:26 +02:00
Nadav Har'El	b0233c0833	Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias. Closes scylladb/scylladb#17455 * github.com:scylladb/scylladb: interval: rename nonwrapping_interval to interval interval: rename interval_test to wrapping_interval_test	2024-02-22 14:03:43 +02:00
Kamil Braun	3ee56e1936	Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. This PR adjusts the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. To load all required generations into memory, we replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. To ensure this set doesn't grow endlessly, we remove an entry from this set together with the data in CDC_GENERATIONS_V3. Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution introduced in this PR is to clear data of the generations that finished operating more than 24 hours ago. Apart from the changes mentioned above, this PR hardens `test_cdc_generation_clearing.py`. Fixes scylladb/scylladb#16916 Fixes scylladb/scylladb#17184 Fixes scylladb/scylladb#17288 Closes scylladb/scylladb#17374 * github.com:scylladb/scylladb: test: harden test_cdc_generation_clearing test: test clean-up of committed_cdc_generations raft topology: clean committed_cdc_generations raft topology: clean only obsolete CDC generations' data storage_service: topology_state_load: load all committed CDC generations system_keyspace: load_topology_state: fix indentation raft topology: store committed CDC generations' IDs in the topology	2024-02-22 11:41:25 +01:00
Kefu Chai	37c6073fd5	mutation: add fmt::formatter for clustering_row and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 17:53:34 +08:00
Avi Kivity	51df8b9173	interval: rename nonwrapping_interval to interval Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.	2024-02-21 19:43:17 +02:00
Avi Kivity	e338f0e009	interval: rename interval_test to wrapping_interval_test As preparation for reclaiming the name `interval` for nonwrapping_interval, rename interval_test to wrapping_interval_test.	2024-02-21 19:38:53 +02:00
Botond Dénes	ca585903b7	test/cql-pytest: remove skip_with_tablets fixture All tests that used it are fixed, and we should not add any new tests failing with tablets from now on, so remove.	2024-02-21 02:08:49 -05:00
Botond Dénes	8df82d4781	test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests To run with both vnodes and tablets. For this functionality, both replication methods should be covered with tests, because it uses different ways to produce partition lists, depending on the replication method. Also add scylla_only to those tests that were missing this fixture before. All tests in this suite are scylla-only and with the parameterization, this is even more apparent.	2024-02-21 02:08:49 -05:00
Botond Dénes	b09b949159	test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets The underlying functionality was fixed, the tests should now pass with tablets.	2024-02-21 02:08:49 -05:00
Botond Dénes	7bdd0c2cae	locator: introduce tablet_range_spliter Given a list of partition-ranges, yields the intersection of this range-list, with that of that tablet-ranges, for tablets located on the given host. This will be used in multishard_mutation_query.cc, to obtain the ranges to read from the local node: given the read ranges, obtain the ranges belonging to tablets who have replicas on the local node.	2024-02-21 02:08:48 -05:00
Botond Dénes	239484f259	interval: add before() overload which takes another interval The current point variant cannot take inclusiveness into account, when said point comes from another interval bound. This method had no tests at all, so add tests covering both overloads.	2024-02-21 02:08:48 -05:00
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Tomasz Grabiec	e63d8ae272	Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration. This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier. To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field. refs: #16527 Closes scylladb/scylladb#17360 * github.com:scylladb/scylladb: test/topology: Add checking error paths for failed migration topology.tablets_migration: Handle failed streaming topology.tablets_migration: Add cleanup_target transition stage topology.tablets_migration: Add revert_migration transition stage storage_service: Rewrap cleanup stage checking in cleanup_tablet() test/topology: Move helpers to get tablet replicas to pylib	2024-02-20 18:50:55 +01:00
Botond Dénes	73a3a3faf3	Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17387 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement tablestats utils/rjson: add templated streaming_writer::Write()	2024-02-20 14:46:07 +02:00
Patryk Jędrzejczak	419354bc9f	test: harden test_cdc_generation_clearing In one of the previous patches, we fixed scylladb/scylladb#16916 as a side effect. We removed `system_keyspace::get_cdc_generations_cleanup_candidate`, which contained the bug causing the issue. Even though we didn't have to fix this issue directly, it showed us that `test_cdc_generation_clearing` was too weak. If something went wrong during/after the only clearing, the test still could pass because the clearing was the last action in the test. In scylladb/scylladb#16916, the CDC generation publisher was stuck after the clearing because of a recurring error. The test wouldn't detect it. Therefore, we harden the test by expecting two clearings instead of one. If something goes wrong during the first clearing, there is a high chance that the second clearing will fail. The new test version wouldn't pass with the old bug in the code.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	2b724735d1	test: test clean-up of committed_cdc_generations We extend `test_cdc_generation_clearing`. Now, it also tests the clean-up of `TOPOLOGY.committed_cdc_generations` added in the previous patch. In the implementation, we harden the already existing `check_system_topology_and_cdc_generations_v3_consistency`. After the previous patch, data of every generation present in `committed_cdc_generations` should be present in CDC_GENERATIONS_V3. In other words, `committed_cdc_generations` should always be a subset of a set containing generations in CDC_GENERATIONS_V3. Before the previous patch, this wasn't true after the clearing, so the new version of `test_cdc_generation_clearing` wouldn't pass back then.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	b8aa74f539	raft topology: clean only obsolete CDC generations' data Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution is to clear data of the generations that finished operating more than 24 hours ago. The rationale behind it is in the new comment in `topology_coordinator:clean_obsolete_cdc_generations`. The previous solution used the clean-up candidate. After introducing `committed_cdc_generations`, it became unneeded. The last obsolete generation can be computed in `topology_coordinator:clean_obsolete_cdc_generations`. Therefore, we remove all the code that handles the clean-up candidate. After changing how we clear CDC generations' data, `test_current_cdc_generation_is_not_removed` became obsolete. The tested feature is not present in the code anymore. `test_dependency_on_timestamps` became the only test case covering the CDC generation's data clearing. We adjust it after the changes.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	e145e758eb	raft topology: store committed CDC generations' IDs in the topology When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. We need to adjust the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. This patch is the first step of the adjustment. We replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. This set is sorted by timestamps, just like `unpublished_cdc_generations`. This patch is mostly refactoring. The last generation in `committed_cdc_generations` is the equivalent of the previous `current_cdc_generation_{uuid, timestamp}`. The other generations are irrelevant for now. They will be used in the following patches. After introducing `committed_cdc_generations`, a newly committed generation is also unpublished (it was current and unpublished before the patch). We introduce `add_new_committed_cdc_generation`, which updates both sets of generations so that we don't have to call `add_committed_cdc_generation` and `add_unpublished_cdc_generation` together. It's easy to forget that both of them are necessary. Before this patch, there was no call to `add_unpublished_cdc_generation` in `topology_coordinator::build_coordinator_state`. It was a bug reported in scylladb/scylladb#17288. This patch fixes it. This patch also removes "the current generation" notion from the Raft-based topology. For the Raft-based topology, the current generation was the last committed generation. However, for the `cdc::metadata`, it was the generation operating now. These two generations could be different, which was confusing. For the `cdc::metadata`, the current generation is relevant as it is handled differently, but for the Raft-based topology, it isn't. Therefore, we change only the Raft-based topology. The generation called "current" is called "the last committed" from now.	2024-02-20 12:35:16 +01:00
Kefu Chai	c627d9134e	tools/scylla-nodetool: implement tablestats Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Botond Dénes	050c6dcad7	api: storage_service/keyspaces: add replication filter To allow to filter the returned keyspaces based by the replication they use: tablets or vnodes. The filter can be disabled by omitting the parameter or passing "all". The default is "all". Fixes: #16509 Closes scylladb/scylladb#17319	2024-02-20 09:04:41 +01:00
Botond Dénes	2a494b6c47	Merge 'test/nodetool: parameterize test_ring' from Kefu Chai so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Fixes #17401 Closes scylladb/scylladb#17417 * github.com:scylladb/scylladb: test/nodetool: parameterize test_ring test/nodetool: fail a test only with leftover expected requests	2024-02-20 08:48:11 +02:00
Kefu Chai	64f9d90f7b	tools/scylla-nodetool: implement toppartitions Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17357	2024-02-20 08:16:43 +02:00
Pavel Emelyanov	1440eddc58	test/topology: Add checking error paths for failed migration For now only fail streaming stage and check that migration doesn't get stuck and doesn't make tablet appear on dead node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	c06cbc391f	test/topology: Move helpers to get tablet replicas to pylib These are very useful and will be used across different test files soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Kefu Chai	3a94a7c1ff	test/nodetool: parameterize test_ring so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. Fixes #17401 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Kefu Chai	3d8a6956fc	test/nodetool: fail a test only with leftover expected requests if there are unconsumed requests whose `multiple` is -1, we should not consider it a required, the test can consume it or not. but if it does not, we should not consider the test a failure just because these requests are sitting at the end of queue. so, in this change, we * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Patryk Wrobel	82104b6f50	test_tablets: tablet count metric - remove assumption about tablets existence The mentioned test failed on CI. It sets up two nodes and performs operations related to creation and dropping of tables as well as moving tablets. Locally, the issue was not visible - also, the test was passing on CI in majority of cases. One of steps in the test case is intended to select the shard that has some tablets on host_0 and then move them to (host_1, shard_3). It contains also a precondition that requires the tablets count to be greater than zero - to ensure, that move_tablets operation really moves tablets. The error message in the failed CI run comes from the precondition related to tablets count on (host0, src_shard) - it was zero. This indicated that there were no tablets on entire host_0. The following commit removes the assumption about the existence of tablets on host_0. In case when there are no tablets there, the procedure is rerun for host_1. Now the logic is as follows: - find shard that has some tablets on host_0 - if such shard does not exist, then find such shard on host_1 - depending on the result of search set src/dest nodes - verify that reported tablet count metric is changed when move_tablet operation finishes Refs: scylladb#17386 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17398	2024-02-19 21:26:08 +01:00
Petr Gusev	f83df24108	test_decommission: fix log messages Closes scylladb/scylladb#17396	2024-02-19 12:09:43 +02:00
Kefu Chai	47ec74ad1a	tools/scylla-nodetool: implement ring Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17375	2024-02-19 09:30:01 +02:00
Petr Gusev	1d6caa42b9	join_cluster: move was_decommissioned check earlier Before the patch if a decommissioned node tries to restart, it calls _group0->discover_group0 first in join_cluster, which hangs since decommissioned nodes are banned and other nodes don't respond to their discovering requests. We fix the problem by checking was_decommissioned() flag before calling discover_group0. fixes scylladb/scylladb#17282 Closes scylladb/scylladb#17358	2024-02-18 22:07:28 +02:00
Avi Kivity	43f1c3df2e	Merge 'repair: Update repair history for tablet repair' from Asias He This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history Closes scylladb/scylladb#17047 * github.com:scylladb/scylladb: repair: Update repair history for tablet repair repair: Extract flush hints code	2024-02-18 19:21:54 +02:00
Avi Kivity	d257cc5003	Merge 'scylla-nodetool: implement the repair command' from Botond Dénes As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation. Refs: #15588 Closes scylladb/scylladb#17368 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the repair command test/nodetool: utils: add check_nodetool_fails_with_error_contains() test/nodetool: util: replace flags with custom matcher	2024-02-18 19:21:54 +02:00
Petr Gusev	4ef5d92f50	gossiping_property_file_snitch_test: modernize + fix potential race This is mostly a refactoring commit to make the test more readable, as a byproduct of scylladb/scylladb#17369 investigation. We add the check for specific type of exceptions that can be thrown (bad_property_file_error). We also fix the potential race - the test may write to res from multiple cores with no locks. Closes scylladb/scylladb#17371	2024-02-18 19:21:53 +02:00
Patryk Wrobel	3842bf18a7	storage_service/range_to_endpoint_map: allow API to properly handle tablets This API endpoint was failing when tablets were enabled because of usage of get_vnode_effective_replication_map(). Moreover, it was providing an error message that was not user-friendly. This change extends the handler to properly service the incoming requests. Furthermore, it introduces two new test cases that verify the behavior of storage_service/range_to_endpoint_map API. It also adjusts the test case of this endpoint for vnodes to succeed when tablets are enabled by default. The new logic is as follows: - when tablets are disabled then users may query endpoints for a keyspace or for a given table in a keyspace - when tablets are enabled then users have to provide table name, because effective replication map is per-table When user does not provide table name when tablets are enabled for a given keyspace, then BAD_REQUEST is returned with a meaningful error message. Fixes: scylladb#17343 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17372	2024-02-18 19:21:53 +02:00
Botond Dénes	b11213e547	tools/scylla-nodetool: implement the upgradesstables command Refs: #15588 Closes scylladb/scylladb#17370	2024-02-18 19:21:53 +02:00
Avi Kivity	9bb4482ad0	Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak Before this PR, writes to the previous CDC generations would always be rejected. After this PR, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Apart from this change, this PR adds tests for it and updates the documentation. This PR is sufficient to enable writes to the previous generations only in the gossiper-based topology. The Raft-based topology needs some adjustments in loading and cleaning CDC generations. These changes won't interfere with the changes introduced in this PR, so they are left for a follow-up. Fixes scylladb/scylladb#7251 Fixes scylladb/scylladb#15260 Closes scylladb/scylladb#17134 * github.com:scylladb/scylladb: docs: using-scylla: cdc: remove info about failing writes to old generations docs: dev: cdc: document writing to previous CDC generations test: add test_writes_to_previous_cdc_generations cdc: generation: allow increasing generation_leeway through error injection cdc: metadata: allow sending writes to the previous generations	2024-02-18 19:21:53 +02:00
Asias He	796044be1c	repair: Update repair history for tablet repair This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history	2024-02-18 10:21:58 +08:00
Kefu Chai	47fec0428a	tools/scylla-nodetool: return 1 when viewbuild not succeeds this change introduces a new exception which carries the status code so that an operation can return a non-zero exit code without printing any errors. this mimics the behavior of "viewbuildstatus" command of C* nodetool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17359	2024-02-16 13:53:33 +02:00
Botond Dénes	8d8ea12862	tools/scylla-nodetool: implement the repair command	2024-02-16 04:42:08 -05:00
Botond Dénes	48e8435466	test/nodetool: utils: add check_nodetool_fails_with_error_contains() Checks that at least one error snippet is contained in the error output.	2024-02-16 04:40:31 -05:00
Botond Dénes	190c9a7239	test/nodetool: util: replace flags with custom matcher _do_check_nodetool_fails_with() currently has a `match_all` flag to control how the match is checked. Now we need yet another way to control how matching is done. Instead of adding yet another flag (and who knows how many more), jut replace the flag and the errors input with a matcher functor, which gets the stdout and stderr and is delegated to do any checks it wants. This method will scale much better going forward.	2024-02-16 04:40:31 -05:00
Kamil Braun	50ebce8acc	Merge 'Purge old ip on change' from Petr Gusev When a node changes IP address we need to remove its old IP from `system.peers` and gossiper. We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted. The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address. The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1. To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup. The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes. Fixes #16886 Fixes #16691 Fixes #17199 Closes scylladb/scylladb#17162 * github.com:scylladb/scylladb: test_change_ip: improve the test raft_ip_address_updater: remove stale IPs from gossiper raft_address_map: add my ip with the new generation system_keyspace::update_peer_info: check ep and host_id are not empty system_keyspace::update_peer_info: make host_id an explicit parameter system_keyspace::update_peer_info: remove any_set flag optimisation system_keyspace: remove duplicate ips for host_id system_keyspace: peers table: use coroutines storage_service::raft_ip_address_updater: log gossiper event name raft topology: ip change: purge old IP on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes	2024-02-15 17:40:29 +01:00
Avi Kivity	5df5714331	Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter. This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR. Fixes: #17313 Closes scylladb/scylladb#17316 * github.com:scylladb/scylladb: service/storage_service: get_natural_endpoints(): add tablets support replica/database: keyspace: add uses_tablets() service/storage_service: remove token overload of get_natural_endpoints()	2024-02-15 13:36:56 +02:00
Botond Dénes	811e931b09	Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17344 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement viewbuildstatus tools/scylla-nodetool: implement compactionstats	2024-02-15 12:44:05 +02:00
Petr Gusev	c4140678ba	test_change_ip: improve the test In this commit we refactor test_change_ip to improve it in several ways: * We inject failure before old IP is removed and verify that after restart the node sees the proper peers - the new IP for node2 and old IP for node3, which is not restarted yet. * We introduce the lambda wait_proper_ips, which checks not only the system.peers table, but also gossiper and token_metadata. * We call this lambda for all nodes, not only the first node; this allows to validate that the node that has changed its IP has the proper IP of itself in the data structures above. Note that we need to inject an additional delay ip-change-raft-sync-delay before old IP is removed. Otherwise the problem stop reproducing - other nodes remove the old IP before it's send back to the just restarted node.	2024-02-15 13:26:02 +04:00
Petr Gusev	4b33ba2894	raft_address_map: add my ip with the new generation The following scenario is possible: a node A changes its IP from ip1 to ip2 with restart, other nodes are not yet aware of ip2 so they keep gossiping ip1, after restart A receives ip1 in a gossip message and calls handle_major_state_change since it considers it as a new node. Then on_join event is called on the gossiper notification handles, we receive such event in raft_ip_address_updater and reverts the IP of the node A back to ip1. The essence of the problem is that we don't pass the proper generation when we add ip2 as a local IP during initialization when node A restarts, so the zero generation is used in raft_address_map::add_or_update_entry and the gossiper message owerwrites ip2 to ip1. In this commit we fix this problem by passing the new generation. To do that we move the increment_and_get_generation call from join_token_ring to scylla_main, so that we have a new generation value before init_address_map is called. Also we remove the load_initial_raft_address_map function from raft_group0 since it's redundant. The comment above its call site says that it's needed to not miss gossiper updates, but the function storage_service::init_address_map where raft_address_map is now initialized is called before gossiper is started. This function does both - it load the previously persisted host_id<->IP mappings from system.local and subscribes to gossiper notifications, so there is no room for races. Note that this problem reproduces less likely with the 'raft topology: ip change: purge old IP' commit - other nodes remove the old IP before it's send back to the just restarted node. This is also the reason why this problem doesn't occur in gossiper mode. fixes scylladb/scylladb#17199	2024-02-15 13:21:04 +04:00

1 2 3 4 5 ...

6369 Commits