scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	419354bc9f	test: harden test_cdc_generation_clearing In one of the previous patches, we fixed scylladb/scylladb#16916 as a side effect. We removed `system_keyspace::get_cdc_generations_cleanup_candidate`, which contained the bug causing the issue. Even though we didn't have to fix this issue directly, it showed us that `test_cdc_generation_clearing` was too weak. If something went wrong during/after the only clearing, the test still could pass because the clearing was the last action in the test. In scylladb/scylladb#16916, the CDC generation publisher was stuck after the clearing because of a recurring error. The test wouldn't detect it. Therefore, we harden the test by expecting two clearings instead of one. If something goes wrong during the first clearing, there is a high chance that the second clearing will fail. The new test version wouldn't pass with the old bug in the code.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	2b724735d1	test: test clean-up of committed_cdc_generations We extend `test_cdc_generation_clearing`. Now, it also tests the clean-up of `TOPOLOGY.committed_cdc_generations` added in the previous patch. In the implementation, we harden the already existing `check_system_topology_and_cdc_generations_v3_consistency`. After the previous patch, data of every generation present in `committed_cdc_generations` should be present in CDC_GENERATIONS_V3. In other words, `committed_cdc_generations` should always be a subset of a set containing generations in CDC_GENERATIONS_V3. Before the previous patch, this wasn't true after the clearing, so the new version of `test_cdc_generation_clearing` wouldn't pass back then.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	7301d1317b	raft topology: clean committed_cdc_generations We clean `TOPOLOGY.committed_cdc_generations` from obsolete generations to ensure this set doesn't grow endlessly. After this patch, the following invariant will be true: if a generation is in `committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	b8aa74f539	raft topology: clean only obsolete CDC generations' data Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution is to clear data of the generations that finished operating more than 24 hours ago. The rationale behind it is in the new comment in `topology_coordinator:clean_obsolete_cdc_generations`. The previous solution used the clean-up candidate. After introducing `committed_cdc_generations`, it became unneeded. The last obsolete generation can be computed in `topology_coordinator:clean_obsolete_cdc_generations`. Therefore, we remove all the code that handles the clean-up candidate. After changing how we clear CDC generations' data, `test_current_cdc_generation_is_not_removed` became obsolete. The tested feature is not present in the code anymore. `test_dependency_on_timestamps` became the only test case covering the CDC generation's data clearing. We adjust it after the changes.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	8b214d02fb	storage_service: topology_state_load: load all committed CDC generations We load all committed CDC generations into `cdc::metadata`. Since we have allowed sending writes to the previous generations in scylladb/scylladb#17134, the committed generations may be necessary to handle a correct request.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	18cff1aa6a	system_keyspace: load_topology_state: fix indentation Broken in the previous patch.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	e145e758eb	raft topology: store committed CDC generations' IDs in the topology When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. We need to adjust the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. This patch is the first step of the adjustment. We replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. This set is sorted by timestamps, just like `unpublished_cdc_generations`. This patch is mostly refactoring. The last generation in `committed_cdc_generations` is the equivalent of the previous `current_cdc_generation_{uuid, timestamp}`. The other generations are irrelevant for now. They will be used in the following patches. After introducing `committed_cdc_generations`, a newly committed generation is also unpublished (it was current and unpublished before the patch). We introduce `add_new_committed_cdc_generation`, which updates both sets of generations so that we don't have to call `add_committed_cdc_generation` and `add_unpublished_cdc_generation` together. It's easy to forget that both of them are necessary. Before this patch, there was no call to `add_unpublished_cdc_generation` in `topology_coordinator::build_coordinator_state`. It was a bug reported in scylladb/scylladb#17288. This patch fixes it. This patch also removes "the current generation" notion from the Raft-based topology. For the Raft-based topology, the current generation was the last committed generation. However, for the `cdc::metadata`, it was the generation operating now. These two generations could be different, which was confusing. For the `cdc::metadata`, the current generation is relevant as it is handled differently, but for the Raft-based topology, it isn't. Therefore, we change only the Raft-based topology. The generation called "current" is called "the last committed" from now.	2024-02-20 12:35:16 +01:00
Botond Dénes	050c6dcad7	api: storage_service/keyspaces: add replication filter To allow to filter the returned keyspaces based by the replication they use: tablets or vnodes. The filter can be disabled by omitting the parameter or passing "all". The default is "all". Fixes: #16509 Closes scylladb/scylladb#17319	2024-02-20 09:04:41 +01:00
Kefu Chai	57ede58a64	raft: add fmt::formatter for raft::fsm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `raft::fsm`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17414	2024-02-20 09:02:02 +02:00
Kefu Chai	acefde0735	mutation: add fmt::formatter for mutation_partition::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `mutation_partition::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17419	2024-02-20 09:01:22 +02:00
Kefu Chai	0b13de52de	sstable/mx: add fmt::formatter for cached_promoted_index::promoted_index_block before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `cached_promoted_index::promoted_index_block`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17415	2024-02-20 09:00:32 +02:00
Botond Dénes	2a494b6c47	Merge 'test/nodetool: parameterize test_ring' from Kefu Chai so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Fixes #17401 Closes scylladb/scylladb#17417 * github.com:scylladb/scylladb: test/nodetool: parameterize test_ring test/nodetool: fail a test only with leftover expected requests	2024-02-20 08:48:11 +02:00
Anna Stuchlik	69ead0142d	doc: remove outdated/invalid entries from FAQ This commit removes outdated or invalid FAQ entries specified in https://github.com/scylladb/scylladb/issues/16631 In addition, the questions about Cassandra compatibility are removed as they are already answered on the forum: https://forum.scylladb.com/t/which-cassandra-version-is-scylladb-it-compatible-with/84 Also, the incorrect entry about the cache has been removed and the correct answer is added to the forum. Fixes https://github.com/scylladb/scylladb/issues/17003 The question about troubleshooting performance issues has also been removed, as it's already covered on the Forum. Also, it removes the Apache copyright entry, which should not be added to the FAQ page. Closes scylladb/scylladb#17200	2024-02-20 08:43:58 +02:00
Anna Stuchlik	4f8f183736	doc: remove SSTable2json from the docs This commit removes the SSTable2json documentation, as well as the links to the removed page. In addition, it adds a redirection for that page to prevent 404. Fixes https://github.com/scylladb/scylladb/issues/17204 Closes scylladb/scylladb#17340	2024-02-20 08:43:27 +02:00
Kefu Chai	64f9d90f7b	tools/scylla-nodetool: implement toppartitions Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17357	2024-02-20 08:16:43 +02:00
Kefu Chai	3a94a7c1ff	test/nodetool: parameterize test_ring so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. Fixes #17401 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Kefu Chai	3d8a6956fc	test/nodetool: fail a test only with leftover expected requests if there are unconsumed requests whose `multiple` is -1, we should not consider it a required, the test can consume it or not. but if it does not, we should not consider the test a failure just because these requests are sitting at the end of queue. so, in this change, we * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Patryk Wrobel	82104b6f50	test_tablets: tablet count metric - remove assumption about tablets existence The mentioned test failed on CI. It sets up two nodes and performs operations related to creation and dropping of tables as well as moving tablets. Locally, the issue was not visible - also, the test was passing on CI in majority of cases. One of steps in the test case is intended to select the shard that has some tablets on host_0 and then move them to (host_1, shard_3). It contains also a precondition that requires the tablets count to be greater than zero - to ensure, that move_tablets operation really moves tablets. The error message in the failed CI run comes from the precondition related to tablets count on (host0, src_shard) - it was zero. This indicated that there were no tablets on entire host_0. The following commit removes the assumption about the existence of tablets on host_0. In case when there are no tablets there, the procedure is rerun for host_1. Now the logic is as follows: - find shard that has some tablets on host_0 - if such shard does not exist, then find such shard on host_1 - depending on the result of search set src/dest nodes - verify that reported tablet count metric is changed when move_tablet operation finishes Refs: scylladb#17386 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17398	2024-02-19 21:26:08 +01:00
Kefu Chai	3c84f08b93	alternator: add formatter for attribute_path_map_node<update_expression::action> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `attribute_path_map_node<update_expression::action>`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17270	2024-02-19 20:09:11 +02:00
Petr Gusev	f83df24108	test_decommission: fix log messages Closes scylladb/scylladb#17396	2024-02-19 12:09:43 +02:00
Kefu Chai	34cc245da5	gms: add formatter for read_context::dismantle_buffer_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `read_context::dismantle_buffer_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17389	2024-02-19 09:43:53 +02:00
Kefu Chai	fe8e37c5bd	configure.py: remove -Wno-unused-command-line-argument `-Wno-unused-command-line-argument` is used to disable the warning of `-Wunused-command-line-argument`, which is in turn used to split warnings if any of the command line arguments passed to the compiler driver is not used. see https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument but it seems we are not passing unused command line arguments to the compiler anymore. so let's drop this option. this change helps to * reduce the discrepencies between the compiling options used by CMake-generated rules and those generated directly using `configure.py` * reenable the warning so we are aware if any of the options is not used by compiler. this could a sign that the option fails to serve its purpose. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17195	2024-02-19 09:42:31 +02:00
Botond Dénes	42a76ca568	Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not. This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation. Closes scylladb/scylladb#17235 * github.com:scylladb/scylladb: topology: Restore indentation after previous patch topology: Drop if_enabled checks for logging topology: Add lazy_backtrace() helper topology: Add printer wrapper for node* and formatter for it topology: Expand formatter<locator::node>	2024-02-19 09:32:53 +02:00
Kefu Chai	47ec74ad1a	tools/scylla-nodetool: implement ring Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17375	2024-02-19 09:30:01 +02:00
Anna Stuchlik	ef1468d5ec	doc: remove Enterprise OS support from Open Source With this commit: - The information about ScyllaDB Enterprise OS support is removed from the Open Source documentation. - The information about ScyllaDB Open Source OS support is moved to the os-support-info file in the _common folder. - The os-support-info file is included in the os-support page using the scylladb_include_flag directive. This update employs the solution we added with https://github.com/scylladb/scylladb/pull/16753. It allows to dynamically add content to a page depending on the opensource/enterprise flag. Refs https://github.com/scylladb/scylladb/issues/15484 Closes scylladb/scylladb#17310	2024-02-18 22:09:06 +02:00
Petr Gusev	1d6caa42b9	join_cluster: move was_decommissioned check earlier Before the patch if a decommissioned node tries to restart, it calls _group0->discover_group0 first in join_cluster, which hangs since decommissioned nodes are banned and other nodes don't respond to their discovering requests. We fix the problem by checking was_decommissioned() flag before calling discover_group0. fixes scylladb/scylladb#17282 Closes scylladb/scylladb#17358	2024-02-18 22:07:28 +02:00
Kefu Chai	9d666f7d29	cmake: add -Wextra to compiling options this matches what we have in configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17376	2024-02-18 19:21:54 +02:00
Kefu Chai	cb781c0ff7	gms: add add formatter for gms::versioned_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::versioned_value`. its operator<< is preserved, as it's still being used by the homebrew generic formatter for std::unordered_map<gms::application_state, gms::versioned_value>, which is in turn used in gms/gossiper.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17366	2024-02-18 19:21:54 +02:00
Avi Kivity	43f1c3df2e	Merge 'repair: Update repair history for tablet repair' from Asias He This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history Closes scylladb/scylladb#17047 * github.com:scylladb/scylladb: repair: Update repair history for tablet repair repair: Extract flush hints code	2024-02-18 19:21:54 +02:00
Kefu Chai	8fc4243cf6	configure.py: do not pass include cxx_ldflags in cxxflags ldflags are passed to ld (the linker), while cxxflags are passed to the C++ compiler. the compiler does not understand the ldflags. if we pass ldflags to it, it complains if `-Wunused-command-line-argument` is enabled. in this change, we do not include the ldflags in cxxflags, this helps us to enable the warning option of `-Wunused-command-line-argument`, so we don't need to disabled it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17328	2024-02-18 19:21:54 +02:00
Avi Kivity	d257cc5003	Merge 'scylla-nodetool: implement the repair command' from Botond Dénes As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation. Refs: #15588 Closes scylladb/scylladb#17368 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the repair command test/nodetool: utils: add check_nodetool_fails_with_error_contains() test/nodetool: util: replace flags with custom matcher	2024-02-18 19:21:54 +02:00
Petr Gusev	4ef5d92f50	gossiping_property_file_snitch_test: modernize + fix potential race This is mostly a refactoring commit to make the test more readable, as a byproduct of scylladb/scylladb#17369 investigation. We add the check for specific type of exceptions that can be thrown (bad_property_file_error). We also fix the potential race - the test may write to res from multiple cores with no locks. Closes scylladb/scylladb#17371	2024-02-18 19:21:53 +02:00
Kefu Chai	4812a57f71	gms: add add formatter for gms::gossip_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for - gms::gossip_digest - gms::gossip_digest_ack - gms::gossip_digest_syn and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17379	2024-02-18 19:21:53 +02:00
Patryk Wrobel	3842bf18a7	storage_service/range_to_endpoint_map: allow API to properly handle tablets This API endpoint was failing when tablets were enabled because of usage of get_vnode_effective_replication_map(). Moreover, it was providing an error message that was not user-friendly. This change extends the handler to properly service the incoming requests. Furthermore, it introduces two new test cases that verify the behavior of storage_service/range_to_endpoint_map API. It also adjusts the test case of this endpoint for vnodes to succeed when tablets are enabled by default. The new logic is as follows: - when tablets are disabled then users may query endpoints for a keyspace or for a given table in a keyspace - when tablets are enabled then users have to provide table name, because effective replication map is per-table When user does not provide table name when tablets are enabled for a given keyspace, then BAD_REQUEST is returned with a meaningful error message. Fixes: scylladb#17343 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17372	2024-02-18 19:21:53 +02:00
Kefu Chai	808f4d72fb	storage_service: fix typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17377	2024-02-18 19:21:53 +02:00
Botond Dénes	b11213e547	tools/scylla-nodetool: implement the upgradesstables command Refs: #15588 Closes scylladb/scylladb#17370	2024-02-18 19:21:53 +02:00
Kefu Chai	af2553e8bc	cdc: add formatter for cdc::image_mode and cdc::delta_mode before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cdc::image_mode and cdc::delta_mode, and drop their operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17381	2024-02-18 19:21:53 +02:00
Avi Kivity	9bb4482ad0	Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak Before this PR, writes to the previous CDC generations would always be rejected. After this PR, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Apart from this change, this PR adds tests for it and updates the documentation. This PR is sufficient to enable writes to the previous generations only in the gossiper-based topology. The Raft-based topology needs some adjustments in loading and cleaning CDC generations. These changes won't interfere with the changes introduced in this PR, so they are left for a follow-up. Fixes scylladb/scylladb#7251 Fixes scylladb/scylladb#15260 Closes scylladb/scylladb#17134 * github.com:scylladb/scylladb: docs: using-scylla: cdc: remove info about failing writes to old generations docs: dev: cdc: document writing to previous CDC generations test: add test_writes_to_previous_cdc_generations cdc: generation: allow increasing generation_leeway through error injection cdc: metadata: allow sending writes to the previous generations	2024-02-18 19:21:53 +02:00
Asias He	796044be1c	repair: Update repair history for tablet repair This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history	2024-02-18 10:21:58 +08:00
Asias He	e43bc775d0	repair: Extract flush hints code So it can be used by tablet repair as well.	2024-02-18 09:42:02 +08:00
Kefu Chai	50964c423e	hints: host_filter: add formatter for hints::host_filter before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `hints::host_filter`. its operator<< is preserved as it's still used by the homebrew generic formatter for vector<>, which is in turn used by db/config.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17347	2024-02-16 19:03:11 +03:00
Anna Stuchlik	e132ffdb60	doc: add missing redirections This commit adds the missing redirections to the pages whose source files were previously stored in the install-scylla folder and were moved to another location. Closes scylladb/scylladb#17367	2024-02-16 14:09:26 +02:00
Kefu Chai	47fec0428a	tools/scylla-nodetool: return 1 when viewbuild not succeeds this change introduces a new exception which carries the status code so that an operation can return a non-zero exit code without printing any errors. this mimics the behavior of "viewbuildstatus" command of C* nodetool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17359	2024-02-16 13:53:33 +02:00
Botond Dénes	8d8ea12862	tools/scylla-nodetool: implement the repair command	2024-02-16 04:42:08 -05:00
Botond Dénes	48e8435466	test/nodetool: utils: add check_nodetool_fails_with_error_contains() Checks that at least one error snippet is contained in the error output.	2024-02-16 04:40:31 -05:00
Botond Dénes	190c9a7239	test/nodetool: util: replace flags with custom matcher _do_check_nodetool_fails_with() currently has a `match_all` flag to control how the match is checked. Now we need yet another way to control how matching is done. Instead of adding yet another flag (and who knows how many more), jut replace the flag and the errors input with a matcher functor, which gets the stdout and stderr and is delegated to do any checks it wants. This method will scale much better going forward.	2024-02-16 04:40:31 -05:00
Avi Kivity	eedb997568	Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Closes scylladb/scylladb#17335 * github.com:scylladb/scylladb: compaction: upgrade: handle keyspaces that use tablets replica/database: add an optional variant to get_keyspace_local_ranges	2024-02-15 21:31:54 +02:00
Kefu Chai	f0b3068bcf	build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy -Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy warning options are enabled by -Wextra. the tree fails to build with these options enabled, before we address them if the warning are genuine problems, let's disable them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17352	2024-02-15 21:27:44 +02:00
Kamil Braun	50ebce8acc	Merge 'Purge old ip on change' from Petr Gusev When a node changes IP address we need to remove its old IP from `system.peers` and gossiper. We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted. The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address. The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1. To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup. The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes. Fixes #16886 Fixes #16691 Fixes #17199 Closes scylladb/scylladb#17162 * github.com:scylladb/scylladb: test_change_ip: improve the test raft_ip_address_updater: remove stale IPs from gossiper raft_address_map: add my ip with the new generation system_keyspace::update_peer_info: check ep and host_id are not empty system_keyspace::update_peer_info: make host_id an explicit parameter system_keyspace::update_peer_info: remove any_set flag optimisation system_keyspace: remove duplicate ips for host_id system_keyspace: peers table: use coroutines storage_service::raft_ip_address_updater: log gossiper event name raft topology: ip change: purge old IP on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes	2024-02-15 17:40:29 +01:00
Lakshmi Narayanan Sreethar	7a98877798	compaction: upgrade: handle keyspaces that use tablets Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:47:39 +05:30

1 2 3 4 5 ...

41344 Commits