scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Botond Dénes	7f04d8231d	Merge 'gms: define and use generation and version types' from Benny Halevy This series cleans up the generation and value types used in gms / gossiper. Currently we use a blend of int, int32_t, and int64_t around messaging. This change defines gms::generation_type and gms::version_type as int32_t and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits. Closes #12966 * github.com:scylladb/scylladb: gossiper: version_generator: add {debug_,}validate_gossip_generation gms: gossip_digest: use generation_type and version_type gms: heart_beat_state: use generation_type and version_type gms: versioned_value: use version_type gms: version_generator: define version_type and generation_type strong types utils: move generation-number to gms utils: add tagged_integer gms: versioned_value: make members private scylla-gdb: add get_gms_versioned_value gms: versioned_value: delete unused compare_to function gms: gossip_digest: delete unused compare_to function	2023-04-24 08:44:48 +03:00
Maxim Korolyov	002bdd7ae7	doc: add jaeger integration docs Closes #13490	2023-04-24 08:26:53 +03:00
Chang Chen Chien	c25a718008	docs: fix typo in using-scylla/local-secondary-indexes.rst Closes #13607	2023-04-24 06:56:19 +03:00
Pavel Emelyanov	5e201b9120	database: Remove compaction_manager.hh inclusion into database.hh The only reason why it's there (right next to compaction_fwd.hh) is because the database::table_truncate_state subclass needs the definition of compaction_manager::compaction_reenabler subclass. However, the former sub is not used outside of database.cc and can be defined in .cc. Keeping it outside of the header allows dropping the compaction_manager.hh from database.hh thus greatly reducing its fanout over the code (from ~180 indirect inclusions down to ~20). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13622	2023-04-23 16:27:11 +03:00
Benny Halevy	5520d3a8e3	gossiper: version_generator: add {debug_,}validate_gossip_generation Make sure that the int64_t generation we get over rpc fits in the int32_t generation_type we keep locally. Restrict this assertion to non-release builds. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	5dc7b7811c	gms: gossip_digest: use generation_type and version_type Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	4cdad8bc8b	gms: heart_beat_state: use generation_type and version_type Define default constructor as heart_beat_state(gms::generation_type(0)) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	b638571cb0	gms: versioned_value: use version_type Adjust scylla-gdb.get_gms_version_value to get the versioned_value version as version_type (utils::tagged_integer). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	2d20ee7d61	gms: version_generator: define version_type and generation_type strong types Derived from utils::tagged_integer, using different tags, the types are incompatible with each other and require explicit typecasting to- and from- their value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:47:17 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	f5f566bdd8	utils: add tagged_integer A generic template for defining strongly typed integer types. Use it here to replace raft::internal::tagged_uint64. Will be used for defining gms generation and version as strong and distinguishable types in following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	c5d819ce60	gms: versioned_value: make members private and provide accessor functions to get them. 1. So they can't be modified by mistake, as the versioned value is immutable. A new value must have a higher version. 2. Before making the version a strong gms::version_type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	5aaec73612	scylla-gdb: add get_gms_versioned_value Prepare for next patch that makes gms::versioned_value members private, and provides methods by the same name as the current members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	44a8db016a	gms: versioned_value: delete unused compare_to function Not only it is unused, it is wrong since it doesn't compare the value, only its version. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	59e771be5c	gms: gossip_digest: delete unused compare_to function Not only it is unused, it is wrong since it doesn't compare the digest endpoint member. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Tomasz Grabiec	bd0b299322	Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`). The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation. When topology coordinator handles a request for node to join, calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. After inserting new CDC generation data , we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening. When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables. We include the current CDC generation data in topology snapshot transfers. Some fixes and refactors included. Closes #13385 * github.com:scylladb/scylladb: docs: cdc: describe generation changes using group 0 topology coordinator cdc: generation_service: add a FIXME cdc: generation_service: add legacy_ prefix for gossiper-based functions storage_service: include current CDC generation data in topology snapshots db: system_keyspace: introduce `query_mutations` with range/slice storage_service: hold group 0 apply mutex when reading topology snapshot service: raft_group0_client: introduce `hold_read_apply_mutex` storage_service: use CDC generations introduced by Raft topology raft topology: publish new CDC generation to the user description tables raft topology: commit a new CDC generation on node bootstrap raft topology: create new CDC generation data during node bootstrap service: topology_state_machine: make topology::find const db: system_keyspace: small refactor of `load_topology_state` cdc: generation: extract pure parts of `make_new_generation` outside db: system_keyspace: add storage for CDC generations managed by group 0 service: topology_state_machine: better error checking for state name (de)serialization service: raft: plumbing `cdc::generation_service&` cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter cdc: generation: make `topology_description_generator::get_sharding_info` a parameter sys_dist_ks: make `get_cdc_generation_mutations` public sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` service/raft: group0_state_machine: signal topology state machine in `load_snapshot`	2023-04-21 18:11:27 +02:00
Anna Stuchlik	a68b976c91	doc: document `tombstone_gc` as not experimental The tombstone_gc was documented as experimental in version 5.0. It is no longer experimental in version 5.2. This commit updates the information about the option. Closes #13469	2023-04-21 14:43:25 +02:00
Botond Dénes	fcd7f6ac5f	Update tools/java submodule * tools/java c9be8583...eb3c43f8 (1): > Use EstimatedHistogram in metricPercentilesAsArray	2023-04-21 14:31:38 +03:00
Kefu Chai	a2aa133822	treewide: use std::lexicographical_compare_threeway this the standard library offers `std::lexicographical_compare_threeway()`, and we never uses the last two addition parameters which are not provided by `std::lexicographical_compare_threeway()`. there is no need to have the homebrew version of trichotomic compare function. in this change, * all occurrences of `lexicographical_tri_compare()` are replaced with `std::lexicographical_compare_threeway()`. * ``lexicographical_tri_compare()` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13615	2023-04-21 14:28:18 +03:00
Kefu Chai	51fc0bc698	sstables: use default generated operator== C++20 compiler is able to generate defaulted operator== and operator!=. and the default generated operators behaves exactly the same as the ones crafted by us. so let's it do its job. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13614	2023-04-21 14:25:39 +03:00
Botond Dénes	10c1f1dc80	Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594 Closes #13604 * github.com:scylladb/scylladb: db: system_keyspace: use microsecond resolution for group0_history range tombstone utils: UUID_gen: accept decimicroseconds in min_time_UUID	2023-04-21 14:08:56 +03:00
Kamil Braun	55f43e532c	Merge 'get rid of gms/failure_detector' from Benny Halevy Move gms::arrival_window to api/failure_detector which is its only user. and get rid of the rest, which is not used, now that we use direct_failure_detector instead. TODO: integare direct_failure_detector with failure_detector api. Closes #13576 * github.com:scylladb/scylladb: gms: get rid of unused failure_detector api: failure_detector: remove false dependency on failure_detector::arrival_window test: rest_api: add test_failure_detector	2023-04-21 11:47:44 +02:00
Kamil Braun	f7408130c9	Merge 'Fix topology management when raft-based topology is enabled' from Tomasz Grabiec Fixes a problem when raft-based topology is enabled, which loads topology from storage. It starts by clearing topology and then adding nodes one by one. Before this patch, this violates internal invariant of topology object which puts the local node as the first node. This would manifest by triggering an assert in topology::pop_node() which throws if popping the node at index 0 in order to keep the information about local node around. This is normally prevented by a check in topology::remove_node() which avoid calling pop_node() if removing the local node. But since there is no node which is marked as local, this check allows the first node to be popped. To fix the problem I lift the invariant that local node is always in _nodes. We still have information about local node in config. Instead of keeping it in _nodes, we recognize it as part of indexing. We also allow removing the local node like a regular node. The path which reloads topology works correctly after this, the local node will be recognized when (if) it is added to the topology. Fixes #13495 Closes #13498 * github.com:scylladb/scylladb: locator: topology: Fix move assignment locator: topology: Add printer tests: topology: Test that topology clearing preserves information about local node locator: topology: Recognize local node as part of indexing it locator: topology: Fix get_location(ep) for local node locator: topology: Fix typo locator: topology: Preserve config when cloning	2023-04-21 11:45:08 +02:00
Alejo Sanchez	ce87aedd30	test: topology smp test with custom cluster Instead of decommission of initial cluster, use custom cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13589	2023-04-21 10:43:54 +02:00
Kamil Braun	f9d8118c8d	db: system_keyspace: use microsecond resolution for group0_history range tombstone in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594	2023-04-21 10:33:05 +02:00
Kamil Braun	218a056825	utils: UUID_gen: accept decimicroseconds in min_time_UUID The function now accepts higher-resolution duration types, such as microsecond resolution timestamps. Will be used by the next commit.	2023-04-21 10:33:02 +02:00
Kefu Chai	ca6ebbd1f0	cql3, db: sstable: specialize fmt::formatter<function_name> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `function_name` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13608	2023-04-21 10:07:28 +03:00
Botond Dénes	d74f3598f4	Merge 'dht: specialize fmt::formatter<dht::token>' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`. the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes. also, the forward declaration of `operator<<(ostream&, constdht::token&)` in `dht/i_partitioner.hh` is removed. ias it not necessary. Refs https://github.com/scylladb/scylladb/issues/13245 Closes #13610 * github.com:scylladb/scylladb: dht: remove unnecessarily forward declaration dht: specialize fmt::formatter<dht::token>	2023-04-21 09:51:25 +03:00
Kefu Chai	c5fa1ac9f7	sstable: specialize fmt::formatter<component_type> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `component_type` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. also, please note, to enable fmtlib to format `std::set<component_type>` in `test/boost/sstable_3_x_test.cc` , we need to include `<fmt/ranges.h>` in that source file. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13598	2023-04-21 09:49:24 +03:00
Kefu Chai	9215adee46	streaming: specialize fmt::formatter<stream_reason> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `stream_reason` without the help of `operator<<`. please note, because we still cannot use the generic formatter for std::unordered_map provided by fmtlib, so in order to drop `operator<<` for `stream_reason`, and to print `unordered_map<stream_reason>`, `fmt::join()` is used as a temporary solution. we will audit all `fmt::join()` calls, after removing the homebrew formatter of `std::unordered_map`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13609	2023-04-21 09:44:23 +03:00
Kefu Chai	ecb5380638	treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/ this change replaces all occurrences of `boost::lexical_cast<std::string>` in the source tree with `fmt::to_string()`. for couple reasons: * `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`, so the latter is easier to parse and read. * `boost::lexical_cast<std::string>` creates a stringstream under the hood, so it can use the `operator<<` to stringify the given object. but stringstream is known to be less performant than fmtlib. * we are migrating to fmtlib based formatting, see #13245. so using `fmt::to_string()` helps us to remove yet another dependency on `operator<<`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13611	2023-04-21 09:43:53 +03:00
Benny Halevy	3f1ac846d8	gms: get rid of unused failure_detector The legacy failure_detector is now unused and can be removed. TODO: integare direct_failure_detector with failure_detector api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:27 +03:00
Benny Halevy	d546b92685	api: failure_detector: remove false dependency on failure_detector::arrival_window Up until `0ef33b71ba` get_endpoint_phi_values retrieved arrival samples from gms::get_arrival_samples(). That function was removed since it returned a constant ampty map. This patch returns empty results without relying on failure_detector::arrival_window, so the latter can be retired altogether. As Tomasz Grabiec <tgrabiec@scylladb.com> said: > I don't think the logic of arrival_window belongs to api, > it belongs to the failure detector. If there is no longers > a failure detector, there should be no arrival_window. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:25 +03:00
Benny Halevy	35de60670c	test: rest_api: add test_failure_detector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:06:15 +03:00
Nadav Har'El	9c3907bb3c	test/cql-pytest: reproducers for incorrect AVG of "decimal" type This patch contains tests reproducing issue #13601 and the corresponding Cassandra issue CASSANDRA-18470. These issues are about what the AVG aggregation does for arbitrary-precision "decimal" numbers - the tests we add here show examples where the current behavior doesn't make sense: The problem is that "decimal" has arbitrary precision - so, should an average of 1/3 be returned as 0.3 or 0.33333333333333333? This is not specified, so Scylla (and Cassandra) decided to pick the result precision based on the input precision. In particular, the average of 1 and 2 is returned as 2 (zero digits after the decimal point, like in the inputs) instead of the expected 1.5. Arguably this isn't useful behavior. The test adds a second test which fails on Cassandra, but does pass on Scylla: Cassandra returns as the average of 1, 2, 2, 3 the integer 1 whereas the correct average is 2 (and Scylla returns it correctly). The reason why this bug is even worse on Cassandra is that Scylla's AVG only loses precision when dividing the sum and count, but Cassandra tries to maintain only the average, and loses precision at every step. Refs #13601 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13603	2023-04-21 08:32:30 +03:00
Kefu Chai	7b21bfd36e	mutation: specialize fmt::formatter<apply_resume> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `apply_resume` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13584	2023-04-21 08:27:57 +03:00
Benny Halevy	77b70dbdb7	sstables: compressed_file_data_source_impl: get: throw malformed_sstable_exception on premature eof Currently, the reader might dereference a null pointer if the input stream reaches eof prematurely, and read_exactly returns an empty temporary_buffer. Detect this condition before dereferencing the buffer and sstables::malformed_sstable_exception. Fixes #13599 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13600	2023-04-21 07:56:58 +03:00
Botond Dénes	d828cfcb23	Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity Database functions currently receive their arguments as an std::vector. This is inflexible (for example, one cannot use small_vector to reduce allocations). This series adapts the function signature to accept parameters using std::span. Some changes in the keys interface are needed to support this. Lastly, one call site is migrated to small_vector. This is in support of changing selectors to use expressions. Closes #13581 * github.com:scylladb/scylladb: cql3: abstract_function_selector: use small_vector for argument buffer db, cql3: functions: pass function parameters as a span instead of a vector keys: change from_optional_exploded to accept a span instead of a vector	2023-04-21 06:49:07 +03:00
Kefu Chai	fe9f41bd84	dht: remove unnecessarily forward declaration it turns out the declaration of `operator<<(ostream&, const dht::token&)` is unnecessarily. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Kefu Chai	53dedca8cd	dht: specialize fmt::formatter<dht::token> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`. the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Avi Kivity	0c64dd12b1	test: raft_server_test: fix string compare for clang 15 Clang 15 rejects string compares where the left-hand-side is a C string, so help it along by converting it ourselves. Closes #13582	2023-04-21 06:38:10 +03:00
Tomasz Grabiec	0ec700cd00	locator: topology: Fix move assignment Defaulted assignment doesn't update node::_topology.	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	6ed841b8d7	locator: topology: Add printer	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	3dfd49fe62	tests: topology: Test that topology clearing preserves information about local node	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	7d3384089a	locator: topology: Recognize local node as part of indexing it Fixes a problem when raft-based topology is enabled, which loads topology from storage. It starts by clearing topology and then adding nodes one by one. Before this patch, this violates internal invariant of topology object which puts the local node as the first node. This would manifest by triggering an assert in topology::pop_node() which throws if popping the node at index 0 in order to keep the information about local node around. This is normally prevented by a check in topology::remove_node() which avoid calling pop_node() if removing the local node. But since there is no node which is marked as local, this check allows the first node to be popped. To fix the problem I lift the invariant that local node is always in _nodes. We still have information about local node in config. Instead of keeping it in _nodes, we recognize it as part of indexing. We also allow removing the local node like a regular node. The path which reloads topology works correctly after this, the local node will be recognized when (if) it is added to the topology. Fixes #13495	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	eb9d6df8bf	locator: topology: Fix get_location(ep) for local node topology config may designate a different node than get_broadcast_address() as local node. In particular, some tests don't designate any node as the local node, which leads to logic errors where current get_location(ep) for ep which happens to have the address 127.0.0.1 returns location of the first node in _nodes rather than ep. Fix by looking up in _nodes first and fall back to local node if it's equal to configured local node (if any).	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	0a675291dd	locator: topology: Fix typo	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	0b1dfb2683	locator: topology: Preserve config when cloning Config is separate from state of the topology (nodes it contains). Preserving the config will make it easier in later patches to maintain invariants for cloned instances.	2023-04-20 23:39:18 +02:00
Botond Dénes	1426c623eb	Merge 'Tune up S3 unit tests environment usage (and a bit more)' from Pavel Emelyanov The tests in question are using MINIO_SERVER_ADDRESS environment variable to export minio server address from pylib to test cases. Also they use hard-coded public bucket name. Both plays badly with AWS S3, the former due to MINIO_... in its name and the latter because public bucket name can be any. So this PR puts address and public bucket name into S3_..._FOR_TEST environment variables and fixes output stream closure on failure while at it. Detached from #13493 Closes #13546 * github.com:scylladb/scylladb: s3/test: Rename MINIO_SERVER_ADDRESS environment variable s3/test: Keep public bucket name in environment s3/test: Fix upload stream closure test/lib: Add getenv_safe() helper	2023-04-20 18:01:12 +03:00
Kamil Braun	88aff50e8b	docs: cdc: describe generation changes using group 0 topology coordinator Update the `Generation switching` section: most of the existing description landed in `Gossiper-based topology changes` subsection, and a new subsection was added to describe Raft group 0 based topology changes. Marked as WIP - we expect further development in this area soon. The existing gossiper-based description was also updated a bit.	2023-04-20 16:36:41 +02:00

1 2 3 4 5 ...

36328 Commits