scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 00:13:31 +00:00

Author	SHA1	Message	Date
Avi Kivity	7da12c64bc	Revert "Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"" This reverts commit `22f13e7ca3`, and reinstates commit `df8e1da8b2` ("Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"). The original commit was reverted due to failures in debug mode on aarch64, but after commit `224a2877b9` ("build: disable -Og in debug mode to avoid coroutine asan breakage"), it works again. Closes #12021	2022-11-18 12:44:00 +02:00
Kamil Braun	d7649a86c4	Merge 'Build up to support of dynamic IP address changes in Raft' from Konstantin Osipov We plan to stop storing IP addresses in Raft configuration, and instead use the information disseminated through gossip to locate Raft peers. Implement patches that are building up to that: * improve Raft API of configuration change notifications * disseminate raft host id in Gossip * avoid using Raft addresses from Raft configuraiton, and instead consistently use the translation layer between raft server id <-> IP address Closes #11953 * github.com:scylladb/scylladb: raft: persist the initial raft address map raft: (upgrade) do not use IP addresses from Raft config raft: (and gossip) begin gossiping raft server ids raft: change the API of conf change notifications	2022-11-18 11:38:19 +01:00
Botond Dénes	437fcdeeda	Merge 'Make use of enum_set in directory lister' from Pavel Emelyanov The lister accepts sort of a filter -- what kind of entries to list, regular, directories or both. It currently uses unordered_set, but enum_set is shorter and better describes the intent. Closes #12017 * github.com:scylladb/scylladb: lister: Make lister::dir_entry_types an enum_set database: Avoid useless local variable	2022-11-18 12:15:26 +02:00
Pavel Emelyanov	a44ca06906	Merge 'token_metadata: Do not use topology info for is_member check' from Asias He Since commit `a980f94` (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring. We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly. Fixes #11935 Closes #11936 * github.com:scylladb/scylladb: token_metadata: Rename is_member to is_normal_token_owner token_metadata: Add docs for is_member token_metadata: Do not use topology info for is_member check token_metadata: Check node is part of the topology instead of the ring	2022-11-18 11:54:07 +03:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Asias He	965097cde5	token_metadata: Add docs for is_member Make it clear, is_member checks if a node is part of the token ring and checks nothing else.	2022-11-18 09:28:56 +08:00
Asias He	a495b71858	token_metadata: Do not use topology info for is_member check Since commit `a980f94` (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring. We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly. Fixes #11935	2022-11-18 09:28:56 +08:00
Asias He	f2ca790883	token_metadata: Check node is part of the topology instead of the ring update_normal_tokens is the way to add a new node into the ring. We should not require a new node to already be in the ring to be able to add it to the ring. The current code works accidentally because is_member is checking if a node is in the topology We should use _topology.has_endpoint to check if a node is part of the topology explicitly.	2022-11-18 09:28:56 +08:00
Avi Kivity	2779a171fc	Merge 'Do not run aborted tasks' from Aleksandra Martyniuk task_manager::task::impl contains an abort source which can be used to check whether it is aborted and an abort method which aborts the task (request_abort on abort_source) and all its descendants recursively. When the start method is called after the task was aborted, then its state is set to failed and the task does not run. Fixes: #11995 Closes #11996 * github.com:scylladb/scylladb: tasks: do not run tasks that are aborted tasks: delete unused variable tasks: add abort_source to task_manager::task::impl	2022-11-17 19:42:46 +02:00
Pavel Emelyanov	a396c27efc	Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when this client was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780 Closes #11942 * github.com:scylladb/scylladb: message: messaging_service: check for known topology before calling is_same_dc/rack test: reenable test_topology::test_decommission_node_add_column test/pylib: util: configurable period in wait_for message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client message: messaging_service: topology independent connection settings for GOSSIP verbs	2022-11-17 20:14:32 +03:00
Pavel Emelyanov	bc62ca46d4	lister: Make lister::dir_entry_types an enum_set This type is currently an unordered_set, but only consists of at most two elements. Making it an enum_set renders it into a size_t variable and better describes the intention. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-17 19:01:45 +03:00
Pavel Emelyanov	c6021b57a1	database: Avoid useless local variable It's used to run lister::scan_dir() with directory_entry_type::directory only, but for that is copied around on lambda captures. It's simpler just to use the value directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-17 19:00:49 +03:00
Avi Kivity	76be6402ed	Merge 'repair: harden effective replication map' from Benny Halevy As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization. This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation. While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield. Fixes #11993 Closes #11994 * github.com:scylladb/scylladb: repair: pass erm down to get_hosts_participating_in_repair and get_neighbors repair: pass effective_replication_map down to repair_info repair: coroutinize sync_data_using_repair repair: futurize do_repair_start effective_replication_map: add global_effective_replication_map shared_token_metadata: get_lock is const repair: sync_data_using_repair: require to run on shard 0 repair: require all node operations to be called on shard 0 repair: repair_info: keep effective_replication_map repair: do_repair_start: use keyspace erm to get keyspace local ranges repair: do_repair_start: use keyspace erm for get_primary_ranges repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc repair: do_repair_start: check_in_shutdown first repair: get_db().local() where needed repair: get topology from erm/token_metdata_ptr view: get_view_natural_endpoint: get topology from erm	2022-11-17 13:29:02 +02:00
Konstantin Osipov	262566216b	raft: persist the initial raft address map	2022-11-17 14:26:36 +03:00
Konstantin Osipov	b35af73fdf	raft: (upgrade) do not use IP addresses from Raft config Always use raft address map to obtain the IP addresses of upgrade peers. Right now the map is populated from Raft configuration, so it's an equivalent transformation, but in the future raft address map will be populated from other sources: discovery and gossip, hence the logic of upgrade will change as well. Do not proceed with the upgrade if an address is missing from the map, since it means we failed to contact a raft member.	2022-11-17 14:26:31 +03:00
Pavel Emelyanov	2add9ba292	Merge 'Refactor topology out of token_metadata' from Benny Halevy This series moves the topology code from locator/token_metadata.{cc,hh} out to localtor/topology.{cc,hh} and introduces a shared header file: locator/types.hh contains shared, low level definitions, in anticipation of https://github.com/scylladb/scylladb/pull/11987 While at it, the token_metadata functions are turned into coroutines and topology copy constructor is deleted. The copy functionality is moved into an async `clone_gently` function that allows yielding while copying the topology. Closes #12001 * github.com:scylladb/scylladb: locator: refactor topology out of token_metadata locator: add types.hh topology: delete copy constructor token_metadata: coroutinize clone functions	2022-11-17 13:55:34 +03:00
Aleksandra Martyniuk	7ead1a7857	compaction: request abort only once in compaction_data::stop compaction_manager::task (and thus compaction_data) can be stopped because of many different reasons. Thus, abort can be requested more than once on compaction_data abort source causing a crash. To prevent this before each request_abort() we check whether an abort was requested before. Closes #12004	2022-11-17 12:44:59 +02:00
Benny Halevy	1e2741d2fe	abstract_replication_strategy: recognized_options: return unordered_set An unordered_set is more efficient and there is no need to return an ordered set for this purpose. This change facilitates a follow-up change of adding topology::get_datacenters(), returning an unordered_set of datacenter names. Refs #11987 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12003	2022-11-17 11:27:05 +02:00
Botond Dénes	e925c41f02	utils/gs/barrett.hh: aarch64: s/brarett/barrett/ Fix a typo introduced by the the recent patch fixing the spelling of Barrett. The patch introduced a typo in the aarch64 version of the code, which wasn't found by promotion, as that only builds on X86_64. Closes #12006	2022-11-17 11:09:59 +02:00
Konstantin Osipov	051dceeaff	raft: (and gossip) begin gossiping raft server ids We plan to use gossip data to educate Raft RPC about IP addresses of raft peers. Add raft server ids to application state, so that when we get a notification about a gossip peer we can identify which raft server id this notification is for, specifically, we can find what IP address stands for this server id, and, whenever the IP address changes, we can update Raft address map with the new address. On the same token, at boot time, we now have to start Gossip before Raft, since Raft won't be able to send any messages without gossip data about IP addresses.	2022-11-17 12:07:31 +03:00
Konstantin Osipov	990c7a209f	raft: change the API of conf change notifications Pass a change diff into the notification callback, rather than add or remove servers one by one, so that if we need to persist the state, we can do it once per configuration change, not for every added or removed server. For now still pass added and removed entries in two separate calls per a single configuration change. This is done mainly to fulfill the library contract that it never sends messages to servers outside the current configuration. The group0 RPC implementation doesn't need the two calls, since it simply marks the removed servers as expired: they are not removed immediately anyway, and messages can still be delivered to them. However, there may be test/mock implementations of RPC which could benefit from this contract, so we decided to keep it.	2022-11-17 12:07:31 +03:00
Benny Halevy	53fdf75cf9	repair: pass erm down to get_hosts_participating_in_repair and get_neighbors Now that it is available in repair_info. Fixes #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:30 +02:00
Benny Halevy	b69be61f41	repair: pass effective_replication_map down to repair_info And make sure the token_metadata ring version is same as the reference one (from the erm on shard 0), when starting the repair on each shard. Refs #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:29 +02:00
Benny Halevy	c47d36b53d	repair: coroutinize sync_data_using_repair Prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	58b1c17f5d	repair: futurize do_repair_start Turn it into a coroutine to prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	4b9269b7e2	effective_replication_map: add global_effective_replication_map Class to hold a coherent view of a keyspace effective replication map on all shards. To be used in a following patch to pass the sharded keyspace e_r_m:s to repair. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:01 +02:00
Avi Kivity	b8b78959fb	build: switch to packaged libdeflate rather than a submodule Now that our toolchain is based on Fedora 37, we can rely on its libdeflate rather than have to carry our own in a submodule. Frozen toolchain is regenerated. As a side effect clang is updated from 15.0.0 to 15.0.4. Closes #12000	2022-11-17 08:01:00 +02:00
Benny Halevy	2c677e294b	shared_token_metadata: get_lock is const The lock is acquired using an a function that doesn't modify the shared_token_metadata object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	d6b2124903	repair: sync_data_using_repair: require to run on shard 0 And with that do_sync_data_using_repair can be folded into sync_data_using_repair. This will simplify using the effective_replication_map throughout the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	0c56c75cf8	repair: require all node operations to be called on shard 0 To simplify using of the effective_replication_map / token_metadata_ptr throught the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	64b0756adc	repair: repair_info: keep effective_replication_map Sampled when repair info is constructed. To be used throughout the repair process. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	c7d753cd44	repair: do_repair_start: use keyspace erm to get keyspace local ranges Rather than calling db.get_keyspace_local_ranges that looks up the keyspace and its erm again. We want all the inforamtion derived from the erm to be based on the same source. The function is synchronous so this changes doesn't fix anything, just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	aaf74776c2	repair: do_repair_start: use keyspace erm for get_primary_ranges Ensure that the primary ranges are in sync with the keyspace erm. The function is synchronous so this change doesn't fix anything, it just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	9200e6b005	repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc Ensure the erm and topology are in sync. The function is synchronous so this change doesn't fix anything, just cleans up the code. Fix mistake in comment while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:57:56 +02:00
Benny Halevy	59dc2567fd	repair: do_repair_start: check_in_shutdown first Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	881eb0df83	repair: get_db().local() where needed In several places we get the sharded database using get_db() and then we only use db.local(). Simplify the code by keeping reference only to the local database upfront. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	c22c4c8527	repair: get topology from erm/token_metdata_ptr We want the topology to be synchronized with the respective effective_replication_map / token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	94f2e95a2f	view: get_view_natural_endpoint: get topology from erm Get the topology for the effective replication map rather than from the storage_proxy to ensure its synchronized with the natural endpoints. Since there's no preemption between the two calls currently there is no issue, so this is merely a clean up of the code and not supposed to fix anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Nadav Har'El	e393639114	test/cql-pytest: reproducer for crash in LWT with null key This patch adds a reproducer for issue #11954: Attempting an "IF NOT EXISTS" (LWT) write with a null key crashes Scylla, instead of producing a simple error message (like happens without the "IF NOT EXISTS" after #7852 was fixed). The test passed on Cassandra, but crashes Scylla. Because of this crash, we can't just mark the test "xfail" and it's temporarily marked "skip" instead. Refs #11954. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11982	2022-11-17 07:31:13 +02:00
Benny Halevy	d0bd305d16	locator: refactor topology out of token_metadata Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:55:54 +02:00
Benny Halevy	297a4de4e4	locator: add types.hh To export low-level types that are used by oher modules for the locator interfaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:53:05 +02:00
Kamil Braun	0c9cb5c5bf	Merge 'raft: wait for the next tick before retrying' from Gusev Petr When `modify_config` or `add_entry` is forwarded to the leader, it may reach the node at "inappropriate" time and result in an exception. There are two reasons for it - the leader is changing and, in case of `modify_config`, other `modify_config` is currently in progress. In both cases the command is retried, but before this patch there was no delay before retrying, which could led to a tight loop. The patch adds a new exception type `transient_error`. When the client receives it, it is obliged to retry the request after some delay. Previously leader-side exceptions were converted to `not_a_leader`, which is strange, especially for `conf_change_in_progress`. Fixes: #11564 Closes #11769 * github.com:scylladb/scylladb: raft: rafactor: remove duplicate code on retries delays raft: use wait_for_next_tick in read_barrier raft: wait for the next tick before retrying	2022-11-16 18:20:54 +01:00
Aleksandra Martyniuk	4250bd9458	tasks: do not run tasks that are aborted Currently in start() method a task is run even if it was already aborted. When start() is called on an aborted task, its state is set to task_manager::task_state::failed and it doesn't run.	2022-11-16 18:09:41 +01:00
Aleksandra Martyniuk	ebffca7ea5	tasks: delete unused variable	2022-11-16 18:07:57 +01:00
Aleksandra Martyniuk	752edc2205	tasks: add abort_source to task_manager::task::impl task_manager::task can be aborted with impl's abort_source. By default abort request is propagated to all task's descendants.	2022-11-16 18:07:11 +01:00
Avi Kivity	c4f069c6fc	Update seastar submodule * seastar 153223a188...4f4cc00660 (10): > Merge 'Avoid using namespace internal' from Pavel Emelyanov > Merge 'De-futurize IO class update calls' from Pavel Emelyanov > abort_source: subscribe(): remove noexcept qualifier > Merge 'Add Prometheus filtering capabilities by label' from Amnon Heiman > fsqual: stop causing memory leak error on LeakSanitizer > metrics.cc: Do not merge empty histogram > Update tutorial.md > README-DPDK.md: document --cflags option > build: install liburing.pc using stow > core/polymorphic_temporary_buffer: include <seastar/core/memory.hh> Closes #11991	2022-11-16 17:59:33 +02:00
Avi Kivity	3497891cf9	utils: spell "barrett" correctly As P. T. Barnoom famously said, "write what you like but spell my name correctly". Following that, we correct the spelling of Barrett's name in the source tree. Closes #11989	2022-11-16 16:30:38 +02:00
Benny Halevy	0c94ffcc85	topology: delete copy constructor Topology is copied only from token_metadata_impl::clone_only_token_map which copies the token_metadata_impl with yielding to prevent reactor stalls. This should apply to topology as well, so add a clone_gently function for cloning the topology from token_metadata_impl::clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Benny Halevy	4f4fc7fe22	token_metadata: coroutinize clone functions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Kamil Braun	a83789160d	message: messaging_service: check for known topology before calling is_same_dc/rack `is_same_dc` and `is_same_rack` assume that the peer's topology is known. If it's unknown, `on_internal_error` will be called inside topology. When these functions are used in `get_rpc_client`, they are already protected by an earlier check for knowing the peer's topology (the `has_topology()` lambda). Another use is in `do_start_listen()`, where we create a filter for RPC module to check if it should accept incoming connections. If cross-dc or cross-rack encryption is enabled, we will reject connections attempts to the regular (non-ssl) port from other dcs/rack using `is_same_dc/rack`. However, it might happen that something (other Scylla node or otherwise) tries to contact us on the regular port and we don't know that thing's topology, which would result in `on_internal_error`. But this is not a fatal error; we simply want to reject that connection. So protect these calls as well. Finally, there's `get_preferred_ip` with an unprotected `is_same_dc` call which, for a given peer, may return a different IP from preferred IP cache if the endpoint resides in the same DC. If there is not entry in the preferred IP cache, we return the original (external) IP of the peer. We can do the same if we don't know the peer's topology. It's interesting that we didn't see this particular place blowing up. Perhaps the preferred IP cache is always populated after we know the topology.	2022-11-16 14:01:50 +01:00

1 2 3 4 5 ...

33811 Commits