scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Botond Dénes	30597f17ed	tools/scylla-sstable: traverse sstables in argument order In the order the user passed them on the command-line.	2022-11-18 15:58:37 +02:00
Botond Dénes	e337b25aa9	tools/scylla-sstable: dump-data docs: s/clustering_fragments/clustering_elements The usage of clustering_fragments is a typo, the output contains clustering_elements.	2022-11-18 15:58:36 +02:00
Botond Dénes	c39408b394	tools/scylla-sstable: dump-data/json: use Null instead of "<unknown>" The currently used "<unknown>" marker for invalid values/types is undistinguishable from a normal value in some cases. Use the much more distinct and unique json Null instead.	2022-11-18 15:58:36 +02:00
Botond Dénes	1dfceb5716	tools/scylla-sstable: dump-data/json: use more uniform format for collections Instead of trying to be clever and switching the output on the type of collection, use the same format always: a list of objects, where the object has a key and value attribute, containing to the respective collection item key and values. This makes processing much easier for machines (and humans too since the previous system wasn't working well).	2022-11-18 15:58:36 +02:00
Botond Dénes	f89acc8df7	tools/scylla-sstable: dump-data/json: make cells easier to parse There are several slightly different cell types in scylla: regular cells, collection cells (frozen and non-frozen) and counter cells (update and shards). In C++ code the type of the cell is always available for code wishing to make out exactly what kind of cell a cell is. In the JSON output of the dump-data this is currently really hard to do as there is not enough information to disambiguate all the different cell types. We wish to make the JSON output self-sufficient so in this patch we introduce a "type" field which contains one of: * regular * counter-update * counter-shards * frozen-collection * collection Furthermore, we bring the different types closer by also printing the counter shards under the 'value' key, not under the 'shards' key as before. The separate 'shards' is no longer needed to disambiguate. The documentation and the write operation is also updated to reflect the changes.	2022-11-18 15:58:36 +02:00
Avi Kivity	2779a171fc	Merge 'Do not run aborted tasks' from Aleksandra Martyniuk task_manager::task::impl contains an abort source which can be used to check whether it is aborted and an abort method which aborts the task (request_abort on abort_source) and all its descendants recursively. When the start method is called after the task was aborted, then its state is set to failed and the task does not run. Fixes: #11995 Closes #11996 * github.com:scylladb/scylladb: tasks: do not run tasks that are aborted tasks: delete unused variable tasks: add abort_source to task_manager::task::impl	2022-11-17 19:42:46 +02:00
Pavel Emelyanov	a396c27efc	Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when this client was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780 Closes #11942 * github.com:scylladb/scylladb: message: messaging_service: check for known topology before calling is_same_dc/rack test: reenable test_topology::test_decommission_node_add_column test/pylib: util: configurable period in wait_for message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client message: messaging_service: topology independent connection settings for GOSSIP verbs	2022-11-17 20:14:32 +03:00
Avi Kivity	76be6402ed	Merge 'repair: harden effective replication map' from Benny Halevy As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization. This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation. While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield. Fixes #11993 Closes #11994 * github.com:scylladb/scylladb: repair: pass erm down to get_hosts_participating_in_repair and get_neighbors repair: pass effective_replication_map down to repair_info repair: coroutinize sync_data_using_repair repair: futurize do_repair_start effective_replication_map: add global_effective_replication_map shared_token_metadata: get_lock is const repair: sync_data_using_repair: require to run on shard 0 repair: require all node operations to be called on shard 0 repair: repair_info: keep effective_replication_map repair: do_repair_start: use keyspace erm to get keyspace local ranges repair: do_repair_start: use keyspace erm for get_primary_ranges repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc repair: do_repair_start: check_in_shutdown first repair: get_db().local() where needed repair: get topology from erm/token_metdata_ptr view: get_view_natural_endpoint: get topology from erm	2022-11-17 13:29:02 +02:00
Pavel Emelyanov	2add9ba292	Merge 'Refactor topology out of token_metadata' from Benny Halevy This series moves the topology code from locator/token_metadata.{cc,hh} out to localtor/topology.{cc,hh} and introduces a shared header file: locator/types.hh contains shared, low level definitions, in anticipation of https://github.com/scylladb/scylladb/pull/11987 While at it, the token_metadata functions are turned into coroutines and topology copy constructor is deleted. The copy functionality is moved into an async `clone_gently` function that allows yielding while copying the topology. Closes #12001 * github.com:scylladb/scylladb: locator: refactor topology out of token_metadata locator: add types.hh topology: delete copy constructor token_metadata: coroutinize clone functions	2022-11-17 13:55:34 +03:00
Aleksandra Martyniuk	7ead1a7857	compaction: request abort only once in compaction_data::stop compaction_manager::task (and thus compaction_data) can be stopped because of many different reasons. Thus, abort can be requested more than once on compaction_data abort source causing a crash. To prevent this before each request_abort() we check whether an abort was requested before. Closes #12004	2022-11-17 12:44:59 +02:00
Benny Halevy	1e2741d2fe	abstract_replication_strategy: recognized_options: return unordered_set An unordered_set is more efficient and there is no need to return an ordered set for this purpose. This change facilitates a follow-up change of adding topology::get_datacenters(), returning an unordered_set of datacenter names. Refs #11987 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12003	2022-11-17 11:27:05 +02:00
Botond Dénes	e925c41f02	utils/gs/barrett.hh: aarch64: s/brarett/barrett/ Fix a typo introduced by the the recent patch fixing the spelling of Barrett. The patch introduced a typo in the aarch64 version of the code, which wasn't found by promotion, as that only builds on X86_64. Closes #12006	2022-11-17 11:09:59 +02:00
Benny Halevy	53fdf75cf9	repair: pass erm down to get_hosts_participating_in_repair and get_neighbors Now that it is available in repair_info. Fixes #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:30 +02:00
Benny Halevy	b69be61f41	repair: pass effective_replication_map down to repair_info And make sure the token_metadata ring version is same as the reference one (from the erm on shard 0), when starting the repair on each shard. Refs #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:29 +02:00
Benny Halevy	c47d36b53d	repair: coroutinize sync_data_using_repair Prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	58b1c17f5d	repair: futurize do_repair_start Turn it into a coroutine to prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	4b9269b7e2	effective_replication_map: add global_effective_replication_map Class to hold a coherent view of a keyspace effective replication map on all shards. To be used in a following patch to pass the sharded keyspace e_r_m:s to repair. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:01 +02:00
Avi Kivity	b8b78959fb	build: switch to packaged libdeflate rather than a submodule Now that our toolchain is based on Fedora 37, we can rely on its libdeflate rather than have to carry our own in a submodule. Frozen toolchain is regenerated. As a side effect clang is updated from 15.0.0 to 15.0.4. Closes #12000	2022-11-17 08:01:00 +02:00
Benny Halevy	2c677e294b	shared_token_metadata: get_lock is const The lock is acquired using an a function that doesn't modify the shared_token_metadata object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	d6b2124903	repair: sync_data_using_repair: require to run on shard 0 And with that do_sync_data_using_repair can be folded into sync_data_using_repair. This will simplify using the effective_replication_map throughout the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	0c56c75cf8	repair: require all node operations to be called on shard 0 To simplify using of the effective_replication_map / token_metadata_ptr throught the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	64b0756adc	repair: repair_info: keep effective_replication_map Sampled when repair info is constructed. To be used throughout the repair process. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	c7d753cd44	repair: do_repair_start: use keyspace erm to get keyspace local ranges Rather than calling db.get_keyspace_local_ranges that looks up the keyspace and its erm again. We want all the inforamtion derived from the erm to be based on the same source. The function is synchronous so this changes doesn't fix anything, just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	aaf74776c2	repair: do_repair_start: use keyspace erm for get_primary_ranges Ensure that the primary ranges are in sync with the keyspace erm. The function is synchronous so this change doesn't fix anything, it just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	9200e6b005	repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc Ensure the erm and topology are in sync. The function is synchronous so this change doesn't fix anything, just cleans up the code. Fix mistake in comment while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:57:56 +02:00
Benny Halevy	59dc2567fd	repair: do_repair_start: check_in_shutdown first Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	881eb0df83	repair: get_db().local() where needed In several places we get the sharded database using get_db() and then we only use db.local(). Simplify the code by keeping reference only to the local database upfront. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	c22c4c8527	repair: get topology from erm/token_metdata_ptr We want the topology to be synchronized with the respective effective_replication_map / token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	94f2e95a2f	view: get_view_natural_endpoint: get topology from erm Get the topology for the effective replication map rather than from the storage_proxy to ensure its synchronized with the natural endpoints. Since there's no preemption between the two calls currently there is no issue, so this is merely a clean up of the code and not supposed to fix anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Nadav Har'El	e393639114	test/cql-pytest: reproducer for crash in LWT with null key This patch adds a reproducer for issue #11954: Attempting an "IF NOT EXISTS" (LWT) write with a null key crashes Scylla, instead of producing a simple error message (like happens without the "IF NOT EXISTS" after #7852 was fixed). The test passed on Cassandra, but crashes Scylla. Because of this crash, we can't just mark the test "xfail" and it's temporarily marked "skip" instead. Refs #11954. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11982	2022-11-17 07:31:13 +02:00
Benny Halevy	d0bd305d16	locator: refactor topology out of token_metadata Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:55:54 +02:00
Benny Halevy	297a4de4e4	locator: add types.hh To export low-level types that are used by oher modules for the locator interfaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:53:05 +02:00
Kamil Braun	0c9cb5c5bf	Merge 'raft: wait for the next tick before retrying' from Gusev Petr When `modify_config` or `add_entry` is forwarded to the leader, it may reach the node at "inappropriate" time and result in an exception. There are two reasons for it - the leader is changing and, in case of `modify_config`, other `modify_config` is currently in progress. In both cases the command is retried, but before this patch there was no delay before retrying, which could led to a tight loop. The patch adds a new exception type `transient_error`. When the client receives it, it is obliged to retry the request after some delay. Previously leader-side exceptions were converted to `not_a_leader`, which is strange, especially for `conf_change_in_progress`. Fixes: #11564 Closes #11769 * github.com:scylladb/scylladb: raft: rafactor: remove duplicate code on retries delays raft: use wait_for_next_tick in read_barrier raft: wait for the next tick before retrying	2022-11-16 18:20:54 +01:00
Aleksandra Martyniuk	4250bd9458	tasks: do not run tasks that are aborted Currently in start() method a task is run even if it was already aborted. When start() is called on an aborted task, its state is set to task_manager::task_state::failed and it doesn't run.	2022-11-16 18:09:41 +01:00
Aleksandra Martyniuk	ebffca7ea5	tasks: delete unused variable	2022-11-16 18:07:57 +01:00
Aleksandra Martyniuk	752edc2205	tasks: add abort_source to task_manager::task::impl task_manager::task can be aborted with impl's abort_source. By default abort request is propagated to all task's descendants.	2022-11-16 18:07:11 +01:00
Avi Kivity	c4f069c6fc	Update seastar submodule * seastar 153223a188...4f4cc00660 (10): > Merge 'Avoid using namespace internal' from Pavel Emelyanov > Merge 'De-futurize IO class update calls' from Pavel Emelyanov > abort_source: subscribe(): remove noexcept qualifier > Merge 'Add Prometheus filtering capabilities by label' from Amnon Heiman > fsqual: stop causing memory leak error on LeakSanitizer > metrics.cc: Do not merge empty histogram > Update tutorial.md > README-DPDK.md: document --cflags option > build: install liburing.pc using stow > core/polymorphic_temporary_buffer: include <seastar/core/memory.hh> Closes #11991	2022-11-16 17:59:33 +02:00
Avi Kivity	3497891cf9	utils: spell "barrett" correctly As P. T. Barnoom famously said, "write what you like but spell my name correctly". Following that, we correct the spelling of Barrett's name in the source tree. Closes #11989	2022-11-16 16:30:38 +02:00
Benny Halevy	0c94ffcc85	topology: delete copy constructor Topology is copied only from token_metadata_impl::clone_only_token_map which copies the token_metadata_impl with yielding to prevent reactor stalls. This should apply to topology as well, so add a clone_gently function for cloning the topology from token_metadata_impl::clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Benny Halevy	4f4fc7fe22	token_metadata: coroutinize clone functions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Kamil Braun	a83789160d	message: messaging_service: check for known topology before calling is_same_dc/rack `is_same_dc` and `is_same_rack` assume that the peer's topology is known. If it's unknown, `on_internal_error` will be called inside topology. When these functions are used in `get_rpc_client`, they are already protected by an earlier check for knowing the peer's topology (the `has_topology()` lambda). Another use is in `do_start_listen()`, where we create a filter for RPC module to check if it should accept incoming connections. If cross-dc or cross-rack encryption is enabled, we will reject connections attempts to the regular (non-ssl) port from other dcs/rack using `is_same_dc/rack`. However, it might happen that something (other Scylla node or otherwise) tries to contact us on the regular port and we don't know that thing's topology, which would result in `on_internal_error`. But this is not a fatal error; we simply want to reject that connection. So protect these calls as well. Finally, there's `get_preferred_ip` with an unprotected `is_same_dc` call which, for a given peer, may return a different IP from preferred IP cache if the endpoint resides in the same DC. If there is not entry in the preferred IP cache, we return the original (external) IP of the peer. We can do the same if we don't know the peer's topology. It's interesting that we didn't see this particular place blowing up. Perhaps the preferred IP cache is always populated after we know the topology.	2022-11-16 14:01:50 +01:00
Kamil Braun	9b2449d3ea	test: reenable test_topology::test_decommission_node_add_column Also improve the test to increase the probability of reproducing #11780 by injecting sleeps in appropriate places. Without the fix for #11780 from the earlier commit, the test reproduces the issue in roughly half of all runs in dev build on my laptop.	2022-11-16 14:01:50 +01:00
Kamil Braun	0f49813312	test/pylib: util: configurable period in wait_for	2022-11-16 14:01:50 +01:00
Kamil Braun	1bd2471c19	message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when topology was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780	2022-11-16 14:01:50 +01:00
Kamil Braun	840be34b5f	message: messaging_service: topology independent connection settings for GOSSIP verbs The gossip verbs are used to learn about topology of other nodes. If inter-dc/rack encryption is enabled, the knowledge of topology is necessary to decide whether it's safe to send unencrypted messages to nodes (i.e., whether the destination lies in the same dc/rack). The logic in `messaging_service::get_rpc_client`, which decided whether a connection must be encrypted, was this (given that encryption is enabled): if the topology of the peer is known, and the peer is in the same dc/rack, don't encrypt. Otherwise encrypt. However, it may happen that node A knows node B's topology, but B doesn't know A's topology. A deduces that B is in the same DC and rack and tries sending B an unencrypted message. As the code currently stands, this would cause B to call `on_internal_error`. This is what I encountered when attempting to fix #11780. To guarantee that it's always possible to deliver gossiper verbs (even if one or both sides don't know each other's topology), and to simplify reasoning about the system in general, choose connection settings that are independent of the topology - for the connection used by gossiper verbs (other connections are still topology-dependent and use complex logic to handle the situation of unknown-and-later-known topology). This connection only contains 'rare' and 'cheap' verbs, so it's not a performance problem to always encrypt it (given that encryption is configured). And this is what already was happening in the past; it was at some point removed during topology knowledge management refactors. We just bring this logic back. Fixes #11992. Inspired by xemul/scylla@45d48f3d02.	2022-11-16 13:58:07 +01:00
Nadav Har'El	2f2f01b045	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542	2022-11-16 13:58:21 +02:00
Nadav Har'El	7cbb0b98bb	Merge 'doc: document user defined functions (UDFs)' from Anna Stuchlik This PR is V2 of the[ PR created by @psarna.](https://github.com/scylladb/scylladb/pull/11560). I have: - copied the content. - applied the suggestions left by @nyh. - made minor improvements, such as replacing "Scylla" with "ScyllaDB", fixing punctuation, and fixing the RST syntax. Fixes https://github.com/scylladb/scylladb/issues/11378 Closes #11984 * github.com:scylladb/scylladb: doc: label user-defined functions as Experimental doc: restore the note for the Count function (removed by mistatke) doc: document user defined functions (UDFs)	2022-11-16 13:09:47 +02:00
Botond Dénes	cbf9be9715	Merge 'Avoid 0.0.0.0 (and :0) as preferred IP' from Pavel Emelyanov Despite docs discourage from using INADDR_ANY as listen address, this is not disabled in code. Worse -- some snitch drivers may gossip it around as the INTERNAL_IP state. This set prevents this from happening and also adds a sanity check not to use this value if it somehow sneaks in. Closes #11846 * github.com:scylladb/scylladb: messaging_service: Deny putting INADD_ANY as preferred ip messaging_service: Toss preferred ip cache management gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP gossiping_property_file_snitch: Make _listen_address optional	2022-11-16 08:30:42 +02:00
Avi Kivity	43d3e91e56	tools: toolchain: prepare: use real bash associative array When we translate from docker/go arch names to the kernel arch names, we use an associative array hack using computed variable names "{$!variable_name}". But it turns out bash has real associative arrays, introduced with "declare -A". Use the to make the code a little clearer. Closes #11985	2022-11-16 08:17:47 +02:00
Botond Dénes	e90d0811d0	Merge 'doc: update ScyllaDB requirements - supported CPUs and AWS i4g instances' from Anna Stuchlik Fix https://github.com/scylladb/scylla-docs/issues/4144 Closes #11226 * github.com:scylladb/scylladb: Update docs/getting-started/system-requirements.rst doc: specify the recommended AWS instance types doc: replace the tables with a generic description of support for Im4gn and Is4gen instances doc: add support for AWS i4g instances doc: extend the list of supported CPUs	2022-11-16 08:15:00 +02:00

1 2 3 4 5 ...

33802 Commits