scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	f17c594d21	large_data_handler: If-less statistics increment The partitions_bigger_than_threshold is incremented only if the previous check detects that the partition exceeds a threshold by its size. It's done with an extra if, but it can be done without (explicit) condition as bool type is guaranteed by the standard to convert into integers as true = 1 and false = 0 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18217	2024-04-16 07:16:05 +03:00
Benny Halevy	1061455442	gossiper: add load_endpoint_state Pack the topology-related data loaded from system.peers in `gms::load_endpoint_state`, to be used in a following patch for `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:56 +03:00
Pavel Emelyanov	90593f4e82	view_builder: Generalize mark_as_built(view_ptr) method Marking is performed in two places and they can be generalized Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	3c3f2cd337	view_builder: Move mark_existing_views_as_built from storage service Now it's in the correct component Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:11 +03:00
Ferenc Szili	f1cc6252fd	logging: Don't log PK/CK in large partition/row/cell warning Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly. This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables. The logged data will look like this: Large cells: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db Large rows: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db Large partitions: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db Fixes #18041 Closes scylladb/scylladb#18166	2024-04-04 12:06:31 +03:00
Piotr Dulikowski	baae811142	Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail. Fixes https://github.com/scylladb/scylladb/issues/17736 Closes scylladb/scylladb#18039 * github.com:scylladb/scylladb: auth: keep auth version in scylla_local auth: coroutinize service::start	2024-04-03 12:25:56 +02:00
Marcin Maliszkiewicz	562caaf6c6	auth: keep auth version in scylla_local Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail.	2024-04-02 19:04:21 +02:00
Botond Dénes	2179bfc40d	Merge 'Relax initialization of virtual tables' from Pavel Emelyanov It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read. Closes scylladb/scylladb#18133 * github.com:scylladb/scylladb: virtual_tables: Open-code install_virtual_readers_and_writers() virtual_tables: Move readers setup loop into add_table() virtual_tables: Move tables creation loop into add_table() virtual_tables: Make add_tablet() a coroutine virtual_tables: Open-code register_virtual_tables()	2024-04-02 13:39:26 +03:00
Lakshmi Narayanan Sreethar	e8026197d2	db/config: add a new variable to limit memory used by table components A new configuration variable, components_memory_reclaim_threshold, has been added to configure the maximum allowed percentage of available memory for all SSTable components in a shard. If the total memory usage exceeds this threshold, it will be reclaimed from the components to bring it back under the limit. Currently, only the memory used by the bloom filters will be restricted. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Pavel Emelyanov	627c5fdf04	virtual_tables: Open-code install_virtual_readers_and_writers() It's pretty short already and is naturally a "part" of initialize_virtual_tables(). Neither it installs writers any longer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:02:40 +03:00
Pavel Emelyanov	1d79cfc6cf	virtual_tables: Move readers setup loop into add_table() Similarly to previous patch, after virtual tables are registered the registry is iterated over to install virtual readers onto each entry. Again, this can happen at the time of registering, no need in dedicated loop for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:01:50 +03:00
Pavel Emelyanov	891e792717	virtual_tables: Move tables creation loop into add_table() Once virtual_tables map is populated, it's iterated over to create replica::table entries for each virtual table. This can be done in the same place where the virtual table is created, no need in dedicated loop for it nowadays. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:38 +03:00
Pavel Emelyanov	420ce3634f	virtual_tables: Make add_tablet() a coroutine Next patches will populate it with sleeping calls, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:15 +03:00
Pavel Emelyanov	ddc6f9279f	virtual_tables: Open-code register_virtual_tables() It's naturally a "part" of initialize_virtual_tables(). Further patching gets possible with it being open-coded. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 18:59:18 +03:00
Benny Halevy	01fc1a9f66	schema_tables: std::move mutation into the mutation vector To save a copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18120	2024-04-01 14:16:30 +03:00
Kamil Braun	d274f63d89	Merge 'Add support for "initial-token" parameter in raft mode' from Gleb Fixes scylladb/scylladb#17893 * 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev: dht: drop unused parameter from get_random_bootstrap_tokens() function test: add test for initial_token parameter topology coordinator: use provided initial_token parameter to choose bootstrap tokens topology cooordinator: propagate initial_token option to the coordinator	2024-03-27 11:41:06 +01:00
Gleb Natapov	6ab78e13c6	topology cooordinator: propagate initial_token option to the coordinator The patch propagates initial_token option to the topology coordinator where it is added to join request parameter.	2024-03-26 18:43:16 +02:00
Avi Kivity	4ddf82e58b	treewide: don't #include "gms/feature_service.hh" from other headers feature_service.hh is a high-level header that integrates much of the system functionality, so including it in lower-level headers causes unnecessary rebuilds. Specifically, when retiring features. Fix by removing feature_service.hh from headers, and supply forward declarations and includes in .cc where needed. Closes scylladb/scylladb#18005	2024-03-26 15:31:18 +02:00
Pavel Emelyanov	8bf9098663	system_keyspace: Consolidate node-state vs tokens checks When loading topology state, nodes are checked to have or not to have "tokens" field set. The check is done based on node state and it's spread across the loading method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17957	2024-03-26 14:55:46 +02:00
Kefu Chai	f3532cbaa0	db: commitlog: use fmt::streamed() to print segment before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change: * add `format_as()` for `segment` so we can use it as a fallback after upgrading to {fmt} v10 * use fmt::streamed() when formatting `segment`, this will be used the intermediate solution before {fmt} v10 after dropping `FMT_DEPRECATED_OSTREAM` macro Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18019	2024-03-26 12:13:29 +02:00
Wojciech Mitros	9789a3dc7c	mv: keep semaphore units alive until the end of a remote view update When a view update has both a local and remote target endpoint, it extends the lifetime of its memory tracking semaphore units only until the end of the local update, while the resources are actually used until the remote update finishes. This patch changes the semaphore transferring so that in case of both local and remote endpoints, both view updates share the units, causing them to be released only after the update that takes longer finishes. Fixes #17890 Closes scylladb/scylladb#17891	2024-03-25 19:43:58 +02:00
Piotr Dulikowski	f23f8f81bf	Merge 'Raft-based service levels' from Michał Jadwiszczak This patch introduces raft-based service levels. The difference to the current method of working is: - service levels are stored in `system.service_levels_v2` - reads are executed with `LOCAL_ONE` - writes are done via raft group0 operation Service levels are migrated to v2 in topology upgrade. After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then) Fixes #17926 Closes scylladb/scylladb#16585 * github.com:scylladb/scylladb: test: test service levels v2 works in recovery mode test: add test for service levels migration test: add test for service levels snapshot test:topology: extract `trigger_snapshot` to utils main: create raft dda if sl data was migrated service:qos: store information about sl data migration service:qos: service levels migration main: assign standard service level DDA before starting group0 service:qos: fix `is_v2()` method service:qos: add a method to upgrade data accessor test: add unit_test_raft_service_levels_accessor service:storage_service: add support for service levels raft snapshot service:qos: add abort_source for group0 operations service:qos: raft service level distributed data accessor service:qos: use group0_guard in data accessor cql3:statements: run service level statements on shard0 with raft guard test: fix overrides in unit_test_service_levels_accessor service:qos: fix indentation service:qos: coroutinize some of the methods db:system_keyspace: add `SERVICE_LEVELS_V2` table service:qos: extract common service levels' table functions	2024-03-22 11:51:53 +01:00
Kamil Braun	4359a1b460	Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev In this PR we add timeouts support to raft groups registry. We introduce the `raft_server_with_timeouts` class, which wraps the `raft::server` add exposes its interface with additional `raft_timeout` parameter. If it's set, the wrapper cancels the `abort_source` after certain amount of time. The value of the timeout can be specified either in the `raft_timeout` parameter, or the default value can be set in `the raft_server_with_timeouts` class constructor. The `raft_group_registry` interface is extended with `group0_with_timeouts()` method. It returns an instance of `raft_server_with_timeouts` for group0 raft server. The timeout value for it is configured in `create_server_for_group0`. It's one minute by default and can be overridden for tests with `group0-raft-op-timeout-in-ms` parameter. The new api allows the client to decide whether to use timeouts or not. In this PR we are reviewing all the group0 call sites and add `raft_timeout` if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they wouldn't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same end effect as not having a timeout at all. Fixes scylladb/scylladb#16604 Closes scylladb/scylladb#17590 * github.com:scylladb/scylladb: migration_manager: use raft_timeout{} storage_service::join_node_response_handler: use raft_timeout{} storage_service::start_upgrade_to_raft_topology: use raft_timeout{} storage_service::set_tablet_balancing_enabled: use raft_timeout{} storage_service::move_tablet: use raft_timeout{} raft_check_and_repair_cdc_streams: use raft_timeout{} raft_timeout: test that node operations fail properly raft_rebuild: use raft_timeout{} do_cluster_cleanup: use raft_timeout{} raft_initialize_discovery_leader: use raft_timeout{} update_topology_with_local_metadata: use with_timeout{} raft_decommission: use raft_timeout{} raft_removenode: use raft_timeout{} join_node_request_handler: add raft_timeout to make_nonvoters and add_entry raft_group0: make_raft_config_nonvoter: add raft_timeout parameter raft_group0: make_raft_config_nonvoter: add abort_source parameter manager_client: server_add with start=false shouldn't call driver_connect scylla_cluster: add seeds parameter to the add_server and servers_add raft_server_with_timeouts: report the lost quorum join_node_request_handler: add raft_timeout{} for start_operation skip_mode: add platform_key auth: use raft_timeout{} raft_group0_client: add raft_timeout parameter raft_group_registry: add group0_with_timeouts utils: add composite_abort_source.hh error_injection: move api registration to set_server_init error_injection: add inject_parameter method error_injection: move injection_name string into injection_shared_data error_injection: pass injection parameters at startup	2024-03-22 10:45:33 +01:00
Michał Jadwiszczak	dab909b1d1	service:qos: store information about sl data migration Save information whether service levels data was migrated to v2 table. The information is stored in `system.scylla_local` table. It's written with raft command and included in raft snapshot.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8e242f5acd	db:system_keyspace: add `SERVICE_LEVELS_V2` table The table has the same schema as `system_distributed.service_levels`. However it's created entirely at once (unlike old table which creates base table first and then it adds other columns) because `system` tables are local to the node.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	990c5e7dd0	service:qos: extract common service levels' table functions Getting a service level(s) will be done the same way in raft-based service levels as it's done in standard service levels, so those funtions are extracted to reused it.	2024-03-21 23:14:57 +01:00
Pavel Emelyanov	21a5911e60	Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports. Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table. Simple sanity tests are added for this virtual table (it had none). Fixes: #16850 Closes scylladb/scylladb#17351 * github.com:scylladb/scylladb: test/cql-pytest: test_virtual_tables: add test for token_ring table db/virtual_tables: token_ring_table: add tablet support db/virtual_tables: token_ring_table: add table_name column db/virtual_tables: token_ring_table: extract ring emit service/storage_service: describe_ring_for_table(): use topology to map hostid to ip	2024-03-20 14:05:49 +03:00
Petr Gusev	49a4220fea	error_injection: pass injection parameters at startup Injection parameters can be used in the lambda passed to inject_with_handler method to take some values from the test. However, there was no way to set values to these parameters on node startup, only through the error injection REST api. Therefore, we couldn't rely on this when inject_with_handler is used during node startup, it could trigger before we call the api from the test. In this commit with solve this problem by allowing these parameters to be assigned through scylla.yaml config. The defer.hh header was added to error_injection.hh to fix compilation after adding error_injection.hh to config.hh, defer function is used in error_injection.hh.	2024-03-19 20:17:02 +04:00
Avi Kivity	e48eb76f61	sstables_manager: decouple from system_keyspace sstables_manager now depends on system_keyspace for access to the system.sstables table, needed by object storage. This violates modularity, since sstables_manager is a relatively low-level leaf module while system_keyspace integrates large parts of the system (including, indirectly, sstables_manager). One area where this is grating is sstables::test_env, which has to include the much higher level cql_test_env to accommodate it. Fix this by having sstables_manager expose its dependency on system_keyspace as an interface, sstables_registry, and have system_keyspace implement the glue logic in system_keyspace_sstables_manager. Closes scylladb/scylladb#17868	2024-03-18 20:38:07 +03:00
Avi Kivity	72bbe75d5b	Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec This PR fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because tablet replica allocation for rebuild will not be able to find a viable destination, as the replacing node is not considered to be a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is in left state and replacing node is in normal state. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes https://github.com/scylladb/scylladb/issues/17025 Nodes in the left state will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. Currently left nodes are never removed from topology, so will accumulate in memory. We could garbage-collect them from topology coordinator if a left node is absent in any replica set. That means we need a new state - left_for_real. Closes scylladb/scylladb#17388 * github.com:scylladb/scylladb: test: py: Add test for view replica pairing after replace raft, api: Add RESTful API to query current leader of a raft group test: test_tablets_removenode: Verify replacing when there is no spare node doc: topology-on-raft: Document replace behavior with tablets tablets, raft topology: Rebuild tablets after replacing node is normal tablets: load_balancer: Access node attributes via node struct tablets: load_balancer: Extract ensure_node() mv: Switch to using host_id-based replica set effective_replication_map: Introduce host_id-based get_replicas() raft topology: Keep nodes in the left state to topology tablets: Introduce read_required_hosts()	2024-03-18 16:16:08 +02:00
Wojciech Mitros	efcb718e0a	mv: adjust memory tracking of single view updates within a batch Currently, when dividing memory tracked for a batch of updates we do not take into account the overhead that we have for processing every update. This patch adds the overhead for single updates and joins the memory calculation path for batches and their parts so that both use the same overhead. Fixes #17854 Closes scylladb/scylladb#17855	2024-03-18 14:31:54 +02:00
Avi Kivity	731b5c5120	schema_tables: unfreeze frozen_mutation:s gently With large schemas, unfreezing can stall, especially as it requires a lot of memory. Switch to a gentle version that will not stall.	2024-03-17 17:46:02 +02:00
Kefu Chai	8bab51733f	db: add fmt::formatter for db::functions::function before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::functions::function`. please note, because we use `std::ostream` as the parameter of the polymorphism implementation of `function::print()`. without an intrusive change, we have to use `fmt::ostream_formatter` or at least use similar technique to format the `function` instance into an instance of `ostream` first. so instead of implementing a "native" `fmt::formatter`, in this change, we just use `fmt::ostream_formatter`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17832	2024-03-16 17:36:49 +02:00
Tomasz Grabiec	9b656ec2aa	mv: Switch to using host_id-based replica set This is necessary to not break replica pairing between base and view. After replacing a node, tablet replica set contains for a while the replaced node which is in the left state. This node is not returned by the IP-based get_natural_endpoints() so the replica indexes would shift, changing the pairing with the view. The host_id-based replica set always has stable indexes for replicas.	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	61b3453552	raft topology: Keep nodes in the left state to topology Those nodes will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. We load topology infromation only for left nodes which are actually referenced by any tablet. To achieve that, topology loading code queries system.tablet for the set of hosts. This set is then passed to system.topology loading method which decides whether to load replica_state for a left node or not.	2024-03-15 11:05:29 +01:00
Botond Dénes	279e496133	db/virtual_tables: token_ring_table: add tablet support For keyspaces which use tablets, we describe each table separately.	2024-03-15 04:23:20 -04:00
Botond Dénes	61b6ac7ffe	db/virtual_tables: token_ring_table: add table_name column As the first clustering column. For vnode keyspaces, this will always be "ALL", for tablet keyspaces, this will contain the name of the described table.	2024-03-15 04:23:20 -04:00
Botond Dénes	fdef62c232	db/virtual_tables: token_ring_table: extract ring emit Into a separate method. For vnodes there is a single ring per keyspace, but for tablets, there is a separate ring for each table in the keyspace. To accomodate both, we move the code emitting the ring into a separate method, so execute() can just call it once per keyspace or once per table, whichever appropriate.	2024-03-15 04:23:20 -04:00
Tomasz Grabiec	8c5d088928	Merge 'Drop tablets of dropped views and indices' from Benny Halevy This series adds notification before dropping views and indices so that the tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets. Additional unit tests were added for these cases. Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path. This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series. Fixes #17627 Closes scylladb/scylladb#17773 * github.com:scylladb/scylladb: migration_manager: notify before_drop_column_family when dropping indices schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices migration_manager: notify before_drop_column_family before dropping views cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table tablet_allocator: on_before_drop_column_family: remove unused result variable	2024-03-14 22:52:29 +01:00
Benny Halevy	5bfca73b30	migration_manager: notify before_drop_column_family when dropping indices Fixes #17627 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:12 +02:00
Benny Halevy	9cf6a2e510	schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices When dropping indices, we don't need to go through `create_view_for_index` in order to drop the index. That actually creates a new schema for this view which is used just for its metadata for generating mutations dropping it. Instead, use `find_schema` to lookup the current schema for the dropped index. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:11 +02:00
Piotr Smaron	ad2d039e3d	db: move all group 0 tables to schema commitlog This is to have durability for the group0 tables. But also because I need it specifially to make `system.topology` & `system_schema.scylla_keyspaces` mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723 Fixes: #15596 Closes scylladb/scylladb#17783	2024-03-14 13:33:30 +01:00
Kefu Chai	926fe29ebd	db: commitlog: add fmt::formatter for commitlog types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * db::commitlog::segment::cf_mark * db::commitlog::segment_manager::named_file * db::commitlog::segment_manager::dispose_mode * db::commitlog::segment_manager::byte_flow<T> please note, the formatter of `db::commitlog::segment` is not included in this commit, as we are formatting it in the inline definition of this class. so we cannot define the specialization of `fmt::formatter` for this class before its callers -- we'd either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`. either way, it's different from the theme of this commit, and we will handle it in a separated commit. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17792	2024-03-14 09:28:12 +02:00
Pavel Emelyanov	3a734facc7	view_builder: Complete build step early if reader produces nothing Builder works in "steps". Each step runs for a given base table, when a new view is created it either initiates a step or appends to currently running step. Running a step means reading mutations from local sstables reader and applying them to all views that has jumped into this step so far. When a view is added to the step it remembers the current token value the step is on. When step receives end-of-stream it rewinds to minimal-token. Rewinding is done by closing current reader and creating a new one. Each time token is advanced, all the views that meet the new token value for the second time (i.e. -- scan full round) are marked as built and are removed from step. When no views are left on step, it finishes. The above machinery can break when rewinding the end-of-stream reader. The trick is that a running step silently assumes that if the reader once produced some token (and there can be a view that remembered this token as its starting one), then after rewinding the reader would generate the same token or greater. With tablets, however, that's not the case. When a node is decommissioned tablets are cleaned and all sstables are removed. Rewinding a reader after it makes empty reader that produces no tokens from now on. Respectively, any build steps that had captured tokens prior to cleanup would get stuck forever. The fix is to check if the mutation consumer stepped at least one step forward after rewind, and if no -- complete all the attached views. fixes: #17293 Similar thing should happen if the base table is truncated with views being built from it. Testing it steps on compaction assertion elsewhere and needs more research. refs: #17543 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17548	2024-03-12 14:58:47 +02:00
Nadav Har'El	19bcea6216	materialized views: fix rare failure caused by empty update This one-line patch fixes a failure in the dtest lwt_schema_modification_test.py::TestLWTSchemaModification ::test_table_alter_delete Where an update sometimes failed due to an internal server error, and the log had the mysterious warning message: "std::logic_error (Empty materialized view updated)" We've also seen this log-message in the past in another user's log, and never understood what it meant. It turns out that the error message was generated (and warning printed) while building view updates for a base-table mutation, and noticing that the base mutation contains an empty row - a row with no cells or tombstone or anything whatsoever. This case was deemed (8 years ago, in `d5a61a8c48`) unexpected and nonsensical, and we threw an exception. But this case actually can happen - here is how it happened in test_table_alter_delete - which is a test involving a strange combination of materialized views, LWT and schema changes: 1. A table has a materialized view, and also a regular column "int_col". 2. A background thread repeatedly drops and re-creates this column int_col. 3. Another thread deletes rows with LWT ("IF EXISTS"). 4. These LWT operations each reads the existing row, and because of repeated drop-and-recreate of the "int_col" column, sometimes this read notices that one node has a value for int_col and the other doesn't, and creates a read-repair mutation setting int_col (the difference between the two reads includes just in this column). 5. The node missing "int_col" receives this mutation which sets only int_col. It upgrade()s this mutation to its most recent schema, which doesn't have int_col, so it removes this column from the mutation row - and is left with a completely empty mutation row. This completely empty row is not useful, but upgrade() doesn't remove it. 6. The view-update generation code sees this empty base-mutation row and fails it with this std::logic_error. 7. The node which sent the read-repair mutation sees that the read repair failed, so it fails the read and therefore fails the LWT delete operation. It is this LWT operation which failed in the test, and caused the whole test to fail. The fix is trivial: an empty base-table row mutation should simply be ignored when generating view updates - it shouldn't cause any error. Before this patch, test_table_alter_delete used to fail in roughly 20% of the runs on my laptop. After this patch, I ran it 100 times without a single failure. Fixes #15228 Fixes #17549 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17607	2024-03-07 12:00:43 +02:00
Kamil Braun	19b816bb68	Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz This patch series makes all auth writes serialized via raft. Reads stay eventually consistent for performance reasons. To make transition to new code easier data is stored in a newly created keyspace: system_auth_v2. Internally the difference is that instead of executing CQL directly for writes we generate mutations and then announce them via raft group0. Per commit descriptions provide more implementation details. Refs https://github.com/scylladb/scylladb/issues/16970 Fixes https://github.com/scylladb/scylladb/issues/11157 Closes scylladb/scylladb#16578 * github.com:scylladb/scylladb: test: extend auth-v2 migration test to catch stale static test: add auth-v2 migration test test: add auth-v2 snapshot transfer test test: auth: add tests for lost quorum and command splitting test: pylib: disconnect driver before re-connection test: adjust tests for auth-v2 auth: implement auth-v2 migration auth: remove static from queries on auth-v2 path auth: coroutinize functions in password_authenticator auth: coroutinize functions in standard_role_manager auth: coroutinize functions in default_authorizer storage_service: add support for auth-v2 raft snapshots storage_service: extract getting mutations in raft snapshot to a common function auth: service: capture string_view by value alternator: add support for auth-v2 auth: add auth-v2 write paths auth: add raft_group0_client as dependency cql3: auth: add a way to create mutations without executing cql3: run auth DML writes on shard 0 and with raft guard service: don't loose service_level_controller when bouncing client_state auth: put system_auth and users consts in legacy namespace cql3: parametrize keyspace name in auth related statements auth: parametrize keyspace name in roles metadata helpers auth: parametrize keyspace name in password_authenticator auth: parametrize keyspace name in standard_role_manager auth: remove redundant consts auth::meta::*::qualified_name auth: parametrize keyspace name in default_authorizer db: make all system_auth_v2 tables use schema commitlog db: add system_auth_v2 tables db: add system_auth_v2 keyspace	2024-03-06 10:11:33 +01:00
Dawid Medrek	b36becc1f3	db/hints: Fix too_many_in_flight_hints_for The semantics of the function was accidentally modified in `6e79d64`. The consequence of the change was that we didn't limit memory consumption: the function always returned false for any node different from the local node. The returned value is used by storage_proxy to decide whether it is able to store a hint or not. This commit fixes the problem by taking other nodes into consideration again. Fixes #17636 Closes scylladb/scylladb#17639	2024-03-06 09:48:30 +02:00
Marcin Maliszkiewicz	ebb0ffeb6c	auth: implement auth-v2 migration During raft topology upgrade procedure data from system_auth keyspace will be migrated to system_auth_v2. Migration works mostly on top of CQL layer to minimize amount of new code introduced, it mostly executes SELECTs on old tables and then INSERTs on new tables. Writes are not executed as usual but rather announced via raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	5a6d4dbc37	storage_service: add support for auth-v2 raft snapshots This patch adds new RPC for pulling snapshot of auth tables.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	d3679de1d2	db: make all system_auth_v2 tables use schema commitlog	2024-03-01 10:40:29 +01:00

... 25 26 27 28 29 ...

4972 Commits