scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Eliran Sinvani	eb368f9f6e	internal_keyspace extention: enhance the semantics also to flushes commit `7c8c020` introduced a new type of a keyspace, an internal keyspace It defined the semantics for this internal keyspace, this keyspace is somewhat a hybrid between system and user keyspace. Here we extend the semantics to include also flushes, meaning that flushes will be done using the system dirty_mamory_manager. This is in order to allow inter dependencies between internal tables and user tables and prevent deadlocks. One example of such a deadlock is our `replicated_key_provider` encryption on the enterprise version. The deadlock occur because in some circumstances, an encrypted user table flush is dependant upon the `encrypted_keys` table being flushed but since the requests are serialized, we get a deadlock. Tests: unit tests dev + debug The deadlock dtest reproducer: encryption_at_rest_test.py::TestEncryptionAtRest::test_reboot Fixes #14529 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #14547	2023-08-21 18:17:05 +03:00
Raphael S. Carvalho	b578d6643f	Kill scylla option to configure number of compaction groups The option was introduced to bootstrap the project. It's still useful for testing, but that translates into maintaining an additional option and code that will not be really used outside of testing. A possible option is to later map the option in boost tests to initial_tablets, which may yield the same effect for testing. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:53 -03:00
Patryk Jędrzejczak	e7077da12d	replica: reduce the size limit of the schema commitlog The size of the schema commitlog is incorrectly set to 10 TB. To avoid wasting space, we reduce it to 2 * schema commitlog segment size. Closes #14946	2023-08-14 20:41:15 +02:00
Pavel Emelyanov	fa93ac9bfd	database: Add wasm::manager& dependency The dependency is needed by db::schema_tables to get wasm manager for its needs. This patch prepares the ground. Now the wasm::manager is shared between replica::database and cql3::query_processor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Botond Dénes	00a62866ac	Merge 'Make database::add_column_family exception safe.' from Aleksandra Martyniuk If some state update in database::add_column_family throws, info about a column family would be inconsistent. Undo already performed operations in database::add_column_family when one throws. Fixes: #14666. Closes #14672 * github.com:scylladb/scylladb: replica: undo the changes if something fails replica: start table earlier in database::add_column_family	2023-08-04 10:58:17 +03:00
Aleksandra Martyniuk	1e9b2972ea	replica: undo the changes if something fails If a step of adding a table fails, previous steps are undone.	2023-08-03 17:37:31 +02:00
Aleksandra Martyniuk	9f68566038	replica: start table earlier in database::add_column_family In database::add_column_family table::start() is called before a table is registered in different structures.	2023-08-02 16:35:34 +02:00
Amnon Heiman	d10a3dd19a	config: add enable_node_table_metrics flag By default, per-table-per-shard metrics reporting is turned off, and the aggregated version of the metrics (per-table-per-node) will be turned on. There could be a situation where a user with an excessive number of tables would suffer from performance issues, both from the network and the metrics collection server. This patch adds a config option, enable_node_table_metrics, which allows users to turn off per-table metrics reporting altogether. For example, when running Scylla with the command line argument '--enable-node-aggregated-table_metrics 0' per-table metrics will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2023-08-02 10:20:18 +03:00
Botond Dénes	4a02865ea1	Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk Maps related to column families in database are extracted to a column_families_data class. Access to them is possible only through methods. All methods which may preempt hold rwlock in relevant mode, so that the iterators can't become invalid. Fixes: #13290 Closes #13349 * github.com:scylladb/scylladb: replica: make tables_metadata's attributes private replica: add methods to get a filtered copy of tables map replica: add methods to check if given table exists replica: add methods to get table or table id replica: api: return table_id instead of const table_id& replica: iterate safely over tables related maps replica: pass tables_metadata to phased_barrier_top_10_counts replica: add methods to safely add and remove table replica: wrap column families related maps into tables_metadata replica: futurize database::add_column_family and database::remove	2023-07-31 15:31:59 +03:00
Botond Dénes	72043a6335	Merge 'Avoid using qctx in schema_tables' column-mapping queries' from Pavel Emelyanov There are three methods in system_keyspace namespace that run queries over `system.scylla_table_schema_history` table. For that they use qctx which's not nice. Fortunately, all the callers already have the system_keyspace& local variable or argument they can pass to those methods. Since the accessed table belongs to system keyspace, the latter declares the querying methods as "friends" to let them get private `query_processor& _qp` member Closes #14876 * github.com:scylladb/scylladb: schema_tables: Extract query_processor from system_keyspace for querying schema_tables: Add system_keyspace& argument to ..._column_mapping() calls migration_manager: Add system_keyspace argument to get_schema_mapping()	2023-07-31 15:00:59 +03:00
Pavel Emelyanov	cf4d4d7e9b	schema_tables: Add system_keyspace& argument to ..._column_mapping() calls The callers all have local sys_ks argument: - merge_tables_and_views() - service::get_column_mapping() - database::parse_system_tables() And a test that can get it from cql_test_env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-28 15:55:13 +03:00
Botond Dénes	b599f15b26	replica: make_[multishard_]streaming_reader(): make compaction_time mandatory Now that all users have opted in unconditionally, there is no point in keeping this optional. Make it mandatory to make sure there are no opt-out by mistake. The global override via enable_compacting_data_for_streaming_and_repair config item still remains, allowing compaction to be force turned-off.	2023-07-27 04:57:52 -04:00
Botond Dénes	2f8d77e97b	replica/table: add optional compacting to make_multishard_streaming_reader() Doing to make_multishard_streaming_reader() what the previous commit did to make_streaming_reader(). In fact, the new compaction_time parameter is simply forwarded to the make_streaming_reader() on the shard readers. Call sites are updated, but none opt in just yet.	2023-07-27 03:22:11 -04:00
Botond Dénes	42b0dd5558	replica/table: add optional compacting to make_streaming_reader() Opt-in is possible by passing an engaged `compaction_time` (gc_clock::time_point) to the method. When this new parameter is disengaged, no compaction happens. Note that there is a global override, via the enable_compacting_data_for_streaming_and_repair config item, which can force-disable this compaction. Compaction done on the output of the streaming reader does not garbage-collect tombstones! All call-sites are adjusted (the new parameter is not defaulted), but none opt in yet. This will be done in separate commit per user.	2023-07-27 03:22:11 -04:00
Avi Kivity	ff1f461a42	Merge 'Introduce tablet load balancer' from Tomasz Grabiec After this series, tablet replication can handle the scenario of bootstrapping new nodes. The ownership is distributed indirectly by the means of a load-balancer which moves tablets around in the background. See docs/dev/topology-over-raft.md for details. The implementation is by no means meant to be perfect, especially in terms of performance, and will be improved incrementally. The load balancer will be also kicked by schema changes, so that allocation/deallocation done during table creation/drop will be rebalanced. Tablet data is streamed using existing `range_streamer`, which is the infrastructure for "the old streaming". This will be later replaced by sstable transfer once integration of tablets with compaction groups is finished. Also, cleanup is not wired yet, also blocked by compaction group integration. Closes #14601 * github.com:scylladb/scylladb: tests: test_tablets: Add test for bootstraping a node storage_service: topology_coordinator: Implement tablet migration state machine tablets: Introduce tablet_mutation_builder service: tablet_allocator: Introduce tablet load balancer tablets: Introduce tablet_map::for_each_tablet() topology: Introduce get_node() token_metadata: Add non-const getter of tablet_metadata storage_service: Notify topology state machine after applying schema change storage_service: Implement stream_tablet RPC tablets: Introduce global_tablet_id stream_transfer_task, multishard_writer: Work with table sharder tablets: Turn tablet_id into a struct db: Do not create per-keyspace erm for tablet-based tables tablets: effective_replication_map: Take transition stage into account when computing replicas tablets: Store "stage" in transition info doc: Document tablet migration state machine and load balancer locator: erm: Make get_endpoints_for_reading() always return read replicas storage_service: topology_coordinator: Sleep on failure between retries storage_service: topology_coordinator: Simplify coordinator loop main: Require experimental raft to enable tablets	2023-07-26 12:30:29 +03:00
Botond Dénes	ad2ddffb22	Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from - proxy::remote::handle_truncate() - schema_tables::merge_schema() - legacy_schema_migrator - tests All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx Closes #14778 * github.com:scylladb/scylladb: system_keyspace: Make save_truncation_record() non-static code: Pass sharded<db::system_keyspace>& to database::truncate() db: Add sharded<system_keyspace>& to legacy_schema_migrator	2023-07-26 08:48:49 +03:00
Tomasz Grabiec	c2b18ae483	db: Do not create per-keyspace erm for tablet-based tables This erm is not updated when replicating token metadata in storage_service::replicate_to_all_cores() so will pin token metadata version and prevent token metadata barrier from finishing. It is not necessary to have per-keyspace erm for tablet-based tables, so just don't create it.	2023-07-25 21:08:51 +02:00
Aleksandra Martyniuk	c5cad803b3	replica: add methods to get a filtered copy of tables map	2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk	ff26b2ba3f	replica: add methods to check if given table exists	2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk	6796721c3d	replica: add methods to get table or table id	2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk	e072a2341d	replica: api: return table_id instead of const table_id& Return table_id instead of const table_id& from database::find_uuid as copying table_id does not cause much overhead and simplifies methods signature.	2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk	cdbfa0b2f5	replica: iterate safely over tables related maps Loops over _column_families and _ks_cf_to_uuid which may preempt are protected by reader mode of rwlock so that iterators won't get invalid.	2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk	a21d3357c3	replica: pass tables_metadata to phased_barrier_top_10_counts	2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk	8842bd87c3	replica: add methods to safely add and remove table	2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk	52afd9d42d	replica: wrap column families related maps into tables_metadata As a preparation for ensuring access safety for column families related maps, add tables_metadata, access to members of which would be protected by rwlock.	2023-07-25 16:13:00 +02:00
Aleksandra Martyniuk	395ce87eff	replica: futurize database::add_column_family and database::remove As a preparation for further changes, database::add_column_family and database::remove return future<>.	2023-07-25 16:13:00 +02:00
Botond Dénes	a8feb7428d	Merge 'semaphore mismatch: don't throw an error if both semaphores belong to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup__querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: https://github.com/scylladb/scylla-enterprise/issues/3182 Refers: https://github.com/scylladb/scylla-enterprise/issues/3050 Closes: #14770 Closes #14736 github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one	2023-07-24 14:13:09 +03:00
Kamil Braun	e6099c4685	Merge 'config: set schema_commitlog_segment_size_in_mb to 128 ' from Patryk Jędrzejczak Fixes #14668 In #14668, we have decided to introduce a new `scylla.yaml` variable for the schema commitlog segment size and set it to 128MB. The reason is that segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. This `schema_commitlog_segment_size_in_mb variable` variable is now added to `scylla.yaml` and `db/config`. Additionally, we do not derive the commitlog sync period for schema commitlog anymore because schema commitlog runs in batch mode, so it doesn't need this parameter. It has also been discussed in #14668. Closes #14704 * github.com:scylladb/scylladb: replica: do not derive the commitlog sync period for schema commitlog config: set schema_commitlog_segment_size_in_mb to 128 config: add schema_commitlog_segment_size_in_mb variable	2023-07-24 10:23:34 +02:00
Michał Jadwiszczak	246728cbbb	querier_cache: add stats of scheduling group mismatches Add stats to count dropped queriers because of scheduling group mismatch.	2023-07-21 19:05:55 +02:00
Michał Jadwiszczak	a5fc53aa11	querier_cache: check semaphore mismatch during querier lookup Previously semaphore mismatch was checked only in multi-partition queries and if happened, an internal error was thrown. This commit pushed the check down to `querier_cache`, so each `lookup_*_querier` method will check for the mismatch. What's more, if semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning and drop cached reader instead of throwing an error. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader.	2023-07-21 19:05:50 +02:00
Michał Jadwiszczak	e5c965b280	querier_cache: add reference to `replica::database::is_user_semaphore()`	2023-07-21 18:58:57 +02:00
Pavel Emelyanov	db1c6e2255	system_keyspace: Make save_truncation_record() non-static ... and stop using qctx Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-21 13:12:50 +03:00
Pavel Emelyanov	eaeffcdb81	code: Pass sharded<db::system_keyspace>& to database::truncate() The arguments goes via the db::(drop\|truncate)_table_on_all_shards() pair of calls that start from - storage_proxy::remote: has its sys.ks reference already - schema_tables::merge_schema: has sys.ks argument already - legacy_schema_migrator: the reference was added by previous patch - tests: run in cql_test_env with sys.ks on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-21 13:11:59 +03:00
Michał Jadwiszczak	d7a3aa2698	replica:database: add method to determine if semaphore is user one Add method to compare semaphore with system ones (streaming, compaction, system read) to be able if the semaphore belongs to a user.	2023-07-20 10:24:21 +02:00
Patryk Jędrzejczak	ee1c240f2a	replica: do not derive the commitlog sync period for schema commitlog We don't want to apply the value of the commitlog_sync_period_in_ms variable to schema commitlog. Schema commitlog runs in batch mode, so it doesn't need this parameter.	2023-07-19 14:16:50 +02:00
Patryk Jędrzejczak	5b167a4ad7	config: add schema_commitlog_segment_size_in_mb variable In #14668, we have decided to introduce a new scylla.yaml variable for the schema commitlog segment size. The segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. Therefore, increasing the schema commitlog segment size is sometimes necessary.	2023-07-19 14:16:41 +02:00
Pavel Emelyanov	312184c0c7	keys: Move exploded_clustering_prefix's operator<< to keys.cc Now it sits in replicate/database.cc, but the latter is overloaded with code, worth keeping less, all the more so the ..._prefix itself lives in the keys.hh header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14748	2023-07-19 11:57:27 +03:00
Petr Gusev	f05ab33ee7	database.hh: make_multishard_streaming_reader with range parameter We add an overload of make_multishard_streaming_reader which reads all the data in the given range. We will use it later in row level repair if --smp is different on the nodes and the number of partitions is small.	2023-07-04 13:30:37 +03:00
Petr Gusev	614a1b3770	database.cc: extract streaming_reader_lifecycle_policy We are going to use it later in a new make_multishard_streaming_reader overload. In this commit we just move it outside into the anonymous namespace, no other code changes were made.	2023-07-04 13:30:37 +03:00
Pavel Emelyanov	0d4c981423	database: Remove unused proxy arg from update_keyspace_on_all_shards() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-03 14:19:54 +03:00
Pavel Emelyanov	42b9ba48de	database: Remove unused proxy arg from update_keyspace() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-03 14:19:36 +03:00
Benny Halevy	13dd92e618	database: modify_keyspace_on_all_shards: execute func first on shard 0 When creating or altering a keyspace, we create a new effective_replication_map instance. It is more efficient to do that first on shard 0 and then on all other shards, otherwise multiple shards might need to calculate to new e_r_m (and reach the same result). When the new e_r_m is "seeded" on shard 0, other shards will find it there and clone a local copy of it - which is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	ba15786059	database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards When creating, updating, or dropping keyspaces, first execute the database internal function to modify the database state, and only when all shards are updated, run the listener notifications, to make sure they would operate when the database shards are consistent with each other. Fixes #13137 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	3b8c913e61	database: add modify_keyspace_on_all_shards Run all keyspace create/update/drop ops via `modify_keyspace_on_all_shards` that will standardize the execution on all shards in the coming patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	dc9b0812e9	schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace Similar to create_keyspace_on_all_shards, `extract_scylla_specific_keyspace_info` and `create_keyspace_from_schema_partition` can be called once in the upper layer, passing keyspace_metadata& down to database::update_keyspace_on_all_shards which now would only make the per-shard keyspace_metadata from the reference it gets from the schema_tables layer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	3520c786bd	database: create_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	53a6ea8616	database: update_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 09:35:35 +03:00
Benny Halevy	9d40305ef6	database: drop_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 09:34:42 +03:00
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00

1 2 3 4 5 ...

313 Commits