scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	d92287f997	db: multishard: Obtain sharder from erm This is not strictly necessary, as the multishard reader will be later avoided altogether for tablet-based tables, but it is a step towards converting all code to use the erm->get_sharder() instead of schema::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	2303466375	db: schema: Attach table pointer to schema This will make it easier to access table proprties in places which only have schema_ptr. This is in particular useful when replacing dht::shard_of() uses with s->table().shard_of(), now that sharding is no longer static, but table-specific. Also, it allows us to install a guard which catches invalid uses of schema::get_sharder() on tablet-based tables. It will be helpful for other uses as well. For example, we can now get rid of the static_props hack.	2023-06-21 00:58:24 +02:00
Kamil Braun	cf120e46b8	db: system_keyspace: refactor local system table creation code `system_keyspace_make` would access private fields of `database` in order to create local system tables (creating the `keyspace` and `table` in-memory structures, creating directory for `system` and `system_schema`). Extract this part into `database::create_local_system_table`. Make `database::add_column_family` private.	2023-06-18 13:39:27 +02:00
Kamil Braun	3f04a5956c	replica: database: remove `is_bootstrap` argument from create_keyspace Unused.	2023-06-18 13:39:27 +02:00
Kamil Braun	4ca149c1f0	replica: database: remove redundant `keyspace::get_erm_factory()` getter `keyspace` can simply access its private field.	2023-06-18 13:39:27 +02:00
Pavel Emelyanov	5412c7947a	backlog_controller: Unwrap scheduling_group Some time ago (`997a34bf8c`) the backlog controller was generalized to maintain some scheduling group. Back then the group was the pair of seastar::scheduling_group and seastar::io_priority_class. Now the latter is gone, so the controller's notion of what sched group is can be relaxed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14266	2023-06-16 12:02:14 +03:00
Wojciech Mitros	89b6c84b49	database: remove unused header After recent changes, all wasm related logic has been moved from the database class to the query_processor. As a result, the wasm headers no longer need to be included there, and in particular, files that include replica/database.hh no longer need to wait on the generated header rust/wasmtime_bindings.hh to compile. Fixes #14224 Closes #14223	2023-06-14 12:33:20 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Pavel Emelyanov	29d80d1fe9	keyspace: Introduce init_storage() Similarly to class table, the keyspace class also needs to create directory for itself for some reason. It looks excessive as table creation would call recursive_touch_directory() and would create the ks directory too, but this call is there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:46 +03:00
Pavel Emelyanov	93d8240bfb	keyspace: Remove column_family_directory() It's no longer used outside of make_column_family_config(). Not to encourage people to use it -- drop it and open-code into that single caller Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:43 +03:00
Pavel Emelyanov	0e50fc609c	table: Introduce destroy_storage() When table is DROP-ed the directory with all its sstables is removed (unless it contains snapshots). Wrap this into table.destroy_storage() method, later it will need to become sstable::storage-specific Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:43 +03:00
Pavel Emelyanov	7ae49f513e	table: Simplify init_storage() There's no need in copying the datadirs vector to call parallel_for_each upon. The datadirs[0] is in fact datadir field. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:43 +03:00
Pavel Emelyanov	99dfade020	table: Coroutinize init_storage() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:43 +03:00
Pavel Emelyanov	a19b8af187	table: Relocate ks.make_directory_for_column_family() This method initializes storage for table naturally belongs to that class. So rename it while moving. Also, there's no longer need to carry table name and uuid as arguments, being table method it can just get the paths to work on from config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:15:41 +03:00
Pavel Emelyanov	5aea6938ae	commitlog: Introduce and use comitlog sched group Nowadays all commitlog code runs in whatever sched group it's kicked from. Since IO prio classes are going to be inherited from the current sched group the commitlog IO loops should be moved into commitlog sched group, not inherit a "random" one. There are currently two places that need correct context for IO -- the .cycle() method and segments replenisher. `$ perf-simple-query --write -c2` results --- Before the patch --- 194898.36 tps ( 56.3 allocs/op, 12.7 tasks/op, 54307 insns/op, 0 errors) 199286.23 tps ( 56.2 allocs/op, 12.7 tasks/op, 54375 insns/op, 0 errors) 199815.84 tps ( 56.2 allocs/op, 12.7 tasks/op, 54377 insns/op, 0 errors) 198260.98 tps ( 56.3 allocs/op, 12.7 tasks/op, 54380 insns/op, 0 errors) 198572.86 tps ( 56.2 allocs/op, 12.7 tasks/op, 54371 insns/op, 0 errors) median 198572.86 tps ( 56.2 allocs/op, 12.7 tasks/op, 54371 insns/op, 0 errors) median absolute deviation: 713.36 maximum: 199815.84 minimum: 194898.36 --- After the patch --- 194751.80 tps ( 56.3 allocs/op, 12.7 tasks/op, 54331 insns/op, 0 errors) 199084.70 tps ( 56.2 allocs/op, 12.7 tasks/op, 54389 insns/op, 0 errors) 195551.47 tps ( 56.3 allocs/op, 12.7 tasks/op, 54385 insns/op, 0 errors) 197953.47 tps ( 56.3 allocs/op, 12.7 tasks/op, 54386 insns/op, 0 errors) 198710.00 tps ( 56.3 allocs/op, 12.7 tasks/op, 54387 insns/op, 0 errors) median 197953.47 tps ( 56.3 allocs/op, 12.7 tasks/op, 54386 insns/op, 0 errors) median absolute deviation: 1131.24 maximum: 199084.70 minimum: 194751.80 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14005	2023-05-23 21:25:57 +03:00
Piotr Smaroń	5f6491987d	Deregister table's metrics when disposing a table to work around #8627 The metrics that are being deregistered (in this PR) cause Scylla to crash when a table is dropped, but the corresponding table object in memory is not yet deallocated, and a new table with the same name is created. This caused a double-metrics-registration exception to be thrown. In order to avoid it, we are deregistering table's metrics as soon as the table is marked to be disposed from the database. Table's representation in memory can still live, but shouldn't forbid other table with the same name to be created. Fixes #13548 Closes #13971	2023-05-23 18:41:51 +03:00
Pavel Emelyanov	5216dcb1b3	Merge 'db/system_keyspace: remove the dependency on storage_proxy' from Botond Dénes The `system_keyspace` has several methods to query the tables in it. These currently require a storage proxy parameter, because the read has to go through storage-proxy. This PR uses the observation that all these reads are really local-replica reads and they only actually need a relatively small code snippet from storage proxy. These small code snippets are exported into standalone function in a new header (`replica/query.hh`). Then the system keyspace code is patched to use these new standalone functions instead of their equivalent in storage proxy. This allows us to replace the storage proxy dependency with a much more reasonable dependency on `replica::database`. This PR patches the system keyspace code and the signatures of the affected methods as well as their immediate callers. Indirect callers are only patched to the extent it was needed to avoid introducing new includes (some had only a forward-declaration of storage proxy and so couldn't get database from it). There are a lot of opportunities left to free other methods or maybe even entire subsystems from storage proxy dependency, but this is not pursued in this PR, instead being left for follow-ups. This PR was conceived to help us break the storage proxy -> storage service -> system tables -> storage proxy dependency loop, which become a major roadblock in migrating from IP -> host_id. After this PR, system keyspace still indirectly depends on storage proxy, because it still uses `cql3::query_processor` in some places. This will be addressed in another PR. Refs: #11870 Closes #13869 * github.com:scylladb/scylladb: db/system_keyspace: remove dependency on storage_proxy db/system_keyspace: replace storage_proxy::query*() with replica:: equivalent replica: add query.hh	2023-05-18 10:53:27 +03:00
Pavel Emelyanov	c3fca9481c	replica: Use global_table_ptr in distributed loader The loader has very similar global_column_family_ptr class for its distributed loadings. Now it can use the "standard" one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 18:14:34 +03:00
Pavel Emelyanov	d7f99d031d	replica: Make global_table_ptr a class Right now all users of global_table know it's a vector and reference its elements with this_shard_id() index. Making the global_table_ptr a class makes it possible to stop using operator[] and "index" this_shard_id() in its -> and * operators. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 18:14:34 +03:00
Pavel Emelyanov	b4a8843907	replica: Add type alias for vector of foreign lw-pointers This is to convert the global_table_ptr into a class with less bulky patch further Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 18:14:34 +03:00
Pavel Emelyanov	fffe3e4336	replica: Put get_table_on_all_shards() to header This is to share it with distributed loader some time soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 18:14:34 +03:00
Pavel Emelyanov	f974617c79	replica: Rewrite get_table_on_all_shards() Use sharded<database>::invoke_on_all() instead of open-coded analogy. Also don't access database's _column_families directly, use the find_column_family() method instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 18:14:34 +03:00
Botond Dénes	157fdb2f6d	db/system_keyspace: remove dependency on storage_proxy The methods that take storage_proxy as argument can now accept a replica::database instead. So update their signatures and update all callers. With that, system_keyspace.* no longer depends on storage_proxy directly.	2023-05-12 07:27:55 -04:00
Botond Dénes	f5d41ac88c	replica: add query.hh Containing utility methods to query data from the local replica. Intended to be used to read system tables, completely bypassing storage proxy in the process. This duplicates some code already found in storage proxy, but that is a small price to pay, to be able to break some circular dependencies involving storage proxy, that have been plaguing us since time immemorial. One thing we lose with this, is the smp service level using in storage proxy. If this becomes a problem, we can create one in database and use it in these methods too. Another thing we lose is increasing `replica_cross_shard_ops` storage proxy stat. I think this is not a problem at all as these new functions are meant to be used by internal users, which will reduce the internal noise in this metric, which is meant to indicate users not using shard-aware clients.	2023-05-12 07:26:18 -04:00
Pavel Emelyanov	a59096aa70	sstables, database: Move object storage config maintenance onto storage_manager Right now the map<endpoint, config> sits on the sstables manager and its update is governed by database (because it's peering and can kick other shards to update it as well). Having the sharded<storage_manager> at hand lets freeing database from the need to update configs and keeps sstables_manager a bit smaller. Also this will allow keeping s3 clients shared between sstables via this map by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:00 +03:00
Pavel Emelyanov	2153751d45	sstables: Introduce sharded<storage_manager> The manager in question keeps track of whatever sstables_manager needs to work with the storage (spoiler: only S3 one). It's main-local sharded peering service, so that container() call can be used by next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:36:01 +03:00
Michał Chojnowski	0813fa1da0	database: fix reads_memory_consumption for system semaphore The metric shows the opposite of what its name suggests. It shows available memory rather than consumed memory. Fix that. Fixes #13810 Closes #13811	2023-05-09 06:42:43 +03:00
Avi Kivity	f125a3e315	Merge 'tree: finish the reader_permit state renames' from Botond Dénes In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`. This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date. Closes #13573 * github.com:scylladb/scylladb: reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes reader_concurrency_semaphore: update API w.r.t. recent permit state name changes reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes	2023-05-04 18:29:04 +03:00
Pavel Emelyanov	bd1e3c688f	sstables_manager: Keep object storage configs onboard The user sstables manager will need to provide endpoint config for sstables' storage drivers. For that it needs to get it from db::config and keep in-sync with its updates. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Tomasz Grabiec	94e1c7b859	db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() This allows update_pending_ranges(), invoked on keyspace creation, to succeed in the presence of keyspaces with per-table replication strategy. It will update only vnode-based erms, which is intended behavior, since only those need pending ranges updated. This change will also make node operations like bootstrap, repair, etc. to work (not fail) in the presence of keyspaces with per-table erms, they will just not be replicated using those algorithms. Before, these would fail inside get_effective_replication_map(), which is forbidden for keyspaces with per-table replication.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	dc04da15ec	db: Introduce get_non_local_vnode_based_strategy_keyspaces() It's meant to be used in places where currently get_non_local_strategy_keyspaces() is used, but work only with keyspaces which use vnode-based replication strategy.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	9b17ad3771	locator: Introduce per-table replication strategy Will be used by tablet-based replication strategies, for which effective replication map is different per table. Also, this patch adapts existing users of effective replication map to use the per-table effective replication map. For simplicity, every table has an effective replication map, even if the erm is per keyspace. This way the client code can be uniform and doesn't have to check whether replication strategy is per table. Not all users of per-keyspace get_effective_replication_map() are adapted yet to work per-table. Those algorithms will throw an exception when invoked on a keyspace which uses per-table replication strategy.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	d3c9ad4ed6	locator: Rename effective_replication_map to vnode_effective_replication_map In preparation for introducing a more abstract effective_replication_map which can describe replication maps which are not based on vnodes.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	7b01fe8742	db: Propagate feature_service to abstract_replication_strategy::validate_options() Some replication strategy options may be feature-dependent.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	a892e144cc	db: Log replication strategy for debugging purposes	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	7543c75b62	db: Log full exception on error in do_parse_schema_tables()	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	c923bdd222	db: keyspace: Remove non-const replication strategy getter Keyspace will store replication_ptr, which is a const pointer. No user needs a mutable reference.	2023-04-24 10:49:36 +02:00
Pavel Emelyanov	5e201b9120	database: Remove compaction_manager.hh inclusion into database.hh The only reason why it's there (right next to compaction_fwd.hh) is because the database::table_truncate_state subclass needs the definition of compaction_manager::compaction_reenabler subclass. However, the former sub is not used outside of database.cc and can be defined in .cc. Keeping it outside of the header allows dropping the compaction_manager.hh from database.hh thus greatly reducing its fanout over the code (from ~180 indirect inclusions down to ~20). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13622	2023-04-23 16:27:11 +03:00
Botond Dénes	804403f618	reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes They is still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:20:42 -04:00
Kefu Chai	59579d5876	utils: fragment_range: specialize fmt::formatter<FragmentedView> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print classes fulfill the requirement of `FragmentedView` concept without the help of template function of `to_hex()`, this function is dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `fragment_to_hex()` is dropped as well, its only caller is `to_hex()`. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13471	2023-04-11 16:09:38 +03:00
Pavel Emelyanov	e34b86dd61	system_keyspace: Plug to user sstables manager too The sharded<sys_ks> instances are plugged to large data handler and compaction manager to maintain the circular dependency between these components via the interposing database instance. Do the same for user sstables manager, because S3 driver will need to update the local ownership table. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Tomasz Grabiec	4d6443e030	Merge 'Schema commitlog separate dir' from Gusev Petr The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level. A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive. This is expected to be released in 5.3. As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here. Fixes: #11867 Closes #13263 * github.com:scylladb/scylladb: commitlog: use separate directory for schema commitlog schema commitlog: fix commitlog_total_space_in_mb initialization	2023-03-30 23:48:58 +02:00
Petr Gusev	0152c000bb	commitlog: use separate directory for schema commitlog The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in commitlog::descriptor::descriptor, which is logged with the WARN level. A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new schema_commitlog_directory parameter to move the schema commitlog to another disk drive. By default, the schema commitlog directory is nested in the commitlog_directory. This can help avoid problems during an upgrade if the commitlog_directory in the custom scylla.yaml is located on a separate disk partition. This is expected to be released in 5.3. As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here. Fixes: #11867	2023-03-30 21:55:50 +04:00
Petr Gusev	f31bd26971	schema commitlog: fix commitlog_total_space_in_mb initialization It seems there was a typo here, which caused commitlog_total_space_in_mb to always be zero and the schema commitlog to be effectively unlimited in size.	2023-03-30 21:55:50 +04:00
Pavel Emelyanov	92318fdeae	Merge 'Initialize Wasm together with query_processor' from Wojciech Mitros The wasm engine is moved from replica::database to the query_processor. The wasm instance cache and compilation thread runner were already there, but now they're also initialized in the query_processor constructor. By moving the initialization to the constructor, we can now be certain that all wasm-related objects (wasm instance cache, compilation thread runner, and wasm engine, which was already passed in the constructor) are initialized when we try to use them because we have to use the query processor to access them anyway. The change is also motivated by the fact that we're planning to take Wasm UDFs out of experimental, after which they should stop getting special treatment. Closes #13311 * github.com:scylladb/scylladb: wasm: move wasm initialization to query_processor constructor wasm: return wasm instance cache as a reference instead of a pointer wasm: move wasm engine to query_processor	2023-03-30 14:30:23 +03:00
Pavel Emelyanov	a95d3446fd	table: Carry v.u.generator down to do_push_view_replica_updates() The latter is the place where mutate_MV is called and it needs the view updates generator nearby. The call-stack starts at database::do_apply(). As was described in one of the previous patches, applying mutations that need updating views happen late enough, so if the view updates generator is not plugged to the database yet, it's OK to bail out with exception. If it's plugged, it's carried over thus keeping the generator instance alive and waited for on its stop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:12:01 +03:00
Pavel Emelyanov	d5557ef0e2	view: Plug view update generator to database The database is low-level service and currently view update generator implicitly depend on it via storage proxy. However, database does need to push view updates with the help of mutate_MV helper, thus adding the dependency loop. This patch exploits the fact that view updates start being pushed late enough, by that time all other service, including proxy and view update generator, seem to be up and running. This allows a "weak dependency" from database to view update generator, like there's one from database to system keyspace already. So in this patch the v.u.g. puts the shared-from-this pointer onto the database at the time it starts. On stop it removes this pointer after database is drained and (hopefully) all view updates are pushed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:09:49 +03:00
Wojciech Mitros	60c99b4c47	wasm: move wasm engine to query_processor The wasm engine is used for compiling and executing Wasm UDFs, so the query_processor is a more appropriate location for it than replica::database, especially because the wasm instance cache and the wasm alien thread runner are already there. This patch also reduces the number of wasm engines to 1, shared by all shards, as recommended by the wasmtime developers.	2023-03-28 17:41:30 +02:00

1 2 3 4 5 ...

265 Commits