scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Kefu Chai	e4697e2bd2	sstable: remove stale comment this comment should have been removed in `f014ccf369`. but better late than never. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14497	2023-07-05 15:42:11 +03:00
Pavel Emelyanov	0d4c981423	database: Remove unused proxy arg from update_keyspace_on_all_shards() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-03 14:19:54 +03:00
Pavel Emelyanov	42b9ba48de	database: Remove unused proxy arg from update_keyspace() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-03 14:19:36 +03:00
Tomasz Grabiec	a9282103ba	Merge 'Call storage_service notifications only after keyspace schema changes are applied on all shards' from Benny Halevy This series aims at hardening schema merges and preventing inconsistencies across shards by updating the database shards before calling the notification callback. As seen in #13137, we don't want to call the notifications on all shards in parallel while the database shards are in flux. In addition, any error to update the keyspace will cause abort so not to leave the database shards in an inconsistent state . Other changes optimize this path by: - updating shard 0 first, to seed the effective_replication_map. - executing `storage_service::keyspace_changed` only once, on shard 0 to prevent quadratic update of the token_metadata and e_r_m on every keyspace change. Fixes #13137 Closes #14158 * github.com:scylladb/scylladb: migration_manager: propagate listener notification exceptions storage_service: keyspace_changed: execute only on shard 0 database: modify_keyspace_on_all_shards: execute func first on shard 0 database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards database: add modify_keyspace_on_all_shards schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace database: create_keyspace_on_all_shards database: update_keyspace_on_all_shards database: drop_keyspace_on_all_shards	2023-06-29 12:17:53 +02:00
Aleksandra Martyniuk	85cc85fc5a	replica: delete unused functions and struct	2023-06-28 11:41:43 +02:00
Aleksandra Martyniuk	837d77ba8c	compaction: add reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction.	2023-06-28 11:41:43 +02:00
Botond Dénes	f5e3b8df6d	Merge 'Optimize creation of reader excluding staging for view building' from Raphael "Raph" Carvalho View building from staging creates a reader from scratch (memtable \+ sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: ``` + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()https://github.com/scylladb/scylladb/issues/1}::operator()() const::{lambda()https://github.com/scylladb/scylladb/issues/1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert ``` That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from `INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s` to `INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s` Refs https://github.com/scylladb/scylladb/issues/14089. Fixes scylladb/scylladb#14244. Closes #14364 * github.com:scylladb/scylladb: table: Optimize creation of reader excluding staging for view building view_update_generator: Dump throughput and duration for view update from staging utils: Extract pretty printers into a header	2023-06-27 07:25:30 +03:00
Raphael S. Carvalho	1d8cb32a5d	table: Optimize creation of reader excluding staging for view building View building from staging creates a reader from scratch (memtable + sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s to INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s Refs #14089. Fixes #14244. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 22:30:39 -03:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Benny Halevy	13dd92e618	database: modify_keyspace_on_all_shards: execute func first on shard 0 When creating or altering a keyspace, we create a new effective_replication_map instance. It is more efficient to do that first on shard 0 and then on all other shards, otherwise multiple shards might need to calculate to new e_r_m (and reach the same result). When the new e_r_m is "seeded" on shard 0, other shards will find it there and clone a local copy of it - which is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	ba15786059	database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards When creating, updating, or dropping keyspaces, first execute the database internal function to modify the database state, and only when all shards are updated, run the listener notifications, to make sure they would operate when the database shards are consistent with each other. Fixes #13137 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	3b8c913e61	database: add modify_keyspace_on_all_shards Run all keyspace create/update/drop ops via `modify_keyspace_on_all_shards` that will standardize the execution on all shards in the coming patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	dc9b0812e9	schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace Similar to create_keyspace_on_all_shards, `extract_scylla_specific_keyspace_info` and `create_keyspace_from_schema_partition` can be called once in the upper layer, passing keyspace_metadata& down to database::update_keyspace_on_all_shards which now would only make the per-shard keyspace_metadata from the reference it gets from the schema_tables layer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Benny Halevy	3520c786bd	database: create_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 21:08:09 +03:00
Alexey Novikov	ca4e7f91c6	compact and remove expired rows from cache on read when read from cache compact and expire row tombstones remove expired empty rows from cache do not expire range tombstones in this patch Refs #2252, #6033 Closes #12917	2023-06-26 15:29:01 +02:00
Benny Halevy	53a6ea8616	database: update_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 09:35:35 +03:00
Benny Halevy	9d40305ef6	database: drop_keyspace_on_all_shards Part of moving the responsibility for applying and notifying keyspace schema changes from schema_tables to the database so that the database can control the order of applying the changes across shards and when to notify its listeners. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 09:34:42 +03:00
Aleksandra Martyniuk	19ec5b4256	replica: delete unused function	2023-06-23 15:57:43 +02:00
Aleksandra Martyniuk	e3e2d6b886	compaction: add table_reshaping_compaction_task_impl	2023-06-23 15:57:37 +02:00
Botond Dénes	320159c409	Merge 'Compaction group major compaction task' from Aleksandra Martyniuk Task manager task covering compaction group major compaction. Uses multiple inheritance on already existing major_compaction_task_executor to keep track of the operation with task manager. Closes #14271 * github.com:scylladb/scylladb: test: extend test_compaction_task.py test: use named variable for task tree depth compaction: turn major_compaction_task_executor into major_compaction_task_impl compaction: take gate holder out of task executor compaction: extend signature of some methods tasks: keep shared_ptr to impl in task compaction: rename compaction_task_executor methods	2023-06-22 08:15:17 +03:00
Kefu Chai	f014ccf369	Revert "Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"" This reverts commit `562087beff`. The regressions introduced by the reverted change have been fixed. So let's revert this revert to resurrect the uuid_sstable_identifier_enabled support. Fixes #10459	2023-06-21 13:02:40 +03:00
Tomasz Grabiec	f6625e16ee	schema: Catch incorrect uses of schema::get_sharder() We still use it in many places in unit tests, which is ok because those tables are vnode-based. We want to check incorrect uses in production as they may lead to hard to debug consistency problems.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	29cbdb812b	dht: Rename dht::shard_of() to dht::static_shard_of() This is in order to prevent new incorrect uses of dht::shard_of() to be accidentally added. Also, makes sure that all current uses are caught by the compiler and require an explicit rename.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e48ec6fed3	db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	d92287f997	db: multishard: Obtain sharder from erm This is not strictly necessary, as the multishard reader will be later avoided altogether for tablet-based tables, but it is a step towards converting all code to use the erm->get_sharder() instead of schema::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	34ba8a6a53	db: table: Introduce shard_of() helper Saves some boiler plate code.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	36da062bcb	db: Use table sharder in compaction	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad983ac23d	sstables: Compute sstable shards using sharder from erm when loading schema::get_sharder() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should obtain the sharder from erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	17d6163548	sstables: Generate sharding metadata using sharder from erm when writing We need to keep sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	2303466375	db: schema: Attach table pointer to schema This will make it easier to access table proprties in places which only have schema_ptr. This is in particular useful when replacing dht::shard_of() uses with s->table().shard_of(), now that sharding is no longer static, but table-specific. Also, it allows us to install a guard which catches invalid uses of schema::get_sharder() on tablet-based tables. It will be helpful for other uses as well. For example, we can now get rid of the static_props hack.	2023-06-21 00:58:24 +02:00
Aleksandra Martyniuk	e317ffe23a	compaction: extend signature of some methods Extend a signature of table::compact_all_sstables and compaction_manager::perform_major_compaction so that they get the info of a covering task. This allows to easily create child tasks that cover compaction group compaction.	2023-06-20 10:45:34 +02:00
Botond Dénes	562087beff	Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai" This reverts commit `d1dc579062`, reversing changes made to `3a73048bc9`. Said commit caused regressions in dtests. We need to investigate and fix those, but in the meanwhile let's revert this to reduce the disruption to our workflows. Refs: #14283	2023-06-19 08:49:27 +03:00
Kamil Braun	33c19baabc	db: system_keyspace: take simpler service references in `make` Take references to services which are initialized earlier. The references to `gossiper`, `storage_service` and `raft_group0_registry` are no longer needed. This will allow us to move the `make` step right after starting `system_keyspace`.	2023-06-18 13:39:27 +02:00
Kamil Braun	035045c288	db: system_keyspace: remove `system_keyspace_make` The code can now be inlined in `system_keyspace::make` as we no longer access private members of `database`.	2023-06-18 13:39:27 +02:00
Kamil Braun	cf120e46b8	db: system_keyspace: refactor local system table creation code `system_keyspace_make` would access private fields of `database` in order to create local system tables (creating the `keyspace` and `table` in-memory structures, creating directory for `system` and `system_schema`). Extract this part into `database::create_local_system_table`. Make `database::add_column_family` private.	2023-06-18 13:39:27 +02:00
Kamil Braun	3f04a5956c	replica: database: remove `is_bootstrap` argument from create_keyspace Unused.	2023-06-18 13:39:27 +02:00
Kamil Braun	8848c3b809	replica: database: write a comment for `parse_system_tables`	2023-06-18 13:39:27 +02:00
Kamil Braun	4ca149c1f0	replica: database: remove redundant `keyspace::get_erm_factory()` getter `keyspace` can simply access its private field.	2023-06-18 13:39:27 +02:00
Pavel Emelyanov	900c609269	Merge 'Initialize `query_processor` early, without `messaging_service` or `gossiper`' from Kamil Braun In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`. Fixes: #14202 As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well. Closes #14256 * github.com:scylladb/scylladb: main, cql_test_env: start `query_processor` early cql3: query_processor: split `remote` initialization step cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object cql3: query_processor: make `forwarder()` private cql3: query_processor: make `get_group0_client()` private cql3: strongly_consistent_modification_statement: fix indentation cql3: query_processor: make `get_migration_manager` private tracing: remove `qp.get_migration_manager()` calls table_helper: remove `qp.get_migration_manager()` calls thrift: handler: move implementation of `execute_schema_command` to `query_processor` data_dictionary: add `get_version` cql3: statements: schema_altering_statement: move `execute0` to `query_processor` cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations` main: add missing `supervisor::notify` message	2023-06-16 17:41:08 +03:00
Tomasz Grabiec	e41ff4604d	Merge 'raft_topology: fencing and global_token_metadata_barrier' from Gusev Petr This is the initial implementation of [this spec](https://docs.google.com/document/d/1X6pARlxOy6KRQ32JN8yiGsnWA9Dwqnhtk7kMDo8m9pI/edit). * the topology version (int64) was introduced, it's stored in topology table and updated through RAFT at the relevant stages of the topology change algorithm; * when the version is incremented, a `barrier_and_drain` command is sent to all the nodes in the cluster, if some node is unavailable we fail and retry indefinitely; * the `barrier_and_drain` handler first issues a `raft_read_barrier()` to obtain the latest topology, and then waits until all requests using previous versions are finished; if this round of RPCs is finished the topology change coordinator can be sure that there are no requests inflight using previous versions and such requests can't appear in the future. * after `barrier_and_drain` the topology change coordinator issues the `fence` command, it stores the current version in local table as `fence_version` and blocks requests with older versions by throwing `stale_topology_exception`; if a request with older version was started before the fence, its reply will also be fenced. * the fencing part of the PR is for the future, when we relax the requirement that all nodes are available during topology change; it should protect the cluster from requests with stale topology from the nodes which was unavailable during topology change and which was not reached by the `barrier_and_drain()` command; * currently, fencing is implemented for `mutation` and `read` RPCs, other RPCs will be handled in the follow-ups; since currently all nodes are supposed to be alive the missing parts of the fencing doesn't break correctness; * along with fencing, the spec above also describes error handling, isolation and `--ignore_dead_nodes` parameter handling, these will be also added later; [this ticket](https://github.com/scylladb/scylladb/issues/14070) contains all that remains to be done; * we don't worry about compatibility when we change topology table schema or `raft_topology_cmd_handler` RPC method signature since the raft topology code is currently hidden by `--experimental raft` flag and is not accessible to the users. Compatibility is maintained for other affected RPCs (mutation, read) - the new `fencing_token` parameter is `rpc::optional`, we skip the fencing check if it's not present. Closes #13884 * github.com:scylladb/scylladb: storage_service: warn if can't find ip for server storage_proxy.cc: add and use global_token_metadata_barrier storage_service: exec_global_command: bool result -> exceptions raft_topology: add cmd_index to raft commands storage_proxy.cc: add fencing to read RPCs storage_proxy.cc: extract handle_read storage_proxy.cc: refactor encode_replica_exception_for_rpc storage_proxy: fix indentation storage_proxy: add fencing for mutation storage_servie: fix indentation storage_proxy: add fencing_token and related infrastructure raft topology: add fence_version raft_topology: add barrier_and_drain cmd token_metadata: add topology version	2023-06-16 12:07:31 +02:00
Pavel Emelyanov	5412c7947a	backlog_controller: Unwrap scheduling_group Some time ago (`997a34bf8c`) the backlog controller was generalized to maintain some scheduling group. Back then the group was the pair of seastar::scheduling_group and seastar::io_priority_class. Now the latter is gone, so the controller's notion of what sched group is can be relaxed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14266	2023-06-16 12:02:14 +03:00
Petr Gusev	d34da12240	storage_proxy: add fencing_token and related infrastructure A new stale_topology_exception was introduced, it's raised in apply_fence when an RPC comes with a stale fencing_token. An overload of apply_fence with future will be used to wrap the storage_proxy methods which need to be fenced.	2023-06-15 15:48:00 +04:00
Kefu Chai	2d265e860d	replica,sstable: introduce invalid generation id the invalid sstable id is the NULL of a sstable identifier. with this concept, it would be a lot simpler to find/track the greatest generation. the complexity is hidden in the generation_type, which compares the a) integer-based identifiers b) uuid-based identifiers c) invalid identitifer in different ways. so, in this change * the default constructor generation_type is now public. * we don't check for empty generation anymore when loading SSTables or enumerating them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00
Kefu Chai	939fa087cc	sstables, replica: pass uuid_sstable_identifiers to generation generator before this change, we assume that generation is always integer based. in order to enable the UUID-based generation identifier if the related option is set, we should populate this option down to generation generator. because we don't have access to the cluster features in some places where a new generation is created, a new accessor exposing feature_service from sstable manager is added. Fixes #10459 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00
Kamil Braun	26cd3b9b78	data_dictionary: add `get_version` The `replica::database` version simply calls `get_version` on the real database. The `schema_loader` version throws `bad_function_call`.	2023-06-15 09:48:54 +02:00
Botond Dénes	a5ce2d5fb4	Merge 'Initialize `storage_proxy` early, without `messaging_service` and `gossiper`' from Kamil Braun Move the initialization of `storage_proxy` early in the startup procedure, before starting `system_keyspace`, `messaging_service`, `gossiper`, `storage_service` and more. As a follow-up, we'll be able to move initialization of `query_processor` right after `storage_proxy` (but this requires a bit of refactoring in `query_processor` too). Local queries through `storage_proxy` can be done after the early initialization step. In a follow-up, when we do a similar thing for `query_processor`, we'll be able to perform local CQL queries early as well. (Before starting `gossiper` etc.) Closes #14231 * github.com:scylladb/scylladb: main, cql_test_env: initialize `storage_proxy` early main, cql_test_env: initialize `database` early storage_proxy: rename `init_messaging_service` to `start_remote` storage_proxy: don't pass `gossiper&` and `messaging_service&` during initialization storage_proxy: prepare for missing `remote` storage_proxy: don't access `remote` during local queries in `query_partition_key_range_concurrent` db: consistency_level: remove overload of `filter_for_query` storage_proxy: don't access `remote` when calculating target replicas for local queries storage_proxy: introduce const version of `remote()` replica: table: introduce `get_my_hit_rate` storage_proxy: `endpoint_filter`: remove gossiper dependency	2023-06-14 15:37:33 +03:00
Wojciech Mitros	89b6c84b49	database: remove unused header After recent changes, all wasm related logic has been moved from the database class to the query_processor. As a result, the wasm headers no longer need to be included there, and in particular, files that include replica/database.hh no longer need to wait on the generated header rust/wasmtime_bindings.hh to compile. Fixes #14224 Closes #14223	2023-06-14 12:33:20 +03:00
Kamil Braun	2cd17819cd	replica: table: introduce `get_my_hit_rate` Doesn't require `gossiper&`.	2023-06-12 15:23:56 +02:00

1 2 3 4 5 ...

773 Commits