scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Kefu Chai	344ea25ed8	db: add fmt::format for db::consistency_level before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `db::consistency_level` * drop its `operator<<`, as it is not used anymore Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16755	2024-01-12 10:49:00 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
qiulijuan2	7fa2c33ba1	replica: remove duplicated function calling set_skip_when_empty is duplicated of metric column_family_row_hits in replica/table.cc fix: #16582 Signed-off-by: qiulijuan2<qiulijuan2_yewu@cmss.chinamobile.com> Closes scylladb/scylladb#16581	2024-01-04 15:04:31 +02:00
Tomasz Grabiec	715e062d4a	Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity In `7d5e22b43b` ("replica: memtable: don't forget memtable memory allocation statistics") we taught memtable_list to remember learned memory allocation reserves so a new memtable inherits these statistics from an older memtable. Share it now further across tablets that belong to the same table as well. This helps the statistics be more accurate for tablets that are migrated in, as they can share existing tablet's memory allocation history. Closes scylladb/scylladb#16571 * github.com:scylladb/scylladb: table, memtable: share log-structured allocator statistics across all memtables in a table memtable: consolidate _read_section, _allocating_section in a struct	2024-01-03 14:03:40 +01:00
Benny Halevy	fadcef01f5	database: setup_scylla_memory_diagnostics_producer: replace infinity sign with `unlimited` string The infinity unicode sign used for dumping read concurrency semaphore state, `∞` may be misrendered. For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log ``` Read Concurrency Semaphores: user: 0/100, 1K/9M, queued: 0 streaming: 0/10, 0B/9M, queued: 0 system: 0/10, 0B/9M, queued: 0 compaction: 0/âˆž, 0B/âˆž ``` Instead, just print the word `unlimited`. This was introduced in `34c213f9bb` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16534	2024-01-03 14:46:10 +02:00
Sylwia Szunejko	467d466f7e	put all tablet info into one field of custom_payload and update docs Previously, the tablet information was sent to the drivers in two pieces within the custom_payload. We had information about the replicas under the `tablet_replicas` key and token range information under `token_range`. These names were quite generic and might have caused problems for other custom_payload users. Additionally, dividing the information into two pieces raised the question of what to do if one key is present while the other is missing. This commit changes the serialization mechanism to pack all information under one specific name, `tablets-routing-v1`. From: Sylwia Szunejko <sylwia.szunejko@scylladb.com> Closes scylladb/scylladb#16148	2024-01-02 14:35:37 +02:00
Avi Kivity	2a76065e3d	table, memtable: share log-structured allocator statistics across all memtables in a table The log-structured allocator collects allocation statistics (which it uses to manage memory reserves) in some objects kept in memtable_table_shared_data. Right now, this object is local to memtable_list, which itself is local to a tablet replica. Move it to table scope so different tablets in the shard share the statistics. This helps a newly-migrated tablet adjust more quickly.	2023-12-26 21:24:51 +02:00
Avi Kivity	02111d6754	memtable: consolidate _read_section, _allocating_section in a struct Those two members are passed from memtable_list to memtable. Since we wish to pass them from table, it becomes awkward to pass them as two separate variables as their contents are specific to memtable internals. Wrap them in a name that indicates their role (being table-wide shared data for memtables) and pass them as a unit.	2023-12-26 21:11:48 +02:00
Avi Kivity	a7efaca878	Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values. This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate. Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches * schema_tables: Relax extract_scylla_specific_ks_info() check * locator,schema: Move initial tablets from r.s. options to params refs: #16319 refs: #16364 Closes scylladb/scylladb#16555 * github.com:scylladb/scylladb: test: Add sanity tests for tablets initialization and altering locator,schema: Move initial tablets from r.s. options to params schema_tables: Relax extract_scylla_specific_ks_info() check locator: Keep optional initial_tablets on r.s. params ks_prop_defs: Add initial_tablets& arg to prepare_options() keyspace_metadata: Carry optional<initial_tablets> on board locator: Pass abstract_replication_strategy& into validate_tablet_options() locator: Carry r.s. params into process_tablet_options() locator: Call create_replication_strategy() with r.s. params locator: Wrap replication_strategy_config_options into replication_strategy_params locator: Use local members in ..._replication_strategy constructors	2023-12-25 17:44:10 +02:00
Pavel Emelyanov	562fcf0c19	locator: Keep optional initial_tablets on r.s. params Now all the callers have it at hands (spoiler: not yet initialized, but still) so the params can also have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:02:41 +03:00
Pavel Emelyanov	a67c535539	keyspace_metadata: Carry optional<initial_tablets> on board The object in question fully describes the keyspace to be created and, among other things, contains replication strategy options. Next patches move the "initial_tablets" option out of those options and keep it separately, so the ks metadata should also carry this option separately. This patch is _just_ extending the metadata creation API, in fact the new field is unused (write-only) so all the places that need to provide this data keep it disengaged and are explicitly marked with FIXME comment. Next patches will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:58:05 +03:00
Pavel Emelyanov	a943bd927b	locator: Call create_replication_strategy() with r.s. params Previous patch added params to r.s. classes' constructors, but callers don't construct those directly, instead they use the create_r.s.() wrapper. This patch adds params to the wrapper too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:54:59 +03:00
Pavel Emelyanov	f621afa3ec	database: Copy storage options too when updating keyspace metadata When altering a keyspace several keyspace_metadata objects are created along the way. The last one, that is then kept on the keyspace_metadata object, forgets to get its copy of storage options thus transparently converting to LOCAL type. The bug surfaces itself when altering replication strategy class for S3-backed storage -- the 2nd attempt fails, because after the 1st one the keyspace_metadata gets LOCAL storage options and changing storage options is not allowed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16524	2023-12-25 13:31:15 +02:00
Nadav Har'El	79011eeb24	Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity Both virtual tables and schema registry contain thread_local caches that are destroyed at thread exit. after a Seastar change[1], these destructions can happen after the reactor is destroyed, triggering a use-after-free. Fix by scoping the destruction so it takes place earlier. [1] `101b245ed7` Closes scylladb/scylladb#16510 * github.com:scylladb/scylladb: schema_registry, database: flush entries when no longer in use virtual_tables: scope virtual tables registry in system_keyspace	2023-12-21 17:10:25 +02:00
Avi Kivity	c00b376a3e	schema_registry, database: flush entries when no longer in use The schema registry disarms internal timers when it is destroyed. This accesses the Seastar reactor. However, after [1] we don't have ordering between the reactor destruction and the thread_local registry destruction. Fix this by flushing all entries when the database is destroyed. The database object is fundamental so it's unlikely we'll have anything using the registry after it's gone. [1] `101b245ed7`	2023-12-21 17:00:41 +02:00
Kefu Chai	6018e0fea7	database: log when done with truncating truncating is an unusual operation, and we write a logging message when the truncate op starts with INFO level, it would be great if we can have a matching logging messge indicating the end of truncate on the server side. this would help with investigation the TRUNCATE timeout spotted on the client. at least we can rule out the problem happening we server is performing truncate. Refs #15610 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16247	2023-12-21 13:59:09 +02:00
Raphael S. Carvalho	5e55954f27	replica: Make the storage snapshot survive concurrent compactions Consider this: 1) file streaming takes storage snapshot = list of sstables 2) concurrent compaction unlink some of those sstables from file system 3) file streaming tries to send unlinked sstables, but files other than data and index cannot be read as only data and index have file descriptors opened To fix it, the snapshot now returns a set of files, one per sstable component, for each sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16476	2023-12-21 12:50:28 +02:00
Raphael S. Carvalho	5fa69b8a67	replica: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:23:22 -03:00
Raphael S. Carvalho	8a9784d29c	replica: Kill unused calculate_disk_space_used_for() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:22:19 -03:00
Raphael S. Carvalho	546b31846a	replica: Introduce storage group splitting This introduces the ability to split a storage group. The main compaction group is split into left and right groups. set_split() is used to set the storage group to splitting mode, which will create left and right compaction groups. Incoming writes will now be placed into memtable of either left or right groups. split() is used to complete the splitting of a group. It only returns when all preexisting data is split. That means main compaction group will be empty and all the data will be stored in either left or right group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 12:02:01 -03:00
Raphael S. Carvalho	3c5b00ea04	replica: Add storage_group::memtable_count() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	e5a9299696	replica: Add compaction_group::empty() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	213b2f1382	replica: Rename compaction_group_manager to storage_group_manager That's to reflect the fact that the manager now works with storage groups instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	15de1cdcbc	replica: Introduce concept of storage group Storage group is the storage of tablets. This new concept is helpful for tablet splitting, where the storage of tablet will be split in multiple compaction groups, where each can be compacted independently. The reason for not going with arena concept is that it added complexity, and it felt much more elegant to keep compaction group unchanged which at the end of the day abstracts the concept of a set of sstables that can be compacted and operated independently. When splitting, the storage group for a tablet may therefore own multiple compaction groups, left, right, and main, where main keeps the data that needs splitting. When splitting completes, only left and right compaction groups will be populated. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	55bcfba4de	replica: Allow uncompacted SSTables to be moved into a new set With off-strategy, we allow sstables to be moved into a new sstable set even if they didn't undergo reshape compaction. That's done by specifying a sstable is present both in input and output, with the completion desc. We want to do the same with other compaction types. Think for example of split compaction: compaction manager may decide a sstable doesn't need splitting, yet it wants that sstable to be moved into a new sstable set. Theoretically, we could introduce new code to do this movement, but more code means increased maintenance burden and higher chances of bugs. It makes sense to reuse the compaction completion path, as we do today with off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Avi Kivity	7fce057cda	database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_sempaphore are triplicated: each metrics is registered for streaming, user, and system classes. To fix, just move the metrics registration from database to reader_concurrency_sempaphore, so each reader_concurrency_sempaphore instantiated will register its metrics (if its creator asked for it). Adjust the names given to reader_concurrency_sempaphore so we don't change the labels. scylla-gdb is adjusted to support the new names.	2023-12-13 09:16:18 -05:00
Botond Dénes	e1b30f50be	reader_concurrency_semaphore: add register_metrics constructor parameter To be used in the next patch to control whether the semaphore registers and exports metrics or not. We want to move metric registration to the semaphore but we don't want all semaphores to export metrics. The decision on whether a semaphore should or shouldn't export metrics should be made on a case-by-case basis so this new parameter has no default value (except for the for_tests constructor).	2023-12-13 06:25:45 -05:00
Avi Kivity	814f3eb6b5	sstables: name sstables_manager Soon, the reader_concurrency_semaphore will require a unique and meaningful name in order to label its metrics. To prepare for that, name sstable_manager instances. This will be used to generate a name for sstable_manager's reader_concurrency_semaphore.	2023-12-13 04:40:33 -05:00
Benny Halevy	cddcf3ad0c	table: add table_holder and hold method A smart pointer that guards the table object while it's being used by async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:43:49 +02:00
Benny Halevy	c8768f9102	table: stop: allow compactions to be stopped while closing async_gate To make sure a table object is kept valid throughout the lifetime of compaction a following patch will enter the table's _async_gate when the compaction task starts. This change defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:31:50 +02:00
Nadav Har'El	12f0007ede	Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead. Closes scylladb/scylladb#16220 * github.com:scylladb/scylladb: database: Do not auto snapshot non-local storages' tables database: Simplify snapshot booleans in truncate_table_on_all_shards()	2023-12-11 12:13:48 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Pavel Emelyanov	3eaadfcd4a	database: Do not auto snapshot non-local storages' tables Snapshotting is not yet supported for those (see #13025) and auto-snapshot would step on internal error. Skip it and print a warning into logs fixes #16078 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:47:12 +03:00
Pavel Emelyanov	44c076472c	database: Simplify snapshot booleans in truncate_table_on_all_shards() There are three of them in this function -- with_snapshot argument, auto_snapshot local copy of db::config option and the should_snapshot local variable that's && of the above two. The code can go with just one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:06:28 +03:00
Asias He	67cfa12c7d	compaction_group_for_token: Handle minimum_token and maximum_token token The following error was seen: [shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum token,-6917529027641081857] does not contain token=minimum token Since minimum_token or maximum_token will not be inside a token range. Skip the in token range check.	2023-12-07 14:54:12 +08:00
Tomasz Grabiec	7a59acf248	tablets: Fail gracefully when migrating tablet has no pending replica Before the patch we SIGSEGV trying to access pending replica in this case. Fail early instead.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Avi Kivity	12f160045b	Merge 'Get rid of fb_utilities' from Benny Halevy utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address. As part of the effort to get rid of all global state, this series gets rid of fb_utilities. This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address. Closes scylladb/scylladb#16250 * github.com:scylladb/scylladb: treewide: get rid of now unused fb_utilities tracing: use locator::topology rather than fb_utilities streaming: use locator::topology rather than fb_utilities raft: use locator::topology/messaging rather than fb_utilities storage_service: use locator::topology rather than fb_utilities storage_proxy: use locator::topology rather than fb_utilities service_level_controller: use locator::topology rather than fb_utilities misc_services: use locator::topology rather than fb_utilities migration_manager: use messaging rather than fb_utilities forward_service: use messaging rather than fb_utilities messaging_service: accept broadcast_addr in config rather than via fb_utilities messaging_service: move listen_address and port getters inline test: manual: modernize message test table: use gossiper rather than fb_utilities repair: use locator::topology rather than fb_utilities dht/range_streamer: use locator::topology rather than fb_utilities db/view: use locator::topology rather than fb_utilities database: use locator::topology rather than fb_utilities db/system_keyspace: use topology via db rather than fb_utilities db/system_keyspace: save_local_info: get broadcast addresses from caller db/hints/manager: use locator::topology rather than fb_utilities db/consistency_level: use locator::topology rather than fb_utilities api: use locator::topology rather than fb_utilities alternator: ttl: use locator::topology rather than fb_utilities gossiper: use locator::topology rather than fb_utilities gossiper: add get_this_endpoint_state_ptr test: lib: cql_test_env: pass broadcast_address in cql_test_config init: get_seeds_from_db_config: accept broadcast_address locator: replication strategies: use locator::topology rather than fb_utilities locator: topology: add helpers to retrieve this host_id and address snitch: pass broadcast_address in snitch_config snitch: add optional get_broadcast_address method locator: ec2_multi_region_snitch: keep local public address as member ec2_multi_region_snitch: reindent load_config ec2_multi_region_snitch: coroutinize load_config ec2_snitch: reindent load_config ec2_snitch: coroutinize load_config thrift: thrift_validation: use std::numeric_limits rather than fb_utilities	2023-12-05 19:40:14 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Tomasz Grabiec	2d4cd9c574	tablets: Fix topology_metadata_guard holding on to the old erm Since abort callbacks are fired synchronously, we must change the table's erm before we do that so that the callbacks obtain the new erm. Otherwise, we will block barriers.	2023-12-05 14:09:34 +01:00
Pavel Emelyanov	9bbbe7a99f	discard_sstables: Atomically delete all sstables When collected sstables are deleted each is passed into sstables_manager.delete_atomically(). For on-disk sstables this creates a deletion log for each removed stable, which is quite an overkill. The atomic deletion callback already accepts vector of shared sstables, so it's simpler (and a bit faster) to remove them all in a batch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:14:23 +03:00
Pavel Emelyanov	96bc530a57	discard_sstables: Indentation and formatting fix after previous patch By "formatting" fix I mean -- remove the temporary on-stack references that were left for the ease of patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	6d135fea43	discard_sstable: Open-code local prune() lambda The lambda in question was the struct pruner method and was left there for the ease of patching. Now, when this lambda is only called once inside the function it is declared in, it can be open-coded into the place where it's called Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	68cb2e66fc	discard_sstables: Do not allocate pruner This allocation remained from the pre-coroutine times of the method. Now the contents of prumer -- refernce on table, vector and replay_position can reside on coroutine frame Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Benny Halevy	f9acc90926	table: use gossiper rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:43:47 +02:00
Benny Halevy	f40bb7c583	database: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Patryk Jędrzejczak	c8ee7d4499	db: make schema commitlog feature mandatory Using consistent cluster management and not using schema commitlog ends with a bad configuration throw during bootstrap. Soon, we will make consistent cluster management mandatory. This forces us to also make schema commitlog mandatory, which we do in this patch. A booting node decides to use schema commitlog if at least one of the two statements below is true: - the node has `force_schema_commitlog=true` config, - the node knows that the cluster supports the `SCHEMA_COMMITLOG` cluster feature. The `SCHEMA_COMMITLOG` cluster feature has been added in version 5.1. This patch is supposed to be a part of version 6.0. We don't support a direct upgrade from 5.1 to 6.0 because it skips two versions - 5.2 and 5.4. So, in a supported upgrade we can assume that the version which we upgrade from has schema commitlog. This means that we don't need to check the `SCHEMA_COMMITLOG` feature during an upgrade. The reasoning above also applies to Scylla Enterprise. Version 2024.2 will be based on 6.0. Probably, we will only support an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even if we support an upgrade from 2023.x, this patch won't break anything because 2023.1 is based on 5.2, which has schema commitlog. Upgrades from 2022.x definitely won't be supported. When we populate a new cluster, we can use the `force_schema_commitlog=true` config to use schema commitlog unconditionally. Then, the cluster feature check is irrelevant. This check could fail because we initiate schema commitlog before we learn about the features. The `force_schema_commitlog=true` config is especially useful when we want to use consistent cluster management. Failing feature checks would lead to crashes during initial bootstraps. Moreover, there is no point in creating a new cluster with `consistent_cluster_management=true` and `force_schema_commitlog=false`. It would just cause some initial bootstraps to fail, and after successful restarts, the result would be the same as if we used `force_schema_commitlog=true` from the start. In conclusion, we can unconditionally use schema commitlog without any checks in 6.0 because we can always safely upgrade a cluster and start a new cluster. Apart from making schema commitlog mandatory, this patch adds two changes that are its consequences: - making the unneeded `force_schema_commitlog` config unused, - deprecating the `SCHEMA_COMMITLOG` feature, which is always assumed to be true. Closes scylladb/scylladb#16254	2023-12-04 21:02:16 +02:00
Nadav Har'El	4505a86f46	tablets, mv: fix base-view pairing to consider base replication map In the view update code, the function get_view_natural_endpoint() determines which view replica this base replica should send an update to. It currently gets the view table's replication map (i.e., the map from view tokens to lists of replicas holding the token), but assumes that this is also the base table's replication map. This assumption was true with vnodes, but is no longer true with tablets - the base table's replication map can be completely different from the view table's. By looking at the wrong mapping, get_view_natural_endpoint() can believe that this node isn't really a base-replica and drop the view update. Alternatively, it can think it is a base replica - but use the wrong base-view pairing and create base-view inconsistencies. This patch solves this bug - get_view_natural_endpoint() now gets two separate replication maps - the base's and the view's. The callers need to remember what the base table was (in some cases they didn't care at the point of the call), and pass it to the function call. This patch also includes a simple test that reproduces the bug, and confirms it is fixed: The test has a 6-node cluster using tablets and a base table with RF=1, and writes one row to it. Before this patch, the code usually gets confused, thinking the base replica isn't a replica and loses the view update. With this patch, the view update works. Fixes #16227. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16228	2023-12-04 16:38:54 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00

1 2 3 4 5 ...

999 Commits