scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Michael Litvak	9172cc172e	schema: add logstor cf property add a schema property for tables with logstor storage	2026-03-18 19:24:26 +01:00
Botond Dénes	475220b9c9	Merge 'Remove the rest of pre raft topology code' from Gleb Natapov Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more. No need to backport since we remove functionality here. Closes scylladb/scylladb#28841 * github.com:scylladb/scylladb: service level: remove version 1 service level code features: move GROUP0_SCHEMA_VERSIONING to deprecated features list migration_manager: remove unused forward definitions test: remove unused code auth: drop auth_migration_listener since it does nothing now schema: drop schema_registry_entry::maybe_sync() function schema: drop make_table_deleting_mutations since it should not be needed with raft schema: remove calculate_schema_digest function schema: drop recalculate_schema_version function and its uses migration_manager: drop check for group0_schema_versioning feature cdc: drop usage of cdc_local table and v1 generation definition storage_service: no need to add yourself to the topology during reboot since raft state loading already did it storage_service: remove unused functions group0: drop with_raft() function from group0_guard since it always returns true now gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more gossiper: drop tokens from loaded_endpoint_state gossiper: remove unused functions storage_service: do not pass loaded_peer_features to join_topology() storage_service: remove unused fields from replacement_info gossiper: drop is_safe_for_restart() function and its use storage_service: remove unused variables from join_topology gossiper: remove the code that was only used in gossiper topology storage_service: drop the check for raft mode from recovery code cdc: remove legacy code test: remove unused injection points auth: remove legacy auth mode and upgrade code treewide: remove schema pull code since we never pull schema any more raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer group0: hoist the checks for an illegal upgrade into main.cc api: drop get_topology_upgrade_state and always report upgrade status as done service_level_controller: drop service level upgrade code test: drop run_with_raft_recovery parameter to cql_test_env group0: get rid of group0_upgrade_state storage_service: drop topology_change_kind as it is no longer needed storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more service_storage: remove unused functions storage_service: remove non raft rebuild code storage_service: set topology change kind only once group0: drop in_recovery function and its uses group0: rename use_raft to maintenance_mode and make it sync	2026-03-11 10:24:20 +02:00
Dario Mirovic	f72081194c	db: use prefix tombstones in DROP TABLE schema mutations When dropping a table, make_drop_table_or_view_mutations() creates a point tombstone in system_schema.columns for every column in the table. The clustering key of system_schema.columns is (table_name, column_name). A clustering key with only the table_name component acts as a prefix tombstone. That tombstone covers all columns belonging to that table. This approach is already used by make_table_deleting_mutations() during CREATE TABLE. Apply the same prefix tombstone approach to DROP TABLE for the columns, view_virtual_columns, computed_columns, and dropped_columns schema tables. This reduces tombstone accumulation in schema table sstables. In test_max_cells test case, which repeatedly creates and drops a table with 32768 columns, overall test time improved from ~180s to ~157s, which is ~12.7% improvement. Refs SCYLLADB-815 Closes scylladb/scylladb#28976	2026-03-10 11:59:00 +01:00
Gleb Natapov	b633ec1779	features: move GROUP0_SCHEMA_VERSIONING to deprecated features list	2026-03-10 10:46:48 +02:00
Gleb Natapov	b9f3281af6	schema: drop make_table_deleting_mutations since it should not be needed with raft Also remove the test since it is no longer relevant	2026-03-10 10:46:47 +02:00
Gleb Natapov	f76199e5c2	schema: remove calculate_schema_digest function It is used by the test only, so remove the test and its data as well.	2026-03-10 10:46:47 +02:00
Gleb Natapov	08e33ad7f7	schema: drop recalculate_schema_version function and its uses There is no need to recalculate schema version any more since it is set by group0.	2026-03-10 10:46:39 +02:00
Patryk Jędrzejczak	4c8dba15f1	Merge 'strong_consistency/state_machine: ensure and upgrade mutations schema' from Michał Jadwiszczak This patch fixes 2 issues within strong consistency state machine: - it might happen that apply is called before the schema is delivered to the node - on the other hand, the apply may be called after the schema was changed and purged from the schema registry The first problem is fixed by doing `group0.read_barrier()` before applying the mutations. The second one is solved by upgrading the mutations using column mappings in case the version of the mutations' schema is older. Fixes SCYLLADB-428 Strong consistency is in experimental phase, no need to backport. Closes scylladb/scylladb#28546 * https://github.com/scylladb/scylladb: test/cluster/test_strong_consistency: add reproducer for old schema during apply test/cluster/test_strong_consistency: add reproducer for missing schema during apply test/cluster/test_strong_consistency: extract common function raft_group_registry: allow to drop append entries requests for specific raft group strong_consistency/state_machine: find and hold schemas of applying mutations strong_consistency/state_machine: pull necessary dependencies db/schema_tables: add `get_column_mapping_if_exists()`	2026-03-09 09:49:22 +01:00
Michał Jadwiszczak	d25be9e389	db/schema_tables: add `get_column_mapping_if_exists()` In scenarios where we want to firsty check if a column mapping exists and if we don't want do flow control with exception, it is very wasteful to do ``` if (column_mapping_exists()) { get_column_mapping(); } ``` especially in a hot path like `state_machine::apply()` becase this will execute 2 internal queries. This commit introduces `get_column_mapping_if_exists()` function, which simply wrapps result of `get_column_mapping()` in optional and doesn't throw an exception if the mapping doesn't exist.	2026-03-05 11:55:57 +01:00
Aleksandra Martyniuk	6b3b174704	db: schema: remove set_is_group0_table param set_is_group0_table takes an enabled flag, based on which it decides whether it's a group0 table. The method is called only with enabled = true. Drop the param. For not group0 tables nothing should be set.	2026-03-04 17:24:34 +01:00
Nikos Dragazis	d5ec66bc0c	schema: Generalize static configurators into schema initializers Extend the `static_configurator` mechanism to support initialization of arbitrary schema properties, not only static ones, by passing a `schema_builder` reference to the configurator interface. As part of this change, rename `static_configurator` to `schema_initializer` to better reflect its broader responsibility. Add a checkpoint/restore mechanism to allow de-registering an initializer (useful for testing; will be used in the next patch). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-01-13 20:45:59 +02:00
Tomasz Grabiec	d4014b7970	Drop legacy schema support We switched to using v3 schema tables (in system_schema keyspace) in 2017, in `9eb91bc30b`. So no system should have the old schema any more. No need to run legacy_schema_migrator on boot. Closes scylladb/scylladb#27420	2025-12-07 00:09:13 +02:00
Piotr Dulikowski	2e5eb92f21	Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak When generating CDC log mutations for some base mutation, use a CDC schema that is compatible with the base schema. The compatible CDC schema has for every base column a corresponding CDC column with the same name. If using a non-compatible schema, we may encounter a situation, especially during ALTER, that we have a mutation with a base column set with some value, but the CDC schema doesn't have a column by that name. This would cause the user request to fail with an error. We add to the schema object a schema_ptr that for CDC-enabled tables points to the schema object of the CDC table that is compatible with the schema. It is set by the schema merge algorithm when creating the schema for a table that is created or altered. We use the fact that a base table and its CDC table are created and altered in the same group0 operation, and this way we can find and set the cdc schema for a base table. When transporting the base schema as a frozen schema between shards, we transport with it the frozen cdc schema as well. The patch starts with a series of refactoring commits that make extending the frozen schema easier and cleans up some duplication in the code about the frozen schema. We combine the two types `frozen_schema_with_base_info` and `view_schema_and_base_info` to a single type `extended_frozen_schema` that holds a frozen schema with additional data that is not part of the schema mutations but needs to be transported with it to unfreeze it - base_info, and the frozen cdc schema which is added in a later commit. Fixes https://github.com/scylladb/scylladb/issues/26405 backport not needed - enhancement Closes scylladb/scylladb#24960 * github.com:scylladb/scylladb: test: cdc: test cdc compatible schema cdc: use compatiable cdc schema db: schema_applier: create schema with pointer to CDC schema db: schema_applier: extract cdc tables schema: add pointer to CDC schema schema_registry: remove base_info from global_schema_ptr schema_registry: use extended_frozen_schema in schema load schema_registry: replace frozen_schema+base_info with extended_frozen_schema frozen_schema: extract info from schema_ptr in the constructor frozen_schema: rename frozen_schema_with_base_info to extended_frozen_schema	2025-11-13 10:11:54 +01:00
Botond Dénes	f3cec5f11a	Merge 'index: Set tombstone_gc when creating underlying view' from Dawid Mędrek Before this commit, when the underlying materialized view was created, it didn't have the property `tombstone_gc` set to any value. We fix the bug in this PR. Implementation strategy: 1. Move code responsible for producing the schema of a secondary index to the file that handles `CREATE INDEX`. 2. Set the property when creating the view. 3. Add reproducer tests. Fixes scylladb/scylladb#26542 Backport: we can discuss it. Closes scylladb/scylladb#26543 * github.com:scylladb/scylladb: index: Set tombstone_gc when creating secondary index index: Make `create_view_for_index` method of `create_index_statement` index: Move code for creating MV of secondary index to cql3 db, cql3: Move creation of underlying MV for index	2025-10-28 14:42:42 +02:00
Michael Litvak	6e2513c4d2	db: schema_applier: create schema with pointer to CDC schema When creating a schema for a non-CDC table in the schema_applier, find its CDC schema that we created previously in the same operation, if any, and create the schema with a pointer to the CDC schema. We use the fact that for a base table with CDC enabled, its CDC schema is created or altered together in the same group0 operation. Similarly, in schema_tables, when creating table schemas from the schema tables, first create all schemas that don't have CDC enabled, then create schemas that have CDC enabled by extending them with the pointer to the CDC schema that we created before. There are few additional cases where we create schemas that we need to consider how to handle. When loading a schema from schema tables in the schema_loader we decide not to set the CDC schema, because this schema is mostly used for tools and it's not used for generating CDC mutations. When transporting a schema by RPC in the migration manager, we don't transport its CDC schema, and we always set it to null. Because we use raft we expect this shouldn't have any effect, because the schema is synchronized through raft and not through the RPC.	2025-10-21 14:13:43 +02:00
Michael Litvak	ac96e40f13	schema: add pointer to CDC schema Add to the schema object a member that points to the CDC schema object that is compatible with this schema, if any. The compatible CDC schema is created and altered with its base schema in the same group0 operation. When generating CDC log mutations for some base mutation we want them to be created using a compatible schema thas has a CDC column corresponding to each base column. This change will allow us to find the right CDC schema given a base mutation. We also update the relevant structures in the schema registry that are related to learning about schemas and transporting schemas across shards or nodes. When transporting a schema as frozen_schema, we need to transport the frozen cdc schema as well, and set it again when unfreezing and reconstructing the schema. When adding a schema to the registry, we need to ensure its CDC schema is added to the registry as well. Currently we always set the CDC schema to nullptr and maintain the previous behavior. We will change it in a later commit. Until then, we mark all places where CDC schema is passed clearly so we don't forget it.	2025-10-21 14:13:43 +02:00
Tomasz Grabiec	ba692d1805	schema_tables: Keep "replication" column backwards-compatible by expanding rack lists to numeric RF In `380f243986` we added support for rack lists in replication options. Drivers which are not prepared to parse that (as of now, all of them), will not create metadata object for that keyspace. This breaks, for example, the "copy to/from" cqlsh command. Potentially other things too. To fix that, keep the "replication" column in the old format, and store numeric RF there, which corresponds to the number of replicas. Accurate options in the new format are put in "replication_v2". We set replication_v2 in the schema only when it differs from the old "replication" so that the new column is not set during upgrade, otherwise downgrade would fail. Partition tombstone is added to ensure that pre-alter replication_v2 value is deleted on alters which change replication to a value which is the same as the post-alter "replication" value. Fixes #26415 Closes scylladb/scylladb#26429	2025-10-21 09:11:25 +03:00
Dawid Mędrek	20761b5f13	db, cql3: Move creation of underlying MV for index The main goal of this patch is to give more control over the creation of the underlying view on an index to `create_index_statement.cc`. That goal is in line with how the other statements are executed: the schema is built in the cql3 module and only the ready schema_ptr is passed further. That should also make the code cleaner and easier to understand. There are a few important things to note here: * A call to `service::prepare_new_view_announcement` appears out of nowhere. Aside from some validation checks and logging, that function does pretty much the same as the pre-existing code we remove: a. It creates Raft mutations based on the passed `view_ptr`. b. It creates Raft mutations responsible for view building tasks. c. It notifies about a new column family. * We seemingly get rid of the code that creates view building tasks. That's not true: we still do that via `service::prepare_new_view_announcement`. That should explain why the change doesn't remove any relevant logic. On the other hand, it might be more difficult to explain why moving the code is correct. I'll touch on it below. Before that, it may also be important to highlight that this commit only affects the logic responsible for creating an index. There should be no effect on any other part of how Scylla behaves. --- Proving the correctness of the solution would take quite a lot of space, so I'll only summarize it. It relies on a few things: 1. Two schema changes cannot happen in one operation. We allow for more but only when those changes are dependent on each other and when the additional ones are internal for Scylla, e.g. creating an index leads to creating the underlying materialized view. 2. There are no entities or components that rely on indexes. 3. Each index is uniquely defined by the keyspace it belongs to and the name of the index. 4. There is a bijection between rows in `system_schema.indexes` and the currently existing indexes. 5. The name of an unnamed index depends on the name of the base table and the names of the indexed columns. The name of an unnamed index may have a number attached to it, but that number only depends on the state of the schema at the time of creation of the index, and it never changes later on. There are no other things the name of an unnamed index depends on. 6. Scylla doesn't allow for changing any column in the base table that has an index depending on it. Based on that, we conclude that every existing index has exactly one entry in `system_schema.indexes`, and the primary key of that entry never changes. The columns of `system_schema.indexes` that are not part of the primary key are: `kind` and `options`. Both values are only decided at the time of creation of an index, and currently there's no way to modify them. That implies that there are only two events when an entry in the system table can change: when creating an index and when dropping an index. --- When we consider the previous place of the logic that this commit moves to `cql3/statements/create_index_statement.cc`, it works like this: 1. We compare the sets of indexes defined on a specific table (in the form of a structure called `index_metadata`) before and after an operation. 2. We divide the entries into three sets: those present in both sets and those present in only one of them. 3. We handle each of those three sets separately. The structure `index_metadata` is a reflection of entries in `system_schema.indexes`. It stores one more parameter -- `local` -- but its value depends on the other values of an entry, so we can ignore it in this reasoning. Because an index cannot be modified -- it can only be created or dropped -- there are at most two non-empty sets: the set of new indexes and the set of dropped indexes. Those sets are only non-empty during an operation like `CREATE INDEX`, `DROP INDEX`, `DROP TABLE (base table)`, `DROP KEYSPACE`. Note that it's impossible to drop an index by dropping the underlying materialized view -- Scylla doesn't allow for that. However, the code in `migration_manager.cc` we call (`prepare_column_family_update_announcement`) and the code that we call in `schema_tables.cc` (`make_update_table_mutations`) is only triggered by updates related to the base table. In the context of `DROP TABLE` or `DROP KEYSPACE`, we'd call `prepare_column_family_drop_announcement` instead. In other words, we're only concerned with `CREATE INDEX` and `DROP INDEX`. --- A conclusion from this reasoning is that we only need to consider those two situations when talking about correctness of this change. The impact of this commit is that we may have potentially reordered mutations in the resulting vector that will be applied to the Raft log. The only mutations we may have reordered are the mutations responsible for creating the underlying view and the mutations responsible for updating columns in the base table. It's clear then that this commit brings no change at all: we only give `cql3/statements/create_index_statement.cc` more control over creating the underlying view. --- We leave a remnant of the code in `db/schema_tables.cc` responsible for dropping an index along with its underlying view. It would require changing a bit more of the logic, and we don't need it for the rest of this sequence of changes. Refs scylladb/scylladb#16454	2025-10-20 14:04:06 +02:00
Tomasz Grabiec	c4a87453a2	Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov The series adds an experimental flag for strongly consistent tables and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace. Closes scylladb/scylladb#26116 * github.com:scylladb/scylladb: schema: Allow configuring consistency setting for a keyspace db: experimental consistent-tablets option	2025-10-16 21:48:06 +02:00
Gleb Natapov	c255740989	schema: Allow configuring consistency setting for a keyspace We want to add strongly consistent tables as an option. We will have two kind of strongly consistent tables: globally consistent and locally consistent. The former means that requests from all DCs will be globally linearisable while the later - only requests to the same DCs will be linearisable. To allow configuring all the possibilities the patch adds new parameter to a keyspace definition "consistency" that can be configured to be `eventual`, `global` or `local`. Non eventual setting is supported for tablets enabled keyspaces only. Since we want to start with implementing local consistency configuring global consistency will result in an error for now.	2025-10-16 13:34:49 +03:00
Marcin Maliszkiewicz	209563f478	db: remove unused proxy from create_keyspace_metadata	2025-10-14 10:56:25 +02:00
Tomasz Grabiec	66755db062	locator, cql3: Support rack lists in replication options Allows per-DC replication factor to be either a string, holding a numerical value, or a list of strings, holding a list of rack names. The rack list is not respected yet by the tablet allocator, this is achieved in subsequent commit. This changes the format of options stored in the flattened map in system_schema.keyspaces#replication. Values which are rack lists, are converted into multiple entries, with the list index appended to the key with ':' as the separator: For example, this extended map: { 'dc1': '3', 'dc2': ['rack1', 'rack2'] } is stored as a flattened map: { 'dc1': '3', 'dc2:0': 'rack1', 'dc2:1': 'rack2' } Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>	2025-10-02 19:42:39 +02:00
Tomasz Grabiec	91e51a5dd1	cql3, locator: Use type aliases for option maps In preparation for changing their structure. 1) std::map<sstring, sstring> -> replication_strategy_config_options Parsed options. Values will become std::variant<sstring, rack_list> 2) std::map<sstring, sstring> -> property_definitions::map_type Flattened map of options, as stored system tables.	2025-10-01 16:06:51 +02:00
Benny Halevy	1ceb49f6c1	schema_tables: convert_schema_to_mutations: simplify check for system keyspace Currently, the function unfreezes each schema mutation partition and then checks if it's for a system keyspace. This isn't really needed since we can check the partition key using the frozen_mutation, skip it if the partition is for a system keyspace. Note that the constructed partition_key just copies the frozen partition_key_view, without copying or deserializing the actual key contents. Also, reserve `results` capacity using the queried partitions' size to prevent reallocations of the results vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-09-30 17:15:41 +03:00
Botond Dénes	86ed627fc4	compaction: move code to namespace compaction The namespace usage in this directory is very inconsistent, with files and classes scattered in: * global namespace * namespace compaction * namespace sstables With cases, where all three used in the same file. This code used to live in sstables/ and some of it still retains namespace sstables as a heritage of that time. The mismatch between the dir (future module) and the namespace used is confusing, so finish the migration and move all code in compaction/ to namespace compaction too. This patch, although large, is mechanic and only the following kind of changes are made: * replace namespace sstable {} with namespace compaction {} * add namespace compaction {} * drop/add sstables:: * drop/add compaction:: * move around forward-declarations so they are in the correct namespace context This refactoring revealed some awkward leftover coupling between sstables and compaction, in sstables/sstable_set.cc, where the make_sstable_set() methods of compaction strategies are implemented.	2025-09-25 15:03:56 +03:00
Pavel Emelyanov	a1ea553fe1	code: Replace distributed<> with sharded<> The latter is recommended in seastar, and the former was left as compatibility alias. Latest seastar explicitly marks it as deprecated so once the submodule is updated, compilation logs will explode. Most of the patch is generated with for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]>') ; do sed -e 's/\<distributed<$[A-Za-z0-9:_]$>/sharded<\1>/g' -i $f; done for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done and a small manual change in test/perf/perf.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26136	2025-09-19 12:22:51 +02:00
Ernest Zaslavsky	a1f18a8883	treewide: Move schema related files to a `schema` directory As requested in #22111 , moved the files and fixed other includes and build system. Moved files: - frozen_schema.hh - frozen_schema.cc - schema_mutations.hh - schema_mutations.cc - column_computation.hh Fixes: #22111 Closes scylladb/scylladb#25089	2025-09-17 17:31:05 +03:00
Ernest Zaslavsky	d624413ddd	treewide: Move query related files to a new `query` directory As requested in #22120, moved the files and fixed other includes and build system. Moved files: - query.cc - query-request.hh - query-result.hh - query-result-reader.hh - query-result-set.cc - query-result-set.hh - query-result-writer.hh - query_id.hh - query_result_merger.hh Fixes: #22120 This is a cleanup, no need to backport Closes scylladb/scylladb#25105	2025-09-16 23:40:47 +03:00
Michał Jadwiszczak	6e3e287a39	db/schema_tables: create/cleanup tasks when an index is created/dropped Similarly as in previous commits, create view building tasks when an index is created and cleanup view building status when it's dropped.	2025-08-27 08:55:47 +02:00
Jan Łakomy	5fecad0ec8	cql3/statements: add `ANN OF` queries support to select statements Add parsing of `ANN OF` queries to the `select_statement` and `indexed_table_select_statement` classes. Add a placeholder for the implementation of external ANN queries. Rename `should_create_view` to `view_should_exist` as it is used not only to check if the view should be created but also if the view has been created. Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>	2025-08-01 12:08:50 +02:00
Ernest Zaslavsky	408aa289fe	treewide: Move misc files to `utils` directory As requested in #22114, moved the files and fixed other includes and build system. Moved files: - interval.hh - Map_difference.hh Fixes: #22114 This is a cleanup, no need to backport Closes scylladb/scylladb#25095	2025-07-21 11:56:40 +03:00
Nadav Har'El	04b263b51a	Merge 'vector_index: do not create a view when creating a vector index' from Michał Hudobski This PR adds a way for custom indexes to decide whether a view should be created for them, as for the vector_index the view is not needed, because we store it in the external service. To allow this, custom logic for describing indexes using custom classes was added (as it used to depend on the view corresponding to an index). Fixes: VECTOR-10 Closes scylladb/scylladb#24438 * github.com:scylladb/scylladb: custom_index: do not create view when creating a custom index custom_index: refactor describe for custom indexes custom_index: remove unneeded duplicate of a static string	2025-07-17 13:48:49 +03:00
Avi Kivity	6fce817aa8	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit(). Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Fixes https://github.com/scylladb/scylladb/issues/24531 Closes scylladb/scylladb#24886 [avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>] * github.com:scylladb/scylladb: test: add type creation to test_snapshot storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-07-13 20:47:55 +03:00
Benny Halevy	3feb759943	everywhere: use utils::chunked_vector for list of mutations Currently, we use std::vector<*mutation> to keep a list of mutations for processing. This can lead to large allocation, e.g. when the vector size is a function of the number of tables. Use a chunked vector instead to prevent oversized allocations. `perf-simple-query --smp 1` results obtained for fixed 400MHz frequency and PGO disabled: Before (read path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 89055.97 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 18003 cycles/op, 0 errors) 103372.72 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39380 insns/op, 17300 cycles/op, 0 errors) 98942.27 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39413 insns/op, 17336 cycles/op, 0 errors) 103752.93 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39407 insns/op, 17252 cycles/op, 0 errors) 102516.77 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39403 insns/op, 17288 cycles/op, 0 errors) throughput: mean= 99528.13 standard-deviation=6155.71 median= 102516.77 median-absolute-deviation=3844.59 maximum=103752.93 minimum=89055.97 instructions_per_op: mean= 39403.99 standard-deviation=14.25 median= 39406.75 median-absolute-deviation=9.30 maximum=39416.63 minimum=39380.39 cpu_cycles_per_op: mean= 17435.81 standard-deviation=318.24 median= 17300.40 median-absolute-deviation=147.59 maximum=18002.53 minimum=17251.75 ``` After (read path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 59755.04 tps ( 66.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39466 insns/op, 22834 cycles/op, 0 errors) 71854.16 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 17883 cycles/op, 0 errors) 82149.45 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39411 insns/op, 17409 cycles/op, 0 errors) 49640.04 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 19975 cycles/op, 0 errors) 54963.22 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 18235 cycles/op, 0 errors) throughput: mean= 63672.38 standard-deviation=13195.12 median= 59755.04 median-absolute-deviation=8709.16 maximum=82149.45 minimum=49640.04 instructions_per_op: mean= 39448.38 standard-deviation=31.60 median= 39466.17 median-absolute-deviation=25.75 maximum=39474.12 minimum=39411.42 cpu_cycles_per_op: mean= 19267.01 standard-deviation=2217.03 median= 18234.80 median-absolute-deviation=1384.25 maximum=22834.26 minimum=17408.67 ``` `perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency and PGO disabled: Before (write path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 63736.96 tps ( 59.4 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 49667 insns/op, 19924 cycles/op, 0 errors) 64109.41 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 49992 insns/op, 20084 cycles/op, 0 errors) 56950.47 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50005 insns/op, 20501 cycles/op, 0 errors) 44858.42 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50014 insns/op, 21947 cycles/op, 0 errors) 28592.87 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50027 insns/op, 27659 cycles/op, 0 errors) throughput: mean= 51649.63 standard-deviation=15059.74 median= 56950.47 median-absolute-deviation=12087.33 maximum=64109.41 minimum=28592.87 instructions_per_op: mean= 49941.18 standard-deviation=153.76 median= 50005.24 median-absolute-deviation=73.01 maximum=50027.07 minimum=49667.05 cpu_cycles_per_op: mean= 22023.01 standard-deviation=3249.92 median= 20500.74 median-absolute-deviation=1938.76 maximum=27658.75 minimum=19924.32 ``` After (write path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 53395.93 tps ( 59.4 allocs/op, 16.5 logallocs/op, 14.3 tasks/op, 50326 insns/op, 21252 cycles/op, 0 errors) 46527.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50704 insns/op, 21555 cycles/op, 0 errors) 55846.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50731 insns/op, 21060 cycles/op, 0 errors) 55669.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50735 insns/op, 21521 cycles/op, 0 errors) 52130.17 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50757 insns/op, 21334 cycles/op, 0 errors) throughput: mean= 52713.91 standard-deviation=3795.38 median= 53395.93 median-absolute-deviation=2955.40 maximum=55846.30 minimum=46527.83 instructions_per_op: mean= 50650.57 standard-deviation=182.46 median= 50731.38 median-absolute-deviation=84.09 maximum=50756.62 minimum=50325.87 cpu_cycles_per_op: mean= 21344.42 standard-deviation=202.86 median= 21334.00 median-absolute-deviation=176.37 maximum=21554.61 minimum=21060.24 ``` Fixes #24815 Improvement for rare corner cases. No backport required Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24919	2025-07-13 19:13:11 +03:00
Marcin Maliszkiewicz	81c3dabe06	db: make user defined types changes atomic The same order of creation/destruction is preserved as in the original code, looking from single shard point of view. create_types() is called on each shard separately, while in theory we should be able reuse results similarly as diff_rows(). But we don't introduce this optimization yet.	2025-07-10 10:46:55 +02:00
Marcin Maliszkiewicz	2e69016c4f	db: access types during schema merge via special storage Once we create types atomically the code which is before commit may depend on newly added types, so it has to access both old and new types. New storage called in_progress_types_storage was added.	2025-07-10 10:40:42 +02:00
Marcin Maliszkiewicz	ec270b0b5e	db: rename create_keyspace_from_schema_partition It only creates keyspace metadata.	2025-07-10 10:40:42 +02:00
Michał Hudobski	919cca576f	custom_index: do not create view when creating a custom index Currently we create a view for every index, however for currently supported custom index classes (vector_index) that work is redundant, as we store the index in the external service. This patch adds a way for custom indexes to choose whether to create a view when creating the index and makes it so that for vector indexes the view is not created.	2025-07-07 13:47:07 +02:00
Avi Kivity	cd79a8fc25	Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz" This reverts commit `0b516da95b`, reversing changes made to `30199552ac`. It breaks cluster.random_failures.test_random_failures.test_random_failures in debug mode (at least). Fixes #24513	2025-06-16 22:38:12 +03:00
Marcin Maliszkiewicz	b3730282c3	db: access types during schema merge via special storage Once we create types atomically the code which is before commit may depend on newly added types, so it has to access both old and new types. New storage called in_progress_types_storage was added.	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	aceb1f9659	db: rename create_keyspace_from_schema_partition It only creates keyspace metadata.	2025-05-27 20:00:58 +02:00
Avi Kivity	f195c05b0d	untyped_result_set: mark get_blob() as returning unfragmented data Blobs can be large, and unfragmented blobs can easily exceed 128k (as seen in #23903). Rename get_blob() to get_blob_unfragmented() to warn users. Note that most uses are fine as the blobs are really short strings. Closes scylladb/scylladb#24102	2025-05-26 09:40:34 +02:00
Wojciech Mitros	05fce91945	schema_registry: store base info instead of base schema for view entries In the following patch we plan to remove the base schema from the base_info to make the base_info immutable. To do that, we first prepare the schema registry for the change; we need to be able to create view schemas from frozen schemas there and frozen schemas have no information about the base table. Unless we do this change, after base schemas are removed from the base info, we'll no longer be able to load a view schema to the schema registry without looking up the base schema in the database. This change also required some updates to schema building: * we add a method for unfreezing a view schema with base info instead of a base schema * we make it possible to use schema_builder with a base info instead of a base schema * we add a method for creating a view schema from mutations with a base info instead of a base schema * we add a view_info constructor withat base info instead of a base schema * we update the naming in schema_registry to reflect the usage of base info instead of base schema	2025-04-24 01:08:39 +02:00
Wojciech Mitros	900687c818	view_info: set base info on construction Currently, the base_info may or may not be set in view schemas. Even when it's set, it may be modified. This necessitates extra checks when handling view schemas, as well as potentially causing errors when we forget to set it at some point. Instead, we want to make the base info an immutable member of view schemas (inside view_info). The first step towards that is making sure that all newly created schemas have the base info set. We achieve that by requiring a base schema when constructing a view schema. Unfortunately, this adds complexity each time we're making a view schema - we need to get the base schema as well. In most cases, the base schema is already available. The most problematic scenario is when we create a schema from mutations: - when parsing system tables we can get the schema from the database, as regular tables are parsed before views - when loading a view schema using the schema loader tool, we need to load the base additionally to the view schema, effectively doubling the work - when pulling the schema from another node - in this case we can only get the current version of the base schema from the local database Additionally, we need to consider the base schema version - when we generate view updates the version of the base schema used for reads should match the version of the base schema in view's base info. This is achieved by selecting the correct (old or new) schema in `db::schema_tables::merge_tables_and_views` and using the stored base schema in the schema_registry.	2025-04-24 01:08:39 +02:00
Avi Kivity	a62ab824e6	schema: deprecate schema_extension schema_extension allows making invisible changes to system_schema that evade upgrade rollback tests. They appear in system_schema as an encoded blob which reduces serviceability, as they cannot be read. Deprecate it and point users to adding explicit columns in scylla_tables. We could probably make use of the data structure, after we teach it to encode its payload into proper named and typed columns instead of using IDL. Closes scylladb/scylladb#23151	2025-03-19 20:36:16 +02:00
Pavel Emelyanov	0f9cc956f4	schema_tables: Remove all_table_names() Now it's unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-10 13:12:56 +03:00
Pavel Emelyanov	5a897d7368	schema_tables,client_state: Switch to using all_table_infos() There are few more places left that can use all_table_infos() as a replacement for all_table_names(), patch them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-10 13:05:59 +03:00
Pavel Emelyanov	da05765746	schema_tables: Tune up some methods to benefit from table_infos There are convert_schema_to_mutations() and calculate_schema_digest() that collect table names and then use them to find schema and query mutations from the table. Both can use the newly introduced all_table_infos() and use the returned table_id-s to do the same, thus avoiding re-lookups (which are fast anyway, but still). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-10 13:01:50 +03:00
Pavel Emelyanov	d7bfa5a545	schema_tables: Introduce all_table_infos() This method is like all_table_names(), but returns a vector of table_info-s which is effectively a pair of string name and uuid id. To be used later, and the string-returning all_table_name() will be removed very soon too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-10 12:59:03 +03:00
Kefu Chai	aa8c27b872	db: prevent accidental copies of result_set_row by making it move-only result_set_row is a heavyweight object containing multiple cell types: regular columns, partition keys, and static values. To prevent expensive accidental copies, delete the copy constructor and replace it with: 1. A move constructor for efficient vector reallocation 2. An explicit copy() method when copies are actually needed This change reduces overhead in some non-hot paths by eliminating implicit deep copies. Please note, previously, in `create_view_from_mutation()`, we kept a copy of `result_set_row`, and then reused `table_rs` for holding the mutation for `scylla_tables`. Because we don't copy the `result_set_row` in this change, in order to avoid invalidating the `row` after reusing `table_rs` in the outer scope, we define a new `table_rs` shadowing the one in the out scope. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22741	2025-02-17 09:48:08 +02:00

1 2 3 4 5 ...

593 Commits