When creating a schema for a non-CDC table in the schema_applier, find
its CDC schema that we created previously in the same operation, if any,
and create the schema with a pointer to the CDC schema.
We use the fact that for a base table with CDC enabled, its CDC schema
is created or altered together in the same group0 operation.
Similarly, in schema_tables, when creating table schemas from the
schema tables, first create all schemas that don't have CDC enabled,
then create schemas that have CDC enabled by extending them with the
pointer to the CDC schema that we created before.
There are few additional cases where we create schemas that we need to
consider how to handle.
When loading a schema from schema tables in the schema_loader we decide
not to set the CDC schema, because this schema is mostly used for tools
and it's not used for generating CDC mutations.
When transporting a schema by RPC in the migration manager, we don't
transport its CDC schema, and we always set it to null. Because we use
raft we expect this shouldn't have any effect, because the schema is
synchronized through raft and not through the RPC.
Previously in the schema applier we have two maps of schema_mutations,
for tables and for views. Now create another map for CDC tables by
extracting them from the non-views tables map.
We maintain the previous behavior by applying each operation that's done
on the tables map, to the CDC map as well.
Later we will want to handle CDC and non-CDC tables differently. We want
to be able to create all CDC schemas first, so when we create the
non-CDC tables we can create them with a pointer to their CDC schemas.
Add to the schema object a member that points to the CDC schema object
that is compatible with this schema, if any.
The compatible CDC schema is created and altered with its base schema in
the same group0 operation.
When generating CDC log mutations for some base mutation we want them to
be created using a compatible schema thas has a CDC column corresponding
to each base column. This change will allow us to find the right CDC
schema given a base mutation.
We also update the relevant structures in the schema registry that are
related to learning about schemas and transporting schemas across
shards or nodes.
When transporting a schema as frozen_schema, we need to transport the
frozen cdc schema as well, and set it again when unfreezing and
reconstructing the schema.
When adding a schema to the registry, we need to ensure its CDC schema
is added to the registry as well.
Currently we always set the CDC schema to nullptr and maintain the
previous behavior. We will change it in a later commit. Until then, we
mark all places where CDC schema is passed clearly so we don't forget
it.
This commit starts a series of refactoring commits of the frozen_schema
to reduce duplication and make it easier to extend.
Currently there are two essentially identical types,
frozen_schema_with_base_info and view_schema_and_base_info in the
schema_registry that hold a frozen_schema together with a base_info for
view schemas.
Their role is to pass around a frozen schema together with additional
info that is extracted from the schema and passed around with it when
transporting it across shards or nodes, and is needed for
reconstructing it, and it is not part of the schema mutations.
Our goal is to combine them to a single type that we will call
extended_frozen_schema.
Before mutable_token_metadata_ptr containing tablet changes
was replicated to all cores in post_commit phase which violated
atomicy guarantee of schema_applier, now it's incorporated into
per shard commit phase.
It uses service::schema_getter abstraction introduced in earlier
commit to inject "pending" schema which is not yet visible to the
whole system.
It never belonged to tables and views and its placement stems
from location of _tablet_hint handling code.
In the follwing commits we'll reference it in storage_service.cc.
It prepares pending_token_metadata to handle both new and copy
of existing metadata for consistent usage in later commit.
It also adds shared_token_metatada getter so that we don't
need to get it from db.
This is mechanical change which simplifies the code. Schema_applier
class is an object which holds schema merging intermediate state
so it's fine that all schema merging functions have access to this state.
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
As requested in #22120, moved the files and fixed other includes and build system.
Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh
Fixes: #22120
This is a cleanup, no need to backport
Closesscylladb/scylladb#25105
As requested in #22114, moved the files and fixed other includes and build system.
Moved files:
- interval.hh
- Map_difference.hh
Fixes: #22114
This is a cleanup, no need to backport
Closesscylladb/scylladb#25095
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.
Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit().
Backport: no, it's a new feature
Fixes: https://github.com/scylladb/scylladb/issues/19649
Fixes https://github.com/scylladb/scylladb/issues/24531Closesscylladb/scylladb#24886
[avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>]
* github.com:scylladb/scylladb:
test: add type creation to test_snapshot
storage_service: always wake up load balancer on update tablet metadata
db: schema_applier: call destroy also when exception occurs
db: replica: simplify seeding ERM during shema change
db: remove cleanup from add_column_family
db: abort on exception during schema commit phase
db: make user defined types changes atomic
replica: db: make keyspace schema changes atomic
db: atomically apply changes to tables and views
replica: make truncate_table_on_all_shards get whole schema from table_shards
service: split update_tablet_metadata into two phases
service: pull out update_tablet_metadata from migration_listener
db: service: add store_service dependency to schema_applier
service: simplify load_tablet_metadata and update_tablet_metadata
db: don't perform move on tablet_hint reference
replica: split add_column_family_and_make_directory into steps
replica: db: split drop_table into steps
db: don't move map references in merge_tables_and_views()
db: introduce commit_on_shard function
db: access types during schema merge via special storage
replica: make non-preemptive keyspace create/update/delete functions public
replica: split update keyspace into two phases
replica: split creating keyspace into two functions
db: rename create_keyspace_from_schema_partition
db: decouple functions and aggregates schema change notification from merging code
db: store functions and aggregates change batch in schema_applier
db: decouple tables and views schema change notifications from merging code
db: store tables and views schema diff in schema_applier
db: decouple user type schema change notifications from types merging code
service: unify keyspace notification functions arguments
db: replica: decouple keyspace schema change notifications to a separate function
db: add class encapsulating schema merging
Since we abort now on failure during schema commit
there is no need for cleanup as it only manages in-memory
state.
Explicit cf.stop was added to code paths outside of schema
merging to avoid unnecessary regressions.
The same order of creation/destruction is preserved as in the
original code, looking from single shard point of view.
create_types() is called on each shard separately, while in theory
we should be able reuse results similarly as diff_rows(). But we
don't introduce this optimization yet.
Now all keyspace related schema changes are observable
on given shard as they would be applied atomically.
This is achieved by commit_on_shard() function being
non-preemptive (no futures, no co_awaits).
In the future we'll extend this to the whole schema
and also other subsystems.
In this commit we make use of splitted functions introduced before.
Pattern is as follows:
- in merge_tables_and_views we call some preparatory functions
- in schema_applier::update we call non-yielding step
- in schema_applier::post_commit we call cleanups and other finalizing async
functions
Additionally we introduce frozen_schema_diff because converting
schema_ptr to global_schema_ptr triggers schema registration and
with atomic changes we need to place registration only in commit
phase. Schema freezing is the same method global_schema_ptr uses
to transport schema across shards (via schema_registry cache).
It's not a good usage as there is only one non-empty implementation.
Also we need to change it further in the following commit which
makes it incompatible with listener code.
There is already implicit logical dependency via migration_notifier
but in the next commits we'll be moving store_service out from it
as we need better control (i.e. return a value from the call).
This is done so that actual dropping can be
an atomic step which could be composed with other
schema operations, and eventually all subsystems modified
via raft so that we could introduce atomic changes which
span across different subsystems.
We split drop_table_on_all_shards() into:
- prepare_tables_metadata_change_on_all_shards()
- prepare_drop_table_on_all_shards()
- drop_table()
- cleanup_drop_table_on_all_shards()
prepare_tables_metadata_change_on_all_shards() is necessary
because when applying multiple schema changes at once (e.g. drop
and add tables) we need to lock only once.
We add legacy_drop_table_on_all_shards() which
behaves exactly like old drop_table_on_all_shards() to be
compatible with code which doesn't need to play with atomicity.
Usages of legacy_drop_table_on_all_shards() in schema_applier
will be replaced with direct calls to split functions in the following
commits - that's the place we will take advantage of drop_table not
yielding (as it returns void now).
This will be the place for all atomic schema switching
operations.
Note that atomicity is observed only from single shard
point of view. All shards may switch at slightly different times
as global locking for this is not feasible.
Once we create types atomically the code which is before commit
may depend on newly added types, so it has to access both old and
new types. New storage called in_progress_types_storage was added.
Merging types code now returns generic affected_types structure which
is used both for notifications and dropping types. New static
function drop_types() replaces dropping lambda used before.
While I think it's not necessary for dropping nor notifications to
use per shard copies (like it's using before and after this patch)
it could just use string parameters or something similar but
this requires too many changes in other classes so it's out of scope
here.
In following commits we want to separate updating code from committing
shema change (making it visible). Since notifications should be issued
after change is visible we need to separate them and call after
committing.
In subsequent commits other notification types will be moved too.
We change here order of notification calls with regards to rest
of schema updating code. I.e. before keyspace notifications triggered
before tables were updated, after the change they will trigger once
everything is updated. There is no indication that notification
listeners depend on this behaviour.
This commit doesn't yet change how schema merging
works but it prepares the ground for it.
We split merging code into several functions.
Main reasons for it are that:
- We want to generalize and create some interface
which each subsystem would use.
- We need to pull mutation's apply() out
of the code because raft will call it directly,
and it will contain a mix of mutations from more
than one subsystem. This is needed because we have
the need to update multiple subsystems atomically
(e.g. auth and schema during auto-grant when creating
a table).
In this commit do_merge_schema() code is split between
prepare(), update(), commit(), post_commit(). The idea
behind each of these phases is described in the comments.
The last 2 phases are not yet implemented as it requires more
code changes but adding schema_applier enclosing class
will help to create some copied state in the future and
implement commit() and post_commit() phases.
This reverts commit 0b516da95b, reversing
changes made to 30199552ac. It breaks
cluster.random_failures.test_random_failures.test_random_failures
in debug mode (at least).
Fixes#24513
Since we abort now on failure during schema commit
there is no need for cleanup as it only manages in-memory
state.
Explicit cf.stop was added to code paths outside of schema
merging to avoid unnecessary regressions.
The same order of creation/destruction is preserved as in the
original code, looking from single shard point of view.
create_types() is called on each shard separately, while in theory
we should be able reuse results similarly as diff_rows(). But we
don't introduce this optimization yet.