By default, cluster tests have skip_wait_for_gossip_to_settle=0 and
ring_delay_ms=0. In tests with gossip topology, it may lead to a race,
where nodes see different state of each other.
In case of test_auth_v2_migration, there are three nodes. If the first
node already knows that the third node is NORMAL, and the second node
does not, the system_auth tables can return incomplete results.
To avoid such a race, this commit adds a check that all nodes see other
nodes as NORMAL before any writes are done.
Refs: #24163Closesscylladb/scylladb#24185
(cherry picked from commit 555d897a15)
Closesscylladb/scylladb#24519
The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen.
Fixes: scylladb/scylladb#23968
Needs backport to 2025.1 and 6.2, both have the flaky test
- (cherry picked from commit 51025de755)
- (cherry picked from commit 29eedaa0e5)
Parent PR: #23989Closesscylladb/scylladb#24049
* github.com:scylladb/scylladb:
test/cluster/test_read_repair.py: improve trace logging test (again)
test/cluster: extract execute_with_tracing() into pylib/util.py
In ScyllaDB, schema modification operations use "optimistic locking": A schema operation reads the current schema, decides what it wants to do and prepares changes to the schema, and then attempts to commit those changes - but only if the schema hasn't changed since the first read. If the schema has already been changed by some other node - we need to try again. In a loop.
In Alternator, there are six operations that perform schema modification: CreateTable, DeleteTable, UpdateTable, TagResource, UntagResource and UpdateTimeToLive. All of them were missing this loop. We knew about this - and even had FIXME in all places. So all these operations, when facing contention of concurrent schema modifications on different nodes may fail one of these operations with an error like:
Internal server error: service::group0_concurrent_modification
(Failed to apply group 0 change due to concurrent modification).
This problem had very minor effect, if any, on real users because the DynamoDB SDK automatically retries operations that fail with retryable errors - like this "Internal server error" - and most likely the schema operation will succeed upon retry. However, as shown in issue #13152 these failures were annoying in our CI, where tests - which disable request retries - failed on these errors.
This patch fixes all six operations (the last three operations all use one common function, db::modify_tags(), so are fixed by one change) to add the missing loop.
The patch also includes reproducing tests for all these operations - the new tests all fail before this patch, and pass with it.
These new tests are much more reliable reproducers than the dtests we had that only sometimes - very rarely - reproduced the problem. Moreover, the new tests reproduces the bug seperately for each of the six operations, so if we forget to fix one of the six operations, one of the tests would have continued to fail. Of course I checked this during development.
The new tests are in the test/cluster framework, not test/alternator, because this problem can only be reproduced in a multi-node cluster: On a single node, it serializes its schema modifications on its own; The collisions only happen when more than one node attempts schema modifications at the same time.
Fixes#13152
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23827
(cherry picked from commit 3ce7e250cc)
Closesscylladb/scylladb#24494
* github.com:scylladb/scylladb:
alternator: fix indentation
alternator: fix schema "concurrent modification" errors
This PR adjusts existing Boost tests so they respect the invariant
introduced by enabling `rf_rack_valid_keyspaces` configuration option.
We disable it explicitly in more problematic tests. After that, we
enable the option by default in the whole test suite.
Fixes scylladb/scylladb#23958
Backport: backporting to 2025.1 to be able to test the implementation there too.
- (cherry picked from commit 6e2fb79152)
- (cherry picked from commit e4e3b9c3a1)
- (cherry picked from commit 1199c68bac)
- (cherry picked from commit cd615c3ef7)
- (cherry picked from commit fa62f68a57)
- (cherry picked from commit 22d6c7e702)
- (cherry picked from commit 237638f4d3)
- (cherry picked from commit c60035cbf6)
Parent PR: #23802Closesscylladb/scylladb#24367
* github.com:scylladb/scylladb:
test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default
test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests
test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load
test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity
test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity
test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity
test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity
test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity
There are two semaphores in table for synchronizing changes to sstable list:
sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.
sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.
they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.
problem:
A = tablet cleanup
B = take_snapshot()
1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk
Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.
But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.
That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.
The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.
So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.
Fixes#23049.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#23117
(cherry picked from commit fedd838b9d)
Closesscylladb/scylladb#23279
In ScyllaDB, schema modification operations use "optimistic locking":
A schema operation reads the current schema, decides what it wants to do
and prepares changes to the schema, and then attempts to commit those
changes - but only if the schema hasn't changed since the first read.
If the schema has already been changed by some other node - we need to
try again. In a loop.
In Alternator, there are six operations that perform schema modification:
CreateTable, DeleteTable, UpdateTable, TagResource, UntagResource and
UpdateTimeToLive. All of them were missing this loop. We knew about
this - and even had FIXME in all places. So all these operations,
when facing contention of concurrent schema modifications on different
nodes may fail one of these operations with an error like:
Internal server error: service::group0_concurrent_modification
(Failed to apply group 0 change due to concurrent modification).
This problem had very minor effect, if any, on real users because the
DynamoDB SDK automatically retries operations that fail with retryable
errors - like this "Internal server error" - and most likely the schema
operation will succeed upon retry. However, as shown in issue #13152
these failures were annoying in our CI, where tests - which disable
request retries - failed on these errors.
This patch fixes all six operations (the last three operations all
use one common function, db::modify_tags(), so are fixed by one
change) to add the missing loop.
The patch also includes reproducing tests for all these operations -
the new tests all fail before this patch, and pass with it.
These new tests are much more reliable reproducers than the dtests
we had that only sometimes - very rarely - reproduced the problem.
Moreover, the new tests reproduces the bug seperately for each of the
six operations, so if we forget to fix one of the six operations, one
of the tests would have continued to fail. Of course I checked this
during development.
The new tests are in the test/cluster framework, not test/alternator,
because this problem can only be reproduced in a multi-node cluster:
On a single node, it serializes its schema modifications on its own;
The collisions only happen when more than one node attempts schema
modifications at the same time.
Fixes#13152
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23827
(cherry picked from commit 3ce7e250cc)
We've adjusted all of the Boost tests so they respect the invariant
enforced by the `rf_rack_valid_keyspaces` configuration option, or
explicitly disabled the option in those that turned out to be more
problematic and will require more attention. Thanks to that, we can
now enable it by default in the test suite.
(cherry picked from commit c60035cbf6)
Some of the tests in the file verify more subtle parts of the behavior
of tablets and rely on topology layouts or using keyspaces that violate
the invariant the `rf_rack_valid_keyspaces` configuration option is
trying to enforce. Because of that, we explicitly disable the option
to be able to enable it by default in the rest of the test suite in
the following commit.
(cherry picked from commit 237638f4d3)
We make sure that the keyspaces created in the test are always RF-rack-valid.
To achieve that, we change how the test is performed.
Before this commit, we first created a cluster and then ran the actual test
logic multiple times. Each of those test cases created a keyspace with a random
replication factor.
That cannot work with `rf_rack_valid_keyspaces` set to true. We cannot modify
the property file of a node (see commit: eb5b52f598),
so once we set up the cluster, we cannot adjust its layout to work with another
replication factor.
To solve that issue, we also recreate the cluster in each test case. Now we choose
the replication factor at random, create a cluster distributing nodes across as many
racks as RF, and perform the rest of the logic. We perform it multiple times in
a loop so that the test behaves as before these changes.
(cherry picked from commit fa62f68a57)
We distribute the nodes used in the test across two racks so we can
run the test with `rf_rack_valid_keyspaces` set to true.
We want to avoid cross-rack migrations and keep the test as realistic
as possible. Since host3 is supposed to function as a new node in the
cluster, we change the layout of it: now, host1 has 2 shards and resides
in a separate rack. Most of the remaining test logic is preserved and behaves
as before this commit.
There is a slight difference in the tablet migrations. Before the commit,
we were migrating a tablet between nodes of different shard counts. Now
it's impossible because it would force us to migrate tablets between racks.
However, since the test wants to simply verify that an ongoing migration
doesn't interfere with load balancing and still leads to a perfect balance,
that still happens: we explicitly migrate ONLY 1 tablet from host2 to host3,
so to achieve the goal, one more tablet needs to be migrated, and we test
that.
(cherry picked from commit cd615c3ef7)
We assign the nodes created by the test to separate racks. It has no impact
on the test since the keyspace used in the test uses RF=2, so the tablet
replicas will still be the same.
(cherry picked from commit 1199c68bac)
We distribute the nodes used in the test between two racks. Although
that may affect how tablets behave in general, this change will not
have any real impact on the test. The test verifies that load balancing
eventually balances tablets in the cluster, which will still happen.
Because of that, the changes in this commit are safe to apply.
(cherry picked from commit e4e3b9c3a1)
We distribute the nodes used in the test between two racks. Although that
may have an impact on how tablets behave, it's orthogonal to what the test
verifies -- whether the topology coordinator is continuously in the tablet
migration track. Because of that, it's safe to make this change without
influencing the test.
(cherry picked from commit 6e2fb79152)
The "tags" mechanism in Alternator is a convenient way to attach metadata
to Alternator tables. Recently we have started using it more and more for
internal metadata storage:
* UpdateTimeToLive stores the attribute in a tag system:ttl_attribute
* CreateTable stores provisioned throughput in tags
system:provisioned_rcu and system:provisioned_wcu
* CreateTable stores the table's creation time in a tag called
system:table_creation_time.
We do not want any of these internal tags to be visible to a
ListTagsOfResource request, because if they are visible (as before this
patch), systems such as Terraform can get confused when they suddenly
see a tag which they didn't set - and may even attempt to delete it
(as reported in issue #24098).
Moreover, we don't want any of these internal tags to be writable
with TagResource or UntagResource: If a user wants to change the TTL
setting they should do it via UpdateTimeToLive - not by writing
directly to tags.
So in this patch we forbid read or write to *any* tag that begins
with the "system:" prefix, except one: "system:write_isolation".
That tag is deliberately intended to be writable by the user, as
a configuration mechanism, and is never created internally by
Scylla. We should have perhaps chosen a different prefix for
configurable vs. internal tags, or chosen more unique prefixes -
but let's not change these historic names now.
This patch also adds regression tests for the internal tags features,
failing before this patch and passing after:
1. internal tags, specifically system:ttl_attribute, are not visible
in ListTagsOfResource, and cannot be modified by TagResource or
UntagResource.
2. system:write_isolation is not internal, and be written by either
TagResource or UntagResource, and read with ListTagsOfResource.
This patch also fixes a bug in the test where we added more checks
for system:write_isolation - test_tag_resource_write_isolation_values.
This test forgot to remove the system:write_isolation tags from
test_table when it ended, which would lead to other tests that run
later to run with a non-default write isolation - something which we
never intended.
Fixes#24098.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#24299
(cherry picked from commit 6cbcabd100)
Closesscylladb/scylladb#24376
The test test_multiple_unpublished_cdc_generations reads the CDC
generation timestamps to verify they are published in the correct order.
To do so it issues reads in a loop with a short sleep period and checks
the differences between consecutive reads, assuming they are monotonic.
However the assumption that the reads are monotonic is not valid,
because the reads are issued with consistency_level=ONE, thus we may read
timestamps {A,B} from some node, then read timestamps {A} from another
node that didn't apply the write of the new timestamp B yet. This will
trigger the assert in the test and fail.
To ensure the reads are monotonic we change the test to use consistency
level ALL for the reads.
Fixesscylladb/scylladb#24262Closesscylladb/scylladb#24272
(cherry picked from commit 3a1be33143)
Closesscylladb/scylladb#24333
This series adds support for WCU tracking in batch_write_item and tests it.
The patches include:
Switch the metrics (RCU and WCU) to count units vs half-units as they were, to make the metrics clearer for users.
Adding a public static get_half_units function to wcu_consumed_capacity_counter for use by batch write item, which cannot directly use the counter object.
Adding WCU calculation support to batch_write_item, based on item size for puts and a fixed 1 WCU for deletes. WCU metrics are updated, and consumed capacity is returned per table when requested.
The return handling was refactored to be coroutine-like for easier management of the consumed capacity array.
Adding tests that validate WCU calculation for batch put requests on a single table and across multiple tables, ensuring delete operations are counted correctly.
Adding a test that validates that WCU metrics are updated correctly during batch write item operations, ensuring the WCU of each item is calculated independently.
Need backport, WCU is partially supported, and is missing from batch_write_item
Fixes https://github.com/scylladb/scylladb/issues/23940
(cherry picked from commit 5ae11746fa)
(cherry picked from commit f2ade71f4f)
(cherry picked from commit 68db77643f)
(cherry picked from commit 14570f1bb5)
(cherry picked from commit 2ab99d7a07)
Parent PR: https://github.com/scylladb/scylladb/pull/23941
Replaces #24028Closesscylladb/scylladb#24034
* github.com:scylladb/scylladb:
alternator/test_metrics.py: batch_write validate WCU
alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU
alternator/executor: add WCU for batch_write_items
alternator/consumed_capacity: make wcu get_units public
Alternator: Change the WCU/RCU to use units
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as we'll as potentially causing
errors when we forget to set it at some point.
Instead, we want to make the base info an immutable member of view
schemas (inside view_info). To achieve this, in this series we remove
all base_info members that can change due to a base schema update,
and we calculate the remaining values during view update generation,
using the most up-to-date base schema version.
To calculate the values that depend on the base schema version, we
need to iterate over the view primary key and find the corresponding
columns, which adds extra overhead for each batch of view updates.
However, this overhead should be relatively small, as when creating
a view update, we need to prepare each of its columns anyway. And
if we need to read the old value of the base row, the relative
overhead is even lower.
After this change, the base info in view schemas stays the same
for all base schema updates, so we'll no longer get issues with
base_info being incompatible with a base schema version. Additionally,
it's a step towards making the schema objects immutable, which
we sometimes incorrectly assumed in the past (they're still not
completely immutable yet, as some other fields in view_info other
than base_info are initialized lazily and may depend on the base
schema version).
Fixes https://github.com/scylladb/scylladb/issues/9059
Fixes https://github.com/scylladb/scylladb/issues/21292
Fixes https://github.com/scylladb/scylladb/issues/22194
Fixes https://github.com/scylladb/scylladb/issues/22410
- (cherry picked from commit 900687c818)
- (cherry picked from commit a33963daef)
- (cherry picked from commit a3d2cd6b5e)
- (cherry picked from commit 32258d8f9a)
- (cherry picked from commit 6e539c2b4d)
- (cherry picked from commit 05fce91945)
- (cherry picked from commit ad55935411)
- (cherry picked from commit ea462efa3d)
- (cherry picked from commit d7bd86591e)
- (cherry picked from commit d77f11d436)
- (cherry picked from commit bf7bba9634)
- (cherry picked from commit ee5883770a)
Parent PR: #23337Closesscylladb/scylladb#23937
* github.com:scylladb/scylladb:
test: remove flakiness from test_schema_is_recovered_after_dying
mv: add a test for dropping an index while it's building
base_info: remove the lw_shared_ptr variant
view_info: don't re-set base_info after construction
base_info: remove base_info snapshot semantics
base_info: remove base schema from the base_info
schema_registry: store base info instead of base schema for view entries
base_info: make members non-const
view_info: move the base info to a separate header
view_info: move computation of view pk columns not in base pk to view_updates
view_info: move base-dependent variables into base_info
view_info: set base info on construction
alter_table_statement: fix renaming multiple columns in tables with views
Max purgeable has two possible values for each partition: one for
regular tombstones and one for shadowable ones. Yet currently a single
member is used to cache the max-purgeable value for the partition, so
whichever kind of tombstone is checked first, its max-purgeable will
become sticky and apply to the other kind of tombstones too. E.g. if the
first can_gc() check is for a regular tombstone, its max-purgeable will
apply to shadowable tombstones in the partition too, meaning they might
not be purged, even though they are purgeable, as the shadowable
max-purgeable is expected to be more lenient. The other way around is
worse, as it will result in regular tombstone being incorrectly purged,
permitted by the more lenient shadowable tombstone max-purgeable.
Fix this by caching the two possible values in two separate members.
A reproducer unit test is also added.
Fixes: scylladb/scylladb#23272Closesscylladb/scylladb#24171
(cherry picked from commit 7db956965e)
Closesscylladb/scylladb#24328
When map_reduce is called on a collection, one shouldn't expect that it
processes the elements of the collection in any specific order.
Current test of map-reduce over boost outcome assumes that if reduce
function is the string concatenation, then it would concatenate the
given vector of strings in the order they are listed. That requirement
should be relaxed, and the result may have reversed concatentation.
Fixesscylladb/scylladb#24321
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#24325
(cherry picked from commit a65ffdd0df)
Closesscylladb/scylladb#24334
When schema is changed, sstable set is updated according to the compaction strategy of the new schema (no changes to set are actually made, just the underlying set type is updated), but the problem is that it happens without a lock, causing a use-after-free when running concurrently to another set update.
Example:
1) A: sstable set is being updated on compaction completion
2) B: schema change updates the set (it's non deferring, so it happens in one go) and frees the set used by A.
3) when A resumes, system will likely crash since the set is freed already.
ASAN screams about it:
SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ...
Fix is about deferring update of the set on schema change to compaction, which is triggered after new schema is set. Only strategy state and backlog tracker are updated immediately, which is fine since strategy doesn't depend on any particular implementation of sstable set.
Fixes#22040.
- (cherry picked from commit 628bec4dbd)
- (cherry picked from commit 434c2c4649)
Parent PR: #23680Closesscylladb/scylladb#24081
* github.com:scylladb/scylladb:
replica: Fix use-after-free with concurrent schema change and sstable set update
sstables: Implement sstable_set_impl::all_sstable_runs()
When schema is changed, sstable set is updated according to the compaction
strategy of the new schema (no changes to set are actually made, just
the underlying set type is updated), but the problem is that it happens
without a lock, causing a use-after-free when running concurrently to
another set update.
Example:
1) A: sstable set is being updated on compaction completion
2) B: schema change updates the set (it's non deferring, so it
happens in one go) and frees the set used by A.
3) when A resumes, system will likely crash since the set is freed
already.
ASAN screams about it:
SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ...
Fix is about deferring update of the set on schema change to compaction,
which is triggered after new schema is set. Only strategy state and
backlog tracker are updated immediately, which is fine since strategy
doesn't depend on any particular implementation of sstable set, since
patch "sstables: Implement sstable_set_impl::all_sstable_runs()".
Fixes#22040.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 434c2c4649)
In the test test_tablet_mv_replica_pairing_during_replace we stop 2 out of 4 servers while using RF=2. Even though in the test we use exactly 4 tablets (1 for each replica of a base table and view), intially, the tablets may not be split evenly between all nodes. Because of this, even when we chose a server that hosts the view and a different server that hosts the base table, we sometimes stoped all replicas of the base or the view table because the node with the base table replica may also be a view replica.
After some time, the tablets should be distributed across all nodes. When that happens, there will be no common nodes with a base and view replica, so the test scenario will continue as planned.
In this patch, we add this waiting period after creating the base and view, and continue the test only when all 4 tablets are on distinct nodes.
Fixesscylladb/scylladb#23982Fixesscylladb/scylladb#23997Fixesscylladb/scylladb#24250
(cherry picked from commit bceb64fb5a)
(cherry picked from commit 5074daf1b7)
Parent PR: scylladb/scylladb#24111Closesscylladb/scylladb#24128
* github.com:scylladb/scylladb:
test: actually wait for tablets to distribute across nodes
test_mv_tablets_replace: wait for tablet replicas to balance before working on them
In test_tablet_mv_replica_pairing_during_replace, after we create
the tables, we want to wait for their tablets to distribute evenly
across nodes and we have a wait_for for that.
But we don't await this wait_for, so it's a no-op. This patch fixes
it by adding the missing await.
Refs scylladb/scylladb#23982
Refs scylladb/scylladb#23997Closesscylladb/scylladb#24250
(cherry picked from commit 5074daf1b7)
Due to the changes in creating schemas with base info the
test_schema_is_recovered_after_dying seems to be flaky when checking
that the schema is actually lost after 'grace_period'. We don't
actually guarantee that the the schema will be lost at that exact
moment so there's no reason to test this. To remove the flakiness,
we remove the check and the related sleep, which should also slightly
improve the speed of this test.
(cherry picked from commit ee5883770a)
Dropping an index is a schema change of its base table and
a schema drop of the index's materialized view. This combination
of schema changes used to cause issues during view building, because
when a view schema was dropped, it wasn't getting updated with the
new version of the base schema, and while the view building was
in progress, we would update the base schema for the base table
mutation reader and try generating updates with a view schema that
wasn't compatible with the base schema, failing on an `on_internal_error`.
In this patch we add a test for this scenario. We create an index,
halt its view building process using an injection, and drop it.
If no errors are thrown, the test succeeds.
The test was failing before https://github.com/scylladb/scylladb/pull/23337
and is passing afterwards.
(cherry picked from commit bf7bba9634)
The base_dependent_view_info is no longer needed to be shared or
modified in the view_info, so we no longer need to keep it as
a shared pointer.
(cherry picked from commit d77f11d436)
In the previous commits we made sure that the base info is not dependent
on the base schema version, and the info dependent on the base schema
version is calculated when it's needed. In this patch we remove the
unnecessary re-setting of the base_info.
The set_base_info method isn't removed completely, because it also has
a secondary function - zeroing the view_info fields other than base_info.
Because of this, in this patch we rename it accordingly and limit its
use to the updates caused by a base schema change.
(cherry picked from commit d7bd86591e)
In the following patch we plan to remove the base schema from the base_info
to make the base_info immutable. To do that, we first prepare the schema
registry for the change; we need to be able to create view schemas from
frozen schemas there and frozen schemas have no information about the base
table. Unless we do this change, after base schemas are removed from the
base info, we'll no longer be able to load a view schema to the schema registry
without looking up the base schema in the database.
This change also required some updates to schema building:
* we add a method for unfreezing a view schema with base info instead of
a base schema
* we make it possible to use schema_builder with a base info instead of
a base schema
* we add a method for creating a view schema from mutations with a base info
instead of a base schema
* we add a view_info constructor withat base info instead of a base schema
* we update the naming in schema_registry to reflect the usage of base info
instead of base schema
(cherry picked from commit 05fce91945)
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.
Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database
Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
(cherry picked from commit 900687c818)
When we rename columns in a table which has materialized views depending
on it, we need to also rename them in the materialized views' WHERE
clauses.
Currently, we do that by creating a new WHERE clause after each rename,
with the updated column. This is later converted to a mutation that
overwrites the WHERE clause. After multiple renames, we have multiple
mutations, each overwriting the WHERE clause with one column renamed.
As a result, the final WHERE clause is one of the modified clauses with
one column renamed.
Instead, we should prepare one new WHERE clause which includes all the
renamed columns. This patch accomplishes this by processing all the
column renames first, and only preparing the new view schema with the
new WHERE clause afterwards.
This patch also includes a test reproducer for this scenario.
Fixesscylladb/scylladb#22194Closesscylladb/scylladb#23152
(cherry picked from commit 88d3fc68b5)
In the test test_tablet_mv_replica_pairing_during_replace we stop 2 out of 4 servers while using RF=2.
Even though in the test we use exactly 4 tablets (1 for each replica of a base table and view), intially,
the tablets may not be split evenly between all nodes. Because of this, even when we chose a server that
hosts the view and a different server that hosts the base table, we sometimes stoped all replicas of the
base or the view table because the node with the base table replica may also be a view replica.
After some time, the tablets should be distributed across all nodes. When that happens, there will be
no common nodes with a base and view replica, so the test scenario will continue as planned.
In this patch, we add this waiting period after creating the base and view, and continue the test only
when all 4 tablets are on distinct nodes.
Fixes https://github.com/scylladb/scylladb/issues/23982
Fixes https://github.com/scylladb/scylladb/issues/23997Closesscylladb/scylladb#24111
(cherry picked from commit bceb64fb5a)
So that a multi-dc/multi-rack cluster can be populated in a single call.
Original PR: scylladb/scylladb#23341
Fixes https://github.com/scylladb/scylladb/issues/23551
(cherry picked from commit 0fdf2a2090)
Closesscylladb/scylladb#23549
* github.com:scylladb/scylladb:
Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy
test.py: Remove reuse cluster in cluster tests
test.py apply prepare_3_nodes_cluster in topology
test.py: introduce prepare_3_nodes_cluster marker
The test test_read_repair_with_trace_logging wants to test read repair
with trace logging. Turns out that node restart + trace-level logging
+ debug mode is too much and even with 1 minute timeout, the read repair
times out sometimes.
Refactor the test to use injection point instead of restart. To make
sure the test still tests what it supposed to test, use tracing to
assert that read repair did indeed happen.
(cherry picked from commit 29eedaa)
Any empty object of the json::json_list type has its internal
_set variable assigned to false which results in such objects
being skipped by the json::json_builder.
Hence, the json returned by the api GET//compaction_manager/compaction_history
does not contain the field `rows_merged` if a cell in the
system.compaction_history table is null or an empty list.
In such cases, executing the command `nodetool compactionhistory`
will result in a crash with the following error message:
`error running operation: rjson::error (JSON assert failed on condition 'false'`
The patch fixes it by checking if the json object contains the
`rows_merged` element before processing. If the element does
not exist, the nodetool will now produce an empty list.
Fixes https://github.com/scylladb/scylladb/issues/23540Closesscylladb/scylladb#23514
(cherry picked from commit 113647550f)
Closesscylladb/scylladb#24113
test_tablet_repair_hosts_filter checks whether the host filter
specfied for tablet repair is correctly persisted. To check this,
we need to ensure that the repair is still ongoing and its data
is kept. The test achieves that by failing the repair on replica
side - as the failed repair is going to be retried.
However, if the filter does not contain any host (included_host_count = 0),
the repair is started on no replica, so the request succeeds
and its data is deleted. The test fails if it checks the filter
after repair request data is removed.
Fail repair on topology coordinator side, so the request is ongoing
regardless of the specified hosts.
Fixes: #23986.
Closesscylladb/scylladb#24003
(cherry picked from commit 2549f5e16b)
Closesscylladb/scylladb#24075
The test is failing in CI sometimes due to performance reasons.
There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
reclaim, the `background_reclaim` scheduling group isn't actually
guaranteed to get any CPU, regardless of shares. If the process is
switched out inside the `background_reclaim` group, it might
accumulate so much vruntime that it won't get any more CPU again
for a long time.
We have seen both.
This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.
So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test). After the patch,
we only check that the background reclaim happens *eventually*.
Fixes https://github.com/scylladb/scylladb/issues/15677
Backporting this is optional. The test is flaky even in stable branches, but the failure is rare.
- (cherry picked from commit c47f438db3)
- (cherry picked from commit 1c1741cfbc)
Parent PR: #24030Closesscylladb/scylladb#24092
* github.com:scylladb/scylladb:
logalloc_test: don't test performance in test `background_reclaim`
logalloc: make background_reclaimer::free_memory_threshold publicly visible
Currently, stream_manager is initialized after storage_service and
so it is stopped before the storage_service is. In its stop method
storage_service accesses stream_manager which is uninitialized
at a time.
Move stream_manager initialization over the storage_service initialization.
Fixes: #23207.
Closesscylladb/scylladb#24008
(cherry picked from commit 9c03255fd2)
Closesscylladb/scylladb#24189
The test is failing in CI sometimes due to performance reasons.
There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
reclaim, the `background_reclaim` scheduling group isn't actually
guaranteed to get any CPU, regardless of shares. If the process is
switched out inside the `background_reclaim` group, it might
accumulate so much vruntime that it won't get any more CPU again
for a long time.
We have seen both.
This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.
So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test).
(cherry picked from commit 1c1741cfbc)
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317Fixes#22342Fixes#21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060Fixes#23699
- (cherry picked from commit 50ce0aaf1c)
- (cherry picked from commit 5d448f721e)
- (cherry picked from commit f946302369)
- (cherry picked from commit 0fd1b846fe)
- (cherry picked from commit a66ddb7c04)
- (cherry picked from commit df84097a4b)
- (cherry picked from commit 59687c25e0)
- (cherry picked from commit fdb339bf28)
- (cherry picked from commit 205ed113dd)
- (cherry picked from commit 57faab9ffa)
- (cherry picked from commit 4fefffe335)
- (cherry picked from commit 480a5837ab)
- (cherry picked from commit fed078a38a)
- (cherry picked from commit c6653e65ba)
- (cherry picked from commit 9c095b622b)
- (cherry picked from commit 0668c642a2)
- (cherry picked from commit 0e11aad9c5)
- (cherry picked from commit ef85c4b27e)
- (cherry picked from commit b13e48b648)
- (cherry picked from commit a82e734110)
- (cherry picked from commit 629ee3cb46)
- (cherry picked from commit 42a104038d)
- (cherry picked from commit d5e3c578f5)
- (cherry picked from commit c05794c156)
- (cherry picked from commit 966cf82dae)
- (cherry picked from commit 11005b10db)
- (cherry picked from commit ff9c8428df)
- (cherry picked from commit 55b35eb21c)
- (cherry picked from commit 5759a97eb4)
- (cherry picked from commit c68d2a471c)
- (cherry picked from commit e05372afa4)
- (cherry picked from commit 380c5e5ac8)
- (cherry picked from commit 3f35491264)
- (cherry picked from commit e72a9d3faa)
- (cherry picked from commit 47326d01b7)
- (cherry picked from commit 72bc4016e7)
- (cherry picked from commit 4fd6c2d24e)
- (cherry picked from commit 50a8f5c1c0)
- (cherry picked from commit 005ceb77d3)
- (cherry picked from commit 649e68c6db)
- (cherry picked from commit 0b88ea9798)
- (cherry picked from commit 6b37d04aa9)
- (cherry picked from commit e59aca66bf)
- (cherry picked from commit 5ff3153912)
- (cherry picked from commit 20f7eda16e)
- (cherry picked from commit f30e4c6917)
- (cherry picked from commit 96d327fb83)
- (cherry picked from commit 16ef78075c)
- (cherry picked from commit 2d4af01281)
- (cherry picked from commit b810791fbb)
- (cherry picked from commit 46b1850f0c)
- (cherry picked from commit 0564e95c51)
- (cherry picked from commit 12f85ce57c)
- (cherry picked from commit 9829b1594f)
- (cherry picked from commit cbe79b20f7)
- (cherry picked from commit cc281ff88d)
Parent PR: #22399Closesscylladb/scylladb#23408
* github.com:scylladb/scylladb:
test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
test/repair: create_table_insert_data_for_repair: create keyspace with unique name
topology_tasks/test_tablet_tasks: use new_test_keyspace
topology_tasks/test_node_ops_tasks: use new_test_keyspace
topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
topology_custom/test_view_build_status: use new_test_keyspace
topology_custom/test_truncate_with_tablets: use new_test_keyspace
topology_custom/test_topology_failure_recovery: use new_test_keyspace
topology_custom/test_tablets_removenode: use create_new_test_keyspace
topology_custom/test_tablets_migration: use new_test_keyspace
topology_custom/test_tablets_merge: use new_test_keyspace
topology_custom/test_tablets_intranode: use new_test_keyspace
topology_custom/test_tablets_cql: use new_test_keyspace
topology_custom/test_tablets2: use *new_test_keyspace
topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
test/cluster/test_tablets.py: Fix test errorneous indentation
topology_custom/test_tablets: use new_test_keyspace
topology_custom/test_table_desc_read_barrier: use new_test_keyspace
topology_custom/test_shutdown_hang: use new_test_keyspace
topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
topology_custom/test_rpc_compression: use new_test_keyspace
topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
topology_custom/test_raft_no_quorum: use new_test_keyspace
topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
topology_custom/test_query_rebounce: use new_test_keyspace
topology_custom/test_not_enough_token_owners: use new_test_keyspace
topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
topology_custom/test_node_isolation: use create_new_test_keyspace
topology_custom/test_mv_topology_change: use new_test_keyspace
topology_custom/test_mv_tablets_replace: use new_test_keyspace
topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
topology_custom/test_mv_tablets: use new_test_keyspace
topology_custom/test_mv_read_concurrency: use new_test_keyspace
topology_custom/test_mv_fail_building: use new_test_keyspace
topology_custom/test_mv_delete_partitions: use new_test_keyspace
topology_custom/test_mv_building: use new_test_keyspace
topology_custom/test_mv_backlog: use new_test_keyspace
topology_custom/test_mv_admission_control: use new_test_keyspace
topology_custom/test_major_compaction: use new_test_keyspace
topology_custom/test_maintenance_mode: use new_test_keyspace
topology_custom/test_lwt_semaphore: use new_test_keyspace
topology_custom/test_ip_mappings: use new_test_keyspace
topology_custom/test_hints: use new_test_keyspace
topology_custom/test_group0_schema_versioning: use new_test_keyspace
topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
topology_custom/test_read_repair: use new_test_keyspace
topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
test/topology/util: new_test_keyspace: drop keyspace only on success
test/topology/util: refactor new_test_keyspace
test/topology/util: CREATE KEYSPACE IF NOT EXISTS
test/topology/util: new_test_keyspace: accept ManagerClient