Commit Graph

47891 Commits

Author SHA1 Message Date
Marcin Maliszkiewicz
ac254e9722 service: split update_tablet_metadata into two phases
In following commits calls will be split in schema_applier.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
21a5a3c01f service: pull out update_tablet_metadata from migration_listener
It's not a good usage as there is only one non-empty implementation.
Also we need to change it further in the following commit which
makes it incompatible with listener code.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
92e3d69f79 db: service: add store_service dependency to schema_applier
There is already implicit logical dependency via migration_notifier
but in the next commits we'll be moving store_service out from it
as we need better control (i.e. return a value from the call).
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
1c8fd3a65d service: simplify load_tablet_metadata and update_tablet_metadata
- remove load_tablet_metadata(), instead we add wake_up_load_balancer flag
to update_tablet_metadata(), it reduces number of public functions and
also serves as a comment (removed comment with very similar meaning)

- reimplement the code to not use mutate_token_metadata(), this way
it's more readable and it's also needed as we'll split
update_tablet_metadata() in following commits so that we can have
subroutine which doesn't yield (for ensuring atomicity)
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
3119a02edd db: don't perform move on tablet_hint reference
This lambda is called several times so there should be no move.
Currently the bug likely doesn't manifest as code does work
only on shard 0.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
1ad14f02f1 replica: split add_column_family_and_make_directory into steps
This is similar work as for drop_table in previous commit.

add_column_family_and_make_directory() behaves exactly the same
as before but calls to it in schema_applier will be replaced by
calls directly to split steps. Other usages will remain intact as
they don't need atomicity (like creating system tables at startup).
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
141a5643e5 replica: db: split drop_table into steps
This is done so that actual dropping can be
an atomic step which could be composed with other
schema operations, and eventually all subsystems modified
via raft so that we could introduce atomic changes which
span across different subsystems.

We split drop_table_on_all_shards() into:
- prepare_tables_metadata_change_on_all_shards()
- prepare_drop_table_on_all_shards()
- drop_table()
- cleanup_drop_table_on_all_shards()

prepare_tables_metadata_change_on_all_shards() is necessary
because when applying multiple schema changes at once (e.g. drop
and add tables) we need to lock only once.

We add legacy_drop_table_on_all_shards() which
behaves exactly like old drop_table_on_all_shards() to be
compatible with code which doesn't need to play with atomicity.

Usages of legacy_drop_table_on_all_shards() in schema_applier
will be replaced with direct calls to split functions in the following
commits - that's the place we will take advantage of drop_table not
yielding (as it returns void now).
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
2bae38e252 db: don't move map references in merge_tables_and_views()
Since they are const it's not needed and misleading.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
85f19e165a db: introduce commit_on_shard function
This will be the place for all atomic schema switching
operations.

Note that atomicity is observed only from single shard
point of view. All shards may switch at slightly different times
as global locking for this is not feasible.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
b3730282c3 db: access types during schema merge via special storage
Once we create types atomically the code which is before commit
may depend on newly added types, so it has to access both old and
new types. New storage called in_progress_types_storage was added.
2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz
7f057af1f2 replica: make non-preemptive keyspace create/update/delete functions public
As those operations will be managed by schema_applier class. This
will be implemented in following commit.
2025-05-27 20:01:35 +02:00
Marcin Maliszkiewicz
2daa630938 replica: split update keyspace into two phases
- first phase is preemptive (prepare_update_keyspace)
- second phase is non-preemptive (update_keyspace)

This is done so that schema change can be applied atomically.

Aditionally create keyspace code was changed to share common
part with update keyspace flow.

This commit doesn't yet change the behaviour of the code,
as it doesn't guarantee atomicity, it will be done in following
commits.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
fe0f4033ca replica: split creating keyspace into two functions
This is done so that in following commits insert_keyspace can be used
to atomically change schema (as it doesn't yield).
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
aceb1f9659 db: rename create_keyspace_from_schema_partition
It only creates keyspace metadata.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
f8fe51640a db: decouple functions and aggregates schema change notification from merging code 2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
52069d954f db: store functions and aggregates change batch in schema_applier
To be used in following commit.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
5fff3097a5 db: decouple tables and views schema change notifications from merging code
As post_commit() can't be fully implemented at this stage,
it was moved to interim place to keep things working.
It will be moved back later.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
6f8579e242 db: store tables and views schema diff in schema_applier
It will be used in subsequent commit for moving
notifications code.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
b74c1e9ae4 db: decouple user type schema change notifications from types merging code
Merging types code now returns generic affected_types structure which
is used both for notifications and dropping types. New static
function drop_types() replaces dropping lambda used before.

While I think it's not necessary for dropping nor notifications to
use per shard copies (like it's using before and after this patch)
it could just use string parameters or something similar but
this requires too many changes in other classes so it's out of scope
here.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
3a95edd0d7 service: unify keyspace notification functions arguments
Keyspace metadata is not used, only name is needed so
we can remove those extra find_keyspace() calls.

Moreover there is no need to copy the name.
2025-05-27 20:00:58 +02:00
Marcin Maliszkiewicz
d7202586ca db: replica: decouple keyspace schema change notifications to a separate function
In following commits we want to separate updating code from committing
shema change (making it visible). Since notifications should be issued
after change is visible we need to separate them and call after
committing.

In subsequent commits other notification types will be moved too.

We change here order of notification calls with regards to rest
of schema updating code. I.e. before keyspace notifications triggered
before tables were updated, after the change they will trigger once
everything is updated. There is no indication that notification
listeners depend on this behaviour.
2025-05-27 19:59:47 +02:00
Marcin Maliszkiewicz
ddf9f7ae05 db: add class encapsulating schema merging
This commit doesn't yet change how schema merging
works but it prepares the ground for it.

We split merging code into several functions.
Main reasons for it are that:

- We want to generalize and create some interface
which each subsystem would use.

- We need to pull mutation's apply() out
of the code because raft will call it directly,
and it will contain a mix of mutations from more
than one subsystem. This is needed because we have
the need to update multiple subsystems atomically
(e.g. auth and schema during auto-grant when creating
a table).

In this commit do_merge_schema() code is split between
prepare(), update(), commit(), post_commit(). The idea
behind each of these phases is described in the comments.
The last 2 phases are not yet implemented as it requires more
code changes but adding schema_applier enclosing class
will help to create some copied state in the future and
implement commit() and post_commit() phases.
2025-05-27 19:33:02 +02:00
Jenkins Promoter
76dddb758e Update pgo profiles - x86_64 2025-05-27 12:02:49 +03:00
Pavel Emelyanov
bd3bd089e1 sstables_loader: Fix load-and-stream vs skip-cleanup check
The intention was to fail the REST API call in case --skip-cleanup is
requested for --load-and-stream loading. The corresponding if expression
is checking something else :( despite log message is correct.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24208
2025-05-27 12:01:01 +03:00
Jenkins Promoter
de9d9c9ece Update pgo profiles - aarch64 2025-05-27 11:59:56 +03:00
Andrzej Jackowski
555d897a15 test: wait for normal state propagation in test_auth_v2_migration
By default, cluster tests have skip_wait_for_gossip_to_settle=0 and
ring_delay_ms=0. In tests with gossip topology, it may lead to a race,
where nodes see different state of each other.

In case of test_auth_v2_migration, there are three nodes. If the first
node already knows that the third node is NORMAL, and the second node
does not, the system_auth tables can return incomplete results.

To avoid such a race, this commit adds a check that all nodes see other
nodes as NORMAL before any writes are done.

Refs: #24163

Closes scylladb/scylladb#24185
2025-05-27 11:41:09 +03:00
Nikos Dragazis
eaa2ce1bb5 sstables: Fix race when loading checksum component
`read_checksum()` loads the checksum component from disk and stores a
non-owning reference in the shareable components. To avoid loading the
same component twice, the function has an early return statement.
However, this does not guarantee atomicity - two fibers or threads may
load the component and update the shareable components concurrently.
This can lead to use-after-free situations when accessing the component
through the shareable components, since the reference stored there is
non-owning. This can happen when multiple compaction tasks run on the
same SSTable (e.g., regular compaction and scrub-validate).

Fix this by not updating the reference in shareable components, if a
reference is already in place. Instead, create an owning reference to
the existing component for the current fiber. This is less efficient
than using a mutex, since the component may be loaded multiple times
from disk before noticing the race, but no locks are used for any other
SSTable component either. Also, this affects uncompressed SSTables,
which are not that common.

Fixes #23728.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#23872
2025-05-27 11:26:35 +03:00
Botond Dénes
2739eb49fd Merge 'docs: remove API reference redirect' from David Garcia
Fix for https://github.com/scylladb/scylladb/pull/24097

The stable branch does not contain the split API reference yet. This change fixes the 404 error raised when accessing the API reference on the stable branch due to the redirect.

Closes scylladb/scylladb#24259

* github.com:scylladb/scylladb:
  docs: fix typo
  docs: remove API reference redirect
2025-05-27 11:24:27 +03:00
Nadav Har'El
8487d81c6e Merge 'test: mark difference in handling IFs in LWT as scylla_only' from Andrzej Jackowski
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

In this series, we extend the documentation of the ScyllaDB behaviour
by extending the documents and improving relevant LWT tests.

Fixes: https://github.com/scylladb/scylladb/issues/13011

Backport not needed, only docs and minor tests changes.

Closes scylladb/scylladb#24086

* github.com:scylladb/scylladb:
  test: mark difference in handling IFs in LWT as scylla_only
  docs: cql: add explicit explanation how mixing IFs works in LWT
  docs: lwt: add two missing spaces
2025-05-27 09:35:41 +03:00
Andrzej Jackowski
7dc0c4cf4f test: close logfile/socket_dir for stopped servers in recycle_cluster
PythonTestSuite::recycle_cluster is a function that releases resources
of an old, dirty cluster to make it reusable. It closes log_file and
maintenance_socket_dir for running nodes in a dirty cluster, however it
doesn't do the same for stopped nodes. It leads to leakage of file
descriptors of stopped nodes, which in turn can lead to hitting ulimit
of open files (that is often 1024) if the leaking test is repeated with
`./test.py --repeat ...`. The problem was detected when tests from
`test/cluster/dtest/` directory were executed with high `repeat` value.

This commit extends `recycle_cluster` to close and cleanup logfile and
`socket_dir` for nodes that are stopped (because self.servers in
ScyllaCluster is ChainMap of self.running and self.stopped).

Closes scylladb/scylladb#24243
2025-05-27 08:37:43 +03:00
David Garcia
d99d1c315c docs: remove [erno X] prefix from metrics logger
Closes scylladb/scylladb#24246
2025-05-27 08:37:11 +03:00
David Garcia
3e331cfbbe docs: fix typo 2025-05-26 21:34:23 +02:00
David Garcia
eefc9c33e8 docs: remove API reference redirect
The stable branch does not contain the split API reference yet.
This change fixes the 404 error raised when accessing the API reference on the stable branch.
2025-05-26 21:32:07 +02:00
Andrzej Jackowski
ea6ef5d0aa test: mark difference in handling IFs in LWT as scylla_only
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

This commit:
 - Make test_lwt_with_batch_conflict_1 scylla_only instead of xfail,
   change the scenario to pass with the current implementation.
 - Add test_lwt_with_batch_conflict_3 that shows how Cassandra fails
   batch statement with different conditions, even when the conditions
   are not contradictory.
 - Add test_lwt_with_batch_conflict_4/5 that shows how static rows
   are handled in conditional batches.

Fixes: #13011
2025-05-26 15:47:11 +02:00
Andrzej Jackowski
2d4acb623e docs: cql: add explicit explanation how mixing IFs works in LWT
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS").

This commit explicitly documents the differences in the behavior.

Refs: #13011
2025-05-26 15:13:01 +02:00
Piotr Dulikowski
4508823294 Merge 'test.py: dtest: few fixes missed in the initial implementation' from Evgeniy Naydanov
There are few problems found in the dtest shim code after scylladb/scylladb#21580 was merged:

- The call of `init_default_config()` method was missed in scylladb/scylladb#21580.  It is required to handle dtest options and markers.
- The implementation of dtest shim uses `server_id` to format a name of a node in a cluster. This is a difference in behavior with dtest. Some of dtests use code like `cluster.nodes()["node1"]` to get access to a node object.
- Default timeout was missed in `ScyllaNode.wait_until_stopped()` method. Set it to 600 for debug mode or to 127 otherwise.

Closes scylladb/scylladb#24225

* github.com:scylladb/scylladb:
  test.py: dtest: set default wait_seconds based on build mode
  test.py: dtest: name nodes in cluster using index starting from 1
  test.py: dtest: initialize default config in dtest setup fixture
2025-05-26 13:37:12 +02:00
Yaron Kaikov
89ace09c18 [workflow]: add conflict_reminder to PRs based against master
Today we send a reminder to PR's author when backport PRs has conflicts.
Often, PR authors wait for their PR to be reviewed/merged, but the merge is not happening because the PR now conflicts with master and so maintainers won't merge it. This can lead to a stall, where maintainers wait for the author to rebase and authors are waiting for merge.

In this PR we added the ability to notify the PR author as soon as base
branch moved forward and rebase is requried

Fixes: https://github.com/scylladb/scylla-pkg/issues/4955

Closes scylladb/scylladb#24209
2025-05-26 14:30:06 +03:00
David Garcia
6f722e8bc0 docs: split api reference in smaller files
Closes scylladb/scylladb#24097
2025-05-26 12:06:59 +03:00
David Garcia
bf9534e2b5 docs: fix \t (tab) is not rendered correctly
Closes scylladb/scylladb#24096
2025-05-26 12:06:03 +03:00
Avi Kivity
29932a5af1 pgo: drop Java configuration
Since 5e1cf90a51
("build: replace tools/java submodule with packaged cassandra-stress")
we run pre-packaged cassandra-stress. As such, we don't need to look for
a Java runtime (which is missing on the frozen toolchain) and can
rely on the cassandra-stress package finding its own Java runtime.

Fix by just dropping all the Java-finding stuff.

Note: Java 11 is in fact present on the frozen toolchain, just
not in a way that pgo.py can find it.

Fixes #24176.

Closes scylladb/scylladb#24178
2025-05-26 10:16:03 +02:00
Avi Kivity
f195c05b0d untyped_result_set: mark get_blob() as returning unfragmented data
Blobs can be large, and unfragmented blobs can easily exceed 128k
(as seen in #23903). Rename get_blob() to get_blob_unfragmented()
to warn users.

Note that most uses are fine as the blobs are really short strings.

Closes scylladb/scylladb#24102
2025-05-26 09:40:34 +02:00
Michał Chojnowski
ff8a119f26 test/boost/sstable_compressor_factory_test: define a test suite name
It seems that tests in test/boost/combined_tests have to define a test
suite name, otherwise they aren't picked up by test.py.

Fixes #24199

Closes scylladb/scylladb#24200
2025-05-26 09:35:30 +02:00
Anna Stuchlik
d303edbc39 doc: remove copyright from Cassandra Stress
This commit removes the Apache copyright note from the Cassandra Stress page.

It's a follow up to https://github.com/scylladb/scylladb/pull/21723, which missed
that update (see https://github.com/scylladb/scylladb/pull/21723#discussion_r1944357143).

Cassandra Stress is a separate tool with separate repo with the docs, so the copyright
information on the page is incorrect.

Fixes https://github.com/scylladb/scylladb/issues/23240

Closes scylladb/scylladb#24219
2025-05-26 09:35:30 +02:00
Pavel Emelyanov
2a253ace5e Merge 'test.py: add coverage for boost with pytest execution' from Andrei Chekun
This PR adds the possibility to gather coverage for the boost tests when they're executed with pytest. Since the pytest will be used as the main runner for boost tests as well, we need this before switching the runners.

Closes scylladb/scylladb#24236

* github.com:scylladb/scylladb:
  test.py: add support for coverage for boost test
  test.py: get the temp dir from facade
2025-05-26 10:18:53 +03:00
Andrei Chekun
537054bfad test.py: add support for coverage for boost test
This PR adds the possibility to gather coverage for the boost tests when they're executed with pytest. Since the pytest will be used as the main runner for boost tests as well, we need this before switching the runners.
2025-05-23 12:54:54 +02:00
Andrei Chekun
c5a7f3415c test.py: get the temp dir from facade
No need to get the temp dir from the options when facade has this information already.
2025-05-23 12:54:48 +02:00
Nadav Har'El
d2844055ad Merge 'index: implement schema management layer for vector search indexes' from null
This pull request adds support for creating custom indexes (at a metadata level) as long as a supported custom class is provided (currently only vector search).

The patch contains:

- a change in CREATE INDEX statement that allows for the USING keyword to be present as long as one of the supported classes is used
-  support for describing custom indexes in the DESCRIBE statement
- unit tests

Co-authored by: @Balwancia

Closes scylladb/scylladb#23720

* github.com:scylladb/scylladb:
  test/cqlpy: add custom index tests
  index: support storing metadata for custom indices
2025-05-22 12:19:36 +03:00
Pavel Emelyanov
a0d2e63303 Merge 'test.py: add the possibility to gather resource metrics for C++ tests' from Andrei Chekun
Move the run_process method to resource gather instance, since we need to start a monitor to check memory consumption in the cgroup. Pytest has concept of the test, but it is completely different from test.py. Resource gather instance take test instance to save and extract information about the test. Additional method emulating test.py test instance added not to rewrite the resource gather instance. Finally, combining all these changes to have ability to get metrics for test in both runners: test.py and pytest.

Closes scylladb/scylladb#24091

* github.com:scylladb/scylladb:
  test.py: add missing parameter for boost tests for pytest runner
  test.py: add support for boost_data_test_case in combined tests
  test.py: clean log files after a successful run
  test.py: attach output of the boost test to the report
  test.py: fix metrics DB location
  test.py: move run_process to resource_gather.py
  test.py: unify using constant for finding repo root directory
  test.py: refactor run_process in facade.py
  test.py: add the possibility to create a test alike object
2025-05-22 10:34:34 +03:00
Evgeniy Naydanov
8dc5413f54 test.py: dtest: set default wait_seconds based on build mode
Default timeout was missed in `ScyllaNode.wait_until_stopped()` method.
Set it to 600 for debug mode or to 127 otherwise.
2025-05-22 06:39:03 +00:00
Evgeniy Naydanov
eca5d52f1d test.py: dtest: name nodes in cluster using index starting from 1
The current implementation of dtest shim use `server_id` to format a
name of a node in a cluster. This is a difference in behavior with dtest.
Some of dtests use code like `cluster.nodes()["node1"]` to get access
to a node object.  This commit changes it to be more consistent with
dtest.
2025-05-22 06:34:03 +00:00