Commit Graph

48284 Commits

Author SHA1 Message Date
Ernest Zaslavsky
d2d69cbc8c s3_client: Stop retries in chunked download source
Disable retries for S3 requests in the chunked download source to
prevent duplicate chunks from corrupting the buffer queue. The
response handler now throws an exception to bypass the retry
strategy, allowing the next range to be attempted cleanly.

This exception is only triggered for retryable errors; unretryable
ones immediately halt further requests.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
c75acd274c s3_client: Enhance test coverage for retry logic
Extend the S3 proxy to support error injection when the client
makes multiple requests to the same resource—useful for testing
retry behavior and failure handling.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
ec59fcd5e4 s3_client: Add test for Content-Range fix
Introduce a test that accurately verifies the Content-Range
behavior, ensuring the previous fix is properly validated.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
6d9cec558a s3_client: Fix missing negation
Restore a missing `not` in a conditional check that caused
incorrect behavior during S3 client execution.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
e73b83e039 s3_client: Refine logging
Fix typo in log message to improve clarity and accuracy during
S3 operations.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
f1d0690194 s3_client: Improve logging placement for current_range output
Relocated logging to occur after determining the `current_range`,
ensuring more relevant output during S3 client operations.
2025-07-01 18:45:17 +03:00
Yaron Kaikov
fd0e044118 Update ScyllaDB version to: 2025.4.0-dev 2025-07-01 11:33:20 +03:00
Jenkins Promoter
94d7c22880 Update pgo profiles - aarch64 2025-07-01 11:33:20 +03:00
Jenkins Promoter
7531fc72a6 Update pgo profiles - x86_64 2025-07-01 11:33:20 +03:00
Nadav Har'El
e12ff4d3ab Merge 'LWT: use tablet_metadata_guard' from Petr Gusev
This PR is a step towards enabling LWT for tablet-based tables.

It pursues several goals:
* Make it explicit that the tablet can't migrate after the `cas_shard` check in `selec_statement/modification_statement`. Currently, `storage_proxy::cas` expects that the client calls it on a correct shard -- the one which owns the partition key the LWT is running on. There reasons for that are explained in [this commit](f16e3b0491 (diff-1073ea9ce4c5e00bb6eb614154f523ba7962403a4fe6c8cd877d1c8b73b3f649)) message. The statements check the current shard and invokes `bounce_to_shard` if it's not the right one. However , the erm strong pointer is only captured in `storage_proxy::cas` and until that moment there is no explicit structure in the code which would prevent the ongoing migrations. In this PR we introduce such stucture -- `erm_handle`. We create it before the `cas_check` and pass it down to `storage_proxy::cas` and `paxos_response_handler`.
* Another goal of this PR is an optimization -- we don't want to hold erm for the duration of entire LWT, unless it directly affects the current tablet. The is a `tablet_metadata_guard` class which is used for long running tablet operations. It automatically switches to a new erm if the topology change represented by the new erm doesn't affect the current tablet. We use this class in `erm_handle` if the table uses tablets. Otherwise, `erm_handle` just stores erm directly.
* Fixes [shard bouncing issue in alternator](https://github.com/scylladb/scylladb/issues/17399)

Backport: not needed (new feature).

Closes scylladb/scylladb#24495

* github.com:scylladb/scylladb:
  LWT: make cas_shard non-optional in sp::cas
  LWT: create cas_shard in select_statement
  LWT: create cas_shard in modification and batch statements
  LWT: create cas_shard in alternator
  LWT: use cas_shard in storage_proxy::cas
  do_query_with_paxos: remove redundant cas_shard check
  storage_proxy: add cas_shard class
  sp::cas_shard: rename to get_cas_shard
  token_metadata_guard: a topology guard for a token
  tablet_metadata_guard: mark as noncopyable and nonmoveable
2025-07-01 11:33:20 +03:00
Gleb Natapov
a221b2bfde gossiper: do not assume that id->ip mapping is available in failure_detector_loop_for_node
failure_detector_loop_for_node may be started on a shard before id->ip
mapping is available there. Currently the code treats missing mapping
as an internal error, but it uses its result for debug output only, so
lets relax the code to not assume the mapping is available.

Fixes #23407

Closes scylladb/scylladb#24614
2025-07-01 11:33:20 +03:00
Pavel Emelyanov
26c7f7d98b Merge 'encryption_at_rest_test: Fix some spurious errors' from Calle Wilund
Fixes #24574

* Ensure we close the embedded load_cache objects on encryption shutdown, otherwise we can, in unit testing, get destruction of these while a timer is still active -> assert
* Add extra exception handling to `network_error_test_helper`, so even if test framework might exception-escape, we properly stop the network proxy to avoid use after free.

Closes scylladb/scylladb#24633

* github.com:scylladb/scylladb:
  encryption_at_rest_test: Add exception handler to ensure proxy stop
  encryption: Ensure stopping timers in provider cache objects
2025-07-01 11:33:20 +03:00
Avi Kivity
6aa71205d8 repair: row_level: unstall to_repair_rows_on_wire() destroying its input
to_repair_rows_on_wire() moves the contents of its input std::list
and is careful to yield after each element, but the final destruction
of the input list still deals with all of the list elements without
yielding. This is expensive as not all contents of repair_row are moved
(_dk_with_hash is of type lw_shared_ptr<const decorated_key_with_hash>).

To fix, destroy each row element as we move along. This is safe as we
own the input and don't reference row_list other than for the iteration.

Fixes #24725.

Closes scylladb/scylladb#24726
2025-07-01 11:33:19 +03:00
Pavel Emelyanov
6826856cf8 Merge 'test.py: Fix start 3rd party services' from Andrei Chekun
Move 3rd party services starting under `try` clause to avoid situation that main process is collapses without going stopping services.
Without this, if something wrong during start it will not trigger execution exit artifacts, so the process will stay forever.

This functionality in 2025.2 and can potentially affect jobs, so backport needed.

Closes scylladb/scylladb#24734

* github.com:scylladb/scylladb:
  test.py: use unique hostname for Minio
  test.py: Catch possible exceptions during 3rd party services start
2025-07-01 11:33:19 +03:00
Anna Stuchlik
9234e5a4b0 doc: add the SBOM page and the download link
This commit migrates the Software Bill Of Materials (SBOM) page
added to the Enterprise docs with https://github.com/scylladb/scylla-enterprise/pull/5067.

The only difference is the link to the SBOM files - it was Enterprise SBOM in the Enterprise docs,
while here is a link to the ScyllaDB SBOM.

It's a follow-up of migration to Source Avalable and should be backported
to all Source Available versions - 2025.1 and later.

Fixes https://github.com/scylladb/scylladb/issues/24730

Closes scylladb/scylladb#24735
2025-07-01 11:33:19 +03:00
Calle Wilund
8d37e5e24b encryption_at_rest_test: Add exception handler to ensure proxy stop
If boost test is run such that we somehow except even in a test macro
such as BOOST_REQUIRE_THROW, we could end up not stopping the net proxy
used, causing a use after free.
2025-06-30 11:36:38 +00:00
Calle Wilund
ee98f5d361 encryption: Ensure stopping timers in provider cache objects
utils::loading cache has a timer that can, if we're unlucky, be runnnig
while the encryption context/extensions referencing the various host
objects containing them are destroyed in the case of unit testing.

Add a stop phase in encryption context shutdown closing the caches.
2025-06-30 11:36:38 +00:00
Anna Stuchlik
b61641cf57 doc: remove support for Ubuntu 20.04
Fixes https://github.com/scylladb/scylladb/issues/24564

Closes scylladb/scylladb#24565
2025-06-30 12:33:29 +02:00
Andrei Chekun
c6c3e9f492 test.py: use unique hostname for Minio
To avoid situation that port is occupied on localhost, use unique
hostname for Minio
2025-06-30 12:03:06 +02:00
Andrei Chekun
0ca539e162 test.py: Catch possible exceptions during 3rd party services start
With this change if something will go wrong during starting services,
they are still will be shuted down on the finally clause. Without it can
hang forever
2025-06-30 12:00:23 +02:00
Petr Gusev
35aba76401 LWT: make cas_shard non-optional in sp::cas
We also make sp::cas_shard function local since it's now
not used directly by sp clients.
2025-06-30 10:37:33 +02:00
Petr Gusev
3d262d2be8 LWT: create cas_shard in select_statement
In this commit we create cas_shard in select_statement
and pass it to the sp::query_result function.
2025-06-30 10:37:33 +02:00
Petr Gusev
736fa05b17 LWT: create cas_shard in modification and batch statements
We create cas_shard before the shard check to protect
against concurrent tablet migrations.
2025-06-30 10:37:33 +02:00
Petr Gusev
7e64852bfd LWT: create cas_shard in alternator
We create cas_shard instance in shard_for_execute(). This implies that
the decision about the correct shard was made using the specific
token_metadata_guard, and it remains valid only as long as the guard
is held.

When forwarding a request to another shard, we keep the original
cas_shard alive. This ensures that the target shard
remains a valid owner for the given token.

Fixes scylladb/scylladb#17399
2025-06-30 10:37:33 +02:00
Petr Gusev
deb7afbc87 LWT: use cas_shard in storage_proxy::cas
Take cas_shard parameter in sp::cas and pass token_metadata_guard down to paxos_response_handler.

We make cas_shard parameter optional in storage_proxy methods
to make the refactoring easier. The sp::cas method constructs a new
token_metadata_guard if it's not set. All call sites pass null
in this commit, we will add the proper implementation in the next
commits.
2025-06-30 10:33:17 +02:00
Petr Gusev
94f0717a1e do_query_with_paxos: remove redundant cas_shard check
The same check is done in the sp::cas method.
2025-06-30 10:33:17 +02:00
Petr Gusev
43c4de8ad1 storage_proxy: add cas_shard class
The sp::cas method must be called on the correct shard,
as determined by sp::cas_shard. Additionally, there must
be no asynchronous yields between the shard check and
capturing the erm strong pointer in sp::cas. While
this condition currently holds, it's fragile and
easy to break.

To address this, future commits will move the capture of
token_metadata_guard to the call sites of sp::cas, before
performing the shard check.

As a first step, this commit introduces a cas_shard class
that wraps both the target shard and a token_metadata_guard
instance. This ensures the returned shard remains valid for
the given tablet as long as the guard is held.
In the next commits, we’ll pass a cas_shard instance
to sp::cas as a separate parameter.
2025-06-30 10:33:17 +02:00
Nadav Har'El
7db5e9a3e9 test/cqlpy: reproducer for decimal parsing with very high exponent
This patch adds tests reproducing issue #24581, where Scylla incorrectly
parsed "decimal"-type literals in CQL with very high exponents, near or
above the 32-bit limit.

For example, 1.1234e-2147483647 was incorrectly read as 1.1234E+2147483649,
while it should be (as we explain in comments in the test) an error.

The tests in this patch failed (in multiple checks) before #24581 was
fixed, and pass after it was fixed.

These tests all pass on Cassandra 3, confirming our understanding on the
limits of "decimal" to be correct. But they fail on Cassandra 4 and 5 due
to a regression https://issues.apache.org/jira/browse/CASSANDRA-20723
in Cassandra, that mistakenly limited "decimal" exponents to just 309.

Refs #24581

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24646
2025-06-30 10:37:13 +03:00
Anna Stuchlik
b7683d0eba doc: remove duplicated content
This commit removes the Non-Reserved CQL Keywords and Reserved CQL Keywords pages-keyword
as that content is already covered on the Appendices page.
Redirections are added to avoid 404s for the removed pages.

In addition, the Appendices page title is extended with "Reserved CQL Keywords and Types"
to help users understand what those appendices are about.

Fixes https://github.com/scylladb/scylladb/issues/24319

Closes scylladb/scylladb#24320
2025-06-30 10:30:13 +03:00
Botond Dénes
ee6d7c6ad9 test/boost/memtable_test: only inject error for test table
Currently the test indiscriminately injects failures into the flushes of
any table, via the IO extension mechanism. The tests want to check that
the node correctly handles the IO error by self isolating, however the
indiscriminate IO errors can have unintended consequences when they hit
raft, leading to disorderly shutdown and failure of the tests. Testing
raft's resiliency to IO errors if of course worth doing, but it is not
the goal of this particular test, so to avoid the fallout, the IO errors
are limited to the test tables only.

Fixes: https://github.com/scylladb/scylladb/issues/24637

Closes scylladb/scylladb#24638
2025-06-30 10:08:49 +03:00
Avi Kivity
07c5edcc30 tools: add patchelf utility
We use patchelf to rewrite the dynamic loader (known as the interpreter)
of the binaries we ship, so we can point to our shipped dynamic loader,
which is compatible with our binaries, rather than rely on the distribution's
dynamic loader, which is likely to be incompatible.

Upstream patchelf losing compatibity [1] with Linux 5.17 and below.
This change was also picked up by Fedora 42, so we cannot update the
toolchain to that distribution until we have an alternative.

Here we add a minimal patchelf alternative. It was mostly written by
Claude. It is minimal in that it only supports --set-interpreter and
--print-interpreter, and works well enough for our needs. We still use
the original patchelf for --remove-rpath; this reduces our maintenance
needs.

[1] 43b75fbc9f
[2] 4b015255d1

Closes scylladb/scylladb#24695
2025-06-30 07:24:05 +03:00
Avi Kivity
e2cda38b0f Merge 'alternator: improve, document and test table/index name lengths' from Nadav Har'El
Whereas DynamoDB limits the names of tables, LSIs and GSIs to 255 characters each, Alternator currently has different (and lower) limitations:
 1. A table name must be up to 222 characters.
 2. For a GSI, the sum of the table's and GSI's name length, plus 1, must be up to 222 characters.
 3. For an LSI, the sum of the table's and LSI's name length, plus 2, must be up to 222 characters.

The first patch documents these existing limitations, improves their testing, and fixes a tiny bug found by one of the tests (where UpdateTable adding a GSI's limit testing is off by one).

The second patch unfortunately shows with a reproducer (issue #24598) this limit of 222 is problematic and we may need to lower it: If a user creates a table of length 222 and then enables Alternator streams, Scylla shuts down on an IO error. This will need to be fixed later, but at least this patch properly documents the existing behavior.

No need to backport this patch - it is a very minor improvement that it is unlikely users care about and there is no potential for harm.

Closes scylladb/scylladb#24597

* github.com:scylladb/scylladb:
  test/alternator: reproducer for streams bug with long table name
  alternator: improve, document and test table/index name lengths
2025-06-29 18:53:48 +03:00
Avi Kivity
b33dd2bd7d Merge 'sstables/mx/writer: handle non-full prefix row keys' from Botond Dénes
Although valid for compact tables, non-full (or empty) clustering key prefixes are not handled for row keys when writing sstables. Only the present components are written, consequently if the key is empty, it is omitted entirely.
When parsing sstables, the parsing code unconditionally parses a full prefix.
This mis-match results in parsing failures, as the parser parses part of the row content as a key resulting in a garbage key and subsequent mis-parsing of the row content and maybe even subsequent partitions.

Introduce a new system table: `system.corrupt_data` and infrastructure similar to `large_data_handler`: `corrupt_data_handler` which abstracts how corrupt data is handled. The sstable writer now passes rows such corrupt keys to the corrupt data handler. This way, we avoid corrupting the sstables beyond parsing and the rows are also kept around in system.corrupt_data for later inspection and possible recovery.

Add a full-stack test which checks that rows with bad keys are correctly handled.

Fixes: https://github.com/scylladb/scylladb/issues/24489

The bug is present in all versions, has to be backported to all supported versions.

Closes scylladb/scylladb#24492

* github.com:scylladb/scylladb:
  test/boost/sstable_datafile_test: add test for corrupt data
  sstables/mx/writer: handler rows with empty keys
  test/lib/cql_assertions: introduce columns_assertions
  sstables: add corrupt_data_handler to sstables::sstables
  tools/scylla-sstable: make large_data_handler a local
  db: introduce corrupt_data_handler
  mutation: introduce frozen_mutation_fragment_v2
  mutation/mutation_partition_view: read_{clustering,static}_row(): return row type
  mutation/mutation_partition_view: extract de-ser of {clustering,static} row
  idl-compiler.py: generate skip() definition for enums serializers
  idl: extract full_position.idl from position_in_partition.idl
  db/system_keyspace: add apply_mutation()
  db/system_keyspace: introduce the corrupt_data table
2025-06-29 18:18:36 +03:00
Avi Kivity
48d9f3d2e3 Merge 'mutation: check key of inserted rows' from Botond Dénes
Make sure the keys are full prefixes as it is expected to be the case for rows. At severeal occasions we have seen empty row keys make their ways into the sstables, despite the fact that they are not allowed by the CQL frontend. This means that such empty keys are possibly results of memory corruption or use-after-{free,copy} errors. The source of the corruption is impossible to pinpoint when the empty key is discovered in the sstable. So this patch adds checks for such keys to places where mutations are built: when building or unserializing mutations.

Fixes: https://github.com/scylladb/scylladb/issues/24506

Not a typical backport candidate (not a bugfix or regression fix), but we should still backport so we have the additional checks deployed to existing production clusters.

Closes scylladb/scylladb#24497

* github.com:scylladb/scylladb:
  mutation: check key of inserted rows
  compound: optimize is_full() for single-component types
2025-06-29 18:10:17 +03:00
Pavel Emelyanov
ef396ecf7a api: Reserve resulting vector with schema versions
The get_schema_versions handler gets unordered_map from storage service,
then converts it to API returning type, which is a vector. This vector
can be reserved, the final number of elements is known in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24715
2025-06-29 14:37:45 +03:00
Nadav Har'El
50d370f06e test/alternator: reproducer for streams bug with long table name
The two tests in this patch reproduce issue #24598: When enabling
Alternator streams on an Alternator table with a very long name,
such as the maximum allowed name length 222, the result is an
I/O error and a Scylla shutdown.

The two tests are currently marked "skip", otherwise they would
crash the Scylla being tested.

Refs #24598

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-29 11:40:55 +03:00
Nadav Har'El
0ce0b2934f alternator: improve, document and test table/index name lengths
Whereas DynamoDB limits the names of tables, LSIs and GSIs to 255
characters each, Alternator currently has different (and lower)
limitations:
 1. A table name must be up to 222 characters.
 2. For a GSI, the sum of the table's and GSI's name length, plus 1,
    must be up to 222 characters.
 3. For an LSI, the sum of the table's and LSI's name length, plus 2,
    must be up to 222 characters.

These specific limitations were never documented, so in this patch we
add this information to docs/alternator/compatibility.md.

Moreover, these limitations where only partially tested, so in this patch
we add testing for more cases that we forgot to check - such as length
of LSI names (only GSI were checked before this patch), or adding a
GSI to an existing table. It is important to check all these corner
cases because there is a risk that if we attempt to create a table
without checking its length, we can end up with an I/O error that brings
down Scylla.

In one case - UpdateTable adding a GSI to an existing table - the new
test exposed a trivial bug: Because UpdateTable wants to verify the new
GSI doesn't have the same name as an existing LSI, it mistakenly applied
the LSI's length name limit instead of the GSI's name length limit,
which is one byte less than it should be. So this patch fixes this
trivial bug as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-29 11:40:55 +03:00
Emil Maskovsky
c6307aafd5 test.py: handle cancellation gracefully to avoid TypeError
Previously, if test execution was cancelled, `run_all_tests()` could
return `None`. This caused a `TypeError` when the result was
unconditionally unpacked into `total_tests_pytest, failed_pytest_tests`.

This commit updates the code to handle the cancellation appropriately,
preventing the confusing `TypeError` exception and ensuring clean
cancellation behavior.

Closes scylladb/scylladb#24624
2025-06-27 20:14:35 +03:00
Pavel Emelyanov
23d86ede72 Merge 'audit: introduce debug level logs on happy path' from Dario Mirovic
Audit component defines `audit` logger which it uses only for `error` and `info` logs,
regarding `audit` module initialization and errors during audit log writing.
This change introduces `debug` level logs on the happy path of audit log writes.

Fixes: https://github.com/scylladb/scylladb/issues/23773

No backport needed - this is a small quality-of-life improvement.

Closes scylladb/scylladb#24658

* github.com:scylladb/scylladb:
  audit: change audit test logger level to `debug`
  audit: introduce debug level logs on happy path
2025-06-27 20:10:54 +03:00
Anna Stuchlik
2367330513 doc: remove OSS mention from the SI notes
This commit removes a confusing reference to an Open Source version
form the Local Secondary Indexes page.

Fixes https://github.com/scylladb/scylladb/issues/24668

Closes scylladb/scylladb#24673
2025-06-27 20:07:51 +03:00
Anna Stuchlik
7537f5f260 doc: fix the headings in the Admin Guide
This commit fixes incorrect headings in the Admin Guide and the files
that are included in that guide.

The purpose is to properly organize the content and improve the search,
as well as prevent potential build problems caused by a poor heading organization.

Fixes https://github.com/scylladb/scylladb/issues/24441

Closes scylladb/scylladb#24700
2025-06-27 20:07:09 +03:00
Dario Mirovic
ec6249b581 audit: change audit test logger level to debug
Audit module tests should show the `debug` level messages.
This change makes audit_test.py `audit` module log level to `debug`.

Closes scylladb/scylladb#23773
2025-06-27 16:27:33 +02:00
Dario Mirovic
666364f651 audit: introduce debug level logs on happy path
Audit component defines `audit` logger which it uses only for `error` and `info` logs,
regarding `audit` module initialization and errors during audit log writing.
This change introduces `debug` level logs on the happy path of audit log writes.

Ref: scylladb/scylladb#23773
2025-06-27 16:27:27 +02:00
Botond Dénes
495f607e73 test/cluster/test_read_repair: write 100 rows in trace test
This test asserts that a read repair really happened. To ensure this
happens it writes a single partition after enabling the database_apply
error injection point. For some reason, the write is sometimes reordered
with the error injection and the write will get replicated to both nodes
and no read repair will happen, failing the test.
To make the test less sensitive to such rare reordering, add a
clustering column to the table and write a 100 rows. The chance of *all*
100 of them being reordered with the error injection should be low
enough that it doesn't happen again (famous last words).

Fixes: #24330

Closes scylladb/scylladb#24403
2025-06-27 16:23:08 +03:00
Pavel Emelyanov
4c0154f156 Merge 'test.py: enhance allure reporting' from Andrei Chekun
Add run ID for process output file to be not overwritten in the next case: first run failed, second passed. They are using the same name, so the second run will overwrite and delete the file. This will help to investigate in case of C++ test fails
Add attaching Scylla log files to allure report in case test failed. This is an alternative for link in JUnit report that exists in CI. That change will help to investigate the cluster tests fails. Example can be found in the failed [job](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2980/allure/).

Backport is not needed, this is only framework enhancements

Closes scylladb/scylladb#24677

* github.com:scylladb/scylladb:
  test.py: Attach node logs in allure report in case of fail
  test.py: Add run id to the boost output file
2025-06-27 16:22:03 +03:00
Botond Dénes
e715a150b9 tools/scylla-nodetool: backup: add --move-files parameter
Allow opting in for backup to move the files instead of copying them.

Fixes: https://github.com/scylladb/scylladb/issues/24372

Closes scylladb/scylladb#24503
2025-06-27 16:21:39 +03:00
Piotr Dulikowski
9d70e7a067 Merge 'docs: document the new recovery procedure' from Patryk Jędrzejczak
We replace the documentation of the old recovery procedure with the
documentation of the new recovery procedure.

The new recovery procedure requires the Raft-based topology to be
enabled, so to remove the old procedure from the documentation,
we must assume users have the Raft-based topology enabled.
We can do it in 2025.2 because the upgrade guides to 2025.1 state that
enabling the Raft-based topology is a mandatory step of the upgrade.
Another reminder is the upgrade guides to 2025.2.

Since we rely on the Raft-based topology being enabled, we remove the
obsolete parts of the documentation.

We will make the Raft-based topology mandatory in the code in the
future, hopefully in 2025.3. For this reason, we also don't touch the
dev docs in this PR.

Fixes scylladb/scylladb#24530

Requires backport to 2025.2 because 2025.2 contains the new recovery
procedure.

Closes scylladb/scylladb#24583

* github.com:scylladb/scylladb:
  docs: rely on the Raft-based topology being enabled
  docs: handling-node-failures: document the new recovery procedure
2025-06-26 17:07:37 +02:00
Gleb Natapov
5f953eb092 storage_proxy: retry paxos repair even if repair write succeeded
After paxos state is repaired in begin_and_repair_paxos we need to
re-check the state regardless if write back succeeded or not. This
is how the code worked originally but it was unintentionally changed
when co-routinized in 61b2e41a23.

Fixes #24630

Closes scylladb/scylladb#24651
2025-06-26 17:06:02 +02:00
Andrei Chekun
2c726c5074 test.py: Attach node logs in allure report in case of fail
Currently, allure report have no nodes logs in case of fail, this will
allow to view the logs in one place without going anywhere else.
2025-06-26 15:37:33 +02:00
Piotr Dulikowski
2f7ed8b1d4 Merge 'Fix for cassandra role gets recreated after DROP ROLE' from Marcin Maliszkiewicz
This patchset fixes regression introduced by 7e749cd848 when we started re-creating default superuser role and password from the config, even if new custom superuser was created by the user.

Now we'll check, first with CL LOCAL_ONE if there is a need to create default superuser role or password, confirm
it with CL QUORUM and only then atomically create role or password.

If server is started without cluster quorum we'll skip creating role or password.

Fixes https://github.com/scylladb/scylladb/issues/24469
Backport: all versions since 2024.2

Closes scylladb/scylladb#24451

* github.com:scylladb/scylladb:
  test: auth_cluster: add test for password reset procedure
  auth: cache roles table scan during startup
  test: auth_cluster: add test for replacing default superuser
  test: pylib: add ability to specify default authenticator during server_start
  test: pylib: allow rolling restart without waiting for cql
  auth: split auth-v2 logic for adding default superuser password
  auth: split auth-v2 logic for adding default superuser role
  auth: ldap: fix waiting for underlying role manager
  auth: wait for default role creation before starting authorizer and authenticator
2025-06-26 14:36:25 +02:00