Commit Graph

47142 Commits

Author SHA1 Message Date
David Garcia
209ea2ea27 docs: update issues label
Closes scylladb/scylladb#23304
2025-03-20 17:46:58 +03:00
Kefu Chai
c37149d106 test: stop using seastar::at_exit()
seastar::at_exit() was marked deprecated recently. so let's use
the recommended approach to perform cleanups.

following tests were updated in this changes

- scylla perf-tablets: tested with
  scylla perf-tablets
- scylla perf-row-cache-update: tested with
  scylla perf-row-cache-update
- scylla perf-fast-forward: tested with
  scylla perf-fast-forward --populate --run-tests small-partition-skips \
    --smp 1
  scylla perf-fast-forward --run-tests small-partition-skips \
    --smp 1
- scylla perf-load-balancing: tested with
  scylla perf-load-balancing --nodes 3 --tablets1 16 --tablets2 16 --rf1 3 --rf2 3 --shards 16
- unit/row_cache_stress_test: tested with
  row_cache_stress_test --seconds 10
- perf/perf_cache_eviction: tested with
  ./perf_cache_eviction --seconds 1 --smp 1
- perf/perf_row_cache_reads: tested with
  ./perf_row_cache_reads

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23356
2025-03-20 17:44:57 +03:00
Ernest Zaslavsky
2fb5c7402e s3_client: Rearrange credentials providers chain
As the IAM role is not configured to assume a role at this moment, it
makes sense to move the instance metadata credentials provider up in
the chain. This avoids unnecessary network calls and prevents log
clutter caused by failure messages.

Closes scylladb/scylladb#23360
2025-03-20 17:43:04 +03:00
Pavel Emelyanov
23089e1387 Merge 'Enhance S3 client robustness' from Ernest Zaslavsky
This PR introduces several key improvements to bolster the reliability of our S3 client, particularly in handling intermittent authentication and TLS-related issues. The changes include:

1. **Automatic Credential Renewal and Request Retry**: When credentials expire, the new retry strategy now resets the credentials and set the client to the retryable state, so the client will re-authenticate, and automatically retry the request. This change prevents transient authentication failures from propagating as fatal errors.
2. **Enhanced Exception Unwrapping**: The client now extracts the embedded std::system_error from std::nested_exception instances that may be raised by the Seastar HTTP client when using TLS. This allows for more precise error reporting and handling.
3. **Expanded TLS Error Handling**: We've added support for retryable TLS error codes within the std::system_error handler. This modification enables the client to detect and recover from transient TLS issues by retrying the affected operations.

Together, these enhancements improve overall client robustness by ensuring smoother recovery from both credential and TLS-related errors.

No backport needed since it is an enhancement

Closes scylladb/scylladb#22150

* github.com:scylladb/scylladb:
  aws_error: Add GNU TLS codes
  s3_client: Handle nested std::system_error exceptions
  s3_client: Start using new retry strategy
  retry_strategy: Add custom retry strategy for S3 client
  retry_strategy: Make `should_retry` awaitable
2025-03-20 16:52:20 +03:00
Pavel Emelyanov
339a849f13 transport: Remove connection::make_client_key()
It's effectively unused, there's one place where connection initializes
the client_data object using this helper, but that initialization looks
better without it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23321
2025-03-20 10:22:05 +01:00
Calle Wilund
5cc3fc4f14 cluster/test_encryption: bring test from enterprise (and enable)
Fixes scylladb/scylla-enterprise#5262

Part of the source-available code migration from scylla-enterprise.git
to scylla.git.

Original comment: topology_custom: add test_file_streaming_respects_encryption

Reproducer for issue scylladb/scylla-enterprise#4246.

Closes scylladb/scylladb#23320
2025-03-20 10:07:16 +02:00
Kefu Chai
ebf9125728 storage_proxy: Prevent integer overflow in abstract_read_executor::execute
Fix UBSan abort caused by integer overflow when calculating time difference
between read and write operations. The issue occurs when:
1. The queried partition on replicas is not purgeable (has no recorded
   modified time)
2. Digests don't match across replicas
3. The system attempts to calculate timespan using missing/negative
   last_modified timestamps

This change skips cross-DC repair optimization when write timestamp is
negative or missing, as this optimization is only relevant for reads
occurring within write_timeout of a write.

Error details:
```
service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80
Aborting on shard 1, in scheduling group sl:default
```

Related to previous fix 39325cf which handled negative read_timestamp cases.

Fixes #23314
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23359
2025-03-20 10:05:42 +02:00
Botond Dénes
d06bc27979 Merge 'Don't export string filenames from sstable' from Pavel Emelyanov
There are several sstring-returning methods on class sstable that return paths to files. Mostly these are used to print them into logs, sometimes are used to be put into exception messages. And there are places that use these strings as file names. Since now sstables can also be stored on S3, generic code shouldn't consider those strings as on disk file names.

Other than that, even when the methods are used to put component names into logs, in many cases these log messages come with debug or trace level, so generated strings are immediately dropped on the floor, but generating it is not extremely cheap. Code would benefit from using lazily-printed names.

This change introduces the component_name struct that wraps sstable reference and component ID (which is a numerical enum of several items). When printed, the component_name formatter calls the aforementioned filename generation, thus implementing lazy printing. And since there's no automatic conversion of component_name-s into strings, all the code that treats them as file paths, becomes explicit.

refs: #14122 (previous ugly attempt to achieve the same goal)

Closes scylladb/scylladb#23194

* github.com:scylladb/scylladb:
  sstable: Remove unused malformed_sstable_exctpion(string filename)
  sstables: Make filename() return component_name
  sstables: Make file_writer keep component_name on board
  sstables: Make get_filename() return component_name
  sstables: Make toc_filename() return component_name
  sstables: Make sstable::index_filename() return component_name
  sstables: Introduce struct component_name
  sstables: Remove unused sstable::component_filenames() method
  sstables: Do not print component filenames on load-and-stream wrap-up
  sstables: Explicitly format prefix in S3 object name making
  sstables: Don't include directory name in exception
  sstables: Use fmt::format instead of string concatenation
  sstables: Rename filename($component) calls to ${component}_filename()
  sstables: Rename local filename variable to component_name
2025-03-20 09:51:03 +02:00
Avi Kivity
a62ab824e6 schema: deprecate schema_extension
schema_extension allows making invisible changes to system_schema
that evade upgrade rollback tests. They appear in system_schema
as an encoded blob which reduces serviceability, as they cannot
be read.

Deprecate it and point users to adding explicit columns in scylla_tables.

We could probably make use of the data structure, after we teach it
to encode its payload into proper named and typed columns instead of
using IDL.

Closes scylladb/scylladb#23151
2025-03-19 20:36:16 +02:00
Kefu Chai
8fdaaf6491 service/storage_proxy: Improve digest comparison
Previously, the code used a find_if to compare each digest to the first
one to check for any mismatches. This was less readable. This change
replaces that with `std::ranges::all_of`, which checks if all elements
in the range are equal to the first digest, improving readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23332
2025-03-19 18:21:14 +03:00
Nadav Har'El
317de64281 test/alternator: enable debugging output during Python crashes
For a long time now, we've been seeing (see #17564), once in a while,
Alternator tests crashing with the Python process getting killed on
SIGSEGV after the tests have already finished successfully and all
pytest had to do is exit. We have not been able to figure out where the
bug is. Unfortunately, we've never been able to reproduce this bug
locally - and only rarely we see it in CI runs, and when it happens
we don't any information on why it happend.

So the goal of this patch is to print more information that might
hopefully help us next time we see this problem in CI (this patch
does NOT fix the bug). This patch adds to test/alternator's conftest.py
a call to faulthandler.enable(). This traps SIGSEGV and prints a stack
trace (for each thread, if there are several) showing what Python was
trying to do while it is crashing. Hopefully we'll see in this output
some specific cleanup function belonging to boto3 or urllib or whatever,
and be able to figure out where the bug is and how to avoid it.

We could have added this faulthandler.enable() call to the top-level
conftest.py or to test.py, but since we only ever had this Python
crash in Alternator tests, I think it is more suitable that we limit
this desperate debugging attempt only to Alternator tests.

Refs #17564

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23340
2025-03-19 18:18:51 +03:00
Pavel Emelyanov
73187a2e19 Merge 'mutation/mutation_consumer_concepts: simplify consumer hierarchy' from Botond Dénes
The reader consumer concept hierarchy is a sprawling confusing jungle of deeply nested concepts. Looking at `FlattenedConsumer[V2]` -- the subject of this PR: this consumer is defined in terms of the `StreamedMutationConsumer[V2]` which in terms is defined in terms of the `FragmentConsumer[V2]`.
This amount of nesting makes it really hard to see what a concept actually comes down to: made even more difficult by the fact that the concepts are scattered across two header files.
In theory, this nesting allows for greater flexibility: some code can use a lower lever concept directly while it can also serve as the basis for the higher lever concepts. But the fact of the matter is that none of the lower level concepts are used directly, so we pay the price in hard-to-follow code for no benefit.

This PR cuts down the complexity by folding up the entire hierarchy into the top-level `FlattenedConsumer[V2]` and `FlatteneConsumerReturning[V2]` concepts.
Doing this immediately reveals just how similar the two major consumer concepts (`FlattenedConsumer[V2]` and `MutationFragmentConsumer[V2]`) supported by `mutation_reader` are. In a follow-up PR, we will attempt to unify the two.

Refactoring, no backport needed.

Closes scylladb/scylladb#23344

* github.com:scylladb/scylladb:
  mutation: fold FragmentConsumer[V2] into FlattenedConsumer[V2]
  mutation: fold StreamedMutationConsumer[V2] into FlattenedConsumer[V2]
  test/lib/fragment_scatterer: s/StreamedMutationConsumer/FlattenedConsumer/
2025-03-19 15:43:00 +03:00
Pavel Emelyanov
a408a7abe1 sstable: Remove unused malformed_sstable_exctpion(string filename)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:03:29 +03:00
Pavel Emelyanov
f06cc32812 sstables: Make filename() return component_name
Similarly to toc_, index_ and data filenames, make the generic component
name getter return back not string, but a wrapper object. Most of
callers are log messages and exception generations. Other than that
there are tests, filesystem storage driver and few more places in
generic code who "know" that they work with real files, so make them use
explicit fmt::to_string().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:03:29 +03:00
Pavel Emelyanov
68c41f0459 sstables: Make file_writer keep component_name on board
The class in question is a wrapper around output_stream that writes,
flushes and closes the stream in async context. For logging it also
keeps the component filename on board, and now it's good time to patch
it and keep the component_filename instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:03:29 +03:00
Pavel Emelyanov
1ba91e28cb sstables: Make get_filename() return component_name
Similarly to previous patches -- mostly the result is used as log
argument. The remaining users include

- scylla sstable tool that dumps component names to json output
- API endpoint that returns component names to user
- tests

these are all good to explicitly convert component_names to strings.

There are few more places that expect strings instead of component name
objects. For now they also use fmt::to_string() explicitly, partially it
will be fixed later, mostly -- as future follow-ups.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:03:29 +03:00
Pavel Emelyanov
0cdeed858c sstables: Make toc_filename() return component_name
Most of the callers use the returned value as log message parameter,
some construct malformed_sstable_exception that was prepared by previous
patch.

The remaining callers explicitly use fmt::to_string(), these are

- pending deletion log creation
- filesystem storage code
- tests
- stream-blob code that re-loads sstable

All but the last one are OK to use string toc name, the last one is not
very correct in its usage of toc_filename string, but it needs more care
to be fixed properly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:03:29 +03:00
Pavel Emelyanov
80e0030613 sstables: Make sstable::index_filename() return component_name
Most of the method callers use it as log parameter. There are few more
places that push it to malformed_sstable_exception, which immediately
converts it to string, so this patch makes the exception be constructed
with the component_name either.

And there's one more place that passes this string to file_writer
constructor. For now, convert it to string explicitly, but next patches
will fix that place to use pure component_name too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 13:01:23 +03:00
Pavel Emelyanov
dbb9ee15c1 sstables: Introduce struct component_name
The structure wraps const reference to sstable and component_name value
(it's an enum of several elements). It also has a formatter so that it
can be directly printed in logs (main usage) as well as converted to
strings (auxiliary and discourage usage).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
aba400f5d9 sstables: Remove unused sstable::component_filenames() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
24e5c30cc8 sstables: Do not print component filenames on load-and-stream wrap-up
When load-and-stream finishes it may call sstable::unlink() method to
drop the loaded (and streamed) sstable. Before calling it it prints a
log message about its intention that includes component_filenames()
vector. This log message is ugly in several ways.

First, it prints only recognized components, while unlink() method
unlinks all of them, so it's sort of misleading (it doesn't seem that
anyone ever read this message IRL though)

Next, that's the only place that is _that_ verbose about sstable
unlinking. "Common" unlinking paths don't print that much info.

Finally, the log message happen in debug level, so it's hardly ever
appears in any logs, but collecting several filenames takes time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
fb2bd91009 sstables: Explicitly format prefix in S3 object name making
Sometimes a component object name looks like
s3://bucket/prefix/component. For that the path formatting code formats
bucket name with the result of sstable->filename() invocation. This
patch changes it to format bucket name, prefix itself and
sstable->component_filename().

The change is idempotent, as sstable::filename() just concatenates prefix
with sstable::component_filename(). This change will help to remove the
former method from sstable soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
f212b5efa9 sstables: Don't include directory name in exception
When filesystem storage throws an exception about failure to create
components hardlinks, it includes three paths into it -- source file
name, destination file name and the directory name. The directory name
is excessive, source file name already has it. Also, this change will
make it possible to remove one of malformed_sstable_exception
constructors soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
a8bc81eb3c sstables: Use fmt::format instead of string concatenation
There are some places that concatentate filenames with something else to
get different filename (tool does it) or message for exception
(read_toc() helper). This patch uses fmt::format() instead to facilitate
future patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
dcc9167734 sstables: Rename filename($component) calls to ${component}_filename()
There's a generic sstable::filename(component_type) method that returns
a file name for the given component. For "popular" components, namely
TOC, Data and Index there are dedicated sstable methods to get their
names. Fix existing callers of the generic method to use the former.
It's shorter, nicer and makes further patching simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Pavel Emelyanov
e6898a8854 sstables: Rename local filename variable to component_name
This is to be consistent with future changes and not to bloat them with
extra renames

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:20 +03:00
Kefu Chai
1ab2b7e7a0 tree: fix misspellings
these two misspellings were flagged by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23357
2025-03-19 09:13:20 +02:00
Botond Dénes
8f0d0daf53 Merge 'repair: allow concurrent repair and migration of two different tablets' from Aleksandra Martyniuk
Do not hold erm during repair of a tablet that is started with tablet
repair scheduler. This way two different tablets can be repaired
and migrated concurrently. The same tablet won't be migrated while
being repaired as it is provided by topology coordinator.

Use topology_guard to maintain safety.

Fixes: https://github.com/scylladb/scylladb/issues/22408.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

Closes scylladb/scylladb#22842

* github.com:scylladb/scylladb:
  test: add test to check concurrent tablets migration and repair
  repair: do not hold erm for repair scheduled by scheduler
  repair: get total rf based on current erm
  repair: make shard_repair_task_impl::erm private
  repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
  repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
  repair: pass session_id to repair_writer_impl::create_writer
  repair: keep materialized topology guard in shard_repair_task_impl
  repair: pass session_id to repair_meta
2025-03-19 08:55:24 +02:00
Kefu Chai
aca00118fb service: fix misspellings
these misspellings were flagged by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23334
2025-03-18 22:21:45 +02:00
Piotr Dulikowski
2ca1c0b6f9 Merge 'introduce the new Raft-based recovery procedure for group 0 majority loss' from Patryk Jędrzejczak
This PR introduces the new Raft-based recovery procedure for group 0
majority loss.

The Raft-based recovery procedure works with tablets. The old
gossip-based recovery procedure does not because we have no code
for tablet migrations after the gossip-based topology changes.

The Raft-based procedure requires the Raft-based topology to be
enabled in the cluster. If the Raft-based topology is not enabled, the
gossip-based procedure must be used.

We will be able to get rid of the gossip-based procedure when we make
the Raft-based topology mandatory (we can do both in the same version,
2025.2 is the plan). Before we do it, we will have to keep both procedures
and explain when each of them should be used.

The idea behind the new procedure is to recreate group 0 without
touching the topology structures. Once we create a new group 0, we
can remove all dead nodes using the standard `removenode` and
`replace` operations.

For the procedure to be safe, we must ensure that each member of the
new group 0 moves to the same initial group 0 state. Also, the only safe
choice for the state is the latest persistent state available among the
live nodes.

The solution to the problem above is to ensure that the leader of the new
group 0 (called the recovery leader) is one of the nodes with the latest
state available. Other members will receive the snapshot from the
recovery leader when they join the new group 0 and move to its state.

Below is the shortened description of the new recovery procedure from
the perspective of the administrator. For the full description, refer to the
design document.
1. Find the set of live nodes.
2. Kill any live node that shouldn't be a member of the new group 0.
3. Ensure the full network connectivity between live nodes.
4. Rolling restart live nodes to ensure they are healthy and ready for
recovery.
5. Check if some data could have been lost. If yes, restore it from
backup after the recovery procedure.
6. Find the recovery leader (the node with the largest `group0_state_id`).
7. Remove `raft_group_id` from `system.scylla_local` and truncate
`system.discovery` on each live node.
8. Set the new scylla.yaml parameter, `recovery_leader`, to Host ID of the
recovery leader on each live node.
9. Rolling restart all live nodes, but the recovery leader must be
restarted first.
10. Remove all dead nodes using `removenode` or `replace`.
11. Unset `recovery_leader` on all nodes.
12. Delete data of the old group 0 from  `system.raft`,
`system.raft_snaphots`, and `system.raft_snapshot_config`.

In the future, we could automate some of these steps or even introduce
a tool that will do all (or most) of them by itself. For now, we are fine with
a procedure that is reliable and simple enough.

This PR makes using 2025.1 with tablets much safer. We want to
backport it to 2025.1. We will also want to backport a few follow-ups.

Fixes scylladb/scylladb#20657

Closes scylladb/scylladb#22286

* github.com:scylladb/scylladb:
  test: mark tests with the gossip-based recovery procedure
  test: add tests for the Raft-based recovery procedure
  test: topology: util: fix the tokens consistency check for left nodes
  test: topology: util: extend start_writes
  gossip: allow group 0 ID mismatch in the Raft-based recovery procedure
  raft_group0: modify_raft_voter_status: do not add new members
  treewide: allow recreating group 0 in the Raft-based recovery procedure
2025-03-18 19:10:56 +01:00
Yaron Kaikov
b375222408 ./github/scripts/auto-backport.py: don't remove backport label when backport process has an error
Today, when the `Fixes` prefix is missing or the developer is not a collaborator with `scylladbbot` we remove the backport labels to prevent the process from starting and notifying the developers.

Developers are worried that removing these backport labels will cause us to forget we need to do these backports. @nyh suggested to add a `scylladbbot/backport_error` label instead

Applied those changes, so when a `Fixes` prefix is missing we will add a `scylladbbot/backport_error` label and stop the process

When a user doesn't accept the invite we will still open the PR but he will not be assigned and will not be able to edit the branch when we have conflicts

Fixes: https://github.com/scylladb/scylla-pkg/issues/4898
Fixes: https://github.com/scylladb/scylla-pkg/issues/4897

Closes scylladb/scylladb#23259
2025-03-18 16:19:09 +02:00
Pavel Emelyanov
420b5bee20 test/s3: Increase boost/s3_test log levels
When something goes wrong, it's impossible to find anyting out without
s3 and http logs, so increase them for boost tests.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23245
2025-03-18 15:59:05 +02:00
Botond Dénes
a2d0d7b9a0 mutation: fold FragmentConsumer[V2] into FlattenedConsumer[V2]
FragmentConsumer[V2] also has no direct users, so fold it into
FlattenedConsumer[V2] as well. With this, FlattenedConsumer[V2] has a
nice and simple definition, with a single nesting level required due to
the return-type flexibility.
2025-03-18 09:24:49 -04:00
Botond Dénes
8768e2e08e mutation: fold StreamedMutationConsumer[V2] into FlattenedConsumer[V2]
No code uses StreamedMutationConsumer[V2] directly, so let's take this
opportunity to reduce the jungle of consumer concepts.
2025-03-18 09:24:44 -04:00
Botond Dénes
969b07fdfd test/lib/fragment_scatterer: s/StreamedMutationConsumer/FlattenedConsumer/
The class actually implements the FlattenedConsumer, so fix the comment.
This eliminates the only reference to the StreamedMutationConsumer
concept.
2025-03-18 07:57:04 -04:00
Avi Kivity
9867129c7b Update seastar submodule
* seastar 412d058cf9...2f13c461bb (2):
  > smp: prefaulter: don't leave zombie worker threads
Fixes #23316
  > demos/tcp_sctp_server_demo:  Modernize with seastar::async and proper teardown

Closes scylladb/scylladb#23317
2025-03-18 13:36:05 +02:00
Botond Dénes
2795d83b32 Merge 'commitlog: Serialize file deletion and distribute replayed segments' from Calle Wilund
Fixes #23017

When deleting segments while our footprint is over the limit, mainly when recycling/deleting segments after replay (recover boot) we can cause two deletion passes to be running at the same time. This is because delete is triggered by either

a.) replay release
b.) timer check (explicit)
c.) timer initiated flush callback

where the last one is in fact not even waited for. If we are considering many files for delete/recycle, we can, due to task switch, end up considering segments ok to keep, in parallel, even though one of them should be deleted. The end result will be us keeping one more segment than should be allowed.

Now, eventually, this should be released, once we do deletion again, but this can take a while.

Solution is to simply ensure we serialize deletion. This might cause some delay in processing cycles for recycle, but in practice, this should never happen when we are in fact under pressure.

As noted in the issue above, when replaying a large commitlog from an unclean node, we can cause shard 0
db commitlog to reach footprint limit, and then remain there (because we never release segments lower than limit). This is wasteful with diskspace. But deleting segments early here is also wasteful; A better solution is
to simply give the segments to all CL shards, thus distributing the available space.

Closes scylladb/scylladb#23150

* github.com:scylladb/scylladb:
  main/commitlog: wait for file deletion and distribute recycled segments to shards
  commitlog: Serialize file deletion
2025-03-18 11:47:17 +02:00
Avi Kivity
176bb464a2 github: error if we see #include "seastar/..."
Seastar is a system library from ScyllaDB's persepective and
so should use angle brackets for #include statements.

Closes scylladb/scylladb#23308
2025-03-17 21:56:48 +02:00
Ernest Zaslavsky
08b9e4d87b aws_error: Add GNU TLS codes
Add GNU TLS error codes to std::system_error handler since we can start getting these once they seep from seastar's http client
2025-03-17 16:38:14 +02:00
Ernest Zaslavsky
012f0e6d8c s3_client: Handle nested std::system_error exceptions
Enhance error handling by detecting and processing std::system_error exceptions
nested within std::nested_exception. This improvement ensures that system-level
errors wrapped in the exception chain are properly caught and managed, leading
to more robust error reporting and recovery.
2025-03-17 16:38:14 +02:00
Ernest Zaslavsky
367140a9c5 s3_client: Start using new retry strategy
* Previously, token expiration was considered a fatal error. With this change,
the `s3_client` uses new retry strategy that is trying to renew expired
creds
* Added related test to the `s3_proxy`
2025-03-17 16:38:14 +02:00
Ernest Zaslavsky
ed09614c27 retry_strategy: Add custom retry strategy for S3 client
Introduced a new retry strategy that extends the default implementation.
The should_retry method is overridden to handle a specific case for expired credential tokens.
When an expired token error is detected, the credentials are reset so it is expected that the client will re-authenticates, and the
original request is retried.
2025-03-17 16:38:14 +02:00
Ernest Zaslavsky
26062c65e4 retry_strategy: Make should_retry awaitable 2025-03-17 16:36:26 +02:00
Avi Kivity
0e4b303339 tools: toolchain: regenerate for python3-pytest-asyncio 0.24
Fixes a bug related to load_scope="module".

python-driver fixed to version 3.28.2, as it looks like
3.29.0 regressed TLS handling [1]. In any case tools/cqlsh
fixes it to 3.28.2.

Optimized clang from

 https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
 https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Ref #22960.

Fixes #23213

[1] https://github.com/scylladb/python-driver/issues/456

Closes scylladb/scylladb#23236
2025-03-17 15:41:55 +02:00
Botond Dénes
fda3486770 Merge 'Remove some excessive ks:cf -> table_id conversions in API and schema_tables' from Pavel Emelyanov
Actually, the main goal of this PR was to remove parse_tables() helpers from api/ in favor of more flexible (yet same complex) parse_table_infos(), but it turned out that it also saves some lookups in database maps.

There are several places in API and schema_tables that have table_id at hand, but at some point drop it and carry keyspace and table names over to a place that maps ks:cf back to table_id and then uses it to find the table object. This PR keeps the table_id with the help of table_info struct in those places. This change allows removing the aforementioned parse_table() helpers from api/ and also saves few lookups in database maps.

Removing the parse_tables() from api/ is the continuation of previous effort that reduces the set of helpers in api/ code that help handlers "parse" keyspaces and tables names see #22742 #21533

Closes scylladb/scylladb#23216

* github.com:scylladb/scylladb:
  api: Remove the remaining parse_tables() overload
  database: Sanitize flush_tables_on_all_shards()
  schema_tables: Remove all_table_names()
  database: Make tables flushing helper use table_info-s, not names
  api: Make keyspace flush endpoint use parse_table_infos() (and a bit more)
  schema_tables,client_state: Switch to using all_table_infos()
  schema_tables: Tune up some methods to benefit from table_infos
  schema_tables: Introduce all_table_infos()
2025-03-17 15:40:41 +02:00
Pavel Emelyanov
6217124d1d s3/client: Make "expected" reply status truly optional
Currently when a client::make_request() is called it can pass
std::optional<status> argument indicating which status it expects from
server. In case status doesn't match, the request body handler won't be
called, the request will fail with unexpected status exception.

However, disengaged expected implicitly means, that the requestor
expects the OK (200) status. This makes it impossible to make a query
which return status is not known in advance and it's up to the handler
to check it.

Lower level http client allows disengaged expected with the described
semantics -- handler will check status its own. This behavios for s3
client is needed for GET request. Server can respond with OK or partial
content status depending on the Range header. If the header is absent or
is large enough for the requested object to fit into it, the status
would be OK, if the object is "trimmed" the status is partial content.
In the end of the day, requestor cannot "guess" the returning status in
advance and should check it upon response arrival.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23243
2025-03-17 15:34:58 +02:00
Botond Dénes
afa305ffb4 Merge 'perf/perf_sstable: stop using at_exit() ' from Kefu Chai
`seastar::at_exit()` was marked deprecated recently. so let's use the recommended approach to perform cleanups.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#23253

* github.com:scylladb/scylladb:
  perf/perf_sstable: fix the indent
  perf/perf_sstable: stop using at_exit()
2025-03-17 15:30:10 +02:00
Andrei Chekun
d68e54c26d test.py: Remove reuse cluster in cluster tests
Pool is not aware of the cluster configuration, so it can return cluster
to the test that is not suitable for it. Removing reuse will remove such
possibility, so there will be less flaky tests.

Closes scylladb/scylladb#23277
2025-03-17 15:27:59 +02:00
Calle Wilund
1525cb2dba main/commitlog: wait for file deletion and distribute recycled segments to shards
Refs #23017

When replaying a large commitlog from an unclean node, we can cause shard 0
db commitlog to reach footprint limit, and then remain there (because we
never release segments lower than limit). This is wasteful with diskspace.
But deleting segments early here is also wasteful; A better solution is
to simply give the segments to all CL shards, thus distributing the available
space.

v2:
* Do segement distribution using ranges. go c++23
2025-03-17 12:09:00 +00:00
Calle Wilund
4ed81e05bf commitlog: Serialize file deletion
Fixes #23017

When deleting segments while our footprint is over the limit,
mainly when recycling/deleting segments after replay (recover
boot) we can cause two deletion passes to be running at the same
time. This is because delete is triggered by either

a.) replay release
b.) timer check (explicit)
c.) timer initiated flush callback

where the last one is in fact not even waited for. If we are
considering many files for delete/recycle, we can, due to task
switch, end up considering segments ok to keep, in parallel,
even though one of them should be deleted. The end result
will be us keeping one more segment than should be allowed.
Now, eventually, this should be released, once we do deletion
again, but this can take a while.

Solution is to simply ensure we serialize deletion. This might
cause some delay in processing cycles for recycle, but in
practice, this should never happen when we are in fact under
pressure.

Small unit test included.
2025-03-17 12:09:00 +00:00