Commit Graph

2047 Commits

Author SHA1 Message Date
Ernest Zaslavsky
e56081d588 treewide: seastar module update and fix broken rest client
start using `write_body` in `rest/client` to properly set headers due to changes applied to seastar's http client

Seastar module update
```
b6be384e Merge 'http: generalize Content-Type setting' from Nadav Har'El
74472298 http: generalize request's Content-Type setting
9fd5a1cc http: generalize reply's Content-Type setting
a2665f38 memory: Remove deprecated enable_abort_on_allocation_failure()
d2a5a8a9 resource.cc: Remove some dead code
7ad9f424 http: Add support of multiple key repetitions for the request
a636baca task: Move task::get_backtrace() definition in its class
a0101efa Fixed "doxygen" spelling in error message
db969482 Merge 'http/reply: introduce set_cookie()' from Botond Dénes
5357b434 http/reply: introduce set_cookie()
1ddcf05f http/reply: make write_reply*() public
4b782d73 http/connection: start_response(): fix indentation
720feca0 http/reply: encapsulate reply writing in write_reply()
3e19917d Merge 'exceptions: log thrown and propagated exception with distinct log levels' from Botond Dénes
db9aea93 Merge 'Correctly wrap up abandoned yielding directory lister' from Pavel Emelyanov
dbb2bf3f test: Add test for input_stream::read_exactly()
a5308ec9 file/directory_lister: Correctly wrap up fallback generator
4f0811f4 file/directory_lister: Convert on-stack queue to shared pointer
59801da7 tests: Add directory lister early drop cases
33233032 http/reply: s/write_reply_to_connection/write_reply/
69b93620 http/reply: write_reply_{to_connection,headers}(): pass output stream
56e9bda7 test: Convert directory_test into seastar test
96782358 Merge 'Improve io_tester's seqwrite and append workloads' from Pavel Emelyanov
8b46e3d4 SEASTAR_ASSERT: assert to stderr and flush stream
3370e22a tutorial.md: use current_exception_as_future()
e977453a Add fixture support for seastar::testing
3e70d7f7 io_tester: Do not set append_is_unlikely unconditionally
2a4ae7b4 io_tester: Count file size overflows
5e678bb5 io_tester: Tuneup size overflow check
d5dad8ce io_tester: Move position management code to io_class_data
5586a056 io_tester: Rename seqwrite -> overwrite
92df2fb2 io_tester: Relax return value of create_and_fill_file()
03d9500d io_tester: Dont fill file for APPEND
d6844a7b io_tester: Indentation fix after previous patch
fb9e0088 io_tester: Coroutinize create_and_fill_file()
2f802f57 exceptions: log thrown and propagated exception with distinct log levels
4971fa70 util: move log-level into own header
39448fc1 Merge 'Fix and tune http::request setup by client' from Pavel Emelyanov
52d0c4fb iostream: Move output_stream::write(scattered_message) lower
7a52f734 Merge 'read_first_line: Missing pragma and licence' from Ernest Zaslavsky
d0881b7e read_first_line: Add missing license boilerplate
988a0e99 read_first_line:: Add missing `#pragma once`
42675266 http: Make client::make_request accept const request&
c7709fb5 http: Make request making API return exceptional future not throw
b68ed89b http: Move request content length header setup
1d96dac6 http: Move request version configuration
072e86f6 http: Setup request once
```

Closes scylladb/scylladb#25915

(cherry picked from commit 44d34663bc)

Closes scylladb/scylladb#26100
2025-09-19 11:40:59 +03:00
Avi Kivity
f6b6312cf4 Merge 'sstables/trie: prepare for integrating BTI indexes with sstable readers and writers' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: introducing the new components, Partitions.db and Rows.db

This is the preparatory, uncontroversial part of https://github.com/scylladb/scylladb/pull/26039, which has been split out to a separate PR to make the main part (which, after a revision, will be posted later) smaller.

This series contains several small fixes and changes to BTI-related code added earlier, which either have to be done (i.e. propagating `reader_permit` to IO calls in index reads) or just deserved to be done. There's no single theme for the changes in this PR, refer to the individual commits for details.

The changes are for the sake of new and unreleased code. No backporting should be done.

Closes scylladb/scylladb#26075

* github.com:scylladb/scylladb:
  sstables/mx/reader: remove mx::make_reader_with_index_reader
  test/boost/bti_index_test: fix indentation
  sstables/trie/bti_index_reader: in last_block_offset(), return offset from the beginning of partition, not file
  sstables/trie: support reader_permit and trace_state properly
  sstables/trie/bti_node_reader: avoid calling into `cached_file` if the target position is already cached
  sstables/trie/bti_index_reader: get rid of the seastar::file wrapper in read_row_index_header
  sstables/trie/bti_index_reader: support BYPASS CACHE
  test/boost/bti_index_test: use read_bti_partitions_db_footer where appropriate
  sstables/trie: change the signature of bti_partition_index_writer::finish
  sstables/bti_index: improve signatures of special member functions in index writers
  streaming/stream_transfer_task: coroutinize `estimate_partitions()`
  types/comparable_bytes: add a missing implementation for date_type_impl
  sstables: remove an outdated FIXME
  storage_service: delete `get_splits()`
  sstables/trie: fix some comment typos in bti_index_reader.cc
  sstables/mx/writer: rename _pi_write_m.tomb to partition_tombstone
2025-09-18 12:10:27 +03:00
Pavel Emelyanov
65638232e8 Merge 'utils: azure: Catch system errors when probing IMDS and bump the verbosity of logs' from Nikos Dragazis
This PR fixes a bug in the Azure default credential provider that would cause the `test_azure_provider_with_incomplete_creds` unit test to be flaky. The provider would assume that an unreachable IMDS endpoint would always result in a timeout, but network errors are also possible (e.g., ICMP "host unreachable"). The issue is triggered by this particular test because it sets the IMDS endpoint to a non-routable address. Some routers choose to silently drop such packets, while others return ICMP errors. To fix it, the default credential provider has been updated to catch system errors as well.

This PR also raises the log level of the default credential provider from DEBUG to INFO, making it easier for operators to diagnose authentication issues.

More details in the commit messages.

Fixes #25641.

Closes scylladb/scylladb#25696

* github.com:scylladb/scylladb:
  utils: azure: Catch system errors when detecting IMDS
  utils: azure: Bump default credential logs from DEBUG to INFO
2025-09-18 07:43:00 +03:00
Ernest Zaslavsky
c9c245c756 rest_client: set version on http::request to avoid invalid state
Upcoming changes in Seastar cause `rest::simple_send` to move the
`http::request` into `seastar::http::experimental::client::make_request`
when called multiple times. This leaves the original request in an
invalid state. Specifically, the `_version` field becomes empty,
causing request validation to fail. This patch ensures `version` is
explicitly set to prevent such failures.

Fixes: https://github.com/scylladb/scylladb/issues/26018

Closes scylladb/scylladb#26066
2025-09-18 07:36:25 +03:00
Michał Chojnowski
1f85069389 sstables/trie: support reader_permit and trace_state properly
Before this patch, `reader_permit` taken by `bti_index_reader`.
wasn't actually being passed down to disk reads. In this patch,
we fix this FIXME by propagating the permit down to the I/O
operations on the `cached_file`.

Also, it didn't take `trace_state_ptr` at all.
In this patch, we add a `trace_state_ptr` argument and propagate
it down to disk reads.

(We combine the two changes because the permit and the trace state
are passed together everywhere anyway).
2025-09-17 12:22:40 +02:00
Benny Halevy
3a6208b319 utils: stall_free: clear_gently: release wrapped objects
As discussed in https://github.com/scylladb/scylladb/pull/24606#discussion_r2281870939
clear_gently of shared pointers should release the wrapped
object reference and when the object's use_count reaches 1,
the object itself would be cleared_gently, before it's destroyed.

This behavior is similar to the way we clear gently containers
like arrays or vectors, and so it is extended in this patch
to smart pointers like unique_ptr and foreign_ptr.

The unit tests are adjusted respectively to expect the
smart pointers to be reset after clear_gently, plus
the use of `reset()` for `foreign_ptr<shared_ptr<>>` was
replaced by `clear_gently().get()` which now ensures the
reference to a shared object is released, and awaited for,
if it happens on a foreign owner shard, unlike reset of
a foreign_ptr that kicks off destroy of that shared object
in the background on the owner shard - causing flakiness.

Fixes #25723

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25759
2025-09-17 11:44:26 +03:00
Pavel Emelyanov
6fb66b796a s3: Add metrics to show S3 prefetch bytes
The chunked download source sends large GET requests and then consumes data
as it arrives. Sometimes it can stop reading from socket early and drop the
in-flight data. The existing read-bytes metrics show only the number of
consumed bytes, we we also want to know the number of requested bytes

Refs #25770 (accounting of read-bytes)
Fixes #25876

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25877
2025-09-16 23:40:47 +03:00
Nikos Dragazis
58e8142a06 utils: azure: Catch system errors when detecting IMDS
When the default credential provider probes IMDS to check its
availability, it assumes that application-level connection timeouts are
the only error that can occur when the node is not an Azure VM, i.e.,
the packets will be silently dropped somewhere in the network.

However, this has proven not always true for the
`test_azure_provider_with_incomplete_creds` unit test, which overrides
the default IMDS endpoint with a non-routeable IP from TEST-NET-1 [1].
This test has been reported to fail in some local setups where routers
respond with ICMP "host unreachable" errors instead of silently dropping
the packets. This error propagates to user space as an EHOSTUNREACH
system error, which is not caught by the default credential provider,
causing the test to fail. The reason we use a non-routeable address in
this test is to ensure that IMDS probing will always fail, even if
running the test on an Azure VM.

Theoretically, the same problem applies to the default IMDS endpoint as
well (169.254.169.254). The RFC 3927 [2] mandates that packets targeting
link-local addresses (169.254/16) must not be forwarded, but the exact
behavior is left to implementation.

Since we cannot predict how routers will behave, fix this by catching
all relevant system errors when probing IMDS.

[1] https://datatracker.ietf.org/doc/html/rfc5735
[2] https://datatracker.ietf.org/doc/html/rfc3927

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-16 15:27:59 +03:00
Nikos Dragazis
78bcecd570 utils: azure: Bump default credential logs from DEBUG to INFO
The default credential provider produces diagnostic logs on each step as
it walks through the credential chain. These logs are useful for
operators to diagnose authentication problems as they expose information
about which credential sources are being evaluated, in which order, why
they fail, and which source is eventually selected.

Promote them from DEBUG to INFO level.

Additionally, concatenate the logs for environment credentials into a
single log statement to avoid interleaving with other logs.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-16 15:20:52 +03:00
Botond Dénes
ee7c85919e Revert "treewide: seastar module update and fix broken rest client"
This reverts commit 44d34663bc of PR
https://github.com/scylladb/scylladb/pull/25915.

Breaks articact tests on ARM, blocking us from building new images from
master.
2025-09-16 08:31:08 +03:00
Ernest Zaslavsky
44d34663bc treewide: seastar module update and fix broken rest client
start using `write_body` in `rest/client` to properly set headers due to changes applied to seastar's http client

Seastar module update
```
b6be384e Merge 'http: generalize Content-Type setting' from Nadav Har'El
74472298 http: generalize request's Content-Type setting
9fd5a1cc http: generalize reply's Content-Type setting
a2665f38 memory: Remove deprecated enable_abort_on_allocation_failure()
d2a5a8a9 resource.cc: Remove some dead code
7ad9f424 http: Add support of multiple key repetitions for the request
a636baca task: Move task::get_backtrace() definition in its class
a0101efa Fixed "doxygen" spelling in error message
db969482 Merge 'http/reply: introduce set_cookie()' from Botond Dénes
5357b434 http/reply: introduce set_cookie()
1ddcf05f http/reply: make write_reply*() public
4b782d73 http/connection: start_response(): fix indentation
720feca0 http/reply: encapsulate reply writing in write_reply()
3e19917d Merge 'exceptions: log thrown and propagated exception with distinct log levels' from Botond Dénes
db9aea93 Merge 'Correctly wrap up abandoned yielding directory lister' from Pavel Emelyanov
dbb2bf3f test: Add test for input_stream::read_exactly()
a5308ec9 file/directory_lister: Correctly wrap up fallback generator
4f0811f4 file/directory_lister: Convert on-stack queue to shared pointer
59801da7 tests: Add directory lister early drop cases
33233032 http/reply: s/write_reply_to_connection/write_reply/
69b93620 http/reply: write_reply_{to_connection,headers}(): pass output stream
56e9bda7 test: Convert directory_test into seastar test
96782358 Merge 'Improve io_tester's seqwrite and append workloads' from Pavel Emelyanov
8b46e3d4 SEASTAR_ASSERT: assert to stderr and flush stream
3370e22a tutorial.md: use current_exception_as_future()
e977453a Add fixture support for seastar::testing
3e70d7f7 io_tester: Do not set append_is_unlikely unconditionally
2a4ae7b4 io_tester: Count file size overflows
5e678bb5 io_tester: Tuneup size overflow check
d5dad8ce io_tester: Move position management code to io_class_data
5586a056 io_tester: Rename seqwrite -> overwrite
92df2fb2 io_tester: Relax return value of create_and_fill_file()
03d9500d io_tester: Dont fill file for APPEND
d6844a7b io_tester: Indentation fix after previous patch
fb9e0088 io_tester: Coroutinize create_and_fill_file()
2f802f57 exceptions: log thrown and propagated exception with distinct log levels
4971fa70 util: move log-level into own header
39448fc1 Merge 'Fix and tune http::request setup by client' from Pavel Emelyanov
52d0c4fb iostream: Move output_stream::write(scattered_message) lower
7a52f734 Merge 'read_first_line: Missing pragma and licence' from Ernest Zaslavsky
d0881b7e read_first_line: Add missing license boilerplate
988a0e99 read_first_line:: Add missing `#pragma once`
42675266 http: Make client::make_request accept const request&
c7709fb5 http: Make request making API return exceptional future not throw
b68ed89b http: Move request content length header setup
1d96dac6 http: Move request version configuration
072e86f6 http: Setup request once
```

Closes scylladb/scylladb#25915
2025-09-13 17:14:28 +03:00
Radosław Cybulski
436150eb52 treewide: fix spelling errors
Fix spelling errors reported by copilot on github.
Remove single use namespace alias.

Closes scylladb/scylladb#25960
2025-09-12 15:58:19 +03:00
Avi Kivity
c91b326d5a Merge 'transport: replace throwing protocol_exception with returns' from Dario Mirovic
Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance.

Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read*` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers.

The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same.

transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved.

cql3 module changes do the same as transport server module.

Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query.

Command line used:
```
./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null
```
The only thing changed across runs is `--workload write`/`--workload read`.

Built and run on `release` target.

<details>

```
throughput:
        mean=   36946.04 standard-deviation=1831.28
        median= 37515.49 median-absolute-deviation=1544.52
        maximum=39748.41 minimum=28443.36
instructions_per_op:
        mean=   108105.70 standard-deviation=965.19
        median= 108052.56 median-absolute-deviation=53.47
        maximum=124735.92 minimum=107899.00
cpu_cycles_per_op:
        mean=   70065.73 standard-deviation=2328.50
        median= 69755.89 median-absolute-deviation=1250.85
        maximum=92631.48 minimum=66479.36

⏱  real=5:11.08  user=2:00.20  sys=2:25.55  cpu=85%
```

```
throughput:
        mean=   40718.30 standard-deviation=2237.16
        median= 41194.39 median-absolute-deviation=1723.72
        maximum=43974.56 minimum=34738.16
instructions_per_op:
        mean=   117083.62 standard-deviation=40.74
        median= 117087.54 median-absolute-deviation=31.95
        maximum=117215.34 minimum=116874.30
cpu_cycles_per_op:
        mean=   58777.43 standard-deviation=1225.70
        median= 58724.65 median-absolute-deviation=776.03
        maximum=64740.54 minimum=55922.58

⏱  real=5:12.37  user=27.461  sys=3:54.53  cpu=83%
```

```
throughput:
        mean=   37107.91 standard-deviation=1698.58
        median= 37185.53 median-absolute-deviation=1300.99
        maximum=40459.85 minimum=29224.83
instructions_per_op:
        mean=   108345.12 standard-deviation=931.33
        median= 108289.82 median-absolute-deviation=55.97
        maximum=124394.65 minimum=108188.37
cpu_cycles_per_op:
        mean=   70333.79 standard-deviation=2247.71
        median= 69985.47 median-absolute-deviation=1212.65
        maximum=92219.10 minimum=65881.72

⏱  real=5:10.98  user=2:40.01  sys=1:45.84  cpu=85%
```

```
throughput:
        mean=   38353.12 standard-deviation=1806.46
        median= 38971.17 median-absolute-deviation=1365.79
        maximum=41143.64 minimum=32967.57
instructions_per_op:
        mean=   117270.60 standard-deviation=35.50
        median= 117268.07 median-absolute-deviation=16.81
        maximum=117475.89 minimum=117073.74
cpu_cycles_per_op:
        mean=   57256.00 standard-deviation=1039.17
        median= 57341.93 median-absolute-deviation=634.50
        maximum=61993.62 minimum=54670.77

⏱  real=5:12.82  user=4:10.79  sys=11.530  cpu=83%
```

This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes.

Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second * 300 seconds = 11.4m ops.

Update:

I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage.

run count: 5 times before and 5 times after the patch
duration: 300 seconds

Average write throughput median before patch: 41155.99
Average write throughput median after patch: 42193.22

Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350.

</details>

Built and run on `release` target.

<details>

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null

```
throughput:
        mean=   14910.90 standard-deviation=477.72
        median= 14956.73 median-absolute-deviation=294.16
        maximum=16061.18 minimum=13198.68
instructions_per_op:
        mean=   659591.63 standard-deviation=495.85
        median= 659595.46 median-absolute-deviation=324.91
        maximum=661184.94 minimum=658001.49
cpu_cycles_per_op:
        mean=   213301.49 standard-deviation=2724.27
        median= 212768.64 median-absolute-deviation=1403.85
        maximum=225837.15 minimum=208110.12

⏱  real=5:19.26  user=5:00.22  sys=15.827  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null

```
throughput:
        mean=   93345.45 standard-deviation=4499.00
        median= 93915.52 median-absolute-deviation=2764.41
        maximum=104343.64 minimum=79816.66
instructions_per_op:
        mean=   65556.11 standard-deviation=97.42
        median= 65545.11 median-absolute-deviation=71.51
        maximum=65806.75 minimum=65346.25
cpu_cycles_per_op:
        mean=   34160.75 standard-deviation=803.02
        median= 33927.16 median-absolute-deviation=453.08
        maximum=39285.19 minimum=32547.13

⏱  real=5:03.23  user=4:29.46  sys=29.255  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null

```
throughput:
        mean=   206982.18 standard-deviation=15894.64
        median= 208893.79 median-absolute-deviation=9923.41
        maximum=232630.14 minimum=127393.34
instructions_per_op:
        mean=   35983.27 standard-deviation=6.12
        median= 35982.75 median-absolute-deviation=3.75
        maximum=36008.24 minimum=35952.14
cpu_cycles_per_op:
        mean=   17374.87 standard-deviation=985.06
        median= 17140.81 median-absolute-deviation=368.86
        maximum=26125.38 minimum=16421.99

⏱  real=5:01.23  user=4:57.88  sys=0.124  cpu=98%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null

```
throughput:
        mean=   16198.26 standard-deviation=902.41
        median= 16094.02 median-absolute-deviation=588.58
        maximum=17890.10 minimum=13458.74
instructions_per_op:
        mean=   659752.73 standard-deviation=488.08
        median= 659789.16 median-absolute-deviation=334.35
        maximum=660881.69 minimum=658460.82
cpu_cycles_per_op:
        mean=   216070.70 standard-deviation=3491.26
        median= 215320.37 median-absolute-deviation=1678.06
        maximum=232396.48 minimum=209839.86

⏱  real=5:17.33  user=4:55.87  sys=18.425  cpu=99%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null

```
throughput:
        mean=   97067.79 standard-deviation=2637.79
        median= 97058.93 median-absolute-deviation=1477.30
        maximum=106338.97 minimum=87457.60
instructions_per_op:
        mean=   65695.66 standard-deviation=58.43
        median= 65695.93 median-absolute-deviation=37.67
        maximum=65947.76 minimum=65547.05
cpu_cycles_per_op:
        mean=   34300.20 standard-deviation=704.66
        median= 34143.92 median-absolute-deviation=321.72
        maximum=38203.68 minimum=33427.46

⏱  real=5:03.22  user=4:31.56  sys=29.164  cpu=99%
```

./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null

```
throughput:
        mean=   223495.91 standard-deviation=6134.95
        median= 224825.90 median-absolute-deviation=3302.09
        maximum=234859.90 minimum=193209.69
instructions_per_op:
        mean=   35981.41 standard-deviation=3.16
        median= 35981.13 median-absolute-deviation=2.12
        maximum=35991.46 minimum=35972.55
cpu_cycles_per_op:
        mean=   17482.26 standard-deviation=281.82
        median= 17424.08 median-absolute-deviation=143.91
        maximum=19120.68 minimum=16937.43

⏱  real=5:01.23  user=4:58.54  sys=0.136  cpu=99%
```

</details>

Fixes: #24567

This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported.

Closes scylladb/scylladb#25408

* github.com:scylladb/scylladb:
  test/cqlpy: add protocol exception tests
  test/cqlpy: `test_protocol_exceptions.py` refactor message frame building
  test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code
  transport: replace `make_frame` throw with return result
  cql3: remove throwing `protocol_exception`
  transport: replace throw in validate_utf8 with result_with_exception_ptr return
  transport: replace throwing protocol_exception with returns
  utils: add result_with_exception_ptr
  test/cqlpy: add unknown compression algorithm test case
2025-09-10 21:54:15 +03:00
Avi Kivity
fc64333040 Merge 'sstables/trie: add BTI index readers and writers' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25506/
Next part: plugging the BTI index readers and writers into sstable readers and writers.

The new code added in this PR isn't used outside of tests yet, but it's posted as a separate PR for reviewability.

This series implements, on top of the key translation logic, and abstract trie writing and traversal logic, a writer and a reader of sstable index files (which map primary keys to positions in Data.db), as described in f16fb6765b/src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md.

Caveats:
1. I think the added test has reasonable coverage, but that depends on running it multiple times. (Though it shouldn't need more than a few runs to catch any bug it covers). It's somewhat awkward as a test meant for running in CI, it's better as something you run many times after a relevant change.
2. These readers and writers are intended to be compatible with Cassandra, but I did *NOT* do any compatibility testing. The writers and readers added here have only been tested against each other, not against Cassandra's readers and writers.
3. This didn't undergo any proper benchmarking and optimization work. I was doing some measurements in the past, but everything was rewritten so much since then that the my old measurements are effectively invalidated. Frankly I have no idea what the performance of all this branchy-branchy logic is now.

No backports needed, new functionality.

Closes scylladb/scylladb#25626

* github.com:scylladb/scylladb:
  test/manual: add bti_cassandra_compatibility_test
  test/lib/random_schema: add some constraints for generated uuid and time/date values
  test/lib/random_utils: add a variant of get_bytes which takes an `engine&`
  test/boost: add bti_index_test
  sstables/writer: add an accessor for the current write position in Data.db
  sstables/trie: introduce bti_index_reader
  sstables/trie: add bti_partition_index_writer.cc
  sstables/trie: add bti_row_index_writer.cc
  utils/bit_cast: add a new overload of write_unaligned()
  sstables/trie: add trie_writer::add_partial()
  sstables/consumer: add read_56()
  sstables/trie: make bti_node_reader::page_ptr copy-constructible
  sstables: extract abstract_index_reader from index_reader.hh to its own header
  sstables/trie: add an accessor to the file_writer under bti_node_sink
  sstables/types: make `deletion_time::operator tombstone()` const
  sstables/types: add sstables::deletion_time::make_live()
  sstables/trie: fix a special case in max_offset_from_child
  sstables/trie: handle `partition_region`s other than `clustered` in BTI position encoding
  sstables/trie: rewrite lcb_mismatch to handle fragment invalidation
  test/boost/bti_key_translation_test: fix a compilation error hidden behind `if constexpr`
2025-09-10 21:48:52 +03:00
Pavel Emelyanov
9deea3655f s3: Fix chunked download source metrics calculations
In S3 client both read and write metrics have three counters -- number
of requests made, number of bytes processed and request latency. In most
of the cases all three counters are updated at once -- upon response
arrival.

However, in case of chunked download source this way of accounting
metrics is misleading. In this code the request is made once, and then
the obtained bytes are consumed eventually as the data arrive.

Currently, each time a new portion of data is read from the socket the
number of read requests is incremented. That's wrong, the request is
made once, and this counter should also be incremented once, not for
every data buffer that arrived in response.

Same for read request latency -- it's "added" for every data buffer that
arrives, but it's a lenghy process, the _request_ latency should be
accounted once per responce. Maybe later we'll want to have "data
latency" metrics as well, but for what we have now it's request latency.

The number of read bytes is accounted properly, so not touched here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25770
2025-09-08 09:49:03 +03:00
Michał Chojnowski
a800fef633 utils/bit_cast: add a new overload of write_unaligned()
Does the same thing as the existing overload, but this one
takes `std::byte*` instead of `void*`, and it additionally
returns the pointer to the end position.
2025-09-07 00:30:15 +02:00
Avi Kivity
ed483647a4 interval: specialize interval_data<T> for trivial types
C++ data movement algorithms (std::uninitialized_copy()) and friends
and the containers that use them optimize for trivially copyable
and destructible types by calling memcpy instead of using a loop
around constructors/destructors. Make intervals of trivially
copyable and destructible types also trivially copyable and
destructible by specializing interval_data<T> not to have
user-defined special member functions. This requires that T have
a default constructor since we can't skip construction when
!_start_exists or !_end_exists.

To choose whether we specialize or not, we look at default
constructiblity (see above) and trivial destructibility. This is
wider than trivial copyablity (a user-defined copy constructor
can exist) but is still beneficial, since the generated copy
constructor for interval_data<T> will be branch-free.

We don't implement the poison words in debug mode; nor are they
necessary, since we no don't manage the lifetime of _start_value
and _end_value manually any more but let the compiler do that for us.

Note [1] prevents full conversion to memcpy for now, but we still
get branch free code.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121789
2025-09-06 18:38:24 +03:00
Avi Kivity
20751517a4 interval: split data members into new interval_data class
Prepare for specialized handling of trivial types by extracting
the data members of wrapping_internal<T> and the special member
functions (constructors/destructors/assignment) into a new
interval_data<T> template.

To avoid having to refer to data member with a this-> prefix,
add using declarations in wrapping_interval<T>.
2025-09-06 18:31:58 +03:00
Radosław Cybulski
c242234552 Revert "build: add precompiled headers to CMakeLists.txt"
This reverts commit 01bb7b629a.

Closes scylladb/scylladb#25735
2025-09-03 09:46:00 +03:00
Avi Kivity
7ed261fc52 Merge 'Inital GCP object storage support' from Calle Wilund
Adds infrastructure and client for interaction with GCP object storage services.

Note: this is just a client object usable for creating, listing, deleting and up/downloading of objects to/from said storage service. It makes no attempt at actually inserting it into the sstable storage flow. That can come later.

This PR breaks out GCP auth and some general REST call functionality into shared routines. Not all code is 100% reused, but at least some.

Test is added, though could be more comprehensive (feel free to suggest a test vector).
Test can run in either local mock server mode (default), or against actual GCP.
See `test/boost/gcp_object_storage_test.cc` for explanation on the config environment vars.
Default is to run the test against a temporary docker deamon.

Closes scylladb/scylladb#24629

* github.com:scylladb/scylladb:
  test::boost::gcp_object_storage_test: Initial unit tests for GCP obj storage
  proc-utils: Re-export waiting types from seastar
  proc-utils: Inherit environment from current process
  utils::gcp::object_storage: Add client for GCP object storage
  utils::http: Add optional external credentials to dns_connection_factory init
  utils::rest: Break out request wrapper and send logic
  encryption::gcp_host: Use shared gcp credentials + REST helpers
  utils::gcp: Move/add gcp credentials management to shared file
  utils::rest::client: Add formatter for seastar::http::reply
  utils::rest::client: Add helper routines for simple REST calls
  utils::http: Make shared system trust certificates public
2025-09-02 14:38:09 +03:00
Avi Kivity
fe308de8df Merge 'treewide: Add missing #pragma once' from Ernest Zaslavsky
Add missing #pragma once and license boilerplate to include headers.

Consider adding a CI step to catch missing header guards early. It can be done easily by running `cpplint` like below
```
 find . -path ./seastar -prune -o -path ./venv -prune -o -path ./idl -prune -o -type f \( -name "*.h" -o -name "*.hh" -o -name "*.hpp" \) -print0 | xargs -0 cpplint 2>&1 | grep "header guard found"
```

No backport is needed, the change is not "functional"

Closes scylladb/scylladb#25768

* github.com:scylladb/scylladb:
  treewide: Add missing license boilerplate
  treewide: Add missing `#pragma once`
2025-09-02 13:18:04 +03:00
Calle Wilund
4a5b547a86 utils::gcp::object_storage: Add client for GCP object storage
Adds a minial client for GCP object storage operations:

* Create buckets
* Delete buckets
* List bucket content
* Copy/move bucket content
* Delete bucket content
* Upload bucket content
* Download bucket content
2025-09-01 18:03:44 +00:00
Calle Wilund
8f54b709ce utils::http: Add optional external credentials to dns_connection_factory init
Also allow creating the object using an endpoint expression.
Note: this moves code to the .cc file, because it introduces a few
more lines, and I feel we have to much stuff in headers as is.
2025-09-01 18:03:44 +00:00
Calle Wilund
0e9e1f7738 utils::rest: Break out request wrapper and send logic
Allows sharing some of the wrapping and logic outside the
single-call object/routine paths, using it also with an external
seastar::http::client, i.e. caching resources across several calls.
2025-09-01 18:03:44 +00:00
Calle Wilund
2b7ad605b3 utils::gcp: Move/add gcp credentials management to shared file
Copied from encryption::gcp_host. Light-weight impl of gcp credentials
management.
2025-09-01 18:03:44 +00:00
Calle Wilund
f6d7c7e300 utils::rest::client: Add formatter for seastar::http::reply 2025-09-01 18:03:44 +00:00
Calle Wilund
cc1e659abd utils::rest::client: Add helper routines for simple REST calls
Packing headers and unpacking response to json. Usable for esp. gcp
interaction.
2025-09-01 18:03:43 +00:00
Calle Wilund
886fcf1759 utils::http: Make shared system trust certificates public
So other clients/factories can share.
2025-09-01 18:03:43 +00:00
Ernest Zaslavsky
0e4292adb4 treewide: Add missing license boilerplate
Add missing license boilerplate to include headers
2025-09-01 14:58:32 +03:00
Ernest Zaslavsky
19345e539f treewide: Add missing #pragma once
Add missing `#pragma once` to include headers
2025-09-01 14:58:21 +03:00
Nadav Har'El
6d1abc5b2c utils/base64: fix misleading code and comment (no functional change)
utils/base64.cc had some strange code with a strange comment in
base64_begins_with().

The code had

        base.substr(operand.size() - 4, operand.size())

The comment claims that this is "last 4 bytes of base64-encoded string",
but this comment is misleading - operand is typically shorter than base
(this this whole point of the base64_begins_with()), so the real
intention of the code is not to find the *last* 4 bytes of base, but rather
the *next* four bytes after the (operand.size() - 4) which we already copied.
These four bytes that may need the full power of base64_decode_string()
because they may or may not contain padding.

But, if we really want the next 4 bytes, why pass operand.size() as the
length of the substring? operand.size() is at least 4 (it's a mutiple of
4, and if it's 0 we returned earlier), but it could me more. We don't
need more, we just need 4. It's not really wrong to take more than 4 (so
this patch doesn't *fix* any bug), but can be wasteful. So this code
should be:

        base.substr(operand.size() - 4, 4)

We already have in test/boost/alternator_unit_test.cc a test,
test_base64_begins_with that takes encoded base64 strings up to 12
characters in length (corresponding to decoded strings up to 8 chars),
and substrings from length 0 to the base string's length, and check
that test_base64_begins_with succeeds.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25712
2025-09-01 08:57:50 +03:00
Ernest Zaslavsky
05154e131a cleanup: Add missing #pragma once
Add missing `#pragma once` to include header

Closes scylladb/scylladb#25761
2025-09-01 06:41:57 +03:00
Nadav Har'El
ff91027eac utils, alternator: fix detection of invalid base-64
This patch fixes an error-path bug in the base-64 decoding code in
utils/base64.cc, which among other things is used in Alternator to decode
blobs in JSON requests.

The base-64 decoding code has a lookup table, which was wrongly sized 255
bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF)
was included in an invalid base-64 string, instead of detecting that this
is an invalid byte (since the only valid bytes in a base-64 string are
A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a
nonsense 6-bit part, or even crash on an out-of-bounds read.

Besides the trivial fix, this patch also includes a reproducing test,
which tries to write a blob as a supposedly base-64 encoded string with
a 0xFF byte in it. The test fails before this patch (the write succeeds,
unexpectedly), and passes after this patch (the write fails as
expected). The test also passes on DynamoDB.

Fixes #25701

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25705
2025-08-31 15:38:01 +03:00
Avi Kivity
bf9a963582 utils: mark crc barrett tables const
They're marked constinit, but constinit does not imply const. Since
they're not supposed to be modified, mark them const too.

Closes scylladb/scylladb#25539
2025-08-31 11:37:39 +03:00
Avi Kivity
bc5773f777 Merge 'Add out of space prevention mechanisms' from Łukasz Paszkowski
When a scaling out is delayed or fails, it is crucial to ensure that clusters remain operational
and recoverable even under extreme conditions. To achieve this, the following proactive measures
are implemented:
- reject writes
      - includes: inserts, updates, deletes, counter updates, hints, read+repair and lwt writes
      - applicable to: user tables, views, CDC log, audit, cql tracing
- stop running compactions/repairs and prevent from starting new ones
- reject incoming tablet migrations

The aforementioned mechanisms are automatically enabled when node's disk utilization reaches
the critical level (default: 98%) and disabled when the utilization drop below the threshold.

Apart from that, the series add tests that require mounted volumes to simulate out of space.
The paths to the volumes can be provided using the a pytest argument, i.e.  `--space-limited-dirs`.
When not provided, tests are skipped.

Test scenarios:

1. Start a cluster and write data until one of the nodes reaches 90% of the disk utilization
2. Perform an **operation** that would take the nodes over 100%
3. The nodes should not exceed the critical disk utilization (98% by default)
4. Scale out the cluster by adding one node per rack
5. Retry or wait for the **operation** from step 2

The **operation** is: writing data, running compactions, building materialized views, running repair,
migrating tablets (caused by RF change, decommission).

The test is successful, if no nodes run out of space, the **operation** from step 2 is
aborted/paused/timed out and the **operation** from step 5 is successful.

`perf-simple-query --smp 1 -m 1G` results obtained for fixed 400MHz frequency:

Read path (before)

```
instructions_per_op:
	mean=   39661.51 standard-deviation=34.53
	median= 39655.39 median-absolute-deviation=23.33
	maximum=39708.71 minimum=39622.61
```

Read path (after)

```
instructions_per_op:
	mean=   39691.68 standard-deviation=34.54
	median= 39683.14 median-absolute-deviation=11.94
	maximum=39749.32 minimum=39656.63
```

Write path (before):

```
instructions_per_op:
	mean=   50942.86 standard-deviation=97.69
	median= 50974.11 median-absolute-deviation=34.25
	maximum=51019.23 minimum=50771.60
```

Write path (after):

```
instructions_per_op:
	mean=   51000.15 standard-deviation=115.04
	median= 51043.93 median-absolute-deviation=52.19
	maximum=51065.81 minimum=50795.00
```

Fixes: https://github.com/scylladb/scylladb/issues/14067
Refs: https://github.com/scylladb/scylladb/issues/2871

No backport, as it is a new feature.

Closes scylladb/scylladb#23917

* github.com:scylladb/scylladb:
  tests/cluster: Add new storage tests
  test/scylla_cluster: Override workdir when passed via cmdline
  streaming: Reject incoming migrations
  storage_service: extend locator::load_stats to collect per-node critical disk utilization flag
  repair_service: Add a facility to disable the service
  compaction_manager: Subscribe to out of space controller
  compaction_manager: Replace enabled/disabled states with running state
  database: Add critical_disk_utilization mode database can be moved to
  disk_space_monitor: add subscription API for threshold-based disk space monitoring
  docs: Add feature documentation
  config: Add critical_disk_utilization_level option
  replica/exceptions: Add a new custom replica exception
2025-08-30 18:47:57 +03:00
Piotr Dulikowski
7ccb50514d Merge 'Introduce view building coordinator' from Michał Jadwiszczak
This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views.

The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table.
The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built.
The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results.
This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch.

The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation.
If the operation fails at any moment, aborted tasks are rollback.

The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149).

For detailed description check: `docs/dev/view-building-coordinator.md`

Fixes https://github.com/scylladb/scylladb/issues/22288
Fixes https://github.com/scylladb/scylladb/issues/19149
Fixes https://github.com/scylladb/scylladb/issues/21564
Fixes https://github.com/scylladb/scylladb/issues/17603
Fixes https://github.com/scylladb/scylladb/issues/22586
Fixes https://github.com/scylladb/scylladb/issues/18826
Fixes https://github.com/scylladb/scylladb/issues/23930

---

This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942

Closes scylladb/scylladb#23760

* github.com:scylladb/scylladb:
  test/cluster: add view build status tests
  test/cluster: add view building coordinator tests
  utils/error_injection: allow to abort `injection_handler::wait_for_message()`
  test: adjust existing tests
  utils/error_injection: add injection with `sleep_abortable()`
  db/view/view_builder: ignore `no_such_keyspace` exception
  docs/dev: add view building coordinator documentation
  db/view/view_building_worker: work on `process_staging` tasks
  db/view/view_building_worker: register staging sstable to view building coordinator when needed
  db/view/view_building_worker: discover staging sstables
  db/view/view_building_worker: add method to register staging sstable
  db/view/view_update_generator: add method to process staging sstables instantly
  db/view/view_update_generator: extract generating updates from staging sstables to a method
  db/view/view_update_generator: ignore tablet-based sstables
  db/view/view_building_coordinator: update view build status on node join/left
  db/view/view_building_coordinator: handle tablet operations
  db/view: add view building task mutation builder
  service/topology_coordinator: run view building coordinator
  db/view: introduce `view_building_coordinator`
  db/view/view_building_worker: update built views locally
  db/view: introduce `view_building_worker`
  db/view: extract common view building functionalities
  db/view: prepare to create abstract `view_consumer`
  message/messaging_service: add `work_on_view_building_tasks` RPC
  service/topology_coordinator: make `term_changed_error` public
  db/schema_tables: create/cleanup tasks when an index is created/dropped
  service/migration_manager: cleanup view building state on drop keyspace
  service/migration_manager: cleanup view building state on drop view
  service/migration_manager: create view building tasks on create view
  test/boost: enable proxy remote in some tests
  service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()`
  service/migration_manager: coroutinize `prepare_new_view_announcement()`
  service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine`
  service: reload `view_building_state_machine` on group0 apply()
  service/vb_coordinator: add currently processing base
  db/system_keyspace: move `get_scylla_local_mutation()` up
  db/system_keyspace: add `view_building_tasks` table
  db/view: add view_building_state and views_state
  db/system_keyspace: add method to get view build status map
  db/view: extract `system.view_build_status_v2` cql statements to system_keyspace
  db/system_keyspace: move `internal_system_query_state()` function earlier
  db/view: ignore tablet-based views in `view_builder`
  gms/feature_service: add VIEW_BUILDING_COORDINATOR feature
2025-08-29 17:28:44 +02:00
Dario Mirovic
51995af258 transport: replace throwing protocol_exception with returns
Replace throwing `protocol_exception` with returning it as a result
or an exceptional future in the transport server module. The goal is
to improve performance.

Most of the `protocol_exception` throws were made from
`fragmented_temporary_buffer` module, by passing `exception_thrower()`
to its `read*` methods. `fragmented_temporary_buffer` is changed so
that it now accepts an exception creator, not exception thrower.
`fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced
`fragmented_temporary_buffer_concepts::ExceptionThrower` and all
methods that have been throwing now return failed result of type
`utils::result_with_exception_ptr`. This change is then propagated to the callers.

The scope of this patch is `protocol_exception`, so commitlog just calls
`.value()` method on the result. If the result failed, that will throw the
exception from the result, as defined by `utils::result_with_exception_ptr_throw_policy`.
This means that the behavior of commitlog module stays the same.

transport server module handles results gracefully. All the caller functions
that return non-future value `T` now return `utils::result_with_exception_ptr<T>`.
When the caller is a function that returns a future, and it receives
failed result, `make_exception_future(std::move(failed_result).value())`
is returned. The rest of the callstack up to the transport server `handle_error`
function is already working without throwing, and that's how zero throws is
achieved.

Fixes: #24567
2025-08-28 23:31:36 +02:00
Dario Mirovic
f01efd822e utils: add result_with_exception_ptr
Add `result_with_exception_ptr` result type.
Successful result has user specified type.
Failed result has std::exception_ptr.

This approach is simpler than `result_with_exception`. It does not require
user to pass exception types as variadic template through all the callstack.
Specific exception type can still be accessed without costly std::rethrow_exception(eptr)
by using `try_catch`, if configured so via `USE_OPTIMIZED_EXCEPTION_HANDLING`.
This means no information loss, but less verbosity when writing result types.

Refs: #24567
2025-08-28 23:31:04 +02:00
Łukasz Paszkowski
3e740d25b5 disk_space_monitor: add subscription API for threshold-based disk space monitoring
Introduce the `subscribe` method to disk_space_monitor, allowing clients to
register callbacks triggered when disk utilization crosses a configurable
threshold.

The API supports flexible trigger options, including notifications on threshold
crossing and direction (above/below). This enables more granular and efficient
disk space monitoring for consumers.
2025-08-28 18:06:37 +02:00
Radosław Cybulski
01bb7b629a build: add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros

Closes #25182
2025-08-27 21:37:54 +03:00
Michał Jadwiszczak
90b5b2c5f5 utils/error_injection: allow to abort injection_handler::wait_for_message() 2025-08-27 10:23:04 +02:00
Michał Jadwiszczak
6056b55309 utils/error_injection: add injection with sleep_abortable() 2025-08-27 10:23:04 +02:00
Botond Dénes
f8b79d563a Merge 's3: Minor refactoring and beautification of S3 client and tests' from Ernest Zaslavsky
This pull request introduces minor code refactoring and aesthetic improvements to the S3 client and its associated test suite. The changes focus on enhancing readability, consistency, and maintainability without altering any functional behavior.

No backport is required, as the modifications are purely cosmetic and do not impact functionality or compatibility.

Closes scylladb/scylladb#25490

* github.com:scylladb/scylladb:
  s3_client: relocate `req` creation closer to usage
  s3_client: reformat long logging lines for readability
  s3_test: extract file writing code to a function
2025-08-18 18:48:42 +03:00
Avi Kivity
96956e48c4 Merge 'utils: stall_free: detect clear_gently method of const payload types' from Benny Halevy
Currently, when a container or smart pointer holds a const payload
type, utils::clear_gently does not detect the object's clear_gently
method as the method is non-const and requires a mutable object,
as in the following example in class tablet_metadata:
```
    using tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>;
    using table_to_tablet_map = std::unordered_map<table_id, tablet_map_ptr>;
```

That said, when a container is cleared gently the elements it holds
are destroyed anyhow, so we'd like to allow to clear them gently before
destruction.

This change still doesn't allow directly calling utils::clear_gently
an const objects.

And respective unit tests.

Fixes #24605
Fixed #25026

* This is an optimization that is not strictly required to backport (as https://github.com/scylladb/scylladb/pull/24618 dealt with clear_gently of `tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>` well enough)

Closes scylladb/scylladb#24606

* github.com:scylladb/scylladb:
  utils: stall_free: detect clear_gently method of const payload types
  utils: stall_free: clear gently a foreign shared ptr only when use_count==1
2025-08-18 12:52:02 +03:00
Ernest Zaslavsky
a0016bd0cc s3_client: relocate req creation closer to usage
Move the creation of the `req` object to the point where it is
actually used, improving code clarity and reducing premature
initialization.
2025-08-14 16:18:43 +03:00
Ernest Zaslavsky
6ef2b0b510 s3_client: reformat long logging lines for readability
Break up excessively long logging statements to improve readability
and maintain consistent formatting across the codebase.
2025-08-14 16:18:43 +03:00
Ernest Zaslavsky
dd51e50f60 s3_client: add memory fallback in chunked_download_source
Introduce fallback logic in `chunked_download_source` to handle
memory exhaustion. When memory is low, feed the `deque` with only
one uncounted buffer at a time. This allows slow but steady progress
without getting stuck on the memory semaphore.

Fixes: https://github.com/scylladb/scylladb/issues/25453
Fixes: https://github.com/scylladb/scylladb/issues/25262

Closes scylladb/scylladb#25452
2025-08-14 09:52:10 +03:00
Ernest Zaslavsky
380c73ca03 s3_client: make memory semaphore acquisition abortable
Add `abort_source` to the `get_units` call for the memory semaphore
in the S3 client, allowing the acquisition process to be aborted.

Fixes: https://github.com/scylladb/scylladb/issues/25454

Closes scylladb/scylladb#25469
2025-08-13 08:48:55 +03:00
Benny Halevy
23ac80fc6b utils: stall_free: detect clear_gently method of const payload types
Currently, when a container or smart pointer holds a const payload
type, utils::clear_gently does not detect the object's clear_gently
method as the method is non-const and requires a mutable object,
as in the following example in class tablet_metadata:
```
    using tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>;
    using table_to_tablet_map = std::unordered_map<table_id, tablet_map_ptr>;
```

That said, when a container is cleared gently the elements it holds
are destroyed anyhow, so we'd like to allow to clear them gently before
destruction.

This change still doesn't allow directly calling utils::clear_gently
an const objects.

And respective unit tests.

Fixes #24605

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-11 14:22:01 +03:00
Benny Halevy
cb9db2f396 utils: stall_free: clear gently a foreign shared ptr only when use_count==1
Unlike clear_gently of SharedPtr, clear_gently of a
`foreign_ptr<shared_ptr<T>>` calls clear_gently on the contained object
even if it's still shared and may still be in use.

This change examines the foreign shared pointer's use_count
and calls clear_gently on the shard object only when
its use_count reaches 1.

Fixes #25026

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-11 14:21:32 +03:00