Large reserves in allocating_section can cause stalls. We already log
reserve increase, but we don't know which table it belongs to:
lsa - LSA allocation failure, increasing reserve in section 0x600009f94590 to 128 segments;
Allocating sections used for updating row cache on memtable flush are
notoriously problematic. Each table has its own row_cache, so its own
allocating_section(s). If we attached table name to those sections, we
could identify which table is causing problems. In some issues we
suspected system.raft, but we can't be sure.
This patch allows naming allocating_sections for the purpose of
identifying them in such log messages. I use abstract_formatter for
this purpose to avoid the cost of formatting strings on the hot path
(e.g. index_reader). And also to avoid duplicating strings which are
already stored elsewhere.
Fixes#25799Closesscylladb/scylladb#27470
Fixes#27366
A URL with numeric host part formats special in case of ipv6,
to avoid confusion with port part.
The parser should handle this.
I.e.
http://[2001:db8:4006:812::200e]:8080
v2:
* Include scheme agnostic parse + case insensitive scheme matching
Fixes#27268
Refs #27268
Includes the auth call in code covered by backoff-retry on server
error, as well as moves the code to use the shared primitive for this
and increase the resilience a bit (increase retry count).
v2:
* Don't do backoff if we need to refresh credentials.
* Use abort source for backoff if avail
v3:
* Include other retryable conditions in auth check
Closesscylladb/scylladb#27269
Add handling for a broader set of transient network-related `std::errc` values in `aws_error::from_system_error`. Treat these conditions as retryable when the client re-creates the socket for each request.
Fixes: https://github.com/scylladb/scylladb/issues/27349Closesscylladb/scylladb#27350
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updates queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.
This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated, because we update mapping on each change of gossip
states. This made bootstrap impossible because nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.
To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.
It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.
Fixes#26865Closesscylladb/scylladb#26941
* github.com:scylladb/scylladb:
address_map: Use barrier() to wait for replication
address_map: Use more efficient and reliable replication method
utils: Introduce helper for replicated data structures
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.
New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.
The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.
The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.
Note: following configuration needs to be added to ccache.conf:
sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime
Closesscylladb/scylladb#26617
Fixes#26874
Due to certain people (me) not being able to tell forward from backward,
the data alignment to ensure partial uploads adhere to the 256k-align
rule would potentially _reorder_ trailing buffers generated iff the
source buffers input into the sink are small enough. Which, as a fun fact,
they are in backup upload.
Change the unit test to use raw sink IO and add two unit tests (of which
the smaller size provokes the bug) that checks the same 64k buf segmented
upload backup uses.
Closesscylladb/scylladb#26938
According to the IETF spec uuid variant bits should be set to '10'. All
others are either invalid or reserved. The patch change the code to
follow the spec.
Closesscylladb/scylladb#27073
* seastar 63900e03...340e14a7 (19):
> Merge 'rpc: harden sink_impl::close()' from Benny Halevy
rpc: sink_impl::close: fixup indentation
rpc: harden sink_impl::close()
> http: Document the way "unread body bytes" accounting works
> net: tighten port load balancing port access
> coroutine: reimplement generator with buffered variant
> Merge 'Stop using net::packet in posix data sink' from Pavel Emelyanov
net/posix-stack: Don't use packet in posix_data_sink_impl
reactor: Move fragment-vs-iovec static assertion
reactor: Make backend::sendmsg() calls use std::span<iovec>
utils: Introduce iovec_trim_front helper
utils: Spannize iovec_len()
> Merge 'Generalize memory data sink in tests' from Pavel Emelyanov
test: Make output_stream_test splitting test case use own sink
test: Make some output_stream_test cases use memory data sink
test: Threadify one of output_stream_test test cases
test: Make json_formatter_test use memory_data_sink
test: Move memory_data_sink to its own header
> dns: avoid using deprecated c-ares API
> reactor: Move read_directory() to posix_file_impl
> Merge 'rpc: sink_impl: batch sending and deletion of snd_buf:s' from Benny Halevy
test: rpc_test: add test_rpc_stream_backpressure_across_shards
reactor: add abort_on_too_long_task_queue option
rpc: make sink flush and close noexcept
rpc: sink_impl: batch sending and deletion of snd_buf:s
rpc: move sink_impl and source_impl into internal namespace
rpc: sink_impl: extend backpressure until snd_buf destroy
> configure.py: fix --api-level help
> Merge 'Close http client connection if handler doesn't consume all chunked-encoded body' from Pavel Emelyanov
test: Fix indentation after previous patch
test/http: Extend test for improper client handling of aborted requests
test/http: Ignore EPIPE exception from server closing connection
test/http: Split the partial response body read test
http: Track "remaining bytes" for chunked_source_impl
http: Switch content_length_source_impl to update remaining bytes
> metrics: Add default ~config()
> headers: Remove smp.hh from app-template.hh
> prometheus: remove hostname and metric_help config
> rpc: Tune up connection methods visibility
> perf_tests: Fix build with fmt 12.0.0 by avoiding internal functions
> doc: Fix some typos in codying style
> reactor: Remove unused try_sleep() method
directory_lister::get is adjusted in this patch to
use the new experimental::coroutine::generator interface
that was changed in scylladb/seastar@81f2dc9dd9
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#26913
Having variadic dispose_gently() clear inputs concurrently
serves no purpose, since this is a CPU bound operation. It
will just add more tasks for the reactor to process.
Reduce disruption to other work by processing inputs
sequentially.
Closesscylladb/scylladb#26993
Key goals:
- efficient (batching updates)
- reliable (no lost updates)
Will be used in data structures maintained on one designed owning
shard and replicated to other shards.
The directory_lister uses utils::lister under the hood which accepts a
callback to put directory_entry-s in. The directory_lister's callback
then puts the entries into a queue and its .get() method pops up entries
from there to return to caller.
This patch simplifies this code by switching the directory_lister to use
experimental generator lister from seastar. With it, the entries to be
returned from .get() are simply co_await-ed from calling the generator
object (wich co_yield-s them).
As a result the directory_lister becomes smaller and drops the need for
utils::lister. Since directory_lister was created as a replacement for
that callback-based lister, the latter can be eventually removed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26586
Sometimes file::list_directory() returns entries without type set. In
thase case lister calls file_type() on the entry name to get it. In case
the call returns disengated type, the code assumes that some error
occurred and resolves into exception.
That's not correct. The file_type() method returns disengated type only
if the file being inspected is missing (i.e. on ENOENT errno). But this
can validly happen if a file is removed bettween readdir and stat. In
that case it's not "some error happened", but a enry should be just
skipped. In "some error happened", then file_type() would resolve into
exceptional future on its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26595
The utils library requires OpenSSL's libcrypto for cryptographic
operations and without linking libcrypto, builds fail with undefined
symbol errors. Fix that by linking `crypto` to `utils` library when
compiled with cmake. The build files generated with configure.py already
have `crypto` lib linked, so they do not have this issue.
Fix#26705
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#26707
With the recent introduction of retry_strategy to Seastar, the pure virtual class previously defined in ScyllaDB is now redundant. This change allows us to streamline our codebase by directly inheriting from Seastar’s implementation, eliminating duplication in ScyllaDB.
Despite this update is purely a refactoring effort and does not introduce functional changes it should be ported back to 2025.3 and 2025.4 otherwise it will make future backports of bugfixes/improvements related to `s3_client` near to impossible
ref: https://github.com/scylladb/seastar/issues/2803
depends on: https://github.com/scylladb/seastar/pull/2960Closesscylladb/scylladb#25801
* github.com:scylladb/scylladb:
s3_client: remove unnecessary `co_await` in `make_request`
s3 cleanup: remove obsolete retry-related classes
s3_client: remove unused `filler_exception`
s3_client: fix indentation
s3_client: simplify chunked download error handling using `make_request`
s3_client: reformat `make_request` functions for readability
s3_client: eliminate duplication in `make_request` by using overload
s3_client: reformat `make_request` function declarations for readability
s3_client: reorder `make_request` and helper declarations
s3_client: add `make_request` override with custom retry and error handler
s3_client: migrate s3_client to Seastar HTTP client
s3_client: fix crash in `copy_s3_object` due to dangling stream
s3_client: coroutinize `copy_s3_object` response callback
aws_error: handle missing `unexpected_status_error` case
s3_creds: use Seastar HTTP client with retry strategy
retry_strategy: add exponential backoff to `default_aws_retry_strategy`
retry_strategy: introduce Seastar-based retry strategy
retry_strategy: update CMake and configure.py for new strategy
retry_strategy: rename `default_retry_strategy` to `default_aws_retry_strategy`
retry_strategy: fix include
retry_strategy: Copied utils/s3/retry_strategy.hh to utils/s3/default_aws_retry_strategy.hh
retry_strategy: Copied utils/s3/retry_strategy.cc to utils/s3/default_aws_retry_strategy.cc
The std::optional<T> inject_parameter(...) method is a template, and in
dev/debug modes this parameter is defaulted to std::string_view, but for
release mode it's not. This patch makes it symmetrical.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26706
Refactor `chunked_download_source` to eliminate redundant exception
handling by leveraging the new `make_request` override with custom
retry strategy. This streamlines the download fiber logic, improving
readability and maintainability.
Reformats the `make_request` function declarations to improve readability
due to the large number of arguments. This aligns with our formatting
guidelines and makes the code easier to maintain.
handler
Introduce an override for `make_request` in `s3_client` to support
custom retry strategies and error handlers, enabling flexibility
beyond the default client behavior and improving control over request
handling
In the `copy_part` method, move the `input_stream<char>` argument
into a local variable before use. Failing to do so can lead to a
SIGSEGV or trigger an abort under address sanitizer.
Add a missing `case` clause to the `switch` statement to correctly
handle scenarios where `unexpected_status_error` is thrown. This
fixes overlooked error handling and improves robustness.
In AWS credentials providers, replace `retryable_http_client` with
Seastar's native HTTP client. Integrate the newly added
`default_aws_retry_strategy` to handle retries more efficiently and
reduce dependency on external retry logic.
Add a new class derived from Seastar's `default_retry_strategy`.
Relocate the `should_retry` implementation from Scylla's
`default_retry_strategy` into the new class to centralize and
standardize retry behavior.
Renames the `default_retry_strategy` class to `default_aws_retry_strategy`
to clarify its association with the S3 client implementation. This avoids
confusion with the unrelated `seastar::default_retry_strategy` class.
Other than patching Scylla sinks to implement new data_sink_impl::put(std::span<temporary_buffer>) overload, the PR changes transport write_response() method to stop using output_stream::write(scattered_message) because it's also gone.
Using newer seastar API, no need to backport
Closesscylladb/scylladb#26592
* github.com:scylladb/scylladb:
code: Fix indentation after previous patch
code: Switch to seastar API level 9
transport: Open-code invoke_with_counting into counting_data_sink::put
transport: Don't use scattered_message
utils: Implement memory_data_sink::put(net::packet)
Apply two main changes to the s3_client error handling
1. Add a loop to s3_client's `make_request` for the case whe the retry strategy will not help since the request itself have to be updated. For example, authentication token expiration or timestamp on the request header
2. Refine the way we handle exceptions in the `chunked_download_source` background fiber, now we carry the original `exception_ptr` and also we wrap EVERY exception in `filler_exception` to prevent retry strategy trying to retry the request altogether
Fixes: https://github.com/scylladb/scylladb/issues/26483
Should be ported back to 2025.3 and 2025.4 to prevent deadlocks and failures in these versions
Closesscylladb/scylladb#26527
* github.com:scylladb/scylladb:
s3_client: tune logging level
s3_client: add logging
s3_client: improve exception handling for chunked downloads
s3_client: fix indentation
s3_client: add max for client level retries
s3_client: remove `s3_retry_strategy`
s3_client: support high-level request retries
s3_client: just reformat `make_request`
s3_client: unify `make_request` implementation
Refactor the wrapping exception used in `chunked_download_source` to
prevent the retry strategy from reattempting failed requests. The new
implementation preserves the original `exception_ptr`, making the root
cause clearer and easier to diagnose.
It never worked as intended, so the credentials handling is moving to the same place where we handle time skew, since we have to reauthenticate the request
Add an option to retry S3 requests at the highest level, including
reinitializing headers and reauthenticating. This addresses cases
where retrying the same request fails, such as when the S3 server
rejects a timestamp older than 15 minutes.
Integrates GCP object storage as a working storage backend for scylla sstables as well as backup storage.
Adds an abstraction layer (atm very heavily designed around the s3 client interface and usage) to allow the "storage" etc layers of sstable management to pick transparently between "s3" and "gs" providers.
This modifies the scylla config such that endpoints can optionally (through a "type" param) ref a GS backend.
Similarly with storage_options.
Also adds some IO wrapping primitives to make it more feasible to place some logic at a mid level of the implementation stack (such as making networked storage files, ranged reading etc).
Test s3 fixture is replaced (where appropriate) with an `object_storage` fixture that multiplexes the test across both backends.
Unit tests are duplicated and for the GS versions use a boost test fixture for GCS, default local fake.
Fixes#25359Fixes#26453Closesscylladb/scylladb#26186
* github.com:scylladb/scylladb:
docs::dev::object_storage: Add some initial info on GS storage
docs/dev: Add mention of (nested) docker usage in testing.md
sstables::object_storage_client: Forward memory limit semaphore to GS instance
utils::gcp::object_storage: Add optional memory limits to up/download
sstables::object_storage_client: Add multi-upload support for GS
utils::gcp::storage: Add merge objects operation
test_backup/test_basic: Make tests multiplex both s3 and gs backends
test::cluster::conftest: Add support for multiple object storage backends
boost::gcs_storage_test: reindent
boost::gcs_storage_test: Convert to use fixture
tests::boost: Add GS object storage cases to mirror S3 ones
tests::lib::gcs_fixture: Add a reusable test fixture for real/fake GS/GCS
tests::lib::test_utils: Add overloads/helpers for reading and (temp) writing env
sstables::object_storage_client: Add google storage implementation
test_services: Allow testing with GS object storage parameters
utils::gcp::gcp_credentials: Add option to create uninitialized credentials
utils::gcp::object_storage: Make create_download_source return seekable_data_source
utils::gcp::object_storage: Add defensive copies of string_view params
utils::gcp::object_storage: Add missing retry backoff increate
utils::gcp::object_storage: Add timestamp to object listing
utils::gcp::object_storage: Add paging support to list_objects
object_storage_client: Add object_name wrapper type
utils::gcp::object_storage: Add optional abort_source
utils::rest::client: Add abort_source support
sstables: Use object_storage_client for remote storage
sstables::object_storage_client: Add abstraction layer for OS cliens (s3 initial)
s3::upload_progress: Promote to general util type
storage_options: Abstract s3 to "object_storage" and add gs as option
sstables::file_io_extension: Change "creator" callback to just data_source
utils::io-wrappers: Add ranged data_source
utils::io-wrappers: Add file wrapper type for seekable_source
utils::seekable_source: Add a seekable IO source type
object_storage_endpoint_param: Add gs storage as option
config: break out object_storage_endpoint_param preparing for multi storage