As the IAM role is not configured to assume a role at this moment, it
makes sense to move the instance metadata credentials provider up in
the chain. This avoids unnecessary network calls and prevents log
clutter caused by failure messages.
Closesscylladb/scylladb#23360
This PR introduces several key improvements to bolster the reliability of our S3 client, particularly in handling intermittent authentication and TLS-related issues. The changes include:
1. **Automatic Credential Renewal and Request Retry**: When credentials expire, the new retry strategy now resets the credentials and set the client to the retryable state, so the client will re-authenticate, and automatically retry the request. This change prevents transient authentication failures from propagating as fatal errors.
2. **Enhanced Exception Unwrapping**: The client now extracts the embedded std::system_error from std::nested_exception instances that may be raised by the Seastar HTTP client when using TLS. This allows for more precise error reporting and handling.
3. **Expanded TLS Error Handling**: We've added support for retryable TLS error codes within the std::system_error handler. This modification enables the client to detect and recover from transient TLS issues by retrying the affected operations.
Together, these enhancements improve overall client robustness by ensuring smoother recovery from both credential and TLS-related errors.
No backport needed since it is an enhancement
Closesscylladb/scylladb#22150
* github.com:scylladb/scylladb:
aws_error: Add GNU TLS codes
s3_client: Handle nested std::system_error exceptions
s3_client: Start using new retry strategy
retry_strategy: Add custom retry strategy for S3 client
retry_strategy: Make `should_retry` awaitable
Enhance error handling by detecting and processing std::system_error exceptions
nested within std::nested_exception. This improvement ensures that system-level
errors wrapped in the exception chain are properly caught and managed, leading
to more robust error reporting and recovery.
* Previously, token expiration was considered a fatal error. With this change,
the `s3_client` uses new retry strategy that is trying to renew expired
creds
* Added related test to the `s3_proxy`
Introduced a new retry strategy that extends the default implementation.
The should_retry method is overridden to handle a specific case for expired credential tokens.
When an expired token error is detected, the credentials are reset so it is expected that the client will re-authenticates, and the
original request is retried.
Currently when a client::make_request() is called it can pass
std::optional<status> argument indicating which status it expects from
server. In case status doesn't match, the request body handler won't be
called, the request will fail with unexpected status exception.
However, disengaged expected implicitly means, that the requestor
expects the OK (200) status. This makes it impossible to make a query
which return status is not known in advance and it's up to the handler
to check it.
Lower level http client allows disengaged expected with the described
semantics -- handler will check status its own. This behavios for s3
client is needed for GET request. Server can respond with OK or partial
content status depending on the Range header. If the header is absent or
is large enough for the requested object to fit into it, the status
would be OK, if the object is "trimmed" the status is partial content.
In the end of the day, requestor cannot "guess" the returning status in
advance and should check it upon response arrival.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23243
* seastar 5b95d1d7...412d058c (62):
> fstream: Export functions for making file_data_source
> build: Include DPDK dependency libraries in Seastar linkage
> demos/tls_echo_server_demo: Modernize with seastar::async
> http/client: Pass abort source by pointer
> rpc: remove deprecated logging function support
> github: Add Alpine Linux workflow to test builds with musl libc
> exception_hacks: Make dl_iterate_phdr resolution manual
> tests: relax test_file_system_space check for empty filesystems
> demos/udp_server_demo: Modernize with seastar::async and proper teardown
> future: remove deprecated functions/concepts
> util: logger: remove deprecated set_stdout_enabled and logger_ostream_type::{stdout,stderr}
> memory: guard __GLIBC_PREREQ usage with __GLIBC__ check
> scheduling_specific: Add noexcept wrapper for free()
> file: Replace __gid_t with standard POSIX gid_t
> aio_storage_context: Use reactor::do_at_exit()
> json2code: support chunked_fifo
> json: remove unused headers
> httpd: test cases for streaming
> build: use find_dependency() instead find_package() in config file
> build: stop using a loop for finding dependencies
> dns: Fix event processing to work safely with recent c-ares
> tutorial: add a section about initialization and cleanup
> reactor: deprecate at_exit()
> httpclient: Add exception handling to connection::close
> file: document max_length-limits for dma_read/write funcs taking vector<iovec>
> build: fix P2582R1 detection in GCC compatibility check
> json2code: optimize string handling using std::string_view
> tests/unit: fix typo in test output
> doc: Update documentation after removing build.sh
> test: Add direct exception passing for awaits for perf test
> github: add Docker build verification workflow
> docker: update LLVM debian repo for Ubuntu Orcular migration
> tests/unit: Use http.HTTPStatus constants instead of raw status codes
> tests/unit: Fix exception verification in json2code_test.py
> httpd: handle streaming results in more handlers
> json: stream_object now moves value
> json: support for rvalue ranges
> chunked_fifo: make copyable
> reactor: deprecate at_destroy()
> testing: prevent test scheduling after reactor exit
> net: Add bytes sent/received metrics
> net: switch rss_key_type to std::span instead of std::string_view
> log: fixes for libc++ 19
> sstring: fixes for lib++ 19
> build: finalize numactl dependency removal
> build: link DPDK against libnuma when detected during build
> memory: remove libnuma dependency
> treewide: replace assert with SEASTAR_ASSERT
> future: fix typo in comment
> http: Unwrap nested exceptions to handle retryable transport errors
> net/ip, net: sed -i 's/to_ulong/to_uint/'
> core: function_traits noexcept specializations
> util/variant: seastar::visit forward value arg
> net/tls: fix missing include
> tls: Add a way to inspect peer certificate chain
> websocket: Extract encode_base64() function
> websocket: Rename wlogger to websocket_logger
> websocket: Extract parts of server_connection usable for client
> websocket: Rename connection to server_connection
> websocket: Extract websocket parser to separate file
> json2code_test: factor out query method
> seastar-json2code: fix error handling
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23281
scylla-sstable: Enable support for S3-stored sstables
Minimal implementation of what was mentioned in this [issue](https://github.com/scylladb/scylladb/issues/20532)
This update allows Scylla to work with sstables stored on AWS S3. Users can specify the fully qualified location of the sstable using the format: `s3://bucket/prefix/sstable_name`. One should have `object_storage_config_file` referenced in the `scylla.yaml` as described in docs/operating-scylla/admin.rst
ref: https://github.com/scylladb/scylladb/issues/20532
fixes: https://github.com/scylladb/scylladb/issues/20535
No backport needed since the S3 functionality was never released
Closesscylladb/scylladb#22321
* github.com:scylladb/scylladb:
tests: Add Tests for Scylla-SSTable S3 Functionality
docs: Update Scylla Tools Documentation for S3 SSTable Support
scylla-sstable: Enable Support for S3 SSTables
s3: Implement S3 Fully Qualified Name Manipulation Functions
object_storage: Refactor `object_storage.yaml` parsing logic
This series adds an async guard to system_keyspace operations
and adds a deferred action to stop the system_keyspace in main() before destroying the service.
This helps to make sure that sys_ks is unplugged from its users and that all async operations using it are drained once it's stopped.
* Enhancement, no backport needed
Closesscylladb/scylladb#23113
* github.com:scylladb/scylladb:
main: stop system keyspace
system_keyspace: call shutdown from stop
system_keyspace: shutdown: allow calling more than once
database, compaction_manager, large_data_handler: use pluggable<system_keysapce>
utils: add class pluggable
Several updates and improvements to the retryable HTTP client functionality, as well as enhancements to error handling and integration with AWS services, as part of this PR. Below is a summary of the changes:
- Moved the retryable HTTP client functionality out of the S3 client to improve modularity and reusability across other services like AWS STS.
- Isolated the retryable_http_client into its own file, improving clarity and maintainability.
- Added a make_request method that introduces a response-skipping handler.
- Introduced a custom error handler constructor, providing greater flexibility in handling errors.
- Updated the STS and Instance Metadata Service credentials providers to utilize the new retryable HTTP client, enhancing their robustness and reliability.
- Extended the AWS error list to handle errors specific to the STS service, ensuring more granular and accurate error management for STS operations.
- Enhanced error handling for system errors returned by Seastar’s HTTP client, ensuring smoother operations.
- Properly closed the HTTP client in instance_profile_credentials_provider and sts_assume_role_credentials_provider to prevent resource leaks.
- Reduced the log severity in the retry strategy to avoid SCT test failures that occur when any log message is tagged as an ERROR.
No backport needed since we dont have any s3 related activity on the scylla side been released
Closesscylladb/scylladb#21933
* github.com:scylladb/scylladb:
s3_client: Adjust Log Severity in Retry Strategy
aws_error: Enhance error handling for AWS HTTP client
aws_error: Add STS specific error handling
credentials_providers: Close retryable clients in Credentials Providers
credentials_providers: Integrate retryable_http_client with Credentials Providers
s3_client: enhance `retryable_http_client` functionality
s3_client: isolate `retryable_http_client`
s3_client: Prepare for `retryable_http_client` relocation
s3_client: Remove `is_redirect_status` function
s3_client: Move retryable functionality out of s3 client
Before this patch, the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
1) tablets have the same size
2) shards have the same capacity
That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogeneous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.
After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assumes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.
One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.
Also, per-shard goal for tablet count is still the same for all nodes in the cluster,
so nodes with less capacity will be below limit and nodes with more capacity will
be slightly above limit. This shouldn't be a significant problem in practice, we could
compensate for this by increasing the limit.
Refs #23042Closesscylladb/scylladb#23079
* github.com:scylladb/scylladb:
tablets: Make load balancing capacity-aware
topology_coordinator: Fix confusing log message
topology_coordinator: Refresh load stats after adding a new node
topology_coordinator: Allow capacity stats to be refreshed with some nodes down
topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places
test: boost: tablets_test: Always provide capacity in load_stats
test: perf_load_balancing: Set node capacity
test: perf_load_balancing: Convert to topology_builder
config, disk_space_monitor: Allow overriding capacity via config
storage_service, tablets: Collect per-node capacity in load_stats
- Seastar's HTTP client is known to throw exceptions for various reasons, including network errors, TLS errors and other transient issues.
- Update error handling to correctly capture and process all exceptions from Seastar's HTTP client.
- Previously, only aws_exception was handled, causing retryable errors to be missed and `should_retry` not invoked.
- Now, all exceptions trigger the appropriate retry logic per the intended strategy.
- Add tests for the S3 proxy to ensure robustness and reliability of these enhancements.
Updated the AWS error list to include handling for errors specific to the STS service. This enhancement ensures more comprehensive error management for STS-related operations.
This commit moves the retryable HTTP client functionality out of the S3 client implementation. Since this functionality is also required for other services, such as AWS STS, it has been separated to ensure broader applicability.
Added utility functions to handle S3 Fully Qualified Names (FQN). These
functions enable parsing, splitting, and identification of S3 paths,
enhancing our ability to work with S3 object storage more effectively.
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#22951
Intended for testing, or hot-fixing out-of-space issues in production.
Tablet load balancer uses this information for determining per-shard load
so reducing capacity will cause tablets to be migrated away from the node.
The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.
Select everything from a table:
$ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
keyspace_name | durable_writes | replication
-------------------------------+----------------+-------------------------------------------------------------------------------------
system_replicated_keys | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
system_auth | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
system_schema | true | ({class : org.apache.cassandra.locator.LocalStrategy})
system_distributed | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
system | true | ({class : org.apache.cassandra.locator.LocalStrategy})
ks | true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
system_traces | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
system_distributed_everywhere | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):
$ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
[
{
"keyspace_name": "system_schema",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
},
{
"keyspace_name": "system",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
}
]
Select a specific field in a specific partition using the command-line:
$ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
Select a specific field in a specific partition using ``--query-file``:
$ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
$ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
New functionality: no backport needed.
Closesscylladb/scylladb#22007
* github.com:scylladb/scylladb:
docs/operating-scylla: document scylla-sstable query
test/cqlpy/test_tools.py: add tests for scylla-sstable query
test/cqlpy/test_tools.py: make scylla_sstable() return table name also
scylla-sstable: introduce the query command
tools/utils: get_selected_operation(): use std::string for operation_options
utils/rjson: streaming_writer: add RawValue()
cql3/type_json: add to_json_type()
test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
A wrapper around a shared service allowing
safe plug and unplug of the service from its user
using a phased-barrier operation permit guarding
the service while in use.
Also add a unit test for this class.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_lsa_total_space_bytes
scylla_lsa_non_lsa_used_space_bytes
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Scylla generates many metrics, and when multiplied by the number of
shards, the total number of metrics adds a significant load to a
monitoring server.
With multi-tier monitoring, it is helpful to have a smaller subset of
metrics users care about and allow them to get only those.
This patch adds two kind of labels, the a __level label, currently with
a single value, but we can add more in the future.
The second kind, is a cross feature label, curently for alternator, cdc
and cas.
We will use the __level label to mark the interesting user-facing metrics.
The current level value is:
basic - metrics for Scylla monitoring
In this phase, basic will mark all metrics used in the dashboards.
In practice, without any configuration change, Prometheus would get the
same metrics as it gets today.
While it is possible to filter by the label, e.g.:
curl http://localhost:9180/metrics?__level=basic
The labels themselves are not reported thanks to label filtering of
labels begin with __.
The feature labels:
__cdc, __cas and __alternator can be an easy way to disable a set of
metrics when not using a feature.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Fix a bug where std::same_as<...> constraint was incorrectly used as a
simple requirement instead of a nested requirement or part of a
conjunction. This caused the constraint to be always satisfied
regardless of the actual types involved.
This change promotes std::same_as<...> to a top-level constraint,
ensuring proper type checking while improving code readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23068
The tree code have const and non-const overloads for searching methods
like find(), lower_bound(), etc. Not to implement them twice, it's coded
like
const_iterator find() const {
... // the implementation itself
}
iterator find() {
return iterator(const_cast<const *>(this)->find());
}
i.e. -- const overload is called, and returned by it const_iterator is
converted into a non-const iterator. For that the latter has dedicated
constructor with two inaccuracies: it's not marked as explicit and it
accepts const rvalue reference.
This patch fixes both.
Althogh this disables implicit const -> non-const conversion of
iterators, the constructor in question is public, which still opens a
way for conversion (without const_cast<>). This constructor is better
be marked private, but there's double_decker class that uses bptree
and exploits the same hacks in its finding methods, so it needs this
constructor to be callable. Alas.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23069
Replace value-based exception catching with reference-based catching to address
GCC warnings about polymorphic type slicing:
```
warning: catching polymorphic type ‘class seastar::rpc::stream_closed’ by value [-Wcatch-value=]
```
When catching polymorphic exceptions by value, the C++ runtime copies the
thrown exception into a new instance of the specified type, slicing the
actual exception and potentially losing important information. This change
ensures all polymorphic exceptions are caught by reference to preserve the
complete exception state.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23064
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22997
Use std::to_underlying() when comparing unsigned types with enumeration values
to fix type mismatch warnings in GCC-14. This specifically addresses an issue in
utils/advanced_rpc_compressor.hh where comparing a uint8_t with 0 triggered a
'-Werror=type-limits' warning:
```
error: comparison is always false due to limited range of data type [-Werror=type-limits]
if (x < 0 || x >= static_cast<underlying>(type::COUNT))
~~^~~
```
Using std::to_underlying() provides clearer type semantics and avoids these kind
of comparison warnings. This change improves code readability while maintaining
the same behavior.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22898
Exposes the RawValue() method of the underlying rapidjson::Writer. This
method allows writing a pre-formatted json value to the stream. This
will allow using cql3/type_json.hh to pre-format CQL3 types, then write
these pre-formatted values into a json stream.
This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.
Closesscylladb/scylladb#22729
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
- Implement ScyllaRESTAPIClient::get_config() that obtains a current
value of given configuration parameter via /v2/config REST API
- Implement test to confirm that only liveupdatable parameters are
changed when SIGHUP is sent after configuration file change
Function set_initialization_completed() is called only once in main.cc,
and the effect is expected to be visible in all shards, as a side effect
of cfg->broadcast_to_all_shards() that is called shortly after. The same
technique was already used for enable_3_1_0_compatibility_mode() call.
Fixesscylladb/scylladb#5382
No backport - minor fix.
Closesscylladb/scylladb#22655
* github.com:scylladb/scylladb:
test: SIGHUP doesn't change non-liveupdatable configuration
test: implement ScyllaRESTAPIClient::get_config()
config: prevent SIGHUP from changing non-liveupdatable parameters
config: remove unused set_value_on_all_shards(const YAML::Node&)
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
Fixesscylladb/scylladb#5382
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
This change:
- Remove unused set_value_on_all_shards(const YAML::Node&) member
function in class config_file::named_value
The function logic was flawed, in a similar way
named_value<T>::set_value(const YAML::Node& node) is flawed: the config
source verification is insufficient for liveupdatable parameters,
allowing overwriting of non-liveupdatable config parameters (refer to
scylladb#5382). As the function was not used, it was removed instead of
fixing.
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.
In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.
Fixes#20322
* The test was flaky forever, so backport is required for all live versions.
Closesscylladb/scylladb#22064
* github.com:scylladb/scylladb:
tests: loading_cache_test: use manual_clock
utils: loading_cache: make clock_type a template parameter
test: loading_cache_test: use function-scope loader
test: loading_cache_test: simlute loader using sleep
test: lib: eventually: add sleep function param
test: lib: eventually: make *EVENTUALLY_EQUAL inline functions