scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 09:00:35 +00:00

Author	SHA1	Message	Date
Nikos Dragazis	eec49c4d78	utils: azure: Get access token with default credentials Attempt to detect credentials from the system. Inspired from the `DefaultAzureCredential` in the Azure C++ SDK, this credential type detects credentials from the following sources (in this order): * environment variables (SP credentials - same variables as in Azure C++ SDK) * Azure CLI * IMDS Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	937d6261c0	utils: azure: Get access token from Azure CLI Implement token request with Azure CLI. Inspired from the Azure C++ SDK's `AzureCliCredential`, this credential type attempts to run the Azure CLI in a shell and parse the token from its output. This is meant for development purposes, where a user has already installed the Azure CLI and logged in with their user account. Pass the following environment to the process: * PATH * HOME * AZURE_CONFIG_DIR Add a token factory to construct a token from the process output. Unlike in Azure Entra and IMDS, the CLI's JSON output does not contain 'expires_in', and the token key is in camel case. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	52a4bd83d5	utils: azure: Get access token from IMDS Implement token request from IMDS. No credentials are required for that - just a plain HTTP request on the IMDS token endpoint. Since the IMDS endpoint is a raw IP, it's not possible to reliably determine whether IMDS is accessible or not (i.e., whether the node is an Azure VM). Azure provides no node-local indication either. In lack of a better choice, attempt to connect and declare failure if the connection is not established within 3 seconds. Use a raw TCP socket for this check, as the HTTP client currently lacks timeout or cancellation support. Perform the check only once, during the first token refresh. For the time being, do not support nodes with multiple user-assigned managed identities. Expect the token request to fail in this case (IMDS requires the identifier of the desired Managed Identity). Add a token factory to correctly parse the HTTP response. This addresses a discrepancy between token requests on IMDS and Azure Entra - the 'expires_in' field is a string in the former and an integer in the latter. Finally, implement a fail-fast retry policy for short-lived transient errors. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	919765fb7f	utils: azure: Get access token with SP certificate Implement token request for Service Principals with a certificate. The request is the same as with a secret, except that the secret is replaced with an assertion. The assertion is a JWT that is signed with the certificate. To be consistent with the Azure C++ SDK, expect the certificate and the associated private key to be encoded in PEM format and be provided in a single file. The docs suggest using 'PS256' for the JWT's 'alg' claim. Since this is not supported by our current JWT library (jwt-cpp), use 'RS256' instead. The JWT also requires a unique identifier for the 'jti' claim. Use a random UUID for that (it should suffice for our use cases). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	a671530af6	utils: azure: Get access token with SP secret Implement token request for Service Principals with a secret. The token request requires a TLS connection. When closing the connection, do not wait for a response to the TLS `close_notify` alert. Azure's OAuth server would ignore it and the Seastar `connected_socket` would hang for 10 seconds. Add log redaction logic to not expose sensitive data from the request and response payloads. Add a token factory to parse the HTTP response. This cannot be shared with other credential types because the JSON format is not consistent. Finally, implement a fail-fast retry policy for short-lived transient errors. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	66c8ffa9bf	utils: rest: Add interface for request/response redaction logic The rest http client, currently used by the AWS and GCP key providers, logs the HTTP requests and responses unaltered. This causes some sensitive data to be exposed (plaintext data encryption keys, credentials, access tokens). Add an interface to optionally redact any sensitive data from HTTP headers and payloads. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	0d0135dc4c	utils: azure: Declare all Azure credential types The goal is to mimic the Azure C++ SDK, which offers a variety of credentials, depending on their type and source. Declare the following credentials: * Service Principal credentials * Managed Identity credentials * Azure CLI credentials * Default credentials Also, define a common exception for SP and MI credentials which are network-based. This patch only defines the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	3c4face47b	utils: azure: Define interface for Azure credentials Azure authentication is token based - the client obtains an access token with their credentials, and uses it as a bearer token to authorize requests to Azure services. Define a common API for all credential types. The API will consist of a single `get_access_token()` function that will be returning a new or a cached access token for some resource URI (defines token scope). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Nikos Dragazis	57bc51342e	utils: Introduce base64url_{encode,decode} Add helpers for base64url encoding. base64url is a variant of base64 that uses a URL-safe alphabet. It can be constructed from base64 by replacing the '+' and '/' characters with '-' and '_' respectively. Many implementations also strip the padding, although this is not required by the spec [1]. This will be used in upcoming patches for Azure Key Vault requests that require base64url-encoded payloads. [1] https://datatracker.ietf.org/doc/html/rfc4648#section-5 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-07-16 17:14:08 +03:00
Benny Halevy	0e455c0d45	utils: clear_gently: add support for sets Since set and unordered_set do not allow modifying their stored object in place, we need to first extract each object, clear it gently, and only then destroy it. To achieve that, introduce a new Extractable concept, that extracts all items in a loop and calls clear_gently on each extracted item, until the container is empty. Add respective unit tests for set and unordered_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24608	2025-07-13 12:30:45 +03:00
Pawel Pery	8d3c33f74a	utils: refactor sequential_producer as abortable This patch is a part of vector_store_client sharded service implementation for a communication with vector-store service. There is a need for abortable sequention_producer operator(). The existing operator() is changed to allow timeout argument with default time_point::max() (as current default usage) and the new operator() is created with abort_source parameter. Reference: VS-47	2025-07-08 16:29:55 +02:00
Yaniv Michael Kaul	82fba6b7c0	PowerPC: remove ppc stuff We don't even compile-test it. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#24659	2025-07-08 10:38:23 +03:00
Dawid Mędrek	a151944fa6	treewide: Replace __builtin_expect with (un)likely C++20 introduced two new attributes--likely and unlikely--that function as a built-in replacement for __builtin_expect implemented in various compilers. Since it makes code easier to read and it's an integral part of the language, there's no reason to not use it instead. Closes scylladb/scylladb#24786	2025-07-03 13:34:04 +03:00
Pavel Emelyanov	fa0077fb77	Merge 'S3 chunked download source bug fixes' from Ernest Zaslavsky - Fix missing negation in the `if` in the background downloading fiber - Add test to catch this case - Improve the s3 proxy to inject errors if the same resource requested more than once - Suppress client retry since retrying the same request when each produces multiple buffers may lead to the same data appear more than once in the buffer deque - Inject exception from the test to simulate response callback failure in the middle No need to backport anything since this class in not used yet Closes scylladb/scylladb#24657 * github.com:scylladb/scylladb: s3_test: Add s3_client test for non-retryable error handling s3_test: Add trace logging for default_retry_strategy s3_client: Fix edge case when the range is exhausted s3_client: Fix indentation in try..catch block s3_client: Stop retries in chunked download source s3_client: Enhance test coverage for retry logic s3_client: Add test for Content-Range fix s3_client: Fix missing negation s3_client: Refine logging s3_client: Improve logging placement for current_range output	2025-07-02 14:45:10 +03:00
Avi Kivity	1e0b015c8b	Merge 'cql3: Represent create_statement using managed_bytes' from Dawid Mędrek When describing a table, we need to do it carefully: if some columns were dropped, we must specify that explicitly by ``` ALTER TABLE {table} DROP {column} USING TIMESTAMP ... ``` in the result of the DESCRIBE statement. Failing to do so could lead to data resurrection. However, if a table has been altered many, many times, we might end up with a huge create statement. Constructing it could, in turn, trigger an oversized allocation. Some tests ran into that very problem in fact. In this commit, we want to mitigate the problem: instead of allocating a contiguous chunk of memory for the create statement, we use `bytes_ostream` and `managed_bytes` to possibly keep data scattered in memory. It makes handling `cql3::description` less convenient in the code, but since the struct is pretty much immediately serialized after creating it, it's a very good trade-off. A reproducer is intentionally not provided by this commit: it's easy to test the change, but adding and dropping a huge number of columns would take a really long amount of time, so we need to omit it. Fixes scylladb/scylladb#24018 Backport: all of the supported versions are affected, so we want to backport the changes there. Closes scylladb/scylladb#24151 * github.com:scylladb/scylladb: cql3/description: Serialize only rvalues of description cql3: Represent create_statement using managed_string cql3/statements/describe_statement.cc: Don't copy descriptions cql3: Use managed_bytes instead of bytes in DESCRIBE utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream	2025-07-01 21:59:38 +03:00
Ernest Zaslavsky	acf15eba8e	s3_test: Add s3_client test for non-retryable error handling Introduce a test that injects a non-retryable error and verifies that the chunked download source throws an exception as expected.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	49e8c14a86	s3_client: Fix edge case when the range is exhausted Handle case where the download loop exits after consuming all data, but before receiving an empty buffer signaling EOF. Without this, the next request is sent with a non-zero offset and zero length, resulting in "Range request cannot be satisfied" errors. Now, an empty buffer is pushed to indicate completion and exit the fiber properly.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	e50f247bf1	s3_client: Fix indentation in try..catch block Correct indentation in the `try..catch` block to improve code readability and maintain consistent formatting.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	d2d69cbc8c	s3_client: Stop retries in chunked download source Disable retries for S3 requests in the chunked download source to prevent duplicate chunks from corrupting the buffer queue. The response handler now throws an exception to bypass the retry strategy, allowing the next range to be attempted cleanly. This exception is only triggered for retryable errors; unretryable ones immediately halt further requests.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	6d9cec558a	s3_client: Fix missing negation Restore a missing `not` in a conditional check that caused incorrect behavior during S3 client execution.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	e73b83e039	s3_client: Refine logging Fix typo in log message to improve clarity and accuracy during S3 operations.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	f1d0690194	s3_client: Improve logging placement for current_range output Relocated logging to occur after determining the `current_range`, ensuring more relevant output during S3 client operations.	2025-07-01 18:45:17 +03:00
Michał Chojnowski	a29724479a	utils/alien_worker: fix a data race in submit() We move a `seastar::promise` on the external worker thread, after the matching `seastar::future` was returned to the shard. That's illegal. If the `promise` move occurs concurrently with some operation (move, await) on the `future`, it becomes a data race which could cause various kinds of corruption. This patch fixes that by keeping the promise at a stable address on the shard (inside a coroutine frame) and only passing through the worker. Fixes #24751 Closes scylladb/scylladb#24752	2025-07-01 15:13:04 +03:00
Dawid Mędrek	9cc3d49233	utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream Currently, we use `managed_bytes` to represent fragmented sequences of bytes. In some cases, the type corresponds to generic bytes, while in some other cases -- to strings of actual text. Because of that, it's very easy to get confused what use `managed_bytes` serve in a specific piece of code. We should avoid it. In this commit, we're introducing basic wrappers over `managed_bytes` and `bytes_ostream` with a promise that they represent UTF-8-encoded strings. The interface of those types are pretty basic, but they should be sufficient for the most common use: filling a stream with characters and then extracting a fragmented buffer from it.	2025-06-30 19:12:08 +02:00
Lakshmi Narayanan Sreethar	279253ffd0	utils/big_decimal: fix scale overflow when parsing values with large exponents The exponent of a big decimal string is parsed as an int32, adjusted for the removed fractional part, and stored as an int32. When parsing values like `1.23E-2147483647`, the unscaled value becomes `123`, and the scale is adjusted to `2147483647 + 2 = 2147483649`. This exceeds the int32 limit, and since the scale is stored as an int32, it overflows and wraps around, losing the value. This patch fixes that the by parsing the exponent as an int64 value and then adjusting it for the fractional part. The adjusted scale is then checked to see if it is still within int32 limits before storing. An exception is thrown if it is not within the int32 limits. Note that strings with exponents that exceed the int32 range, like `0.01E2147483650`, were previously not parseable as a big decimal. They are now accepted if the final adjusted scale fits within int32 limits. For the above value, unscaled_value = 1 and scale = -2147483648, so it is now accepted. This is in line with how Java's `BigDecimal` parses strings. Fixes: #24581 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#24640	2025-06-26 15:29:28 +03:00
Szymon Malewski	f28bab741d	utils/exceptions.cc: Added check for `exceptions::request_timeout_exception` in `is_timeout_exception` function. It solves the issue, where in some cases a timeout exceptions in CAS operations are logged incorrectly as a general failure. Fixes #24591 Closes scylladb/scylladb#24619	2025-06-26 12:25:38 +02:00
Marcin Maliszkiewicz	45392ac29e	utils: don't allow do discard updateable_value observer If the object returned from observe() is destructured, it stops observing, potentially causing subtle bugs. Typically, the observer object is retained as a class member.	2025-06-23 17:54:01 +02:00
Pavel Emelyanov	dc166be663	s3: Mark claimed_buffer constructor noexcept It just std::move-s a buffer and a semaphore_units objects, both moves are noexcept, so is the constructor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24552	2025-06-18 20:36:45 +03:00
Pavel Emelyanov	b0766d1e73	Merge 's3_client: Refactor `range` class for state validation' from Ernest Zaslavsky Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333 No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880 Closes scylladb/scylladb#24312 * github.com:scylladb/scylladb: s3_client: headers cleanup s3_client: Refactor `range` class for state validation	2025-06-17 10:34:55 +03:00
Ernest Zaslavsky	e398576795	s3_client: Fix hang in get() on EOF by signaling condition variable * Ensure _get_cv.signal() is called when an empty buffer received * Prevents `get()` from stalling indefinitely while waiting on EOF * Found when testing https://github.com/scylladb/scylladb/pull/23695 Closes scylladb/scylladb#24490	2025-06-17 10:33:19 +03:00
Calle Wilund	4a98c258f6	http: Add missing thread_local specifier for static Refs #24447 Patch adding this somehow managed to leave out the thread_local specifier. While gnutls cert object can be shared across shards just fine, the actual shared_ptr here cannot, thus we could cause memory errors. Closes scylladb/scylladb#24514	2025-06-17 10:23:52 +03:00
Ernest Zaslavsky	1b20e0be4a	s3_client: headers cleanup	2025-06-16 16:02:30 +03:00
Ernest Zaslavsky	9ad7a456fe	s3_client: Refactor `range` class for state validation Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.	2025-06-16 16:02:24 +03:00
Ernest Zaslavsky	2b300c8eb9	s3_client: Improve reporting of S3 client statistics Revise how we report statistics for `chunked_download_source`. Ensure metrics for downloaded but unconsumed data are visible, as they do not contribute to read amplification, which is tracked separately. Closes scylladb/scylladb#24491	2025-06-16 09:33:57 +03:00
Ernest Zaslavsky	30199552ac	s3_client: Mitigate connection exhaustion in `download_source` The existing `download_source` implementation optimizes performance by keeping the connection to S3 open and draining data directly from the socket. While this eliminates the overhead (60-100ms) of repeatedly establishing new connections, it leads to rapid exhaustion of client- side connections. On a single shard, two `mx_readers` for load and stream are enough to trigger this issue. Since each client typically holds two connections, readers keeping index and data sources open can cause deadlocks where processes stall due to unavailable connections. Introduce `chunked_download_source`, a new S3 download method built on `download_source`, to dynamically manage connections: - Buffers data in 5MiB chunks using a producer-consumer model - Closes connections once buffers reach capacity, returning them to the pool for other clients - Uses a filling fiber that resumes fetching once buffers are consumed from the queue Performance remains comparable to `download_source`, achieving 95MiB/s for sequential 1GiB downloads from S3. However, preloading large chunks may cause read amplification. Fixes: https://github.com/scylladb/scylladb/issues/23785 Closes scylladb/scylladb#23880	2025-06-10 12:58:24 +03:00
Calle Wilund	80feb8b676	utils::http::dns_connection_factory: Use a shared certificate_credentials Fixes #24447 This factory type, which is really more a data holder/connection producer per connection instance, creates, if using https, a new certificate_credentials on every instance. Which when used by S3 client is per client and scheduling groups. Which eventually means that we will do a set_system_trust + "cold" handshake for every tls connection created this way. This will cause both IO and cold/expensive certificate checking -> possible stalls/wasted CPU. Since the credentials object in question is literally a "just trust system", it could very well be shared across the shard. This PR adds a thread local static cached credentials object and uses this instead. Could consider moving this to seastar, but maybe this is too much. Closes scylladb/scylladb#24448	2025-06-10 11:20:21 +03:00
Benny Halevy	8b387109fc	disk_space_monitor: add space_source_registration Register the current space_source_fn in an RAII object that resets monitor._space_source to the previous function when the RAII object is destroyed. Use space_source_registration in database_test:: mutation_dump_generated_schema_deterministic_id_version to prevent use-after-stack-return in the test. Fixes #24314 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24342	2025-06-04 16:25:24 +03:00
Calle Wilund	942477ecd9	encryption/utils: Move encryption httpclient to "general" REST client Fixed #24296 While the HTTP client used for REST calls in AWS/GCP KMS integration (EAR) is not general enough to be called a HTTP client as such, it is general enough to be called a REST client (limited to stateless, single-op REST calls). Other code, like general auth integrations (hello Azure) and similar could reuse this to lessen code duplication. This patch simply moves the httpclient class from encryption to "rest" namespace, and explicitly "limits" it to such usage. Making an alias in encryption to avoid touching more files than needed. Closes scylladb/scylladb#24297	2025-05-30 12:21:51 +03:00
Avi Kivity	f0ec9dd8f2	Merge 'utils/logalloc: enforce the max contiguous allocation size limit' from Michał Chojnowski This series fixes the only known violation of logalloc's allocation size limits (in `chunked_managed_vector`), and then it make those limits hard. Before the series, LSA handles overly-large allocations by forwarding them to the standard allocator. After the series, an attempt to do an overly large allocations via LSA will trigger an `on_internal_error` instead. We do this because the allocator fallback logic turned out to have subtle and problematic accounting bugs. We could fix them, or we can remove the mechanism altogether. It's hard to say which choice is better. This PR arbitrarily makes the choice to remove the mechanism. This makes the logic simpler, at the risk of escalating some allocation size bugs to crashes. See the descriptions of individual commits for more details. Fixes scylladb/scylladb#23850 Fixes scylladb/scylladb#23851 Fixes scylladb/scylladb#23854 I'm not sure if any of this should be backported or not. The `chunked_managed_vector` fix could be backported, because it's a bugfix. It's an old bug, though, and we have never observed problems related to it. The changes to `logalloc` aren't supposed to be fixing any observable problem, so a backport probably has more risk than benefit in this case. Closes scylladb/scylladb#23944 * github.com:scylladb/scylladb: utils/logalloc: enforce LSA allocation size limits utils/lsa/chunked_managed_vector: fix the calculation of max_chunk_capacity()	2025-05-29 22:11:41 +03:00
Michał Chojnowski	cb02d47b10	utils/logalloc: enforce LSA allocation size limits In order to guarantee a decent upper limit on fragmentation, LSA only handles allocations smaller than 0.1 of a segment. Allocations larger than this limit are permitted, but they are not placed in LSA segments. Instead, they are forwarded to the standard allocator. We don't really have any use case for this "fallback". As far as I can tell, it only exists for "historical" reasons, from times where there were some data structures which weren't fully adapted to LSA yet. We don't the fallback to be used. Long-lived standard allocations are undesirable. They have higher internal fragmentation than LSA allocations, and they can cause external fragmentation in the standard allocator. So we want to eliminate them all. The only reason to keep the fallback is to soften the impact if some bug results in limit-exceeding LSA allocations happening in production. In principle, the fallback turns a crash (or something similarly drastic) into just a performance problem. However, it turns out that the fallback is buggy. Recently we had a bug which caused limit-exceeding LSA allocations to happen. And then it turned out that LSA reclaim doesn't deal fully correctly with evictable non-LSA allocations, and the dirty_memory_manager accounting for non-LSA allocations is completely wrong. This resulted in subtle, serious, and hard to understand stability problems in production. Arguably the biggest problem is that the "fallback" allocations weren't reported in any way. They were happening in some tests, but they were silently permitted, so nobody noticed that they should be eliminated. If we just had a rate-limited error log that reports fallback allocations, they would have never got into a release. So maybe we could fix the fallback, add more tests for it, add a warning for when it's used, and keep it. But this PR instead opts for removing the fallback mechanism altogether and failing fast. After the patch, if a non-conforming allocation happens, it will trigger an `on_internal_error`. With this, we risk a greater impact if some non-conforming allocations happen in production, but we make the system simpler. It's hard to say if it's a good tradeoff.	2025-05-29 13:05:08 +02:00
Michał Chojnowski	185a032044	utils/stream_compressor: allocate memory for zstd compressors externally The default and recommended way to use zstd compressors is to let zstd allocate and free memory for compressors on its own. That's what we did for zstd compressors used in RPC compression. But it turns out that it generates allocation patterns we dislike. We expected zstd not to generate allocations after the context object is initialized, but it turns out that it tries to downsize the context sometimes (by reallocation). We don't want that because the allocations generated by zstd are large (1 MiB with the parameters we use), so repeating them periodically stresses the reclaimer. We can avoid this by using the "static context" API of zstd, in which the memory for context is allocated manually by the user of the library. In this mode, zstd doesn't allocate anything on its own. The implementation details of this patch adds a consideration for forward compatibility: later versions of Scylla can't use a window size greater than the one we hardcoded in this patch when talking to the old version of the decompressor. (This is not a problem, since those compressors are only used for RPC compression at the moment, where cross-version communication can be prevented by bumping COMPRESSOR_NAME. But it's something that the developer who changes the window size must _remember_ to do). Fixes #24160 Fixes #24183 Closes scylladb/scylladb#24161	2025-05-27 12:43:11 +03:00
Avi Kivity	13a75ff835	utils: chunked_vector: add swap() method Following std::vector(), we implement swap(). It's a simple matter of swapping all the contents. A unit test is added.	2025-05-14 16:19:40 +03:00
Avi Kivity	24e0d17def	utils: chunked_vector: add range insert() overloads Inserts an iterator range at some position. Again we insert the range at the end and use std::rotate() to move the newly inserted elements into place, forgoing possible optimizations. Unit tests are added.	2025-05-14 16:19:40 +03:00
Avi Kivity	9425a3c242	utils: chunked_vector: relax static_assert chunked_vector is only implemented for types with a non-throwing move constructor; this greatly simplifies the implementation. We have a static_assert to enforce it (should really be a constraint, but chunked_vector predates C++ concepts). This static_assert prevents forward declarations from compiling: class forward_declared; using a = utils::chunked_vector<forward_declared>; `a` won't compile since the static_assert will be instantiated and will fail since forward_declared is an incomplete type. Using a constraint has the same problem. Fix by moving the static_assert to the destructor. The destructor won't be instantiated by the forward declaration, so it won't trigger. It will trigger when someone destroys the vector; at this point the types are no longer forward declared.	2025-05-14 16:19:40 +03:00
Avi Kivity	d6eefce145	utils: chunked_vector: implement erase() for single elements and ranges Implement using std::rotate() and resize(). The elements to be erased are rotated to the end, then resized out of existence. Again we defer optimization for trivially copyable types. Unit tests are added. Needed for range_streamer with token_ranges using chunked_vector.	2025-05-14 16:19:37 +03:00
Avi Kivity	5301f3d0b5	utils: chunked_vector: implement insert() for single-element inserts partition_range_compat's unwrap() needs insert if we are to use it for chunked_vector (which we do). Implement using push_back() and std::rotate(). emplace(iterator, args) is also implemented, though the benefit is diluted (it will be moved after construction). The implementation isn't optimal - if T is trivially copyable then using std::memmove() will be much faster that std::rotate(), but this complex optimization is left for later. Unit tests are added.	2025-05-14 14:54:59 +03:00
Michał Chojnowski	c47f438db3	logalloc: make background_reclaimer::free_memory_threshold publicly visible Wanted by the change to the background_reclaim test in the next patch.	2025-05-06 18:59:18 +02:00
Pavel Emelyanov	b56d6fbb84	Merge 'sstables: Fix quadratic space complexity in partitioned_sstable_set' from Raphael Raph Carvalho Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with. A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range. Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled). There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried. And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them. Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario. It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly. Fixes #23634. We can backport this into 2024.2, 2025.1, but we should let this cook in master for 1 month or so. Closes scylladb/scylladb#23806 * github.com:scylladb/scylladb: test: Verify partitioned set store split and unsplit correctly sstables: Fix quadratic space complexity in partitioned_sstable_set compaction: Wire table_state into make_sstable_set() compaction: Introduce token_range() to table_state dht: Add overlap_ratio() for token range	2025-05-05 11:28:38 +03:00
Piotr Dulikowski	8ffe4b0308	utils::loading_cache: gracefully skip timer if gate closed The loading_cache has a periodic timer which acquires the _timer_reads_gate. The stop() method first closes the gate and then cancels the timer - this order is necessary because the timer is re-armed under the gate. However, the timer callback does not check whether the gate was closed but tries to acquire it, which might result in unhandled exception which is logged with ERROR severity. Fix the timer callback by acquiring access to the gate at the beginning and gracefully returning if the gate is closed. Even though the gate used to be entered in the middle of the callback, it does not make sense to execute the timer's logic at all if the cache is being stopped. Fixes: scylladb/scylladb#23951 Closes scylladb/scylladb#23952	2025-04-30 16:43:22 +03:00
Raphael S. Carvalho	d5bee4c814	test: Verify partitioned set store split and unsplit correctly Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-04-29 15:47:33 -03:00

1 2 3 4 5 ...

1967 Commits