scylladb

Author	SHA1	Message	Date
Avi Kivity	76cf5148e1	Merge 'message: introduce advanced rpc compression' from Michał Chojnowski This is a forward port (from scylla-enterprise) of additional compression options (zstd, dictionaries shared across messages) for inter-node network traffic. It works as follows: After the patch, messaging_service (Scylla's interface for all inter-node communication) compresses its network traffic with compressors managed by the new advanced_rpc_compression::tracker. Those compressors compress with lz4, but can also be configured to use zstd as long as a CPU usage limit isn't crossed. A precomputed compression dictionary can be fed to the tracker. Each connection handled by the tracker will then start a negotiation with the other end to switch to this dictionary, and when it succeeds, the connection will start being compressed using that dictionary. All traffic going through the tracker is passed as a single merged "stream" through dict_sampler. dictionary_service has access to the dict_sampler. On chosen nodes (in the "usual" configuration: the Raft leader), it uses the sampler to maintain a random multi-megabyte sample of the sampler's stream. Every several minutes, it copies the sample, trains a compression dictionary on it (by calling zstd's training library via the alien_worker thread) and publishes the new dictionary to system.dicts via Raft's write_mutation command. This update triggers (eventually) a callback on all nodes, which feeds the new dictionary to advanced_rpc_compression::tracker, and this switches (eventually) all inter-node connections to this dictionary. Closes scylladb/scylladb#22032 * github.com:scylladb/scylladb: messaging_service: use advanced_rpc_compression::tracker for compression message/dictionary_service: introduce dictionary_service service: make Raft group 0 aware of system.dicts db/system_keyspace: add system.dicts utils: add advanced_rpc_compressor utils: add dict_trainer utils: introduce reservoir_sampling utils: introduce alien_worker utils: add stream_compressor	2024-12-31 15:02:57 +02:00
Benny Halevy	4af522f61e	utils: small_vector: expose internal_capacity() So we can use it for defining other small_vector deriving their internal capacity from another small_vector type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-24 12:19:20 +02:00
Michał Chojnowski	0fd1050784	utils: add advanced_rpc_compressor Adds glue needed to pass lz4 and zstd with streaming and/or dictionaries as the network traffic compressors for Seastar's RPC servers. The main jobs of this glue are: 1. Implementing the API expected by Seastar from RPC compressors. 2. Expose metrics about the effectiveness of the compression. 3. Allow dynamically switching algorithms and dictionaries on a running connection, without any extra waits. The biggest design decision here is that the choice of algorithm and dictionary is negotiated by both sides of the connection, not dictated unilaterally by the sender. The negotiation algorithm is fairly complicated (a TLA+ model validating it is included in the commit). Unilateral compression choice would be much simpler. However, negotiation avoids re-sending the same dictionary over every connection in the cluster after dictionary updates (with one-way communication, it's the only reliable way to ensure that our receiver possesses the dictionary we are about to start using), lets receivers ask for a cheaper compression mode if they want, and lets them refuse to update a dictionary if they don't think they have enough free memory for that. In hindsight, those properties probably weren't worth the extra complexity and extra development effort. Zstd can be quite expensive, so this patch also includes a mechanism which temporarily downgrades the compressor from zstd to lz4 if zstd has been using too much CPU in a given slice of time. But it should be noted that this can't be treated as a reliable "protection" from negative performance effects of zstd, since a downgrade can happen on the sender side, and receivers are at the mercy of senders.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	5294762ac7	utils: add dict_trainer	2024-12-23 23:37:02 +01:00
Michał Chojnowski	9de52b1c98	utils: introduce reservoir_sampling We are planning to improve some usages of compression in Scylla (in which we compress small blocks of data) by pre-training compression dictionaries on similar data seen so far. For example, many RPC messages have similar structure (and likely similar data), so the similarity could be exploited for better compression. This can be achieved e.g. by training a dictionary on the RPC traffic, and compressing subsequent RPC messages against that dictionary. To work well, the training should be fed a representative sample of the compressible data. Such a sample can be approached by taking a random subset (of some given reasonable size) of the data, with uniform probability. For our purposes, we need an online algorithm for this -- one which can select the random k-subset from a stream of arbitrary size (e.g. all RPC traffic over an hour), while requiring only the necessary minimum of memory. This is a known problem, called "reservoir sampling". This PR introduces `reservoir_sampler`, which implements an optimal algorithm for reservoir sampling. Additionally, it introduces `page_sampler` -- a wrapper for `reservoir_sampler`, which uses it to select a random sample of pages from a stream of bytes.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	d301c29af5	utils: introduce alien_worker Introduces a util which launches a new OS thread and accepts callables for concurrent execution. Meant to be created once at startup and used until shutdown, for running nonpreemptible, 3rd party, non-interactive code. Note: this new utility is almost identical to wasm::alien_thread_runner. Maybe we should unify them.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	866326efe4	utils: add stream_compressor Adds utilities for "advanced" methods of compression with lz4 and zstd -- with streaming (a history buffer persisted across messages) and/or precomputed dictionaries. This patch is mostly just glue needed to use the underlying libraries with discontiguous input and output buffers, and for reusing the same compressor context objects across messages. It doesn't contain any innovations of its own. There is one "design decision" in the patch. The block format of LZ4 doesn't contain the length of the compressed blocks. At decompression time, that length must be delivered to the decompressor by a channel separate to the compressed block itself. In `lz4_cstream`, we deal with that by prepending a variable-length integer containing the compressed size to each compressed block. This is suboptimal for single-fragment messages, since the user of lz4_cstream is likely going to remember the length of the whole message anyway, which makes the length prepended to the block redundant. But a loss of 1 byte is probably acceptable for most uses.	2024-12-23 23:28:12 +01:00
Pavel Emelyanov	bb094cc099	Merge 'Make restore task abortable' from Calle Wilund Fixes #20717 Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data. Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()". Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader. There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored. Closes scylladb/scylladb#21567 * github.com:scylladb/scylladb: test_backup: Add restore abort test case sstables_loader: Make restore task abortable distributed_loader: Add optional abort_source to get_sstables_from_object_store s3_storage: Add optional abort_source to params/object s3::client: Make "readable_file" abortable	2024-12-19 12:23:33 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	48c8d24345	treewide: drop support for fmt < v10 since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also, we've switched to the build image based on fedora 40, which ships fmt-devel v10.2.1, there is no need to support fmt < 10. in this change, we drop the support fmt < 10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21847	2024-12-09 20:42:38 +02:00
Botond Dénes	b6a9c79af3	utils/big_decimal: add fast paths to operator <=> Currently, the tri-compare operator for big_decimal (operator <=>), uses a precise but potentially very expensive algorithm for comparing the numbers: it first brings them to the same scale, then compares the normalized unscaled values. big_decimal has abritrary precisions, therefore the stored numbers can be arbitrarily large. In extreme cases, comparing two numbers can result in huge amount of memory allocated and stalls. If this type is used int he primary key of a table, these comparisons can make the node completely unresponsive. This patch adds the following fast-paths to operator <=>: * An early return for the case of equal scales. * An early return for different signs. * An early return for the case where one or both of the numbers are 0. * A fast algorithm for detecting the case where the there is a big difference between the two numbers. This algorithm works only with the scales and is able to compare the two numbers by using only one division and some additions and substractions. This algorithm is imprecise and when the numbers are closer than its confidence window, it will fall-back to the current slow but precise tri-compare. All but the last case should have been fast before as well, but the scale-compare algorithm makes a huge difference. Numbers, which would previously make the node unresponsive, now compare in constant-time. Fixes: scylladb/scylladb#21716 Closes scylladb/scylladb#21715	2024-12-03 14:56:51 +02:00
Calle Wilund	af4dd1f2cb	s3::client: Make "readable_file" abortable Adds optional abortable source to "readable_file" interface. Note: the abortable aspect is not preserved across a "dup()" call however, since these objects are generally not used in a cross-shard fashion, it should be ok.	2024-12-02 12:30:24 +00:00
Pavel Emelyanov	4d10cd40f0	s3: Remove unused boost/algorithm/string/classification.hpp inclusion Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21690	2024-11-26 19:51:21 +02:00
Avi Kivity	29497f8c5d	Merge 'Automatically compute schema version of system tables' from Tomasz Grabiec Schema of system tables is defined statically and table_schema_version needs to be explicitly set in code like this: ``` builder.with_version(system_keyspace::generate_schema_version(table_id, version_offset)); ``` Whenever schema is changed, the schema version needs to change, otherwise we hit undefined behavior when trying to interpret mutation data created with the old schema using the new schema. It's not obvious that one needs to do that and developers often forget to do that. There were several instances of mistakes of omission, some caught during review, some not, e.g.: `31ea74b96e`. This patch changes definitions to call the new `schema_builder::with_hash_version()`, which will make the schema builder compute version from schema definition so that changes of the schema will automatically change the version. This way we no longer rely on the developer to remember to bump the version offset. All nodes should arrive at the same version, which is verified by existing `test_group0_schema_versioning` and a new unit test: `test_system_schema_version_is_stable`. Closes scylladb/scylladb#21602 * github.com:scylladb/scylladb: system_tables: Compute schema version automatically schema_builder: Introduce with_hash_version() schema: Store raw_view_info in schema::raw_schema schema: Remove dead comment hashing: Add hasher for unordered_map hashing: Add hasher for unique_ptr hashing: Add hasher for double [avi: add missing include <memory> to hashing.hh]	2024-11-24 18:44:32 +02:00
Nadav Har'El	e639434a89	change remaining sstring_view to std::string_view Our "sstring_view" is an historic alias for the standard std::string_view. The patch changes the last remaining random uses of this old alias across our source directory to the standard type name. After this patch, there are no more uses of the "sstring_view" alias. It will be removed in the following patch. Refs #4062. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-11-18 16:48:57 +02:00
Tomasz Grabiec	5dbbbf6300	hashing: Add hasher for unordered_map	2024-11-15 19:16:40 +01:00
Tomasz Grabiec	8209b301a3	hashing: Add hasher for unique_ptr	2024-11-15 19:16:40 +01:00
Tomasz Grabiec	a2c3b9a038	hashing: Add hasher for double	2024-11-15 19:16:40 +01:00
Kefu Chai	00810e6a01	treewide: include seastar/core/format.hh instead of seastar/core/print.hh The later includes the former and in addition to `seastar::format()`, `print.hh` also provides helpers like `seastar::fprint()` and `seastar::print()`, which are deprecated and not used by scylladb. Previously, we include `seastar/core/print.hh` for using `seastar::format()`. and in seastar 5b04939e, we extracted `seastar::format()` into `seastar/core/format.hh`. this allows us to include a much smaller header. In this change, we just include `seastar/core/format.hh` in place of `seastar/core/print.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21574	2024-11-14 17:45:07 +02:00
Tomasz Grabiec	1d0c6aa26f	utils: UUID: Make get_time_UUID() respect the clock offset schema_change_test currently fails due to failure to start a cql test env in unit tests after the point where this is called (in one of the test cases): forward_jump_clocks(std::chrono::seconds(606024*31)); The problem manifests with a failure to join the cluster due to missing_column exception ("missing_column: done") being thrown from system_keyspace::get_topology_request_state(). It's a symptom of join request being missing in system.topology_requests. It's missing because the row is expired. When request is created, we insert the mutations with intended TTL of 1 month. The actual TTL value is computed like this: ttl_opt topology_request_tracking_mutation_builder::ttl() const { return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1) - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()); } _ts comes from the request_id, which is supposed to be a timeuuid set from current time when request starts. It's set using utils::UUID_gen::get_time_UUID(). It reads the system clock without adding the clock offset, so after forward_jump_clocks(), _ts and gc_clock::now() may be far off. In some cases the accumulated offset is larger than 1month and the ttl becomes negative, causing the request row to expire immediately and failing the boot sequence. The fix is to use db_clock, which respects offsets and is consistent with gc_clock. The test doesn't fail in CI becuase there each test case runs in a separate process, so there is no bootstrap attempt (by new cql test env) after forward_jump_clocks(). Closes scylladb/scylladb#21558	2024-11-14 10:32:07 +02:00
Pavel Emelyanov	57af69e15f	Merge 'Add retries to the S3 client' from Ernest Zaslavsky 1. Add `retry_strategy` interface and default implementation for exponential back-off retry strategy. 2. Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body. 3. Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Introduce `aws_exception` to carry the original `aws_error`. 4. Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy. 5. Add tests to cover retries, and retry exhaustion. Also add tests for jumbo upload. 6. Add the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client. Switch the `s3_test` to use the S3 proxy. `s3_tests` set afloat `put_object` problem that was causing segmentation when retrying, fixed. 7. Extend the `s3_test` to use both `minio` and `proxy` configurations. 8. Add parameter to the proxy to seed the error injection randomization to make it replayable. fixes: #20611 fixes: #20613 Closes scylladb/scylladb#21054 * github.com:scylladb/scylladb: aws_errors: Make error messages more verbose. test: Make the minio proxy randomization re-playable test/boost/s3_test: add error injection scenarios to existing test suite test: Switch `s3_test` to use proxy test: Add more tests client: Stop returning error on `DELETE` in multipart upload abortion client: Fix sigsegv when retrying client: Add retries client: Adjust `map_s3_client_exception` to return exception instance aws_errors: Change aws_error::parse to return std::optional<> aws_errors: Add http errors mapping into aws_error client: Add aws_exception mapping aws_error: Add `aws_exeption` to carry original `aws_error` aws_errors: Add new error codes client: Introduce retry strategy	2024-11-11 08:35:55 +03:00
Kefu Chai	aebb532906	bytes, utils: include fmt/iostream.h and iostream when appropriate in seastar e96932b05f394b27cd0101e24f0584736795b50f, we stopped including unused `fmt/ostream.h`. this helped to reduce the header dependency. but this also broke the build of scylladb, as we rely on the `fmt/ostream.h` indirectly included by seastar's header project. in this change, we include `fmt/iostream.h` and `iostream` explictly when we are using the declarations in them. this enables us to - bump up the seastar submodule - potentially reduce the header dependency as we will be able to include seastar/core/format.hh instead of a more bloated seastar/core/print.hh after bumping up seastar submodule Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21494	2024-11-08 16:43:25 +03:00
Ernest Zaslavsky	029837a4a1	aws_errors: Make error messages more verbose. Add more information to the error messages to make the failure reason clearer. Also add tests to check exceptions propagated from s3 client failure.	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	7fd1ff8d79	client: Stop returning error on `DELETE` in multipart upload abortion Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	064a239180	client: Fix sigsegv when retrying Stop moving the `file` into the `make_file_input_stream` since it will try to use it again on retry	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	dc6e4c0d97	client: Add retries Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Also move the expected http status check to the `make_s3_error_handler` since the http::client::make_request call is done with `nullopt` - we want to manage all the aws errors handling in s3 client to prevent the http client to validate it and fail before we have a chance to analyze the error properly	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	244635ebd8	client: Adjust `map_s3_client_exception` to return exception instance "Unfuturize" the `map_s3_client_exception` since the retryable client is going to be implemented using coroutines and no `future` is needed here, just to save unnecessary `co_await` on it	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	bd3d4ed417	aws_errors: Change aws_error::parse to return std::optional<> Change aws_error::parse to return std::optional<> to signify that no error was found in the response body	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	58decef509	aws_errors: Add http errors mapping into aws_error Add http errors mapping into aws_error since the retry strategy is going to operate on aws_error and should not be aware of HTTP status codes	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	fa9e8b7ed0	client: Add aws_exception mapping Map aws_exceptions in `map_s3_client_exception`, will be needed in retryable client calls to remap newly added AWS errors to `storage_io_error`	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	54e250a6f1	aws_error: Add `aws_exeption` to carry original `aws_error` Add `aws_exeption` to carry original `aws_error` for proper error handling in retryable s3 client	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	e6ff34046f	aws_errors: Add new error codes Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	8dbe351888	client: Introduce retry strategy Add `retry_strategy` interface and default implementation for exponential back-off retry strategy	2024-11-07 21:01:25 +02:00
Avi Kivity	9e67649fe5	utils: loading_cache: tighten clock sampling Sample the clock once to avoid the filter returning different results. Range algorithms may use multiple passes, so it's better to return consistent results. Closes scylladb/scylladb#21400	2024-11-07 10:28:01 +03:00
Pavel Emelyanov	49949092ad	Merge 'Make s3 client ops use abort source + use in backup task' from Calle Wilund Fixes #20716 Adds optional abort_source to all s3 client operations. If provided, will propagate to actual HTTP client and allow for aborting actual net op. Note: this uses an abort source per call, not a client-local one. This is for two reasons: 1.) The usage pattern of the client object is to create it outside the eventual owning object (task) that hosts the relevant abort source 2.) It is quite possible to want to have different/no abort source for some operation usage. Also adds forward usage of task abort_source in backup tasks upload s3 call, making it more readily abort-able. Closes scylladb/scylladb#21431 * github.com:scylladb/scylladb: backup_task: Use task abort source in s3 client call s3::client: Make operations (individually) abortable	2024-11-07 10:03:25 +03:00
Kefu Chai	6efde20939	utils/to_string: do not include fmt/ostream.h to_string.hh does not use this header, neither is it obliged to expose the content of this header. so, let's remove this include. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21440	2024-11-06 17:21:29 +03:00
Calle Wilund	3321820c67	s3::client: Make operations (individually) abortable Refs #20716 Adds optional abort_source to all s3 client operations. If provided, will propagate to actual HTTP client and allow for aborting actual net op. Note: this uses an abort source per call, not a client-local one. This is for two reasons: 1.) The usage pattern of the client object is to create it outside the eventual owning object (task) that hosts the relevant abort source 2.) It is quite possible to want to have different/no abort source for some operation usage.	2024-11-05 14:23:24 +00:00
Pavel Emelyanov	440c1e3e3f	error_injection: Remove unused inject(sleep, then invoke) overload The overload was introduced by `a8b14b0227` (utils: add timeout error injection with lambda), but is only used by the test nowadays. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21377	2024-11-05 09:56:08 +02:00
Pavel Emelyanov	292fd52a60	Merge 'utils: chunked_vector: various constructor improvements' from Avi Kivity Optimize the various constructors a little, and add an std::from_range_t constructor. Minor improvement, so no backports. Closes scylladb/scylladb#21399 * github.com:scylladb/scylladb: utils: chunked_vector: add from_range_t constructor utils: chunked_vector: optimize initializer_list constructor utils: chunked_vector: iterator constructor: copy spanwise utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction	2024-11-01 15:02:56 +03:00
Botond Dénes	0ee0dd3ef4	Merge 'Collect and report backup progress' from Pavel Emelyanov Task manager GET /status method returns two counters that reflect task progress -- total and completed. To make caller reason about their meaning, additionally there's progress_units field next to those counters. This patch implements this progress report for backup task. The units are bytes, the total counter is total size of files that are being uploaded, and the completed counter is total amount of bytes successfully sent with PUT requests. To get the counters, the client::upload_file() is extended to calculate those. fixes #20653 Closes scylladb/scylladb#21144 * github.com:scylladb/scylladb: backup_task: Report uploading progress s3/client: Account upload progress for real s3/client: Introduce upload_progress s3: Extract client_fwd.hh	2024-11-01 10:57:12 +02:00
Avi Kivity	6a9852d47b	utils: chunked_vector: add from_range_t constructor std::ranges::to<> has a little protocol with containers. Implement it to get optimized construction. Similar to the iterator pair constructor, if the range's size can be obtained (even with an O(N) algorithm), favor that to avoid reallocations. Copy elements spanwise to promote optimization to memcpy when possible.	2024-10-31 19:32:16 +02:00
Avi Kivity	b2769403d2	utils: chunked_vector: optimize initializer_list constructor Delegate to the previously optimized iterator-pair constructor.	2024-10-31 18:10:14 +02:00
Avi Kivity	0a81be4321	utils: chunked_vector: iterator constructor: copy spanwise Instead of copying element-by-element, copy contiguous spans. This is much faster if the input is a span and the constructor is trivial, since the whole thing translates to a memcpy. Make the two branches constexpr to reduce work for the compiler in optimizing the other branch away.	2024-10-31 18:10:08 +02:00
Avi Kivity	4653430c8e	utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction For a forward iterator, prefer a two pass algorithm to first count the number of elements, reserver, then copy the elements, to a single pass algorithm that involves reallocation and copying.	2024-10-31 17:55:42 +02:00
Pavel Emelyanov	c1432f3657	error_injection: Add inject() overload with wait_for_message wrapper The wrapper object denotes that injection should run a handler and wait_for_message() on it. Wrapper carries the timeout used to call the mentioned method. It's currently unused, next patches will start enjoing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Avi Kivity	020ccbd76a	Merge 'utils: cached_file: Mark permit as awaiting on page miss' from Tomasz Grabiec Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522). But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or partition index IO involved because that IO will signal read concurrency semaphore to invite more concurrency. Fixes #21325 Closes scylladb/scylladb#21323 * github.com:scylladb/scylladb: utils: cached_file: Mark permit as awaiting on page miss utils: cached_file: Push resource_unit management down to cached_file	2024-10-29 16:15:21 +02:00
Pavel Emelyanov	2efcfc13e8	s3/client: Account upload progress for real Before upload starts file size is checked, so this is the place that updates progress.total counter. Uploading a file happens by reading unit_size bytes from file input stream and writing the buffer into http body writer stream. This is the place to update progress.uploaded counter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Pavel Emelyanov	51e03b1025	s3/client: Introduce upload_progress This is a structure with "total" and "uploaded" counters that's passed by user to client::upload_file() method so that client would update it with the progress. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Pavel Emelyanov	f9a5e02b53	s3: Extract client_fwd.hh This is to export some simple structures to users without the need to include client.hh itself (rather large already) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Pavel Emelyanov	b09bb6bc19	error_injection: Re-use enter() code in inject() overloads Most of inject() overloads check if the injection is enabled, then optionally clear the one-shot one, then do the injection. Everything but doing the injection is implemented in the enter() method, it's perfectly worth re-using one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21285	2024-10-28 21:37:20 +02:00

1 2 3 4 5 ...

1817 Commits