Commit Graph

1507 Commits

Author SHA1 Message Date
Avi Kivity
d9a453e72e Merge 'Introduce a scylla-native nodetool' from Botond Dénes
This series introduces a scylla-native nodetool.  It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands.
For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable..

Closes #14841

* github.com:scylladb/scylladb:
  test: add nodetool tests
  test.py: add ToolTestSuite and ToolTest
  tools/scylla-nodetool: implement compact operation
  tools/scylla-nodetool: implement basic scylla_rest_api_client
  tools: introduce scylla-nodetool
  utils: export dns_connection_factory from s3/client.cc to http.hh
  utils/s3/client: pass logger to dns_connection_factory in constructor
  tools/utils: tool_app_template::run_async(): also detect --help* as --help
2023-09-14 17:20:40 +03:00
Botond Dénes
bf2fad3c00 utils: export dns_connection_factory from s3/client.cc to http.hh
So others can use it too. Move headers only used by said class too.
2023-09-14 05:25:14 -04:00
Botond Dénes
17fd57390e utils/s3/client: pass logger to dns_connection_factory in constructor
We want to publish this class in a header so it can be used by others,
but it uses the s3 logger. We don't want future users to pollute the s3
logs, so allow users to pass their own loggers to the factory.
2023-09-14 05:25:14 -04:00
Botond Dénes
cc16502691 Merge 'Add metrics to S3 client' from Pavel Emelyanov
The added metrics include:

- http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far
- IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time)

fixes: #13369

Closes #14494

* github.com:scylladb/scylladb:
  s3/client: Add IO stats metrics
  s3/client: Add HTTP client metrics
  s3/client: Split make_request()
  s3/client: Wrap http client with struct group_client
  s3/client: Move client::stats to namespace scope
  s3/client: Keep part size local variable
2023-09-14 09:49:08 +03:00
Kefu Chai
87088b65b6 util: replace <tab> with spaces
to be aligned with seastar's coding-style.md: scylladb uses seastar's
coding-style.md. so let's adhere to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15345
2023-09-11 14:38:46 +03:00
Kefu Chai
ce291f4385 s3/client: do not use deprecated tls::connect() overload
seastar has deprecated the overload which accepts `server_name`,
let's use the one which accepts `tls::tls_options`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15324
2023-09-08 18:44:45 +03:00
Pavel Emelyanov
308db51306 s3/client: Add IO stats metrics
These metrics mimic the existing IO ones -- total number of read
operation, total number of read bytes and total read delay. And the same
for writing.

This patch makes no difference between wrting object with plain PUT vs
putting it with multipart uploading. Instead, it "measures" individual
IO writes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
91235a84cd s3/client: Add HTTP client metrics
Currently an http client has several exported "numbers" regarding the
number of transport connections the client uses. This patch exports
those via S3 client's per-sched-group metrics and prepares the ground
for more metrics in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
08a12cd4a6 s3/client: Split make_request()
There will appear another make_request() helper that'll do mostly the
same. This split will help to avoid code duplication

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
4b548dd240 s3/client: Wrap http client with struct group_client
The http-client is per-sched-group. Next patch will need to keep metrics
per-sched-group too and this sched-group -> http-client map is the good
place to put them on. Wrapping struct will allow extending it with
metrics

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
627c1932e4 s3/client: Move client::stats to namespace scope
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
896b582850 s3/client: Keep part size local variable
This serves two purposes. First, it fixes potential use-after-move since
the bufs are moved on lambda and bufs.size() are called in the same
statement with no defined evaluation order.

Second, this makes 'size' varable alive up to the time request is
complete thus making it possible to update stats with it (later patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Dawid Medrek
c7fe5d7f94 utils/lister: Limit the API of scan_dir() to fs::path
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.

This commit introduces the more constrained API.

Closes #15266
2023-09-05 20:50:42 +03:00
Benny Halevy
eb51b70e6d utils: atomic_vector: mark for_each functions as const
They only need to access the _vec_lock rwlock
so mark it as mutable, but otherwise they provide a const
interface to the calls, as the called func receives
the entries by value and it cannot modify them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:14:38 +03:00
Michał Chojnowski
c7d9d35030 utils: cached_file: deglobalize cached_file metrics
Move cached_file metrics from a thread_local variable
to cache_tracker.

This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
50b429f255 config: add index_cache_fraction
Adds a configurable upper limit to memory usage by index caches.
See the source code comments added in this patch for more details.

This patch shouldn't change visible behaviour, because the limit is set to 1.0
by default, so it is never triggerred. We will change the default in a future
patch.
2023-09-01 22:34:23 +02:00
Michał Chojnowski
6a7ce6781e utils: lru: add move semantics to list links
Before the patch, fixing list links is done manually in the move constructor of
`evictable`. After the patch, it is done by the move constructors of the links
themselves.

This makes for slightly cleaner code, especially after we add more links in an
upcoming patch.
2023-09-01 22:34:23 +02:00
Piotr Smaroń
34c3688017 db: config: add live_updatable_config_params_changeable_via_cql option
If `live_updatable_config_params_changeable_via_cql` is set to true, configuration parameters defined with `liveness::LiveUpdate` option can be updated in the runtime with CQL, i.e. by updating `system.config` virtual table.
If we don't want any configuration parameter to be changed in the
runtime by updating `system.config` virtual table, this option should be
set to false. This option should be set to false for e.g. cloud users,
who can only perform CQL queries, and should not be able to change
scylla's configuration on the fly.

Current implemenatation is generic, but has a small drawback - messages
returned to the user can be not fully accurate, consider:
```
cqlsh> UPDATE system.config SET value='2' WHERE name='task_ttl_in_seconds';
WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="option is not live-updateable" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
```
where `task_ttl_in_seconds` has been defined with
`liveness::LiveUpdate`, but because `live_updatable_config_params_changeable_via_cql` is set to
`false` in `scylla.yaml,` `task_ttl_in_seconds` cannot be modified in the
runtime by updating `system.config` virtual table.

Fixes #14355

Closes #14382
2023-08-16 17:56:27 +03:00
Pavel Emelyanov
3c6686e181 bptree: Replace assert with static_assert
The one runs under checked constexpr value anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14951
2023-08-06 16:36:12 +03:00
Kamil Braun
39ca07c49b Merge 'Gossiper endpoint locking' from Benny Halevy
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.

We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.

An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.

Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471

Closes #14845

* github.com:scylladb/scylladb:
  gossiper: replicate: ensure non-null permit
  gossiper: add_saved_endpoint: lock_endpoint
  gossiper: mark_as_shutdown: lock_endpoint
  gossiper: real_mark_alive: lock_endpoint
  gossiper: advertise_token_removed: lock_endpoint
  gossiper: do_status_check: lock_endpoint
  gossiper: remove_endpoint: lock_endpoint if needed
  gossiper: force_remove_endpoint: lock_endpoint if needed
  storage_service: lock_endpoint when removing node
  gossiper: use permit_id to serialize state changes while preventing deadlocks
  gossiper: lock_endpoint: add debug messages
  utils: UUID: make default tagged_uuid ctor constexpr
  gossiper: lock_endpoint must be called on shard 0
  gossiper: replicate: simplify interface
  gossiper: mark_as_shutdown: make private
  gossiper: convict: make private
  gossiper: mark_as_shutdown: do not call convict
2023-08-02 13:50:08 +02:00
Benny Halevy
929d03b370 utils: UUID: make default tagged_uuid ctor constexpr
So it can be used for gms::null_permit_id in the next patch

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 19:29:18 +03:00
Benny Halevy
60862c63dd utils/directories: verify_owner_and_mode: add recursive flag
Allow the caller to verify only the top level directories
so that sub-directories can be verified selectively
(in particular, skip validation of snapshots).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 16:01:43 +03:00
Raphael S. Carvalho
050ce9ef1d cached_file: Evict unused pages that aren't linked to LRU yet
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes #14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14818
2023-07-27 00:01:46 +02:00
Kefu Chai
a8254111ef utils: drop operator<< for pretty printers
since all callers of these operators have switched to fmt formatters.
let's drop them. the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-17 14:02:13 +08:00
Kefu Chai
fc6b84ec1f utils: add fmt formatter for pretty printers
add fmt formatter for `utils::pretty_printed_data_size` and
`utils::pretty_printed_throughput`.

this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `utils::pretty_printed_data_size` and
`utils::pretty_printed_throughput` without the help of `operator<<`.

please note, despite that it's more popular to use the IEC prefixes
when presenting the size of storage, i.e., MiB for 1024**2 bytes instead
of MB for 1000**2 bytes, we are still using the SI binary prefixes as
the default binary prefix, in order to preserve the existing behavior.

also, we use the singular form of "byte" when formating "1". this is
more correct.

the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-17 14:02:13 +08:00
Kefu Chai
567b453689 utils: avoid using out-of-range index in pretty_printers
before this change, if the formatter size is greater than a pettabyte,
`exp` would be 6. but we still use it as the index to find the suffix
in `suffixes`, but the array's size is 6. so we would be referencing
random bits after "PB" for the suffix of the formatted size.

in this change

* loop in the suffix for better readability. and to avoid
  the off-by-one errors.
* add tests for both pretty printers

Branches: 5.1,5.2,5.3
Fixes #14702
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14713
2023-07-16 18:46:09 +03:00
Mikołaj Grzebieluch
b165f1e88b utils: error injection: check if it is an ongoing one-shot injection in is_enabled
Change it for consistency with `enabled_injections`.

Closes #14597
2023-07-13 15:56:33 +02:00
Kamil Braun
a2fe63349d Merge 'utils: error injection: add a string-to-string map of injection's parameters' from Mikołaj Grzebieluch
Add `parameters` map to `injection_shared_data`. Now tests can attach
string data to injections that can be read in injected code via
`injection_handler`.

Closes #14521

Closes #14608

* github.com:scylladb/scylladb:
  tests: add a `parameters` argument to code that enables injections
  api/error_injection: add passing injection's parameters to enable endpoint
  tests: utils: error injection: add test for injection's parameters
  utils: error injection: add a string-to-string map of injection's parameters
  utils: error injection: rename received_messages_counter to injection_shared_data
2023-07-13 11:52:15 +02:00
Mikołaj Grzebieluch
f60580ab3e utils: error injection: add a string-to-string map of injection's parameters
Add `parameters` map. Now tests can attach string data to
injections that can be read in injected code via `injection_handler`.
2023-07-13 10:10:52 +02:00
Mikołaj Grzebieluch
b33714a0f0 utils: error injection: rename received_messages_counter to injection_shared_data
For now, `received_messages_counter` have only data for messaging the injection.
In future, there will be more data to keep, for example, a string-to-string map of
injection's parameters.

Rename this class and its attributes.
2023-07-13 10:10:52 +02:00
Kamil Braun
9d4b3c6036 test: use correct timestamp resolution in test_group0_history_clearing_old_entries
In 10c1f1dc80 I fixed
`make_group0_history_state_id_mutation` to use correct timestamp
resolution (microseconds instead of milliseconds) which was supposed to
fix the flakiness of `test_group0_history_clearing_old_entries`.

Unfortunately, the test is still flaky, although now it's failing at a
later step -- this is because I was sloppy and I didn't adjust this
second part of the test to also use microsecond resolution. The test is
counting the number of entries in the `system.group0_history` table that
are older than a certain timestamp, but it's doing the counting using
millisecond resolution, causing it to give results that are off by one
sometimes.

Fix it by using microseconds everywhere.

Fixes #14653

Closes #14670
2023-07-13 10:33:52 +03:00
Tomasz Grabiec
e8ee0a2f86 Merge 'group0_state_machine: use correct comparison for timeuuids in merger' from Kamil Braun
In d2a4079bbe, `merger` was modified so that when we merge a command, `last_group0_state_id` is taken to be the maximum of the merged command's state_id and the current `last_group0_state_id`. This is necessary for achieving the same behavior as if the commands were applied individually instead of being merged -- where we take the maximum state ID from `group0_history` table which was applied until now (because the table is sorted using the state IDs and we take the greatest row).

However, a subtle bug was introduced -- the `std::max` function uses the `utils::UUID` standard comparison operator which is unfortunately not the same as timeuuid comparison that Scylla performs when sorting the `group0_history` table. So in rare cases it could return the *smaller* of the two timeuuids w.r.t. the correct timeuuid ordering. This would then lead to commands being applied which should have been turned to no-ops due to the `prev_state_id` check -- and then, for example, permanent schema desync or worse.

Fix it by using the correct comparison method.

Fixes: #14600

Closes #14616

* github.com:scylladb/scylladb:
  utils/UUID: reference `timeuuid_tri_compare` in `UUID::operator<=>` comment
  group0_state_machine: use correct comparison for timeuuids in `merger`
  utils/UUID: introduce `timeuuid_tri_compare` for `const UUID&`
  utils/UUID: introduce `timeuuid_tri_compare` for `const int8_t*`
2023-07-12 14:48:18 +02:00
Kamil Braun
051728318d utils/UUID: reference timeuuid_tri_compare in UUID::operator<=> comment 2023-07-11 13:19:50 +02:00
Kamil Braun
5ce802676f utils/UUID: introduce timeuuid_tri_compare for const UUID&
The existing `timeuuid_tri_compare` operates on UUIDs serialized in byte
buffers. Introduce a version which operates directly on the
`utils::UUID` type.

To reuse existing comparison code, we serialize to a buffer before
comparing. But we avoid allocations by using `std::array`. Since the
serialized size needs to be known at compile time for `std::array`, mark
`UUID::serialized_size()` as `constexpr`.
2023-07-11 11:48:02 +02:00
Kamil Braun
668beedadc utils/UUID: introduce timeuuid_tri_compare for const int8_t*
`timeuuid_tri_compare` takes `bytes_view` parameters and converts them
to `const int8_t*` before comparing.

Extract the part that operates on `const int8_t*` to separate function
which we will reuse in a later commit.
2023-07-11 11:48:02 +02:00
Kefu Chai
ef78b31b43 s3/client: add tagging ops
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.

* test/pylib/minio_server.py: enable anonymous user to perform
  more actions. because the tagging related ops are not enabled by
  "mc anonymous set public", we have to enable them using "set-json"
  subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14486
2023-07-11 09:30:46 +03:00
Kefu Chai
0dca0a7f27 build: cmake: include pretty_printers.cc in util
we added pretty_printers.cc back in
83c70ac04f, in which configure.py is
updated. so let's sync the CMake building system accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14442
2023-07-11 09:16:33 +03:00
Avi Kivity
0cabf4eeb9 build: disable implicit fallthrough
Prevent switch case statements from falling through without annotation
([[fallthrough]]) proving that this was intended.

Existing intended cases were annotated.

Closes #14607
2023-07-10 19:36:06 +02:00
Kefu Chai
26dcfea84a estimated_histogram: do not use dynamic format_string
fmtlib allows us to specify the field width dynamically, so specify
the field width in the same statement formatting the argument improves
the readability. and use the constexpr fmt string allows us to switch
to compile-time formatter supported by fmtlib v8.

this change also use `fmt::print()` to format the argument right to
the output ostream, instead of creating a temporary sstring, and
copy it to the output ostream.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14579
2023-07-08 15:10:41 +03:00
Mikołaj Grzebieluch
086b3369f4 utils: error injection: add inject_with_handler for interactions with injected code
Currently, it is hard for injected code to wait for some events, for example,
requests on some REST endpoint.

This commit adds the `inject_with_handler` method that executes injected function
and passes `injection_handler` as its argument.
The `injection_handler` class is used to wait for events inside the injected code.
The `error_injection` class can notify the injection's handler or handlers
associated with the injection on all shards about the received message.
There is a counter of received messages in `received_messages_counter`; it is shared
between the injection_data, which is created once when enabling an injection on
a given shard, and all `injection_handler`s, that are created separately for each
firing of this injection. The `counter` is incremented when receiving a message from
the REST endpoint and the condition variable is signaled.
Each `injection_handler` (separate for each firing) stores its own private counter,
`_read_messages_counter` that private counter is incremented whenever we wait for a
message, and compared to the received counter. We sleep on the condition variable
if not enough messages were received.
2023-07-06 12:32:07 +02:00
Mikołaj Grzebieluch
01bc6f5294 utils: error injection: create structure for error injections data
This enables holding additional data associated with the injection.
2023-07-05 13:52:46 +02:00
Raphael S. Carvalho
83c70ac04f utils: Extract pretty printers into a header
Can be easily reused elsewhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-06-26 21:58:20 -03:00
Petr Gusev
1e851262f2 storage_proxy: handler responses, use pointers to default constructed values instead of nulls
The current Seastar RPC infrastructure lacks support
for null values in tuples in handler responses.
In this commit we add the make_default_rpc_tuple function,
which solves the problem by returning pointers to
default-constructed values for smart pointer types
rather than nulls.

The problem was introduced in this commit
 2d791a5ed4. The
 function `encode_replica_exception_for_rpc` used
 `default_tuple_maker` callback to create tuples
 containing exceptions. Callers returned pointers
 to default-constructed values in this callback,
 e.g. `foreign_ptr(make_lw_shared<reconcilable_result>())`.
 The commit changed this to just `SourceTuple{}`,
 which means nullptr for pointer types.

Fixes: #14282

Closes #14352
2023-06-26 11:10:38 +03:00
Tomasz Grabiec
ad6d2b42f2 test: Extract throttle object to separate header 2023-06-21 00:58:24 +02:00
Botond Dénes
ddf8547f25 Merge 'Add concurrency control and workload isolation for S3 client' from Pavel Emelyanov
In its current state s3 client uses a single default-configured http client thus making different sched classes' workload compete with each other for sockets to make requests on. There's an attempt to handle that in upload-sink implementation that limits itself with some small number of concurrent PUT requests, but that doesn't help much as many sinks don't share this limit.

This PR makes S3 client maintain a set of http clients, one per sched-group, configures maximum number of TCP connections proportional to group's shares and removes the artificial limit from sinks thus making them share the group's http concurrency limit.

As a side effect, the upload-sink fixes the no-writes-after-flush protection -- if it's violated, write will result in exception, while currently it just hangs on a semaphore forever.

fixes: #13458
fixes: #13320
fixes: #13021

Closes #14187

* github.com:scylladb/scylladb:
  s3/client: Replace skink flush semaphore with gate
  s3/client: Configure different max-connections on http clients
  s3/client: Maintain several http clients on-board
  s3/client: Remove now unused http reference from sink and file
  s3/client: Add make_request() method
2023-06-20 07:09:21 +03:00
Petr Gusev
2d791a5ed4 storage_proxy.cc: refactor encode_replica_exception_for_rpc
We are going to add fencing to read RPCs, it would be easier
to do it once for all three of them. This refactoring
enables this since it allows to use
encode_replica_exception_for_rpc for handle_read_digest.
2023-06-15 15:52:50 +04:00
Pavel Emelyanov
c1c1752f88 s3/client: Replace skink flush semaphore with gate
Uploading sinks have internal semaphore limiting the maximum number of
uploading parts and pieces with the value of two. This approach has
several drawbacks.

1. The number is random. It could as well be three, four and any other

2. Jumbo upload in fact violates this parallelizm, because it applies to
   maximum number of pieces _and_ maximum number of parts in each piece
   that can be uploaded in parallels. Thus jumbo upload results in four
   parts in parallel.

3. Multiple uploads don't sync with each other, so uploading N objects
   would result in N * 2 (or even N * 4 with jumbo) uploads in parallel.

4. Single upload could benefit from using more sockets if no other
   uploads happen in parallel. IOW -- limit should be shard-wide, not
   single-upload-wide

Previous patches already put the per-shard parallelizm under (some)
control, so this semaphore is in fact used as a way to collect
background uploading fibers on final flush and thus can be replaced with
a gate.

As a side effect, this fixes an issue that writes-after-flush shouldn't
happen (see #13320) -- when flushed the upload gate is closed and
subsequent writes would hit gate-closed error.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:38:57 +03:00
Pavel Emelyanov
99b92f0ed8 s3/client: Configure different max-connections on http clients
After previous patch different sched groups got different http clients.
By default each client is started with 100 allowed connections. This can
be too much -- 100 * nr-sched-groups * smp::count can be quite huge
number. Also, different groups should have different parallelizm, e.g.
flush/compaction doesn't care that much about latency and can use fewer
sockets while query class is more welcome to have larger concurrency.

As a starter -- configure http clients with maximum shares/100 sockets.
Thus query class would have 10 and flush/compaction -- 1.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:35:59 +03:00
Pavel Emelyanov
81d1bfce2a s3/client: Maintain several http clients on-board
The intent is to isolate workloads from different sched groups from each
other and not let one sched group consume all sockets from the http
client thus affecting requests made by other sched groups.

The conention happens in the maximim number of socket an http client may
have (see scylladb/seastar#1652). If requests take time and client is
asked to make more and more it will eventually stop spawning new
connections and would get blocked internally waiting for running
requests to complete and put a socket back to pool. If a sched group
workload (e.g. -- memtable flush) consumes all the available sockets
then workload from another group (e.g. -- query) would be blocked thus
spoiling its latency (which is poor on its own, but still)

After this change S3 client maintains a sched_group:http_client map
thus making sure different sched groups don't clash with each other so
that e.g. query requests don't wait for flush/compaction to release a
socket.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:28:55 +03:00
Pavel Emelyanov
a8492a065b s3/client: Remove now unused http reference from sink and file
Now these two classes use client-> calls and don't need the http&
shortcut

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:28:30 +03:00