scylladb

Author	SHA1	Message	Date
Kefu Chai	82cac8e7cf	treewide: s/std::source_location/seastar::compact::source_location/ CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports an issue on how the default argument is evaluated. this problem is more obvious when it comes to how `std::source_location::current()` is evaluated as a default argument. but not all compilers have the same behavior, see https://godbolt.org/z/PK865KdG4. notebaly, clang-15 evaluates the default argument at the callee site. so we need to check the capability of compiler and fall back to the one defined by util/source_location-compat.hh if the compiler suffers from CWG 2631. and clang-16 implemented CWG2631 in https://reviews.llvm.org/D136554. But unfortunately, this change was not backported to clang-15. before switching over to clang-16, for using std::source_location::current() as the default parameter and expect the behavior defined by CWG2631, we have to use the compatible layer provided by Seastar. otherwise we always end up having the source_location at the callee side, which is not interesting under most circumstances. so in this change, all places using the idiom of passing std::source_location::current() as the default parameter are changed to use seastar::compat::source_location::current(). despite that we have `#include "seastarx.h"` for opening the seastar namespace, to disambiguate the "namespace compat" defined somewhere in scylladb, the fully qualified name of `seastar::compat::source_location::current()` is used. see also `09a3c63345`, where we used std::source_location as an alias of std::experimental::source_location if it was available. but this does not apply to the settings of our current toolchain, where we have GCC-12 and Clang-15. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14086	2023-05-30 15:10:12 +03:00
Avi Kivity	2303f08eea	utils: logalloc: correct asan_interface.h location It's a system header, so it deserves angle brackets. Closes #14036	2023-05-29 23:03:25 +03:00
Pavel Emelyanov	2eb88945ea	utils: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:53:14 +03:00
Pavel Emelyanov	4ebb812df0	utils: Coroutinize verify_owner_and_mode() There's a helper verification_error() that prints a warning and returns excpetional future. The one is converted into void throwing one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-26 18:52:15 +03:00
Botond Dénes	5a14c3311a	Merge 'Break S3 upload 50Gb file limit' from Pavel Emelyanov Current S3 uploading sink has implicit limit for the final file size that comes from two places. First, S3 protocol declares that uploading parts count from 1 to 10000 (inclusive). Second, uploading sink sends out parts once they grow above S3 minimal part size which is 5Mb. Since sstables puts data in 128kb (or smaller) portions, parts are almost exactly 5Mb in size, so the total uploading size cannot grow above ~50Gb. That's too low. To break the limit the new sink (called jumbo sink) uses the UploadPartCopy S3 call that helps splicing several objects into one right on the server. Jumbo sink starts uploading parts into an intermediate temporary object called a piece and named ${original_object}_${piece_number}. When the number of parts in current piece grows above the configured limit the piece is finalized and upload-copied into the object as its next part, then deleted. This happens in the background, meanwhile the new piece is created and subsequent data is put into it. When the sink is flushed the current piece is flushed as is and also squashed into the object. The new jumbo sink is capable of uploading ~500Tb of data, which looks enough. fixes: #13019 Closes #13577 * github.com:scylladb/scylladb: sstables: Switch data and index sink to use jumbo uploader s3/test: Tune-up multipart upload test alignment s3/test: Add jumbo upload test s3/client: Wait for background upload fiber on close-abort c3/client: Implement jumbo upload sink s3/client: Move memory buffers to upload_sink from base s3/client: Move last part upload out of finalize_upload() s3/client: Merge do_flush() with upload_part() s3/client: Rename upload_sink -> upload_sink_base	2023-05-25 11:44:06 +03:00
Petr Gusev	79c6bf0885	clear_gently: remove noexcept for rvalue references overload We use this overload in vnode_erm, one of the arguments is boost::icl::interval_map, whose move constructor is not noexcept.	2023-05-24 12:08:19 +04:00
Petr Gusev	e0bc98a217	sequenced_set: add extract_vector method Can be useful if we want to reuse the vector when we are done with this sequenced_set instance.	2023-05-21 11:33:38 +04:00
Petr Gusev	700eb90ed8	stall_free.hh: add clear_gently for rvalues	2023-05-21 11:33:33 +04:00
Petr Gusev	4a127c3782	stall_free.hh: relax Container requirement We don't use the return value of erase, so we can allow it to return anything. We'll need this for ring_mapping, since boost::icl::interval_map::erase(it) returns void.	2023-05-19 22:11:09 +04:00
Pavel Emelyanov	908d0d2e6a	s3/client: Wait for background upload fiber on close-abort When uploading a part (and a piece) there can be one or more background fibers handling the upload. In case client needs to abort the operation it calls .close() without flush()ing. In this case the S3 API Abort is made and the sink can be terminated. It's expected that background fibers would resolve on their own eventually, but it's not quite the case. First, they hold units for the semaphore and the semaphore should be alive by the time units are returned. Second, the PUT (or copy) request can finish successfully and it may be sitting in the reactor queue waiting for its continuation to get scheduler. The continuation references sink via "this" capture to put the part etag. Finally, in case of piece uploading the copy fiber needs _client at the end to issue delete-object API call dropping the no longer needed part. Said that -- background fibers must be waited upon on .close() if the closing is aborting (if it's successfull close, then the fibers mush have been picked up by final flush() call). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:23:18 +03:00
Pavel Emelyanov	f9686926c2	c3/client: Implement jumbo upload sink The sink is also in charge of uploading large objects in parts, but this time each part is put with the help of upload-part-copy API call, not the regular upload-part one. To make it work the new sink inherits from the uploading base class, but instead of keeping memory_data_sink_buffers with parts it keeps a sink to upload a temporary intermediate object with parts. When the object is "full", i.e. the number of parts in it hits the limit, the object is flushed, then copied into the target object with the S3 API call, then deletes the intermediate object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:23:18 +03:00
Pavel Emelyanov	8fa3294ae1	s3/client: Move memory buffers to upload_sink from base All the buffers manipulations now happen in the upload_sink class and the respective member can be removed from base class. The base class only messes with the buffers in its upload_part() call, but that's unavoidable, as uploading part implies sending its contents which sits in buffers. Now the base class can be re-used for uploading parts with the help of copy-part API call (next patches) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:19:50 +03:00
Pavel Emelyanov	2ac5ecd659	s3/client: Move last part upload out of finalize_upload() This change has two reasons. First, is to facilitate moving the memory_data_sink_buffers from base class, i.e. -- continuation of the previous patch. Also this fixes a corner case -- if final sink flush happens right after the previous part was sent for uploading, the finalization doesn't happen and sink closing aborts the upload even if it was successful. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:19:50 +03:00
Pavel Emelyanov	407b40c430	s3/client: Merge do_flush() with upload_part() The do_flush() helper is practically useless because what it does can be done by the upload_part() itself. This merge also facilitates moving the memory_data_sink_buffers from base class to uploader class by next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:19:50 +03:00
Pavel Emelyanov	a88629227f	s3/client: Rename upload_sink -> upload_sink_base There will appear another sink that would implement multipart upload with the help of copy-part functionality. Current uploading code is going to be partially re-used, so this patch moves all of it into the base class in advance. Next patches will pick needed parts. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:19:50 +03:00
Benny Halevy	a70b53b6e7	utils: tagged_integer: implement std::numeric_limits::{min,max} Add add a respective unit test. It turns out that numeric_limits defines an implicit implementation for std::numeric_limits<utils::tagged_integer<Tag, ValueType>> which apprently returns a default-constructed tagged_integer for min() and max(), and this broke `gms::heart_beat_state::force_highest_possible_version_unsafe()` since `4cdad8bc8b` (merged in `7f04d8231d`) Implementing min/max correctly Fixes #13801 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-15 10:19:39 +03:00
Benny Halevy	1b5d5205c8	test: add tagged_integer_test Add basic test for tagged+integer arithmetic operations. Remove const qualifier from `tagged_integer::operator[+-]=` as these are add/sub-assign operators that need to modify the value in place. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-14 23:26:58 +03:00
Tomasz Grabiec	a91e83fad6	Merge "issue raft read barrier before pulling schema" from Gleb Schema pull may fail because the pull does not contain everything that is needed to instantiate a schema pointer. For instance it does not contain a keyspace. This series changes the code to issue raft read barrier before the pull which will guaranty that the keyspace is created before the actual schema pull is performed.	2023-05-14 14:14:24 +03:00
Avi Kivity	0a78995e2b	Merge 'Share s3 clients between sstables' from Pavel Emelyanov Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server. In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks. There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard. 1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file 2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s) 3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top. refs: #13458 refs: #13369 refs: scylladb/seastar#1652 Closes #13859 * github.com:scylladb/scylladb: s3: Pick client from manager via handle s3: Generalize s3 file handle s3: Live-update clients' configs sstables: Keep clients shared across sstables storage_manager: Rewrap config map sstables, database: Move object storage config maintenance onto storage_manager sstables: Introduce sharded<storage_manager>	2023-05-14 14:14:23 +03:00
Pavel Emelyanov	613acba5d0	s3: Pick client from manager via handle Add the global-factory onto the client that is - cross-shard copyable - generates a client from local storage_manager by given endpoint With that the s3 file handle is fixed and also picks up shared s3 clients from the storage manager instead of creating its own one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Pavel Emelyanov	8ed9716f59	s3: Generalize s3 file handle Currently the s3 file handle tries to carry client's info via explicit host name and endpoint config pointer. This is buggy, the latter pointer is shard-local can cannot be transferred across shards. This patch prepares the fix by abstracting the client handle part. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Pavel Emelyanov	63ff6744d8	s3: Live-update clients' configs Now when the client is accessible directli via the storage_manager, when the latter is requested to update its endpoint config, it can kick the client to do the same. The latter, in turn, can only update the AWS creds info for now. The endpoint port and https usage are immutable for now. Also, updating the endpoint address is not possible, but for another reason -- the endpoint itself is the part of keyspace configuration and updating one in the object_storage.yaml will have no effect on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Gleb Natapov	091ec285fe	serialized_action: make serialized_action abortable Add an ability to abort waiting for a result of a specific trigger() invocation.	2023-05-11 16:31:23 +03:00
Nadav Har'El	e57252092c	Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity CQL evolved several expression evaluation mechanisms: WHERE clause, selectors (the SELECT clause), and the LWT IF clause are just some examples. Most now use expressions, which use managed_bytes_opt as the underlying value representation, but selectors still use bytes_opt. This poses two problems: 1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency 2. trying to use expressions with bytes_opt will incur a copy, reducing performance To solve the problem, we harmonize the data types to managed_bytes_opt (#13216 notwithstanding). This is somewhat difficult since the source of the values are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view are mostly compatible so with a little effort this can be done. The series is neutral wrt performance: before: ``` 222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors) 224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors) 223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors) ``` after: ``` 220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors) 225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors) 222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors) 224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors) 225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors) ``` Though I expect with some more effort we can eliminate some copies. Closes #13637 * github.com:scylladb/scylladb: cql3: untyped_result_set: switch to managed_bytes_view as the cell type cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt cql3: untyped_result_set: always own data types: abstract_type: add mixed-type versions of compare() and equal() utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt utils: managed_bytes: add managed_bytes_view::with_linearized() utils: managed_bytes: mark managed_bytes_view::is_linearized() const	2023-05-10 15:01:45 +03:00
Kamil Braun	7d9ab44e81	Merge 'token_metadata: read remapping for write_both_read_new' from Gusev Petr When new nodes are added or existing nodes are deleted, the topology state machine needs to shunt reads from the old nodes to the new ones. This happens in the `write_both_read_new` state. The problem is that previously this state was not handled in any way in `token_metadata` and the read nodes were only changed when the topology state machine reached the final 'owned' state. To handle `write_both_read_new` an additional `interval_map` inside `token_metadata` is maintained similar to `pending_endpoints`. It maps the ranges affected by the ongoing topology change operation to replicas which should be used for reading. When topology state sm reaches the point when it needs to switch reads to a new topology, it passes `request_read_new=true` in a call to `update_pending_ranges`. This forces `update_pending_ranges` to compute the ranges based on new topology and store them to the `interval_map`. On the data plane, when a read on coordinator needs to decide which endpoints to use, it first consults this `interval_map` in `token_metadata`, and only if it doesn't contain a range for current token it uses normal endpoints from `effective_replication_map`. Closes #13376 * github.com:scylladb/scylladb: storage_proxy, storage_service: use new read endpoints storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading token_metadata: add unit test for endpoints_for_reading token_metadata: add endpoints for reading sequenced_set: add extract_set method token_metadata_impl: extract maybe_migration_endpoints helper function token_metadata_impl: introduce migration_info token_metadata_impl: refactor update_pending_ranges token_metadata: add unit tests token_metadata: fix indentation token_metadata_impl: return unique_ptr from clone functions	2023-05-10 10:03:30 +02:00
Avi Kivity	550aa01242	Merge 'Restore raft::internal::tagged_uint64 type' from Benny Halevy Change `f5f566bdd8` introduced tagged_integer and replaced raft::internal::tagged_uint64 with utils::tagged_integer. However, the idl type for raft::internal::tagged_uint64 was not marked as final, but utils::tagged_integer is, breaking the on-the-wire compatibility. This change restores the use of raft::internal::tagged_uint64 for the raft types and adds back an idl definition for it that is not marked as final, similar to the way raft::internal::tagged_id extends utils::tagged_uuid. Fixes #13752 Closes #13774 * github.com:scylladb/scylladb: raft, idl: restore internal::tagged_uint64 type raft: define term_t as a tagged uint64_t idl: gossip_digest: include required headers	2023-05-09 22:51:25 +03:00
Petr Gusev	b2e5d8c21c	sequenced_set: add extract_set method Can be useful if we want to reuse the set when we are done with this sequenced_set instance.	2023-05-09 13:56:38 +04:00
Benny Halevy	adfb79ba3e	raft, idl: restore internal::tagged_uint64 type Change `f5f566bdd8` introduced tagged_integer and replaced raft::internal::tagged_uint64 with utils::tagged_integer. However, the idl type for raft::internal::tagged_uint64 was not marked as final, but utils::tagged_integer is, breaking the on-the-wire compatibility. This change defines the different raft tagged_uint64 types in idl/raft_storage.idl.hh as non-final to restore the way they were serialized prior to `f5f566bdd8` Fixes #13752 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-09 12:38:20 +03:00
Botond Dénes	ab5fd0f750	Merge 's3: Provide timestamps in the s3 file implementation' from Raphael "Raph" Carvalho SSTable relies on st.st_mtime for providing creation time of data file, which in turn is used by features like tombstone compaction. Therefore, let's implement it. Fixes https://github.com/scylladb/scylladb/issues/13649. Closes #13713 * github.com:scylladb/scylladb: s3: Provide timestamps in the s3 file implementation s3: Introduce get_object_stats() s3: introduce get_object_header()	2023-05-08 11:43:41 +03:00
Raphael S. Carvalho	ad471e5846	s3: Provide timestamps in the s3 file implementation SSTable relies on st.st_mtime for providing creation time of data file, which in turn is used by features like tombstone compaction. Fixes #13649. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:51:12 -03:00
Raphael S. Carvalho	57661f0392	s3: Introduce get_object_stats() get_object_stats() will be used for retrieving content size and also last modified. The latter is required for filling st_mtim, etc, in the s3::client::readable_file::stat() method. Refs #13649. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:51:10 -03:00
Raphael S. Carvalho	da2ccc44a4	s3: introduce get_object_header() This allows other functions to reuse the code to retrieve the object header. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:49:52 -03:00
Kefu Chai	5fa459bd1a	treewide: do not include unused header since #13452, we switched most of the caller sites from std::regex to boost::regex. in this change, all occurences of `#include <regex>` are dropped unless std::regex is used in the same source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13765	2023-05-07 19:01:29 +03:00
Kefu Chai	468460718a	utils: UUID: drop uint64_t_tri_compare() functinoality wise, `uint64_t_tri_compare()` is identical to the three-way comparison operator, so no need to keep it. in this change, it is dropped in favor of <=>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13794	2023-05-07 18:07:49 +03:00
Avi Kivity	11d651b606	utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view The codebase evolved to have several different ways to hold a fragmented buffer: fragmented_temporary_buffer (for data received from the network; not relevant for this discussion); bytes_ostream (for fragmented data that is built incrementally; also used for a serialized result_set), and managed_bytes (used for lsa and serialized individual values in expression evaluation). One problem with this state of affairs is that using data in one fragmented form with functions that accept another fragmented form requires either a copy, or templating everything. The former is unpalatable for fast-path code, and the latter is undesirable for compile time and run-time code footprint. So we'd like to make the various forms compatible. In `53e0dc7530` ("bytes_ostream: base on managed_bytes") we changed bytes_ostream to have the same underlying data structure as managed_bytes, so all that remains is to add the right API. This is somewhat difficult as the data is hidden in multiple layers: ser::buffer_view<> is used to abstract a slice of bytes_ostream, and this is further abstracted by using iterators into bytes_ostream rather than directly using the internals. Likewise, it's impossible to construct a managed_bytes_view from the internals. Hack through all of these by adding extract_implementation() methods, and a build_managed_bytes_view_from_internals() helper. These are all used by new APIs buffer_view_to_managed_bytes_view() that extract the internals and put them back together again. Ideally we wouldn't need any of this, but unifying the type system in this area is quite an undertaking, so we need some shortcuts.	2023-05-07 17:17:34 +03:00
Avi Kivity	613f4b9858	utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt Useful, rather than open-coding the conversions.	2023-05-07 17:16:38 +03:00
Avi Kivity	1e6ef5503c	utils: managed_bytes: add managed_bytes_view::with_linearized() Becomes useful in later patches. To avoid double-compiling the call to func(), use an immediately-invoked lambda to calculate the bytes_view we'll be calling func() with.	2023-05-07 17:16:38 +03:00
Avi Kivity	08ba0935e2	utils: managed_bytes: mark managed_bytes_view::is_linearized() const It's trivially const, mark it so.	2023-05-07 17:16:38 +03:00
Pavel Emelyanov	98b9c205bb	s3/client: Sign requests if configured If the endpoint config specifies AWS key, secret and region, all the S3 requests get signed. Signature should have all the x-amz-... headers included and should contain at least three of them. This patch includes x-ams-date, x-amz-content-sha256 and host headers into the signing list. The content can be unsigned when sent over HTTPS, this is what this patch does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:23:37 +03:00
Pavel Emelyanov	3dd82485f6	s3/client: Add connection factory with DNS resolve and configurable HTTPS Existing seastar's factories work on socket_address, but in S3 we have endpoint name which's a DNS name in case of real S3. So this patch creates the http client for S3 with the custom connection factory that does two things. First, it resolves the provided endpoint name into address. Second, it loads trust-file from the provided file path (or sets system trust if configured that way). Since s3 client creation is no-waiting code currently, the above initialization is spawned in afiber and before creating the connection this fiber is waited upon. This code probably deserves living in seastar, but for now it can land next to utils/s3/client.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:23:19 +03:00
Pavel Emelyanov	3bec5ea2ce	s3/client: Keep server port on config Currently the code temporarily assumes that the endpoint port is 9000. This is what tests' local minio is started with. This patch keeps the port number on endpoint config and makes test get the port number from minio starting code via environment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	85f06ca556	s3/client: Construct it with config Similar to previous patch -- extent the s3::client constructor to get the endpoint config value next to the endpoint string. For now the configs are likely empty, but they are yet unused too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	caf9e357c8	s3/client: Construct it with sstring endpoint Currently the client is constructed with socket_address which's prepared by the caller from the endpoint string. That's not flexible engouh, because s3 client needs to know the original endpoint string for two reasons. First, it needs to lookup endpoint config for potential AWS creds. Second, it needs this exact value as Host: header in its http requests. So this patch just relaxes the client constructor to accept the endpoint string and hard-code the 9000 port. The latter is temporary, this is how local tests' minio is started, but next patch will make it configurable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	2f6aa5b52e	code: Introduce conf/object_storage.yaml configuration file In order to access real S3 bucket, the client should use signed requests over https. Partially this is due to security considerations, partially this is unavoidable, because multipart-uploading is banned for unsigned requests on the S3. Also, signed requests over plain http require signing the payload as well, which is a bit troublesome, so it's better to stick to secure https and keep payload unsigned. To prepare signed requests the code needs to know three things: - aws key - aws secret - aws region name The latter could be derived from the endpoint URL, but it's simpler to configure it explicitly, all the more so there's an option to use S3 URLs without region name in them we could want to use some time. To keep the described configuration the proposed place is the object_storage.yaml file with the format endpoints: - name: a.b.c port: 443 aws_key: 12345 aws_secret: abcdefghijklmnop ... When loaded, the map gets into db::config and later will be propagated down to sstables code (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:15 +03:00
Benny Halevy	959a740dac	utils: to_string: get rid of utils::join Use `fmt::format("{}", fmt::join(...))` instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:59:58 +03:00
Benny Halevy	e6bcb1c8df	utils: to_string: get rid of to_string(std::initializer_list) It's unused. Just in case, add a unit test case for using the fmt library to format it (that includes fmt::to_string(std::initializer_list)). Note that the existing to_string implementation used square brackets to enclose the initializer_list but the new, standardized form uses curly braces. This doesn't break anything since to_string(initializer_list) wasn't used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	ba883859c7	utils: to_string: get rid of to_string(const Range&) Use fmt::to_string instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	15c9f0f0df	utils: to_string: generalize range helpers As seen in https://github.com/scylladb/scylladb/issues/13146 the current implementation is not general enough to provide print helpers for all kind of containers. Modernize the implementation using templates based on std::ranges::range and using fmt::join. Extend unit test for formatting different types of ranges, boost::transformed ranges, deque. Fixes #13146 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	45153b58bd	utils: chunked_vector: add std::ranges::range ctor To be used in next patch for constructing chunked_vector from an initializer_list. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Kefu Chai	37f1beade5	s3/client: do not allocate potentially big object on stack when compiling using GCC-13, it warns that: ``` /home/kefu/dev/scylladb/utils/s3/client.cc:224:9: error: stack usage might be 66352 bytes [-Werror=stack-usage=] 224 \| sstring parse_multipart_upload_id(sstring& body) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ``` so it turns out that `rapidxml::xml_document<>` could be very large, let's allocate it on heap instead of on the stack to address this issue. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13722	2023-05-01 22:46:18 +03:00

1 2 3 4 5 ...

1452 Commits