scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	3bec5ea2ce	s3/client: Keep server port on config Currently the code temporarily assumes that the endpoint port is 9000. This is what tests' local minio is started with. This patch keeps the port number on endpoint config and makes test get the port number from minio starting code via environment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	85f06ca556	s3/client: Construct it with config Similar to previous patch -- extent the s3::client constructor to get the endpoint config value next to the endpoint string. For now the configs are likely empty, but they are yet unused too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	caf9e357c8	s3/client: Construct it with sstring endpoint Currently the client is constructed with socket_address which's prepared by the caller from the endpoint string. That's not flexible engouh, because s3 client needs to know the original endpoint string for two reasons. First, it needs to lookup endpoint config for potential AWS creds. Second, it needs this exact value as Host: header in its http requests. So this patch just relaxes the client constructor to accept the endpoint string and hard-code the 9000 port. The latter is temporary, this is how local tests' minio is started, but next patch will make it configurable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	2f6aa5b52e	code: Introduce conf/object_storage.yaml configuration file In order to access real S3 bucket, the client should use signed requests over https. Partially this is due to security considerations, partially this is unavoidable, because multipart-uploading is banned for unsigned requests on the S3. Also, signed requests over plain http require signing the payload as well, which is a bit troublesome, so it's better to stick to secure https and keep payload unsigned. To prepare signed requests the code needs to know three things: - aws key - aws secret - aws region name The latter could be derived from the endpoint URL, but it's simpler to configure it explicitly, all the more so there's an option to use S3 URLs without region name in them we could want to use some time. To keep the described configuration the proposed place is the object_storage.yaml file with the format endpoints: - name: a.b.c port: 443 aws_key: 12345 aws_secret: abcdefghijklmnop ... When loaded, the map gets into db::config and later will be propagated down to sstables code (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:15 +03:00
Benny Halevy	959a740dac	utils: to_string: get rid of utils::join Use `fmt::format("{}", fmt::join(...))` instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:59:58 +03:00
Benny Halevy	e6bcb1c8df	utils: to_string: get rid of to_string(std::initializer_list) It's unused. Just in case, add a unit test case for using the fmt library to format it (that includes fmt::to_string(std::initializer_list)). Note that the existing to_string implementation used square brackets to enclose the initializer_list but the new, standardized form uses curly braces. This doesn't break anything since to_string(initializer_list) wasn't used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	ba883859c7	utils: to_string: get rid of to_string(const Range&) Use fmt::to_string instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	15c9f0f0df	utils: to_string: generalize range helpers As seen in https://github.com/scylladb/scylladb/issues/13146 the current implementation is not general enough to provide print helpers for all kind of containers. Modernize the implementation using templates based on std::ranges::range and using fmt::join. Extend unit test for formatting different types of ranges, boost::transformed ranges, deque. Fixes #13146 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	45153b58bd	utils: chunked_vector: add std::ranges::range ctor To be used in next patch for constructing chunked_vector from an initializer_list. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Kefu Chai	37f1beade5	s3/client: do not allocate potentially big object on stack when compiling using GCC-13, it warns that: ``` /home/kefu/dev/scylladb/utils/s3/client.cc:224:9: error: stack usage might be 66352 bytes [-Werror=stack-usage=] 224 \| sstring parse_multipart_upload_id(sstring& body) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ``` so it turns out that `rapidxml::xml_document<>` could be very large, let's allocate it on heap instead of on the stack to address this issue. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13722	2023-05-01 22:46:18 +03:00
Kefu Chai	43e9910fa0	utils/chunked_managed_vector: use operator<=> when appropriate instead of crafting 4 operators manually, just delegate it to <=>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13698	2023-04-28 15:59:08 +03:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Botond Dénes	3e92bcaa20	Merge 'utils: redesign reusable_buffer' from Michał Chojnowski Common compression libraries work on contiguous buffers. Contiguous buffers are a problem for the allocator. However, as long as they are short-lived, we can avoid the expensive allocations by reusing buffers across tasks. This idea is already applied to the compression of CQL frames, but with some deficiencies. `utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion. Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually increases memory pressure significantly, instead of reducing it. `zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard. Closes #13324 * github.com:scylladb/scylladb: zstd: share buffers between compressor instances utils: redesign reusable_buffer	2023-04-27 09:09:09 +03:00
Michał Chojnowski	bf26a8c467	utils: redesign reusable_buffer Large contiguous buffers put large pressure on the allocator and are a common source of reactor stalls. Therefore, Scylla avoids their use, replacing it with fragmented buffers whenever possible. However, the use of large contiguous buffers is impossible to avoid when dealing with some external libraries (i.e. some compression libraries, like LZ4). Fortunately, calls to external libraries are synchronous, so we can minimize the allocator impact by reusing a single buffer between calls. An implementation of such a reusable buffer has two conflicting goals: to allocate as rarely as possible, and to waste as little memory as possible. The bigger the buffer, the more likely that it will be able to handle future requests without reallocation, but also the memory memory it ties up. If request sizes are repetitive, the near-optimal solution is to simply resize the buffer up to match the biggest seen request, and never resize down. However, if we anticipate pathologically large requests, which are caused by an application/configuration bug and are never repeated again after they are fixed, we might want to resize down after such pathological requests stop, so that the memory they took isn't tied up forever. The current implementation of reusable buffers handles this by resizing down to 0 every 100'000 requests. This patch attempts to solve a few shortcomings of the current implementation. 1. Resizing to 0 is too aggressive. During regular operation, we will surely need to resize it back to the previous size again. If something is allocated in the hole left by the old buffer, this might cause a stall. We prefer to resize down only after pathological requests. 2. When resizing, the current implementation allocates the new buffer before freeing the old one. This increases allocator pressure for no reason. 3. When resizing up, the buffer is resized to exactly the requested size. That is, if the current size is 1MiB, following requests of 1MiB+1B and 1MiB+2B will both cause a resize. It's preferable to limit the set of possible sizes so that every reset doesn't tend to cause multiple resizes of almost the same size. The natural set of sizes is powers of 2, because that's what the underlying buddy allocator uses. No waste is caused by rounding up the allocation to a power of 2. 4. The interval of 100'000 uses is both too low and too arbitrary. This is up for discussion, but I think that it's preferable to base the dynamics of the buffer on time, rather than the number of uses. It's more predictable to humans. The implementation proposed in this patch addresses these as follows: 1. Instead of resizing down to 0, we resize to the biggest size seen in the last period. As long as at least one maximal (up to a power of 2) "normal" request appears each period, the buffer will never have to be resized. 2. The capacity of the buffer is always rounded up to the nearest power of 2. 3. The resize down period is no longer measured in number of requests but in real time. Additionally, since a shared buffer in asynchronous code is quite a footgun, some rudimentary refcounting is added to assert that only one reference to the buffer exists at a time, and that the buffer isn't downsized while a reference to it exists. Fixes #13437	2023-04-26 22:09:17 +02:00
Botond Dénes	8765442f3f	Merge 'utils: add basic_xx_hasher' from Benny Halevy Consolidate `bytes_view_hasher` and abstract_replication_strategy `factory_key_hasher` which are the same into a reusable utils::basic_xx_hasher. To be used in a followup series for netw:msg_addr. Closes #13530 * github.com:scylladb/scylladb: utils: hashing: use simple_xx_hasher utils: hashing: add simple_xx_hasher utils: hashers: add HasherReturning concept hashing: move static_assert to source file	2023-04-25 09:53:47 +02:00
Pavel Emelyanov	9a9dbffce3	s3/client: Zeroify stat by default The s3::readable_file::stat() call returns a hand-crafted stat structure with some fields set to some sane values, most are constants. However, other fields remain not initialized which leads to troubles sometimes. Better to fill the stat with zeroes and later revisit it for more sane values. fixes: #13645 refs: #13649 Using designated initializers is not an option here, see PR #13499 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13650	2023-04-25 09:53:47 +02:00
Benny Halevy	f4fefec343	utils: hashing: add simple_xx_hasher And a respective unit test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 14:06:43 +03:00
Benny Halevy	b638dddf1b	utils: hashers: add HasherReturning concept And a more specific HasherReturningBytes for hashers that return bytes in finalize(). HasherReturning will be used by the following patch also for simple hashers that return size_t from finalize(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 14:06:40 +03:00
Benny Halevy	a765472b8b	hashing: move static_assert to source file No need to check it inline in the header. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 12:23:03 +03:00
Tomasz Grabiec	5a24984147	utils, tablets: Introduce external_memory_usage()	2023-04-24 10:49:37 +02:00
Botond Dénes	864d27f9af	Merge 'clear_gently: handle null unique_ptr and optional values' from Benny Halevy This series adds handling of null std::unique_ptr to utils::clear_gently and handling of std::optional and seastar::optimized_optional (both engaged and disengaged cases). Also, unit tests were added to tests the above cases. Fixes #13636 Closes #13638 * github.com:scylladb/scylladb: utils: clear_gently: add variants for optional values utils: clear_gently: do not clear null unique_ptr	2023-04-24 10:27:32 +03:00
Benny Halevy	002865018f	utils: clear_gently: add variants for optional values Implement clear_gently for std:;optional<T> and seastar::optimized_optional<T> and respective unit tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:34:02 +03:00
Benny Halevy	12877ad026	utils: clear_gently: do not clear null unique_ptr Otherwise the null pointer is dereferenced. Add a unit test reproducing the issue and testing this fix. Fixes #13636 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:33:11 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	f5f566bdd8	utils: add tagged_integer A generic template for defining strongly typed integer types. Use it here to replace raft::internal::tagged_uint64. Will be used for defining gms generation and version as strong and distinguishable types in following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Kefu Chai	a2aa133822	treewide: use std::lexicographical_compare_threeway this the standard library offers `std::lexicographical_compare_threeway()`, and we never uses the last two addition parameters which are not provided by `std::lexicographical_compare_threeway()`. there is no need to have the homebrew version of trichotomic compare function. in this change, * all occurrences of `lexicographical_tri_compare()` are replaced with `std::lexicographical_compare_threeway()`. * ``lexicographical_tri_compare()` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13615	2023-04-21 14:28:18 +03:00
Botond Dénes	10c1f1dc80	Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594 Closes #13604 * github.com:scylladb/scylladb: db: system_keyspace: use microsecond resolution for group0_history range tombstone utils: UUID_gen: accept decimicroseconds in min_time_UUID	2023-04-21 14:08:56 +03:00
Kamil Braun	218a056825	utils: UUID_gen: accept decimicroseconds in min_time_UUID The function now accepts higher-resolution duration types, such as microsecond resolution timestamps. Will be used by the next commit.	2023-04-21 10:33:02 +02:00
Pavel Emelyanov	30b6f34a0b	s3/client: Explicitly set _upload_id empty when completing The upload_sink::_upload_id remains empty until upload starts, remains non-empty while it proceeds, then becomes empty again after it completes. The upload_started() method cheks that and on .close() started upload is aborted. The final switch to empty is done by std::move()ing the upload id into completion requrest, but it's better to use std::exchange() to emphasize the fact the the _upload_id becomes empty at that point for a reason. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13570	2023-04-20 17:32:08 +03:00
Nadav Har'El	5b792dde68	Merge 'Extend aws_sigv4 code to suite S3 client needs' from Pavel Emelyanov The AWS signature-generating code was moved from alternator some time ago as is. Now it's clear that in which places it should be extended to work for S3 client as well. The enhancements are - Support UNSIGNED-PAYLOAD to omit calculating checksums for request body - Include full URL path into the signature, not just hard-coded "/" string - Don't check datastamp expiration if not asked for This is a part of #13493 Closes #13535 * github.com:scylladb/scylladb: utils/aws: Brush up the aws_sigv4.hh header utils/aws: Export timepoint formatter utils/aws: Omit datestamp expiration checks when not needed utils/aws: Add canonical-uri argument utils/aws: Support unsigned-payload signatures	2023-04-18 16:33:52 +03:00
Avi Kivity	7724223134	Merge 'utils: big_decimal: optimize big_decimal::compare() and use <=> operator' from Kefu Chai in this series, we use <=> operator to replace `big_decimal::compare()` for better readability. also, we trade the chained ternary expression with a more verbose if-else statement for better performance and readability. Closes #13478 * github.com:scylladb/scylladb: utils: big_decimal: replace compare() with <=> operator utils: big_decimal: optimize big_decimal::compare()	2023-04-17 14:33:53 +03:00
Pavel Emelyanov	d09d6adbf4	utils/aws: Brush up the aws_sigv4.hh header Add lost pragma-once directive. Remove the hashers.hh inclusion. It was carried in when the whole code was detached from alternator (`f5de0582c8`), but this header is not needed in the header, only in the .cc file which uses sha256_hasher. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:16:45 +03:00
Pavel Emelyanov	792490e095	utils/aws: Export timepoint formatter The format of timestamp for AWS requests is defined in documentation, there's already the code that prepares it in this form. This patch exports this method so that S3 client could use it in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	706b60a0b0	utils/aws: Omit datestamp expiration checks when not needed The signing code is used in two ways -- by alternator to verify the arrived signed request and by S3 client to prepare the signed request. In the former case date expiration check is performed, but for the latter this is not required, because date stamp is most likely now (or close to it). So this patch makes the orig_datestamp argument optional meaning that expiration checks can be omited. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	c5ccef078a	utils/aws: Add canonical-uri argument Current signing code hard-codes the "/" as the URL, likely this just works for alternator. For S3 client the URL would include bucket and object name and should thus become the argument, not constant. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	8eabe9c4ef	utils/aws: Support unsigned-payload signatures For S3 signing the whole request payload can be too resource consuming. Fortunately, payload signing is only enforced if used with plain http, but with real S3 we're going to use signed requests over https only (see next patch why). Said that, the patch turns body-content into optional reference (i.e. -- a pointer) so that the signing code could inject the UNSIGNED-PAYLOAD mark instead of the payload signature and omit heavy payload signing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	7c7a3416c5	s3/client: Add comments about multipart upload completion message The message length is pre-calculated in advance to provide correct content-length request header. This math is not obvious and deserves a comment. Also, the final message preparation code is also implicitly checking if any part failed to upload. There's a comment in the upload_sink's upload_part() method about it, but the finalization place deserves one too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:08:34 +03:00
Pavel Emelyanov	3f86bed600	s3/client: Fix succeeded/failed part upload final checking When all parts upload complete the final message is prepared and sent out to the server. The preparation code is also responsible for checking if all parts uploaded OK by checking the part etag to be non-empty. In that check a misprint crept in -- the whole list is checked to be empty, not the individual etag itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:08:15 +03:00
Pavel Emelyanov	79379760e6	s3/client: Fix parts to start from 1 Docs say, that part numbers should start from 1, while the code follows the tradition and starts from 0. Minio is conveniently incompatible in this sense so test had been passing so far. On real S3 part number 0 ends up with failed request. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 10:43:12 +03:00
Kefu Chai	6bb32efac0	utils: big_decimal: replace compare() with <=> operator now that we are using C++20, it'd be more convenient if we can use the <=> operator for comparing. the compiler creates the 6 other operators for us if the <=> operator is defined. so the code is more compacted. in this change, `big_decimal::compare()` is replaced with `operator<=>`, and its caller is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Kefu Chai	e991e6087e	utils: big_decimal: optimize big_decimal::compare() before this change in the worst case, the underlying `number::compare()` gets called twice. as it is used by Boost::multiprecision to implement the comparing operators of `number`. but since we can have the result in one go, there is no need to to perform the comparison multiple times. so, in this change, we just call `number::compare()` explicitly, and use it to implement `compare()`. this should save a call of `number::compare()`. also, the chained ternary expression is replaced using if-else statement for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Pavel Emelyanov	b1501d4261	s3/client: Don't use designated initialization of sys stat struct It makes compiler complan about mis-ordered initialization of st_nlink vs st_mode on different arches. Current code (st_nlink before st_mode) compiled fine on x86, but fails on ARM which wants st_mode to come before st_nlink. Changing the order would, apparently, break x86 build with similar message. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13499	2023-04-13 15:13:56 +03:00
Botond Dénes	0c51f72ad6	Merge 'utils, mutation: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `tombstone` and `shadowable_tombstone` without the help of fmt::ostream. and their `operator<<(ostream,..)` are dropped, as there are no users of them anymore. Refs #13245 Closes #13474 * github.com:scylladb/scylladb: mutation: specialize fmt::formatter<tombstone> and fmt::formatter<shadowable_tombstone> utils: specialize fmt::formatter<optional<>>	2023-04-12 09:32:56 +03:00
Kefu Chai	ff202723c6	utils: big_decimal: specialize fmt::formatter<big_decimal> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `big_decimal` without the help of `operator<<`. this operator is droppe in this change, as all its callers are now using fmtlib for formatting now. we might need to use fmtlib to implement `big_decimal::to_string()`, and use `fmt::to_string()` instead, but let's leave it for a follow-up change. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13479	2023-04-12 09:20:50 +03:00
Kefu Chai	c980bd54ad	utils: specialize fmt::formatter<optional<>> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `optional<T>` without the help of `operator<<()`. this change also enables us to ditch more `operator<<()`s in future. as we are relying on `operator<<(ostream&, const optional<T>&)` for printing instances of `optional<T>`, and `operator<<(ostream&, const optional<T>&)` in turn uses the `operator<<(ostream&, const T&)`. so, the new specialization of `fmt::formatter<optional<>>` will remove yet another caller of these operators. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-12 10:57:03 +08:00
Kefu Chai	59579d5876	utils: fragment_range: specialize fmt::formatter<FragmentedView> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print classes fulfill the requirement of `FragmentedView` concept without the help of template function of `to_hex()`, this function is dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `fragment_to_hex()` is dropped as well, its only caller is `to_hex()`. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13471	2023-04-11 16:09:38 +03:00
Botond Dénes	05b381bfa2	Merge 'Simple S3 storage for sstables' from Pavel Emelyanov The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names. When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry. To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state () and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name. () About sstable's status and state. The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017) The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry. To play with: 1. Start minio (installed by install-dependencies.sh) ``` export MINIO_ROOT_USER=${root_user} export MINIO_ROOT_PASSWORD=${root_pass} mkdir -p ${root_directory} minio server ${root_directory} ``` 2. Configure minio CLI, create anonymous bucket ``` mc config host rm local mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass} mc mb local/sstables mc anonymous set public local/sstables ``` 3. Start Scylla with object-storage feature enabled ``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}``` 4. Create KS with S3 storage ``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };``` The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity. Closes #12523 * github.com:scylladb/scylladb: test: Add object-storage test distributed_loader: Print storage type when populating sstable_directory: Add ownership table components lister sstable_directory: Make components_lister and API sstable_directory: Create components lister based on storage options sstables: Add S3 storage implementation system_keyspace: Add ownership table system_keyspace: Plug to user sstables manager too sstable: Make storage instance based on storage options sstable_directory: Keep storage_options aboard sstable: Virtualize the helper that gets on-disk stats for sstable sstable, storage: Virtualize data sink making for small components sstable, storage: Virtualize data sink making for Data and Index sstable/writer: Shuffle writer::init_file_writers() sstable: Make storage an API utils: Add S3 readable file impl for random reads utils: Add S3 data sink for multipart upload utils: Add S3 client with basic ops cql-pytest: Add option to run scylla over stable directory test.py: Equip it with minio server sstables: Detach write_toc() helper	2023-04-11 08:17:25 +03:00
Pavel Emelyanov	033fa107f8	utils: Add S3 readable file impl for random reads Sometimes an sstable is used for random read, sometimes -- for streamed read using the input stream. For both cases the file API can be provided, because S3 API allows random reads of arbitrary lengths. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	a4a64149a6	utils: Add S3 data sink for multipart upload Putting a large object into S3 using plain PUT is bad choice -- one need to collect the whole object in memory, then send it as a content-length request with plain body. Less memory stress is by using multipart upload, but multipart upload has its limitation -- each part should be at least 5Mb in size. For that reason using file API doesn't work -- file IO API operates with external memory buffers and the file impl would only have raw pointers to it. In order to collect 5Mb of chunk in RAM the impl would have to copy the memory which is not good. Unlike the file API data_sink API is more flexible, as it has temporary buffers at hand and can cache them in zero-copy manner. Having sad that, the S3 data_sink implementation is like this: * put(buffer): move the buffer into local cache, once the local cache grows above 5Mb send out the part * flush: send out whatever is in cache, then send upload completion request * close: check that the upload finihsed (in flush), abort the upload otherwise User of the API may (actually should) wrap the sink with output_stream and use it as any other output_stream. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00

1 2 3 4 5 ...

1412 Commits