scylladb

Author	SHA1	Message	Date
Benny Halevy	a70b53b6e7	utils: tagged_integer: implement std::numeric_limits::{min,max} Add add a respective unit test. It turns out that numeric_limits defines an implicit implementation for std::numeric_limits<utils::tagged_integer<Tag, ValueType>> which apprently returns a default-constructed tagged_integer for min() and max(), and this broke `gms::heart_beat_state::force_highest_possible_version_unsafe()` since `4cdad8bc8b` (merged in `7f04d8231d`) Implementing min/max correctly Fixes #13801 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-15 10:19:39 +03:00
Benny Halevy	1b5d5205c8	test: add tagged_integer_test Add basic test for tagged+integer arithmetic operations. Remove const qualifier from `tagged_integer::operator[+-]=` as these are add/sub-assign operators that need to modify the value in place. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-14 23:26:58 +03:00
Tomasz Grabiec	a91e83fad6	Merge "issue raft read barrier before pulling schema" from Gleb Schema pull may fail because the pull does not contain everything that is needed to instantiate a schema pointer. For instance it does not contain a keyspace. This series changes the code to issue raft read barrier before the pull which will guaranty that the keyspace is created before the actual schema pull is performed.	2023-05-14 14:14:24 +03:00
Avi Kivity	0a78995e2b	Merge 'Share s3 clients between sstables' from Pavel Emelyanov Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server. In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks. There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard. 1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file 2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s) 3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top. refs: #13458 refs: #13369 refs: scylladb/seastar#1652 Closes #13859 * github.com:scylladb/scylladb: s3: Pick client from manager via handle s3: Generalize s3 file handle s3: Live-update clients' configs sstables: Keep clients shared across sstables storage_manager: Rewrap config map sstables, database: Move object storage config maintenance onto storage_manager sstables: Introduce sharded<storage_manager>	2023-05-14 14:14:23 +03:00
Pavel Emelyanov	613acba5d0	s3: Pick client from manager via handle Add the global-factory onto the client that is - cross-shard copyable - generates a client from local storage_manager by given endpoint With that the s3 file handle is fixed and also picks up shared s3 clients from the storage manager instead of creating its own one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Pavel Emelyanov	8ed9716f59	s3: Generalize s3 file handle Currently the s3 file handle tries to carry client's info via explicit host name and endpoint config pointer. This is buggy, the latter pointer is shard-local can cannot be transferred across shards. This patch prepares the fix by abstracting the client handle part. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Pavel Emelyanov	63ff6744d8	s3: Live-update clients' configs Now when the client is accessible directli via the storage_manager, when the latter is requested to update its endpoint config, it can kick the client to do the same. The latter, in turn, can only update the AWS creds info for now. The endpoint port and https usage are immutable for now. Also, updating the endpoint address is not possible, but for another reason -- the endpoint itself is the part of keyspace configuration and updating one in the object_storage.yaml will have no effect on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:01 +03:00
Gleb Natapov	091ec285fe	serialized_action: make serialized_action abortable Add an ability to abort waiting for a result of a specific trigger() invocation.	2023-05-11 16:31:23 +03:00
Nadav Har'El	e57252092c	Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity CQL evolved several expression evaluation mechanisms: WHERE clause, selectors (the SELECT clause), and the LWT IF clause are just some examples. Most now use expressions, which use managed_bytes_opt as the underlying value representation, but selectors still use bytes_opt. This poses two problems: 1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency 2. trying to use expressions with bytes_opt will incur a copy, reducing performance To solve the problem, we harmonize the data types to managed_bytes_opt (#13216 notwithstanding). This is somewhat difficult since the source of the values are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view are mostly compatible so with a little effort this can be done. The series is neutral wrt performance: before: ``` 222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors) 224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors) 223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors) ``` after: ``` 220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors) 225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors) 222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors) 224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors) 225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors) ``` Though I expect with some more effort we can eliminate some copies. Closes #13637 * github.com:scylladb/scylladb: cql3: untyped_result_set: switch to managed_bytes_view as the cell type cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt cql3: untyped_result_set: always own data types: abstract_type: add mixed-type versions of compare() and equal() utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt utils: managed_bytes: add managed_bytes_view::with_linearized() utils: managed_bytes: mark managed_bytes_view::is_linearized() const	2023-05-10 15:01:45 +03:00
Kamil Braun	7d9ab44e81	Merge 'token_metadata: read remapping for write_both_read_new' from Gusev Petr When new nodes are added or existing nodes are deleted, the topology state machine needs to shunt reads from the old nodes to the new ones. This happens in the `write_both_read_new` state. The problem is that previously this state was not handled in any way in `token_metadata` and the read nodes were only changed when the topology state machine reached the final 'owned' state. To handle `write_both_read_new` an additional `interval_map` inside `token_metadata` is maintained similar to `pending_endpoints`. It maps the ranges affected by the ongoing topology change operation to replicas which should be used for reading. When topology state sm reaches the point when it needs to switch reads to a new topology, it passes `request_read_new=true` in a call to `update_pending_ranges`. This forces `update_pending_ranges` to compute the ranges based on new topology and store them to the `interval_map`. On the data plane, when a read on coordinator needs to decide which endpoints to use, it first consults this `interval_map` in `token_metadata`, and only if it doesn't contain a range for current token it uses normal endpoints from `effective_replication_map`. Closes #13376 * github.com:scylladb/scylladb: storage_proxy, storage_service: use new read endpoints storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading token_metadata: add unit test for endpoints_for_reading token_metadata: add endpoints for reading sequenced_set: add extract_set method token_metadata_impl: extract maybe_migration_endpoints helper function token_metadata_impl: introduce migration_info token_metadata_impl: refactor update_pending_ranges token_metadata: add unit tests token_metadata: fix indentation token_metadata_impl: return unique_ptr from clone functions	2023-05-10 10:03:30 +02:00
Avi Kivity	550aa01242	Merge 'Restore raft::internal::tagged_uint64 type' from Benny Halevy Change `f5f566bdd8` introduced tagged_integer and replaced raft::internal::tagged_uint64 with utils::tagged_integer. However, the idl type for raft::internal::tagged_uint64 was not marked as final, but utils::tagged_integer is, breaking the on-the-wire compatibility. This change restores the use of raft::internal::tagged_uint64 for the raft types and adds back an idl definition for it that is not marked as final, similar to the way raft::internal::tagged_id extends utils::tagged_uuid. Fixes #13752 Closes #13774 * github.com:scylladb/scylladb: raft, idl: restore internal::tagged_uint64 type raft: define term_t as a tagged uint64_t idl: gossip_digest: include required headers	2023-05-09 22:51:25 +03:00
Petr Gusev	b2e5d8c21c	sequenced_set: add extract_set method Can be useful if we want to reuse the set when we are done with this sequenced_set instance.	2023-05-09 13:56:38 +04:00
Benny Halevy	adfb79ba3e	raft, idl: restore internal::tagged_uint64 type Change `f5f566bdd8` introduced tagged_integer and replaced raft::internal::tagged_uint64 with utils::tagged_integer. However, the idl type for raft::internal::tagged_uint64 was not marked as final, but utils::tagged_integer is, breaking the on-the-wire compatibility. This change defines the different raft tagged_uint64 types in idl/raft_storage.idl.hh as non-final to restore the way they were serialized prior to `f5f566bdd8` Fixes #13752 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-09 12:38:20 +03:00
Botond Dénes	ab5fd0f750	Merge 's3: Provide timestamps in the s3 file implementation' from Raphael "Raph" Carvalho SSTable relies on st.st_mtime for providing creation time of data file, which in turn is used by features like tombstone compaction. Therefore, let's implement it. Fixes https://github.com/scylladb/scylladb/issues/13649. Closes #13713 * github.com:scylladb/scylladb: s3: Provide timestamps in the s3 file implementation s3: Introduce get_object_stats() s3: introduce get_object_header()	2023-05-08 11:43:41 +03:00
Raphael S. Carvalho	ad471e5846	s3: Provide timestamps in the s3 file implementation SSTable relies on st.st_mtime for providing creation time of data file, which in turn is used by features like tombstone compaction. Fixes #13649. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:51:12 -03:00
Raphael S. Carvalho	57661f0392	s3: Introduce get_object_stats() get_object_stats() will be used for retrieving content size and also last modified. The latter is required for filling st_mtim, etc, in the s3::client::readable_file::stat() method. Refs #13649. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:51:10 -03:00
Raphael S. Carvalho	da2ccc44a4	s3: introduce get_object_header() This allows other functions to reuse the code to retrieve the object header. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-07 19:49:52 -03:00
Kefu Chai	5fa459bd1a	treewide: do not include unused header since #13452, we switched most of the caller sites from std::regex to boost::regex. in this change, all occurences of `#include <regex>` are dropped unless std::regex is used in the same source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13765	2023-05-07 19:01:29 +03:00
Kefu Chai	468460718a	utils: UUID: drop uint64_t_tri_compare() functinoality wise, `uint64_t_tri_compare()` is identical to the three-way comparison operator, so no need to keep it. in this change, it is dropped in favor of <=>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13794	2023-05-07 18:07:49 +03:00
Avi Kivity	11d651b606	utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view The codebase evolved to have several different ways to hold a fragmented buffer: fragmented_temporary_buffer (for data received from the network; not relevant for this discussion); bytes_ostream (for fragmented data that is built incrementally; also used for a serialized result_set), and managed_bytes (used for lsa and serialized individual values in expression evaluation). One problem with this state of affairs is that using data in one fragmented form with functions that accept another fragmented form requires either a copy, or templating everything. The former is unpalatable for fast-path code, and the latter is undesirable for compile time and run-time code footprint. So we'd like to make the various forms compatible. In `53e0dc7530` ("bytes_ostream: base on managed_bytes") we changed bytes_ostream to have the same underlying data structure as managed_bytes, so all that remains is to add the right API. This is somewhat difficult as the data is hidden in multiple layers: ser::buffer_view<> is used to abstract a slice of bytes_ostream, and this is further abstracted by using iterators into bytes_ostream rather than directly using the internals. Likewise, it's impossible to construct a managed_bytes_view from the internals. Hack through all of these by adding extract_implementation() methods, and a build_managed_bytes_view_from_internals() helper. These are all used by new APIs buffer_view_to_managed_bytes_view() that extract the internals and put them back together again. Ideally we wouldn't need any of this, but unifying the type system in this area is quite an undertaking, so we need some shortcuts.	2023-05-07 17:17:34 +03:00
Avi Kivity	613f4b9858	utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt Useful, rather than open-coding the conversions.	2023-05-07 17:16:38 +03:00
Avi Kivity	1e6ef5503c	utils: managed_bytes: add managed_bytes_view::with_linearized() Becomes useful in later patches. To avoid double-compiling the call to func(), use an immediately-invoked lambda to calculate the bytes_view we'll be calling func() with.	2023-05-07 17:16:38 +03:00
Avi Kivity	08ba0935e2	utils: managed_bytes: mark managed_bytes_view::is_linearized() const It's trivially const, mark it so.	2023-05-07 17:16:38 +03:00
Pavel Emelyanov	98b9c205bb	s3/client: Sign requests if configured If the endpoint config specifies AWS key, secret and region, all the S3 requests get signed. Signature should have all the x-amz-... headers included and should contain at least three of them. This patch includes x-ams-date, x-amz-content-sha256 and host headers into the signing list. The content can be unsigned when sent over HTTPS, this is what this patch does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:23:37 +03:00
Pavel Emelyanov	3dd82485f6	s3/client: Add connection factory with DNS resolve and configurable HTTPS Existing seastar's factories work on socket_address, but in S3 we have endpoint name which's a DNS name in case of real S3. So this patch creates the http client for S3 with the custom connection factory that does two things. First, it resolves the provided endpoint name into address. Second, it loads trust-file from the provided file path (or sets system trust if configured that way). Since s3 client creation is no-waiting code currently, the above initialization is spawned in afiber and before creating the connection this fiber is waited upon. This code probably deserves living in seastar, but for now it can land next to utils/s3/client.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:23:19 +03:00
Pavel Emelyanov	3bec5ea2ce	s3/client: Keep server port on config Currently the code temporarily assumes that the endpoint port is 9000. This is what tests' local minio is started with. This patch keeps the port number on endpoint config and makes test get the port number from minio starting code via environment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	85f06ca556	s3/client: Construct it with config Similar to previous patch -- extent the s3::client constructor to get the endpoint config value next to the endpoint string. For now the configs are likely empty, but they are yet unused too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	caf9e357c8	s3/client: Construct it with sstring endpoint Currently the client is constructed with socket_address which's prepared by the caller from the endpoint string. That's not flexible engouh, because s3 client needs to know the original endpoint string for two reasons. First, it needs to lookup endpoint config for potential AWS creds. Second, it needs this exact value as Host: header in its http requests. So this patch just relaxes the client constructor to accept the endpoint string and hard-code the 9000 port. The latter is temporary, this is how local tests' minio is started, but next patch will make it configurable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	2f6aa5b52e	code: Introduce conf/object_storage.yaml configuration file In order to access real S3 bucket, the client should use signed requests over https. Partially this is due to security considerations, partially this is unavoidable, because multipart-uploading is banned for unsigned requests on the S3. Also, signed requests over plain http require signing the payload as well, which is a bit troublesome, so it's better to stick to secure https and keep payload unsigned. To prepare signed requests the code needs to know three things: - aws key - aws secret - aws region name The latter could be derived from the endpoint URL, but it's simpler to configure it explicitly, all the more so there's an option to use S3 URLs without region name in them we could want to use some time. To keep the described configuration the proposed place is the object_storage.yaml file with the format endpoints: - name: a.b.c port: 443 aws_key: 12345 aws_secret: abcdefghijklmnop ... When loaded, the map gets into db::config and later will be propagated down to sstables code (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:15 +03:00
Benny Halevy	959a740dac	utils: to_string: get rid of utils::join Use `fmt::format("{}", fmt::join(...))` instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:59:58 +03:00
Benny Halevy	e6bcb1c8df	utils: to_string: get rid of to_string(std::initializer_list) It's unused. Just in case, add a unit test case for using the fmt library to format it (that includes fmt::to_string(std::initializer_list)). Note that the existing to_string implementation used square brackets to enclose the initializer_list but the new, standardized form uses curly braces. This doesn't break anything since to_string(initializer_list) wasn't used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	ba883859c7	utils: to_string: get rid of to_string(const Range&) Use fmt::to_string instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	15c9f0f0df	utils: to_string: generalize range helpers As seen in https://github.com/scylladb/scylladb/issues/13146 the current implementation is not general enough to provide print helpers for all kind of containers. Modernize the implementation using templates based on std::ranges::range and using fmt::join. Extend unit test for formatting different types of ranges, boost::transformed ranges, deque. Fixes #13146 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Benny Halevy	45153b58bd	utils: chunked_vector: add std::ranges::range ctor To be used in next patch for constructing chunked_vector from an initializer_list. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Kefu Chai	37f1beade5	s3/client: do not allocate potentially big object on stack when compiling using GCC-13, it warns that: ``` /home/kefu/dev/scylladb/utils/s3/client.cc:224:9: error: stack usage might be 66352 bytes [-Werror=stack-usage=] 224 \| sstring parse_multipart_upload_id(sstring& body) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ``` so it turns out that `rapidxml::xml_document<>` could be very large, let's allocate it on heap instead of on the stack to address this issue. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13722	2023-05-01 22:46:18 +03:00
Kefu Chai	43e9910fa0	utils/chunked_managed_vector: use operator<=> when appropriate instead of crafting 4 operators manually, just delegate it to <=>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13698	2023-04-28 15:59:08 +03:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Botond Dénes	3e92bcaa20	Merge 'utils: redesign reusable_buffer' from Michał Chojnowski Common compression libraries work on contiguous buffers. Contiguous buffers are a problem for the allocator. However, as long as they are short-lived, we can avoid the expensive allocations by reusing buffers across tasks. This idea is already applied to the compression of CQL frames, but with some deficiencies. `utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion. Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually increases memory pressure significantly, instead of reducing it. `zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard. Closes #13324 * github.com:scylladb/scylladb: zstd: share buffers between compressor instances utils: redesign reusable_buffer	2023-04-27 09:09:09 +03:00
Michał Chojnowski	bf26a8c467	utils: redesign reusable_buffer Large contiguous buffers put large pressure on the allocator and are a common source of reactor stalls. Therefore, Scylla avoids their use, replacing it with fragmented buffers whenever possible. However, the use of large contiguous buffers is impossible to avoid when dealing with some external libraries (i.e. some compression libraries, like LZ4). Fortunately, calls to external libraries are synchronous, so we can minimize the allocator impact by reusing a single buffer between calls. An implementation of such a reusable buffer has two conflicting goals: to allocate as rarely as possible, and to waste as little memory as possible. The bigger the buffer, the more likely that it will be able to handle future requests without reallocation, but also the memory memory it ties up. If request sizes are repetitive, the near-optimal solution is to simply resize the buffer up to match the biggest seen request, and never resize down. However, if we anticipate pathologically large requests, which are caused by an application/configuration bug and are never repeated again after they are fixed, we might want to resize down after such pathological requests stop, so that the memory they took isn't tied up forever. The current implementation of reusable buffers handles this by resizing down to 0 every 100'000 requests. This patch attempts to solve a few shortcomings of the current implementation. 1. Resizing to 0 is too aggressive. During regular operation, we will surely need to resize it back to the previous size again. If something is allocated in the hole left by the old buffer, this might cause a stall. We prefer to resize down only after pathological requests. 2. When resizing, the current implementation allocates the new buffer before freeing the old one. This increases allocator pressure for no reason. 3. When resizing up, the buffer is resized to exactly the requested size. That is, if the current size is 1MiB, following requests of 1MiB+1B and 1MiB+2B will both cause a resize. It's preferable to limit the set of possible sizes so that every reset doesn't tend to cause multiple resizes of almost the same size. The natural set of sizes is powers of 2, because that's what the underlying buddy allocator uses. No waste is caused by rounding up the allocation to a power of 2. 4. The interval of 100'000 uses is both too low and too arbitrary. This is up for discussion, but I think that it's preferable to base the dynamics of the buffer on time, rather than the number of uses. It's more predictable to humans. The implementation proposed in this patch addresses these as follows: 1. Instead of resizing down to 0, we resize to the biggest size seen in the last period. As long as at least one maximal (up to a power of 2) "normal" request appears each period, the buffer will never have to be resized. 2. The capacity of the buffer is always rounded up to the nearest power of 2. 3. The resize down period is no longer measured in number of requests but in real time. Additionally, since a shared buffer in asynchronous code is quite a footgun, some rudimentary refcounting is added to assert that only one reference to the buffer exists at a time, and that the buffer isn't downsized while a reference to it exists. Fixes #13437	2023-04-26 22:09:17 +02:00
Botond Dénes	8765442f3f	Merge 'utils: add basic_xx_hasher' from Benny Halevy Consolidate `bytes_view_hasher` and abstract_replication_strategy `factory_key_hasher` which are the same into a reusable utils::basic_xx_hasher. To be used in a followup series for netw:msg_addr. Closes #13530 * github.com:scylladb/scylladb: utils: hashing: use simple_xx_hasher utils: hashing: add simple_xx_hasher utils: hashers: add HasherReturning concept hashing: move static_assert to source file	2023-04-25 09:53:47 +02:00
Pavel Emelyanov	9a9dbffce3	s3/client: Zeroify stat by default The s3::readable_file::stat() call returns a hand-crafted stat structure with some fields set to some sane values, most are constants. However, other fields remain not initialized which leads to troubles sometimes. Better to fill the stat with zeroes and later revisit it for more sane values. fixes: #13645 refs: #13649 Using designated initializers is not an option here, see PR #13499 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13650	2023-04-25 09:53:47 +02:00
Benny Halevy	f4fefec343	utils: hashing: add simple_xx_hasher And a respective unit test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 14:06:43 +03:00
Benny Halevy	b638dddf1b	utils: hashers: add HasherReturning concept And a more specific HasherReturningBytes for hashers that return bytes in finalize(). HasherReturning will be used by the following patch also for simple hashers that return size_t from finalize(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 14:06:40 +03:00
Benny Halevy	a765472b8b	hashing: move static_assert to source file No need to check it inline in the header. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 12:23:03 +03:00
Tomasz Grabiec	5a24984147	utils, tablets: Introduce external_memory_usage()	2023-04-24 10:49:37 +02:00
Botond Dénes	864d27f9af	Merge 'clear_gently: handle null unique_ptr and optional values' from Benny Halevy This series adds handling of null std::unique_ptr to utils::clear_gently and handling of std::optional and seastar::optimized_optional (both engaged and disengaged cases). Also, unit tests were added to tests the above cases. Fixes #13636 Closes #13638 * github.com:scylladb/scylladb: utils: clear_gently: add variants for optional values utils: clear_gently: do not clear null unique_ptr	2023-04-24 10:27:32 +03:00
Benny Halevy	002865018f	utils: clear_gently: add variants for optional values Implement clear_gently for std:;optional<T> and seastar::optimized_optional<T> and respective unit tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:34:02 +03:00
Benny Halevy	12877ad026	utils: clear_gently: do not clear null unique_ptr Otherwise the null pointer is dereferenced. Add a unit test reproducing the issue and testing this fix. Fixes #13636 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:33:11 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00

1 2 3 4 5 ...

1437 Commits