scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Gleb Natapov	64a67a59d6	query_processor: co_routinize execute_batch_without_checking_exception_message function	2023-06-22 13:49:11 +03:00
Gleb Natapov	c4ca24e636	query_processor: co-routinize process_authorized_statement function	2023-06-22 13:49:11 +03:00
Anna Stuchlik	c65abb06cd	doc: udpate the OSS docs landing page Fixes https://github.com/scylladb/scylladb/issues/14333 This commit replaces the documentation landing page with the Open Source-only documentation landing page. This change is required as now there is a separate landing page for the ScyllaDB documentation, so the page is duplicated, creating bad user experience. Closes #14343	2023-06-21 17:06:48 +03:00
Nadav Har'El	8a9de08510	sstable: limit compression chunk size to 128 KB The chunk size used in sstable compression can be set when creating a table, using the "chunk_length_in_kb" parameter. It can be any power-of-two multiple of 1KB. Very large compression chunks are not useful - they offer diminishing returns on compression ratio, and require very large memory buffers and reading a very large amount of disk data just to read a small row. In fact, small chunks are recommended - Scylla defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB (in Cassandra 3) to 16 KB (in Cassandra 4). Therefore, allowing arbitrarily large chunk sizes is just asking for trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang Scylla when it runs out of memory. So in this patch we add a hard limit of 128 KB for the chunk size - anything larger is refused. Fixes #9933 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14267	2023-06-21 14:26:02 +03:00
Kefu Chai	f014ccf369	Revert "Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"" This reverts commit `562087beff`. The regressions introduced by the reverted change have been fixed. So let's revert this revert to resurrect the uuid_sstable_identifier_enabled support. Fixes #10459	2023-06-21 13:02:40 +03:00
Avi Kivity	e233f471b8	Merge 'Respect tablet shard assignment' from Tomasz Grabiec This PR changes the system to respect shard assignment to tablets in tablet metadata (system.tablets): 1. The tablet allocator is changed to distribute tablets evenly across shards taking into account currently allocated tablets in the system. Each tablet has equal weight. vnode load is ignored. 2. CDC subsystem was not adjusted (not supported yet) 3. sstable sharding metadata reflects tablet boundaries 5. resharding is NOT supported yet (the node will abort on boot if there is a need to reshard tablet-based tables) 6. The system is NOT prepared to handle tablet migration / topology changes in a safe way. 7. Sstable cleanup is not wired properly yet After this PR, dht::shard_of() and schema::get_sharder() are deprecated. One should use table::shard_of() and effective_replication_map::get_sharder() instead. To make the life easier, support was added to obtain table pointer from the schema pointer: ``` schema_ptr s; s->table().shard_of(...) ``` Closes #13939 * github.com:scylladb/scylladb: locator: network_topology_startegy: Allocate shards to tablets locator: Store node shard count in topology service: topology: Extract topology updating to a lambda test: Move test_tablets under topology_experimental sstables: Add trace-level logging related to shard calculation schema: Catch incorrect uses of schema::get_sharder() dht: Rename dht::shard_of() to dht::static_shard_of() treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() storage_proxy: Avoid multishard reader for tablets storage_proxy: Obtain shard from erm in the read path db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() forward_service: Use table sharder alternator: Use table sharder db: multishard: Obtain sharder from erm sstable_directory: Improve trace-level logging db: table: Introduce shard_of() helper db: Use table sharder in compaction sstables: Compute sstable shards using sharder from erm when loading sstables: Generate sharding metadata using sharder from erm when writing test: partitioner: Test split_range_to_single_shard() on tablet-like sharder dht: Make split_range_to_single_shard() prepared for tablet sharder sstables: Move compute_shards_for_this_sstable() to load() dht: Take sharder externally in splitting functions locator: Make sharder accessible through effective_replication_map dht: sharder: Document guarantees about mapping stability tablets: Implement tablet sharder tablets: Include pending replica in get_shard() dht: sharder: Introduce next_shard() db: token_ring_table: Filter out tablet-based keyspaces db: schema: Attach table pointer to schema schema_registry: Fix SIGSEGV in learn() when concurrent with get_or_load() schema_registry: Make learn(schema_ptr) attach entry to the target schema test: lib: cql_test_env: Expose feature_service test: Extract throttle object to separate header	2023-06-21 10:20:41 +03:00
Calle Wilund	f18e967939	storage_proxy: Make split_stats resilient to being called from different scheduling group Fixes #11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics. Closes #14294	2023-06-21 10:08:27 +03:00
Tomasz Grabiec	ebdebb982b	locator: network_topology_startegy: Allocate shards to tablets Uses a simple algorihtm for allocating shards which chooses least-loaded shard on a given node, encapsulated in load_sketch. Takes load due to current tablet allocation into account. Each tablet, new or allocated for other tables, is assumed to have an equal load weight.	2023-06-21 00:58:25 +02:00
Tomasz Grabiec	e110167a2a	locator: Store node shard count in topology Will be needed by tablet allocator.	2023-06-21 00:58:25 +02:00
Tomasz Grabiec	dd968e16bf	service: topology: Extract topology updating to a lambda Reduces code duplication.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	6defcb7bd5	test: Move test_tablets under topology_experimental Tablets will rely on shard_count information in topology, which is set only when using eperimental raft-based topology.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	34f28aa0cb	sstables: Add trace-level logging related to shard calculation	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	f6625e16ee	schema: Catch incorrect uses of schema::get_sharder() We still use it in many places in unit tests, which is ok because those tables are vnode-based. We want to check incorrect uses in production as they may lead to hard to debug consistency problems.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	29cbdb812b	dht: Rename dht::shard_of() to dht::static_shard_of() This is in order to prevent new incorrect uses of dht::shard_of() to be accidentally added. Also, makes sure that all current uses are caught by the compiler and require an explicit rename.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	10e05eec66	storage_proxy: Obtain shard from erm in the read path dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e48ec6fed3	db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	d4497a058e	forward_service: Use table sharder schema::get_sharder() does not return the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ab94e74774	alternator: Use table sharder schema::get_sharder() does not return the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	d92287f997	db: multishard: Obtain sharder from erm This is not strictly necessary, as the multishard reader will be later avoided altogether for tablet-based tables, but it is a step towards converting all code to use the erm->get_sharder() instead of schema::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	18f567385c	sstable_directory: Improve trace-level logging	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	34ba8a6a53	db: table: Introduce shard_of() helper Saves some boiler plate code.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	36da062bcb	db: Use table sharder in compaction	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad983ac23d	sstables: Compute sstable shards using sharder from erm when loading schema::get_sharder() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should obtain the sharder from erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	17d6163548	sstables: Generate sharding metadata using sharder from erm when writing We need to keep sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	36e12020b9	test: partitioner: Test split_range_to_single_shard() on tablet-like sharder	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	28b972a588	dht: Make split_range_to_single_shard() prepared for tablet sharder The function currently assumes that shard assignment for subsequent tokens is round robin, which will not be the case for tablets. This can lead to incorrect split calculation or infinite loop. Another assumption was that subsequent splits returned by the sharder have distinct shards. This also doesn't hold for tablets, which may return the same shard for subsequent tokens. This assumption was embedded in the following line: start_token = sharder.token_for_next_shard(end_token, shard); If the range which starts with end_token is also owned by "shard", token_for_next_shard() would skip over it.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fe7922d65c	sstables: Move compute_shards_for_this_sstable() to load() Soon, compute_shards_for_this_sstable() will need to take a sharder object. open_data() is called indirectly from sstable::load() and directly after writing an sstable from various paths. The latter don't really need to compute shards, since the field is already set by the writer. In order to reduce code churn, move compute_shards_for_this_sstable() to the load() path only so that only load() needs to take the sharder.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	390bcf3fae	dht: Take sharder externally in splitting functions We need those functions to work with tablet sharder, which is not accessible through schema::get_sharder(). In order to propagate the right sharder, those functions need to take it externally rather from the schema object. The sharder will come from the effective_replication_map attached to the table object. Those splitting functions are used when generating sharding metadata of an sstable. We need to keep this sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	353ce1a6d1	locator: Make sharder accessible through effective_replication_map For tablets, sharding depends on replication map, so the scope of the sharder should be effective_replicaion_map rather than the schema object. Existing users will be transitioned incrementally in later patches.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	606a8ee2da	dht: sharder: Document guarantees about mapping stability	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	22ab100b41	tablets: Implement tablet sharder	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e44e6033d8	tablets: Include pending replica in get_shard() We need to move get_shard() from tablet_info to tablet_map in order to have access to transition_info.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e8dd5e34c3	dht: sharder: Introduce next_shard() The logic was extracted from ring_position_range_sharder::next(), and the latter was changed to rely on sharder::next_shard(). The tablet sharder will have a different implementation for next_shard(). This way, ring_position_range_sharder can work with both current sharder and the tablet sharder.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	16797c2d1a	db: token_ring_table: Filter out tablet-based keyspaces Querying from virtual table system.token_ring fails if there is a tablet-based table due to attempt to obtain a per-keyspace erm. Fix by not showing such keyspaces.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	2303466375	db: schema: Attach table pointer to schema This will make it easier to access table proprties in places which only have schema_ptr. This is in particular useful when replacing dht::shard_of() uses with s->table().shard_of(), now that sharding is no longer static, but table-specific. Also, it allows us to install a guard which catches invalid uses of schema::get_sharder() on tablet-based tables. It will be helpful for other uses as well. For example, we can now get rid of the static_props hack.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	84cb0f5df7	schema_registry: Fix SIGSEGV in learn() when concurrent with get_or_load() The netyr may exist, but its schema may not yet be loaded. learn() didn't take that into account. This problem is not reachable in production code, which currently always calls get_or_load() before learn(), except for boot, but there's no concurrency at that point. Exposed by unit test added later.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	053484e762	schema_registry: Make learn(schema_ptr) attach entry to the target schema System tables have static schemas and code uses those static schemas instead of looking them up in the database. We want those schemas to have a valid table() once the table is created, so we need to attach registry entry to the target schema rather than to a schema duplicate.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ebc49e89ab	test: lib: cql_test_env: Expose feature_service	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad6d2b42f2	test: Extract throttle object to separate header	2023-06-21 00:58:24 +02:00
Kamil Braun	643e69af89	Merge 'Cluster features on raft: add storage for supported and enabled features' from Piotr Dulikowski This PR implements the storage part of the cluster features on raft functionality, as described in the "Cluster features on raft v2" doc. These changes will be useful for later PRs that will implement the remaining parts of the feature. Two new columns are added to `system.topology`: - `supported_features set<text>` is a new clustering column which holds the features that given node advertises as supported. It will be first initialized when the node joins the cluster, and then updated every time the node reboots and its supported features set changes. - `enabled_features set<text>` is a new static column which holds the features that are considered enabled by the cluster. Unlike in the current gossip-based implementation the features will not be enabled implicitly when all nodes support a feature, but rather via an explicit action of the topology coordinator. These columns are reflected in the `topology_state_machine` structure and are populated when the topology state is loaded. Appropriate methods are added to the `topology_mutation_builder` and `topology_node_mutation_builder` in order to allow setting/modifying those columns. During startup, nodes update their corresponding `supported_features` column to reflect their current feature set. For now it is done unconditionally, but in the future appropriate checks will be added which will prevent nodes from joining / starting their server for group 0 if they can't guarantee that they support all enabled features. Closes #14232 * github.com:scylladb/scylladb: storage_service: update supported cluster features in group0 on start storage_service: add methods for features to topology mutation builder storage_service: use explicit ::set overload instead of a template storage_service: reimplement mutation builder setters storage_service: introduce topology_mutation_builder_base topology_state_machine: include information about features system_keyspace: introduce deserialize_set_column db/system_keyspace: add storage for cluster features managed in group 0	2023-06-20 18:32:00 +02:00
Avi Kivity	453bbc1115	cql3: expr: improve error message when rejecting aggregation functions in illegal contexts Fix a small grammatical error, and capitalize WHERE in accordance with SQL tradition. Closes #14288	2023-06-20 17:52:53 +03:00
Piotr Dulikowski	3e955945de	storage_service: update supported cluster features in group0 on start Now, when a node starts, it will update its `supported_features` row in `system.topology` via `update_topology_with_local_metadata`. At this point, the functionality behind cluster features on raft is mostly incomplete and the state of the `supported_features` column does not influence anything so it's safe to update this column unconditionally. In the future, the node will only join / start group0 server if it is sure that it supports all enabled features and it can safely update the `supported_features` parameter.	2023-06-20 16:41:08 +02:00
Piotr Dulikowski	707e929831	storage_service: add methods for features to topology mutation builder The newly added `supported_features` and `enabled_features` columns can now be modified via topology mutation builders: - `supported_features` can now be overwritten via a new overload of `topology_node_mutation_builder::set`. - `enabled_features` can now be extended (i.e. more elements can be added to it) via `topology_mutation_builder::add_enabled_features`. As the set of enabled features only grows, this should be sufficient.	2023-06-20 16:41:08 +02:00
Piotr Dulikowski	2a4462a01f	storage_service: use explicit ::set overload instead of a template The `topology_node_mutation_builder::set` function has an overload which accepts any type which can be converted to string via `::format`. Its presence can lead to easy mistakes which can only be detected at runtime rather at compile time. A concrete example: I wrote a function that accepts an std::set<S> where S is convertible to sstring; it turns out that std::string_view is not std::convertible_to sstring and overload resolution falled back to the catch-all overload. This commit gets rid of the catch-all overload and replaces it with explicit ones. Fortunately, it was used for only two enums, so it wasn't much work.	2023-06-20 16:41:08 +02:00
Piotr Dulikowski	a8aaeabfac	storage_service: reimplement mutation builder setters As promised in the previous commit which introduced topology_mutation_builder_base, this commit adjusts existing setters of topology mutation builder and topology node mutation builder to use helper methods defined in the base class. Note that the `::set` method for the unordered set of tokens now does not delete the column in case an empty value is set, instead it just writes an empty set. This semantic is arguably more clear given that we have an explicit `::del` method and it shouldn't affect the existing implementation - we never intentionally insert an empty set of tokens.	2023-06-20 16:41:08 +02:00
Piotr Dulikowski	ee12192125	storage_service: introduce topology_mutation_builder_base Introduces `topology_mutation_builder_base` which will be a base class for both topology mutation builder and topology node mutation builder. Its purpose is to abstract away some detail about setting/deleting/etc. column in the mutation, the actual topology (node) mutation builder will only have to care about converting types and/or allowing only particular columns to be set. The class is using CRTP: derived classes provide access to the row being modified, schema and the timestamp. For the sake of commit diff readability, this commt only introduces this class and changes the builders to derive from it but no setter implementations are modified - this will be done in the next commit.	2023-06-20 16:41:08 +02:00
Piotr Dulikowski	bc84d59665	topology_state_machine: include information about features Now, the newly added `supported_features` and `enabled_features` columns are reflected in the `topology_state_machine` structure.	2023-06-20 16:41:05 +02:00
Piotr Dulikowski	e527e63abc	system_keyspace: introduce deserialize_set_column There are three places in system_keyspace.cc which deserialize a column holding a set of tokens and convert it to an unordered set of dht::token. The deserialization process involves a small number of steps that are the same in all of those places, therefore they can be abstracted away. This commit adds `deserialize_set_column` function which takes care of deserializing the column to `set_type_impl::native_type` which can be then passed to `decode_tokens`. The new function will also be useful for decoding set columns with cluster features, which will be handled in the next commit.	2023-06-20 16:37:09 +02:00

1 2 3 4 5 ...

37478 Commits