scylladb

Author	SHA1	Message	Date
Kamil Braun	41f5b7e69e	Merge branch 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla into next * 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla: main: allow joining raft group0 before waiting for gossiper to settle service: raft_group0: make `join_group0` re-entrant service: storage_service: add `join_group0` method raft_group_registry: update gossiper state only on shard 0 raft: don't update gossiper state if raft is enabled early or not enabled at all gms: feature_service: add `cluster_uses_raft_mgmt` accessor method db: system_keyspace: add `bootstrap_needed()` method db: system_keyspace: mark getter methods for bootstrap state as "const"	2022-04-14 16:42:20 +02:00
Piotr Sarna	567c0d0368	db,gms: add KEYSPACE_STORAGE_OPTIONS feature The feature represents the ability to store storage options in keyspace metadata: represented as a map of options, e.g. storage type, bucket, authentication details, etc.	2022-04-08 09:17:00 +02:00
Pavel Solodovnikov	ccb59ba6c7	gms: feature_service: add `cluster_uses_raft_mgmt` accessor method Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-07 12:30:21 +03:00
Pavel Solodovnikov	011942dcce	raft: move tracking `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` feature to raft Move the listener from feature service to the `raft_group_registry`. Enable support for the `USES_RAFT_CLUSTER_MANAGEMENT` feature when the former is enabled. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-03-18 09:54:25 +03:00
Pavel Solodovnikov	7ea4d44508	gms: feature_service: update `system.local#supported_features` when feature support changes Also, change the signature of `support()` method to return `future<>` since it's now a coroutine. Adjust existing call sites. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-03-18 09:54:21 +03:00
Michael Livshin	d370558279	add "ME_SSTABLE" cluster feature Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michał Sala	3789a4d02b	gms: add PARALLELIZED_AGGREGATION feature This new feature will be used to determined whether the whole cluster is ready to parallelize execution of aggregation queries.	2022-02-01 21:14:41 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Pavel Solodovnikov	904de0a094	gms: introduce two gossip features for raft-based cluster management The patch adds the `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` and `USES_RAFT_CLUSTER_MANAGEMENT` gossiper features. These features provide a way to organize the automatic switch to raft-based cluster management. The scheme is as follows: 1. Every new node declares support for raft-based cluster ops. 2. At the moment, no nodes in the cluster can actually use raft for cluster management, until the `SUPPORTS` feature is enabled (i.e. understood by every node in the cluster). 3. After the first `SUPPORTS` feature is enabled, the nodes can declare support for the second, `USES*` feature, which means that the node can actually switch to use raft-based cluster ops. The scheme ensures that even if some nodes are down while transitioning to new bootstrap mechanism, they can easily switch to the new procedure, not risking to disrupt the cluster. The features are not actually wired to anything yet, providing a framework for the integration with `raft_group0` code, which is subject for a follow-up series. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20211220081318.274315-1-pa.solodovnikov@scylladb.com>	2021-12-30 11:05:45 +02:00
Piotr Jastrzebski	02d5997377	gms: add SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT feature This new feature will be used to determined whether the whole cluster is ready to use additional page_size field in max_result_size. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-12-28 16:38:02 +01:00
Pavel Solodovnikov	5b5fbb4b33	gms: feature_service: expose registered features map This will be used for re-enabling previously enabled cluster features, which will be introduces in later patches. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-11-28 14:18:11 +02:00
Pavel Solodovnikov	a2f5ad432f	gms: feature_service: persist enabled features Save each feature enabled through the feature_service instance in the `system.scylla_local` under the 'enabled_features' key. The features would be persisted only if the underlying query context used by `db::system_keyspace` is initialized. Since `system.scylla_local` table is essentially a string->string map, use an ad-hoc method for serializing enabled features set: the same as used in gossiper for translating supported features set via gossip. The entry should be saved before we enable the feature so that crash-after-enable is safe. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-11-28 14:18:11 +02:00
Pavel Solodovnikov	e891f874df	gms: move `to_feature_set()` function from gossiper to feature_service This utility will also be used for de-serialization of persisted enabled features, which will be introduced in a later patch. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-11-28 14:18:11 +02:00
Nadav Har'El	4ffd8c1f2b	alternator: stub TTL operations This patch adds stubs for the UpdateTimeToLive and DescribeTimeToLive operations to Alternator. These operations can enable, disable, or inquire about, the chosen expiration-time attribute. Currently, the information about the chosen attribute is only saved, with no actual expiration of any items taking place. Some of the tests for the TTL feature start to pass, so their xfail tag is removed. Because this this new feature is incomplete, it is not enabled unless the "alternator-ttl" experimental feature is enabled. Moreover, for these operations to be allowed, the entire cluster needs to support this experimental feature, because all nodes need to participate in the data expiration - if some old nodes don't support Alternator TTL, some of the data they hold won't get expired... So we don't allow enabling TTL until all the nodes in the cluster support this feature. The implementation is in a new source file, alternator/ttl.cc. This source file will continue to grow as we implement the expiration feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-09-19 21:05:21 +03:00
Piotr Sarna	da67c594c8	gms: add UDA feature UDA stands for user-defined aggregates and the feature implies that the whole cluster supports them.	2021-08-13 11:14:12 +02:00
Piotr Jastrzebski	1bdcef6890	features: assume MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES are always enabled These features have been around for over 2 years and every reasonable deployment should have them enabled. The only case when those features could be not enabled is when the user has used enable_sstables_mc_format config flag to disable MC sstable format. This case has been eliminated by removing enable_sstables_mc_format config flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-25 10:12:00 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Kamil Braun	2ac9239f6a	gms: introduce CDC_GENERATIONS_V2 feature When a node joins this feature (which it does immediately when upgrading to a version that has this commit), it says: "I understand the new generation storage format and the new identifier format". Thus, when the feature becomes enabled - after all nodes have joined it - it means that it's safe to create new generations using these new storage/ID formats.	2021-05-25 16:07:23 +02:00
Piotr Sarna	e42dee6afb	gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true The feature is assumed to be true due to being over 2 years old. It's still advertised in gossip, but it's assumed to always be present.	2021-03-30 09:37:13 +02:00
Piotr Sarna	08c4350968	gms: make TRUNCATION_TABLE feature unconditionally true Turns out the feature was not used presently. Historically, the commit which removed the support is `30a700c5b0` .	2021-03-30 09:36:45 +02:00
Piotr Sarna	c070178c7e	gms: make ROW_LEVEL_REPAIR feature unconditionally true The feature is assumed to be true due to being over 2 years old. It's still advertised in gossip, but it's assumed to always be present.	2021-03-30 09:36:11 +02:00
Botond Dénes	5c84aa52db	gms: add RANGE_SCAN_DATA_VARIANT cluster feature To control the transition to the data variant of range scans. As there is a difference in how the data and mutation variants calculate pages sizes, the transition to the former has to happen in a controlled manner, when all nodes in the cluster support it, to avoid artificial differences in page content and subsequently triggering false-positive read repair.	2021-03-02 07:53:53 +02:00
Eliran Sinvani	63b794d104	schema: recalculate digest when computed_columns feature is enabled The schema digest is affected by the computed_columns feature, this means that we have to recalculate our schema digest when this feature is enabled.	2021-02-11 13:48:58 +02:00
Piotr Jastrzebski	d2897d8f8b	alternator: guard streams with an experimental flag Add new alternator-streams experimental flag for alternator streams control. CDC becomes GA and won't be guarded by an experimental flag any more. Alternator Streams stay experimental so now they need to be controlled by their own experimental flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:16 +01:00
Piotr Grabowski	6624d933c9	feature: add correct_idx_token_in_secondary_index Add new correct_idx_token_in_secondary_index feature, which will be used to determine if all nodes in the cluster support new token_column_computation. This column computation will replace legacy_token_column_computation in secondary indexes, which was incorrect as this column computation produced values that when compared with unsigned comparison (CQL type bytes comparison) resulted in different ordering than token signed comparison. See issue: https://github.com/scylladb/scylla/issues/7443	2020-11-04 12:02:42 +01:00
Piotr Sarna	defe6f49df	gms: remove unused feature bits Checks for features introduced over 2 years ago were removed in previous commits, so all that is left is removing the feature bits itself. Note that the feature strings are still sent to other nodes just to be double sure, but the new code assumes that all these features are implicitly enabled.	2020-09-14 12:35:28 +02:00
Piotr Sarna	21a77612b3	gms: add a cluster feature for fixed hashing The new hashing routine which properly takes null cells into account is now enabled if the whole cluster is aware of it.	2020-09-10 13:16:44 +02:00
Benny Halevy	e8d7744040	features: add MD_SSTABLE_FORMAT cluster feature Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Pavel Emelyanov	d53d2bb664	features: Introduce and use masked features Nowadays the knowledge about known/supported features is scattered between frature_service and storage_service. The latter uses knowledge about the selected _sstables_format to alter the "supported" set. Encapsulate this knowledge inside the feature_service with the help of "masked_features" -- those, that shouldn't be advertized to other nodes. When only maskable feature for today is the UNBOUNDED_RANGE_TOMBSTONES one. Nowadays it's reported as supported only if the sstables format is MC. With this patch it starts as masked and gets unmasked when the sstables format is selected to be MC, so the change is correct. This will make it possible to move sstables_format from storage service to anywhere else. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:21:07 +03:00
Pavel Emelyanov	bb3a71529a	features: Get rid of per-features booleans The set of bool enable_something-s on feature_fonfig duplicates the disabled_features set on it, so remove the former and make full use of the latter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:09:12 +03:00
Piotr Jastrzebski	0475dab359	feature: add PER_TABLE_CACHING feature This feature will ensure that caching can be switched off per table only after the whole cluster supports it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-05 08:14:49 +02:00
Konstantin Osipov	9948f548a5	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com>	2020-04-01 09:12:21 +02:00
Nadav Har'El	a0f025f4ce	sstable: LA format is the default, so ignore "LA_SSTABLE" feature flag The previous patch made the LA format the default. We no longer need to choose between writing the older KA format or LA, so the LA_SSTABLE cluster feature has became unnecessary. Unfortunately, we cannot completely remove this feature: Since commit `4f3ce42163` we cannot remove cluster features because this node will refuse to join a cluster which already agreed on features that it lacks - thinking it is an old node trying to join a new cluster. So the LA_SSTABLE feature flag remains, and we continue to advertise that our node supports it. We just no longer care about what other nodes advertised for it, so we can remove a bit of code that cared. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200324232607.4215-3-nyh@scylladb.com>	2020-03-25 13:00:28 +01:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Rafael Ávila de Espíndola	9445608df6	gms: Add a default constructor to feature_config Also move it out of line while at it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200316180321.45914-1-espindola@scylladb.com>	2020-03-20 13:34:26 +01:00
Avi Kivity	ee9df91a76	Merge "Allow setting partitioner per table" from Piotr " This PR makes it possible to enable the usage of different partitioner for each table. If no table-specific partitioner is set for a given table then a default partitioner is used. The PR is composed of the following parts: - Introduction of schema::get_partitioner that still returns dht::global_partitioner - Replacement of all the usage of dht::global_partitioner with schema::get_partitioner - Making it possible to set table-specific partitioner in a schema_builder - Remove all the places that were setting default partitioner except for main.cc (mostly tests) - Move default partitioner from i_partitioner to schema.cc and hide it from the rest of the codebase - Remove dht::global_partitioner After this PR there's no such thing as global partitioner at all. There is only a default partitioner but it still has to be accessed through schema::get_partitioner. There are some intermediate states in which i_partitioner is stored as shared_ptr in the schema but the final version keeps it by const&. The PR does not enable per table partitioner end-to-end. Just the internals of the single node are covered. I still have to deal with: - Making sure a table has the same partitioner on each node - Allowing user to set up a table-specific partitioner on table - Signal driver about what partitioner is used by a given table - Persist partitioner info for each table that does not use default partitioner. Fixes #5493 Tests: unit(dev, release, debug), dtest(byo) " * 'per_table_partitioner' of https://github.com/haaawk/scylla: schema: drop optional from _partitioner field make_multishard_combining_reader: stop taking partitioner split_range_to_single_shard: stop taking partitioner as argument tests: remove unused murmur3 includes partitioner: move default_partitioner to schema.cc partitioner: hide dht::default_partitioner schema: include partitioner name in scylla tables mutation schema: make it possible to set custom partitioner scylla_tables: add partitioner column schema_features: add PER_TABLE_PARTITIONERS feature features: add PER_TABLE_PARTITIONERS feature	2020-03-16 11:13:47 +02:00
Rafael Ávila de Espíndola	69874f4330	feature_service: Remove default constructor This makes user that feature_config_from_db_config is used for both tests and main.cc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200312153453.37282-2-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Piotr Jastrzebski	90df9a44ce	features: add PER_TABLE_PARTITIONERS feature This new feature is required because we now allow setting partitioner per table. This will influence the digest of table schema so we must not include partitioner name into the digest unless we know that the whole cluster already supports per table partitioners. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Rafael Ávila de Espíndola	a1ca83b067	gms: Fix static initialization order problem In test_services.cc there is gms::feature_service test_feature_service; And the feature_service constructor has , _lwt_feature(*this, features::LWT) But features::LWT is a global sstring constructed in another file. Solve the problem by making the feature strings constexpr std::string_view. I found the issue while trying to benchmark the std::string switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200309225749.36661-1-espindola@scylladb.com>	2020-03-13 12:37:22 +02:00
Pavel Emelyanov	aa6b1efc35	features: Unfriend storage_service The storage service no longer needs to mess with feature config. It only needs two features to register onself in, but this can be solved by respective cluster_supports_foo helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	9b67226715	features: Simplify feature registration Now features are registered into a map of vectors, but it looks like the vector is always 1-item long and is used to keep pointer on feature, instead of the feature itself. Switch it into map of reference_wrapper-s. Before this patch we could register more than one feature under the same name, now we can't. But this seems to be OK, as we don't actually do this. To catch violations of this restriction there's an assert() in the feature_service::register_feature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	da6af8bde7	features: Introduce known_feature_set There are two masks -- supported and known. They differ in unbounded_range_tombstones one which is set depending on the sstables format in use. Since the feature_service doesn't know anything about sstables format, the logic is reverted -- the feature service reports back the known mask (all features) and storage_service clears the unbounded_range_tombstones if the sst format is low -- but is (hopefully) left intact. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	4a01f468dd	features: Move disabled features set from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	a5b1998247	features: Move schema_features helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	b0638606e5	features: Move all features from storage_service to feature_service And leave some temporary storage_service->feature links. The plan is to make every subsystem that needs storage_service for features stop doing so and switch on the feature_service. The feature_service is the service w/o any dependencies, it will be freed last, thus making the service dependency tree be a tree, not a graph with loops. While at it -- make all const-s not have _FEATURE suffix (now there are both options) and const-qualify cluster_supports_lwt(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	49de3b4ad8	storage_service: Use feature_config from _feature_service This makes the testing/prod config logic much simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	052259f8ef	features: Add feature_config Some features take db::config to find out whether to be enabled or disabled. This creates unwanted dependency between database and features, so split the features configuration explicitly. Also this will make the "this is for testing env only" logic cleaner and simpler to understand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Piotr Jastrzebski	9934740c39	Register feature listeners in storage_service Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:36:58 +02:00
Avi Kivity	4e553b692e	gossiper: split feature storage into a new feature_service Feature lifetime is tied to storage_service lifetime, but features are now managed by gossip. To avoid circular dependency, add a new feature_service service to manage feature lifetime. To work around the problem, the current code re-initializes features after gossip is initialized. This patch does not fix this problem; it only makes it possible to solve it by untyping features from gossip.	2018-12-06 16:31:04 +02:00

50 Commits