scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	4b4ce015aa	system-keyspace: Keep UUID value when saving The set_local_host_id() accepts UUID references and starts to save it in local keyspace and in all shards' local cache. Before it was coroutinized the UUID was copied on captures and survived, after it it remains references. The problem is that callers pass local variables as arguments that go away "really soon". Fix it to accept UUID as value, it's short enough for safe and painless copy. fixes: #9425 tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211004145421.32137-1-xemul@scylladb.com>	2021-10-04 18:21:44 +03:00
Pavel Emelyanov	9f5fd8b5c0	system_keyspace: Keep local_host_id on local_cache Some places in the code want to have future-less access to the host id, now they do it all by themselves. Local cache seems to be a better place (for the record -- some time ago the "better place" argument justified cached host id relocation from the storage_service onto the database). While at it -- add the future-less getter for the host_id to be used further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-30 10:54:38 +03:00
Pavel Emelyanov	beb345c00a	code: Rename get_local_host_id() into load_...() There will appear the future-less method which better deserves the get_ prefix, so give the existing method the load_ one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-30 10:33:57 +03:00
Avi Kivity	e9ae9279e8	system_keyspace: reindent after conversion to class Conversion to class left indentation in ruins, but that can be easily fixed. 'git diff -w' reports no changes. Closes #9339	2021-09-14 08:49:24 +03:00
Avi Kivity	e70b9d4835	system_keyspace: convert from namespace to class All the namespace scope functions in system_keyspace have no place to store context, so they must store their context in global variables. This prevents conversion of those global variables to constructor-provided depdendencies. Take the first step towards providing a place to store the context by converting system_keyspace to a class. All the functions are static, so no context is yet available, but we can de-static-ify them incrementally in the future and store the context in class members. Indentation is a mess, but can be easily fixed later.	2021-09-13 15:14:14 +03:00
Avi Kivity	115d6d8d4c	system_keyspace: prepare forward-declared members In anticipation of making system_keyspace a class instead of a namespace, rename any member that is currently forward-declared, since one can't forward-declare a class member. Each member is taken out of the system_keyspace namespace and gains a system_keyspace prefix. Aliases are added to reduce code churn. The result isn't lovely, but can be adjusted later.	2021-09-13 15:11:26 +03:00
Avi Kivity	c6ce81d6a0	system_keyspace: rearrange legacy subnamespace Merge two fragments together, in anticipation of making 'legacy' s struct instead of a namespace (when system_keyspace is a class, we can't nest a namespace inside it).	2021-09-13 15:10:15 +03:00
Avi Kivity	6d379ae6f9	system_keyspace: remove outdated java code This code has been rewritten and not removed, or is not needed. Remove it to reduce clutter.	2021-09-13 15:08:57 +03:00
Pavel Solodovnikov	8d3c0ee9b6	raft: new schema for storing raft snapshots Previously, the layout for storing raft snapshot descriptors contained a `config` field, which had `blob` data type. That means `raft::configuration` for the snapshot was serialized as a whole in binary form. It's convenient to implement and is the most compact form of representing the data, but: 1. Hard to debug due to the need to de-serialize the data. 2. Plants a time bomb wrt. changing data layout and also the documentation in the future. Remove the `config` field from `system.raft_snapshots` and extract it to a separate `system.raft_config` table to store the data in exploded form. Also, modify the schema of `system.raft_snapshots` table in the following way: add a `server_id` field as a part of composite partition key ((group_id, server_id)) to be able to start multiple raft servers belonging to one raft group on the same scylla node. Rename `id` field in `raft_snapshots` to `snapshot_id` so it's self-documenting. Rename `snapshot_id` from clustering key since a given server can have only one snapshot installed at a time. Note that the `raft::server_address` stucture contains an opaque `info` member, which is `bytes`, but in the `raft_config` table we use `ip_addr inet` field, instead. We always know that the corresponding member field is going to contain an IP address (either v4 or v6) of a given raft server. So, now the snapshots schema looks like this: CREATE TABLE raft_snapshots ( group_id timeuuid, server_id uuid, snapshot_id uuid, idx int, term int, -- no `config` field here, moved to `raft_config` table PRIMARY KEY ((group_id, server_id)) ) CREATE TABLE raft_config ( group_id timeuuid, my_server_id uuid, server_id uuid, disposition text, -- can be either 'CURRENT` or `PREVIOUS' can_vote bool, ip_addr inet, PRIMARY KEY ((group_id, my_server_id), server_id, disposition) ); This way it's much easier to extend the schema with new fields, very easy to debug and inspect via CQL, and it's much more descriptive in terms of self-documentation. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-08-27 09:24:46 +03:00
Pavel Solodovnikov	c0854a0f62	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>	2021-08-26 12:21:12 +03:00
Juliusz Stasiewicz	f8067d938d	storage_service: Pass the reference down to system_keyspace According to the policy of avoiding globals.	2021-07-20 14:18:24 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Kamil Braun	4658adbe18	tree-wide: introduce cdc::generation_id_v2 This is a new type of CDC generation identifiers. Compared to old IDs, additionally to the timestamp it contains an UUID. These new identifiers will allow a safer and more efficient algorithm of introducing new generations into a cluster (introduced in a later commit). For now, nodes keep using the old identifier format when creating new generations and whenever they learn about a new CDC generation from gossip they assume that it also is stored in the v1 format. But they do know how to (de)serialize the second format and how to persist new identifiers in local tables.	2021-05-24 17:50:21 +02:00
Kamil Braun	99fd2244a3	tree-wide: introduce cdc::generation_id type This is a follow-up to the previous commit. Each CDC generation has a timestamp which denotes a logical point in time when this generation starts operating. That same timestamp is used to identify the CDC generation. We use this identification scheme to exchange CDC generations around the cluster. However, the fact that a generation's timestamp is used as an ID for this generation is an implementation detail of the currently used method of managing CDC generations. Places in the code that deal with the timestamp, e.g. functions which take it as an argument (such as handle_cdc_generation) are often interested in the ID aspect, not the "when does the generation start operating" aspect. They don't care that the ID is a `db_clock::time_point`. They may sometimes want to retrieve the time point given the ID (such as do_handle_cdc_generation when it calls `cdc::metadata::insert`), but they don't care about the fact that the time point actually IS the ID. In the future we may actually change the specific type of the ID if we modify the generation management algorithms. This commit is an intermediate step that will ease the transition in the future. It introduces a new type, `cdc::generation_id`. Inside it contains the timestamp, so: 1. if a piece of code doesn't care about the timestamp, it just passes the ID around 2. if it does care, it can simply access it using the `get_ts` function. The fact that `get_ts` simply accesses the ID's only field is an implementation detail. Using the occasion, we change the `do_handle_cdc_generation_intercept...` function to be a standard function, not a coroutine. It turns out that - depending on the shape of the passed-in argument - the function would sometimes miscompile (the compiled code would not copy the argument to the coroutine frame).	2021-04-07 13:47:13 +02:00
Kamil Braun	e486e0f759	tree-wide: rename "cdc streams timestamp" to "cdc generation id" Each CDC generation always has a timestamp, but the fact that the timestamp identifies the generation is an implementation detail. We abstract away from this detail by using a more generic naming scheme: a generation "identifier" (whatever that is - a timestamp or something else). It's possible that a CDC generation will be identified by more than a timestamp in the (near) future. The actual string gossiped by nodes in their application state is left as "CDC_STREAMS_TIMESTAMP" for backward compatibility. Some stale comments have been updated.	2021-04-06 13:15:31 +02:00
Kamil Braun	1019ff07cb	db: system_keyspace: group cdc functions in single place	2021-04-06 13:15:31 +02:00
Kamil Braun	9bdd000e97	cdc: rewrite streams to the new description table Nodes automatically ensure that the latest CDC generation's list of streams is present in the streams description table. When a new generation appears, we only need to update the table for this generation; old generations are already inserted. However, we've changed the description table (from `cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The existing mechanism only ensures that the latest generation appears in the new description table. This commit adds an additional procedure that rewrites the older generations as well, if we find that it is necessary to do so (i.e. when some CDC log tables may contain data in these generations).	2021-02-18 11:44:59 +01:00
Gleb Natapov	d8345c67d9	Consolidate system and non system keyspace creation The code that creates system keyspace open code a lot of things from database::create_keyspace(). The patch makes create_keyspace() suitable for both system and non system keyspaces and uses it to create system keyspaces as well. Message-Id: <20210209160506.1711177-1-gleb@scylladb.com>	2021-02-09 17:18:04 +01:00
Pavel Solodovnikov	cf5b8c4b79	raft: create `system.raft` and `system.raft_snapshots` tables System raft table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 01:59:04 +03:00
Piotr Sarna	f293c59a46	system_keyspace: migrate helper functions to string_view Functions for checking if the keyspace is system/internal were based on sstring references, which is impractical compared to string views and may lead to unnecessary creation of sstring instances.	2021-01-04 09:47:01 +01:00
Pavel Emelyanov	fea4a5492f	system-keyspace: Remove dead code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201123151453.27341-1-xemul@scylladb.com>	2020-11-23 17:16:15 +02:00
Pavel Emelyanov	689fd029a1	query-context: Remove database from qctx No users of qctx::db are left. One global database reference less. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	fb20d9cd1e	system-keyspace: Remove dead code Not called anywhare. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	78298ec776	init: Use local messaging reference in main There are few places that initialize db and system_ks and need the messaging service. Pass the reference to it from main instead of using the global helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Calle Wilund	30a700c5b0	system_keyspace: Remove support for legacy truncation records Fixes #6341 Since scylla no longer supports upgrading from a version without the "new" (dedicated) truncation record table, we can remove support for these and the migtration thereof. Make sure the above holds whereever this is committed. Note that this does not remove the "truncated_at" field in system.local.	2020-08-03 17:16:26 +03:00
Benny Halevy	e39fbe1849	compaction: move compaction uuid generation to compaction_info We'd like to use the same uuid both for printing compaction log messages and to update compaction_history. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 13:55:23 +03:00
Pavel Emelyanov	3c2066bd78	system_keyspace: Cleanup setup() from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Gleb Natapov	97af6bb0bd	lwt: make load_paxos_state to take partition_key_view instead of a deference Some caller have partition_key_view, but not partition_key, so thy need to create a temporary and copy just to pass a reference. Change it by accepting a view.	2020-04-22 13:51:43 +03:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Pavel Emelyanov	6050c559a3	storage_service: Move get_local_tokens wrapper This wrapper just makes sure the system_keyspace::get_saved_tokens reports non empty result. Move them close together. As a side effect -- get rid of penultimate global storage_service reference from size_estimates_virtual_reader (the last one will be removed soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:31 +03:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00
Pavel Emelyanov	74fd3466b5	start: Pass needed feature as argument into migrate_truncation_records As a nice side-effect this stops using global storage service instance by this function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Kamil Braun	7fa30f6f34	db: add a system.cdc_local table with CDC generation timestamp This will be used to persist CDC streams generation timestamp proposed by a joining node in case the node crashes or restarts, similarly to the way tokens are persisted. The get_saved_cdc_streams_timestamp method retrieves the generation timestamp from the system table. It will be used by a restarting node. The update_cdc_streams_timestamp method saves CDC stream generation timestamp of the calling node in the system table. A joining node will persist the timestamp before it proposes it to other nodes.	2020-01-30 11:10:08 +01:00
Gleb Natapov	0fc48515d8	paxos: mark paxos table schema as "always sync" We want all writes to paxos table to be persisted on a storage before declared completed.	2020-01-15 12:15:42 +02:00
Piotr Sarna	36ec43a262	Merge "add table with connected cql clients" from Juliusz This change introduces system.clients table, which provides information about CQL clients connected. PK is the client's IP address, CK consists of outgoing port number and client_type (which will be extended in future to thrift/alternator/redis). Table supplies also shard_id and username. Other columns, like connection_stage, driver_name, driver_version..., are currently empty but exist for C* compatibility and future use. This is an ordinary table (i.e. non-virtual) and it's updated upon accepting connections. This is also why C*'s column request_count was not introduced. In case of abrupt DB stop, the table should not persist, so it's being truncated on startup. Resolves #4820	2020-01-14 10:01:07 +02:00
Piotr Jastrzebski	c08e6985cd	cdc: allow cluster rolling upgrade Addition of cdc column in scylla_tables changes how schema digests are calculated, and affect the ABI of schema update messages (adding a column changes other columns' indexes in frozen_mutation). To fix this, extend the schema_tables mechanism with support for the cdc column, and adjust schemas and mutations to remove that column when sending schemas during upgrade. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Juliusz Stasiewicz	7fdc8563bf	system_keyspace: Added infrastructure for table `system.clients' I used the following as a reference: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java At this moment there is only info about IP, clients outgoing port, client 'type' (i.e. CQL/thrift/alternator), shard ID and username. Column `request_count' is NOT present and CK consists of (`port', `client_type'), contrary to what C's has: (`port'). Code that notifies `system.clients` about new connections goes to top-level files `connection_notifier.`. Currently only CQL clients are observed, but enum `client_type` can be used in future to notify about connections with other protocols.	2019-12-17 11:31:28 +01:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Gleb Natapov	d1774693bf	lwt: Define state needed by paxos and persist it Paxos protocol relies on replicas having a state that persists over crashes/restarts. This patch defines such state and stores it in the database itself in the paxos table to make it persistent. The stored state is: in_progress_ballot - promised ballot proposal - accepted value proposal_ballot - the ballot of the accepted value most_recent_commit - most recently learned value most_recent_commit_at - the ballot of the most recently learned value	2019-10-27 23:21:51 +03:00
Kamil Braun	fb1e35f032	db: remove system_keyspace::update_local_tokens That was dead code.	2019-10-21 11:11:03 +02:00
Kamil Braun	1b0c8e5d99	db: improve documentation for update_tokens and get_saved_tokens in system_keyspace	2019-10-21 11:11:03 +02:00
Kamil Braun	8c8a17a0fe	db::system_keyspace::update_tokens: take tokens by const ref	2019-10-21 10:38:49 +02:00
Piotr Jastrzebski	caa6798f2c	system_keyspace: add storage_service param to setup Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	0211541d84	system_keyspace: add accessors for SCYLLA_LOCAL Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	4c205b733a	system_keyspace: Add scylla_local Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Rafael Ávila de Espíndola	d17083b483	Create a system.large_cells table This is analogous to the system.large_rows table, but holds individual cells, so it also needs the column name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	66d8a0cf93	Create a system.large_rows table This is analogous to the system.large_partitions table, but holds individual rows, so it also needs the clustering key of the large rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00

1 2 3

123 Commits