scylladb

Author	SHA1	Message	Date
Piotr Sarna	151d8f7c58	test: regenerate schema_change_test for storage options case Keyspace storage options series adds a new schema table: system_schema.scylla_keyspaces. The regenerated cases ensure that this new table is taken into account when the schema feature is available.	2022-04-08 09:17:01 +02:00
Piotr Sarna	4705a5fa42	test: improve output of schema_change_test regeneration Schema change test operates on pre-generated sstables, and sometimes this set of sstables needs to be regenerated. In order to make the regeneration process more ergonomic, the output is now directly copyable as valid C++ representation of UUIDs.	2022-04-08 09:17:01 +02:00
Nadav Har'El	7be3129458	cdc: don't need current keyspace to create the log table CDC registers to the table-creation hook (before_create_column_family) to add a second table - the CDC log table - to the same keyspace. The handler function (on_before_update_column_family() in cdc/log.cc) wants to retrieve the keyspace's definition, but that does NOT WORK if we create the keyspace and table in one operation (which is exactly what we intend to do in Alternator to solve issue #9868) - because at the time of the hook, the keyspace does not yet exist in the schema. It turns out that on_before_update_column_family() does not REALLY need the keyspace. It needed it to pass it on to make_create_table_mutations() but that function doesn't use the keyspace parameter passed to it! All it needs is the keyspace's name - which is in the schema anyway and doesn't need to be looked up. So in this patch we fix make_create_table_mutations() to not require the unused keyspace parameter - and fix the CDC code not to look for the keyspace that is no longer needed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220215162342.622509-1-nyh@scylladb.com>	2022-02-16 08:38:56 +02:00
Kamil Braun	a664ac7ba5	treewide: require `group0_guard` when performing schema changes `announce` now takes a `group0_guard` by value. `group0_guard` can only be obtained through `migration_manager::start_group0_operation` and moved, it cannot be constructed outside `migration_manager`. The guard will be a method of ensuring linearizability for group 0 operations.	2022-01-24 15:20:35 +01:00
Kamil Braun	283ac7fefe	treewide: pass mutation timestamp from call sites into `migration_manager::prepare_*` functions The functions which prepare schema change mutations (such as `prepare_new_column_family_announcement`) would use internally generated timestamps for these mutations. When schema changes are managed by group 0 we want to ensure that timestamps of mutations applied through Raft are monotonic. We will generate these timestamps at call sites and pass them into the `prepare_` functions. This commit prepares the APIs.	2022-01-24 15:12:50 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Gleb Natapov	f0a41c102a	test: move schema_change_test.cc to new schema announcement api	2022-01-13 23:10:18 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Gleb Natapov	38e1f85959	migration_manager: drop view_ptr array from announce_column_family_update() No users pass it any longer.	2021-12-11 12:31:07 +02:00
Botond Dénes	3f4f408bcf	schema: add get_reversed() A variant of make_reversed() which goes through the schema registry, teaching the schema to the registry if necessary. This effectively caches the result of the reversing and as an added bonus double reversing yields the very same schema C++ object that was the starting point. Closes #9365	2021-09-22 18:55:25 +03:00
Tomasz Grabiec	83113d8661	Merge "raft: new schema for storing raft snapshots" from Pavel Solodovnikov Previously, the layout for storing raft snapshot descriptors contained a `config` field, which had `blob` data type. That means `raft::configuration` for the snapshot was serialized as a whole in binary form. It's convenient to implement and is the most compact form of representing the data, but: 1. Hard to debug due to the need to de-serialize the data. 2. Plants a time bomb wrt. changing data layout and also the documentation in the future. Remove the `config` field from `system.raft_snapshots` and extract it to a separate `system.raft_config` table to store the data in exploded form. Also, modify the schema of `system.raft_snapshots` table in the following way: add a `server_id` field as a part of composite partition key ((group_id, server_id)) to be able to start multiple raft servers belonging to one raft group on the same scylla node. Rename `id` field in `raft_snapshots` to `snapshot_id` so it's self-documenting. Rename `snapshot_id` from clustering key since a given server can have only one snapshot installed at a time. Note that the `raft::server_address` stucture contains an opaque `info` member, which is `bytes`, but in the `raft_config` table we use `ip_addr inet` field, instead. We always know that the corresponding member field is going to contain an IP address (either v4 or v6) of a given raft server. So, now the snapshots schema looks like this: CREATE TABLE raft_snapshots ( group_id timeuuid, server_id uuid, snapshot_id uuid, idx int, term int, -- no `config` field here, moved to `raft_config` table PRIMARY KEY ((group_id, server_id)) ) CREATE TABLE raft_config ( group_id timeuuid, my_server_id uuid, server_id uuid, disposition text, -- can be either 'CURRENT` or `PREVIOUS' can_vote bool, ip_addr inet, PRIMARY KEY ((group_id, my_server_id), server_id, disposition) ); This way it's much easier to extend the schema with new fields, very easy to debug and inspect via CQL, and it's much more descriptive in terms of self-documentation. Tests: unit(dev) * manmanson/raft_snapshots_new_schema_v2: test: adjust `schema_change_test` to include new `system.raft_config` table raft: new schema for storing raft snapshots raft: pass server id to `raft_sys_table_storage` instance	2021-09-10 20:41:59 +02:00
Botond Dénes	f200c8104a	schema: introduce make_reversed() `make_revered()` creates a schema identical to the schema instance it is called on, with clustering order reversed. To distinguish the reverse schema from the original one, the node-id part of its version UUID is bit-flipped. This ensures that reversing a schema twice will result in the identical schema to the original one (although a different C++ object). This reversed schema will be used in reversed reads, so intermediate layers can be ignorant of the fact that the read happens in reverse.	2021-09-09 11:49:05 +03:00
Pavel Solodovnikov	b00443ab87	test: adjust `schema_change_test` to include new `system.raft_config` table Check that the new table uses null sharder. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-08-27 09:30:17 +03:00
Pavel Solodovnikov	c0854a0f62	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>	2021-08-26 12:21:12 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Piotr Sarna	7e6beabf27	migration_manager: allow table updates with timestamp In order to avoid needless schema disagreements, a way of announcing a schema change with fixed timestamp is added. That way, when nodes update schemas of their internal tables (e.g. during updates), it's possible for all nodes to use an identical timestamp for this operation, which in turn makes their digests identical.	2021-05-10 10:10:38 +02:00
Pavel Emelyanov	37c91c4c5c	tests: Use migration_manager from cql_test_env All the tests that need migration manager are run inside cql_test_env context and can use the migration manager from the env. For now this is still the global one, but next patch will change this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-23 17:13:24 +03:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Pavel Solodovnikov	9d17a654a6	raft: use null_sharder for raft tables Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210201105300.110210-1-pa.solodovnikov@scylladb.com>	2021-02-01 18:52:04 +02:00
Kamil Braun	bf115e7d69	schema_tables: put schema tables on shard 0 We use a custom sharder for all schema tables: every table under the `system_schema` keyspace, plus `system.scylla_table_schema_history`. This sharder puts all data on shard 0. To achieve this, we hardcode the sharder in initial schema object definitions. Furthermore - since the sharder is not stored inside schema mutations yet - whenever we deserialize schema objects from mutations, we modify the sharder based on the schema's keyspace and table names. A regression test is added to ensure no one forgets to set the special sharder for newly added schema tables. This test assumes that all newly added schema tables will end up in the `system_schema` keyspace (other tables may go unnoticed, unfortunately). Closes #7947	2021-01-28 13:28:22 +02:00
Piotr Sarna	e26aa836a9	schema_change_test: skip distributed system tables in digest With previous design of the schema change test, a regeneration was necessary each time a new distributed system table was added. It was not the original purpose of the test to keep track of new distributed tables which simply propagate on their own, so the test case is now modified: internal distributed tables are not part of the schema digest anymore, which means that changes inside them will not cause mismatches. This change involves a one-shot regeneration of all digests, which due to historical reasons included internal distributed tables in the digest, but no further regenerations should ever be necessary when a new internal distributed table is added.	2021-01-04 10:24:40 +01:00
Gleb Natapov	d3aa17591c	migration_manager: drop announce_locally flag It looks like the history of the flag begins in Cassandra's https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is introduced to speedup tests by not needing to start the gossiper. The thing is we always start gossiper in our cql tests, so the flag only introduce noise. And, of course, since we want to move schema to use raft it goes against the nature of the raft to be able to apply modification only locally, so we better get rid of the capability ASAP. Tests: units(dev, debug) Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>	2021-01-03 13:58:09 +02:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Piotr Jastrzebski	e9072542c1	Mark CDC as GA Enable CDC by default. Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:13 +01:00
Kamil Braun	ff78a3c332	cdc: rename CDC description tables... again Commit `a6ad70d3da` changed the format of stream IDs: the lower 8 bytes were previously generated randomly, now some of them have semantics. In particular, the least significant byte contains a version (stream IDs might evolve with further releases). This is a backward-incompatible change: the code won't properly handle stream IDs with all lower 8 bytes generated randomly. To protect us from subtle bugs, the code has an assertion that checks the stream ID's version. This means that if an experimental user used CDC before the change and then upgraded, they might hit the assertion when a node attempts to retrieve a CDC generation with old stream IDs from the CDC description tables and then decode it. In effect, the user won't even be able to start a node. Similarly as with the case described in `d89b7a0548`, the simplest fix is to rename the tables. This fix must get merged in before CDC goes out of experimental. Now, if the user upgrades their cluster from a pre-rename version, the node will simply complain that it can't obtain the CDC generation instead of preventing the cluster from working. The user will be able to use CDC after running checkAndRepairCDCStreams. Since a new table is added to the system_distributed keyspace, the cluster's schema has changed, so sstables and digests need to be regenerated for schema_digest_test.	2020-08-31 11:33:14 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Pavel Emelyanov	8618a02815	migration_manager: Remove db/schema_tables.hh inclustion into header The schema_tables.hh -> migration_manager.hh couple seems to work as one of "single header for everyhing" creating big blot for many seemingly unrelated .hh's. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:43 +03:00
Kamil Braun	1b7f1806ac	test: improve comments on test_schema_digest_does_not_change This test tends to cause a lot of discussion resulting from not understanding what is actually being tested. Closes https://github.com/scylladb/scylla/issues/6582.	2020-06-05 14:30:02 +02:00
Kamil Braun	d89b7a0548	cdc: rename CDC description tables Commit `968177da04` has changed the schema of cdc_topology_description and cdc_description tables in the system_distributed keyspace. Unfortunately this was a backwards-incompatible change: these tables would always be created, irrespective of whether or not "experimental" was enabled. They just wouldn't be populated with experimental=off. If the user now tries to upgrade Scylla from a version before this change to a version after this change, it will work as long as CDC is protected b the experimental flag and the flag is off. However, if we drop the flag, or if the user turns experimental on, weird things will happen, such as nodes refusing to start because they try to populate cdc_topology_description while assuming a different schema for this table. The simplest fix for this problem is to rename the tables. This fix must get merged in before CDC goes out of experimental. If the user upgrades his cluster from a pre-rename version, he will simply have two garbage tables that he is free to delete after upgrading. sstables and digests need to be regenerated for schema_digest_test since this commit effectively adds new tables to the system_distributed keyspace. This doesn't result in schema disagreement because the table is announced to all nodes through the migration manager.	2020-06-05 09:59:16 +02:00
Piotr Dulikowski	38b7f1ad45	unit tests: register cdc extension before tests In the following commits, using cdc in tests will require registering cdc extension explicitly in db config.	2020-03-05 16:11:20 +01:00
Konstantin Osipov	ff3f9cb7cf	test: stop using BOOST_TEST_MESSAGE() for logging We use boost test logging primarily to generate nice XML xunit files used in Jenkins. These XML files can be bloated with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes of build archives, on every build. Let's use seastar logger for test logging instead, reserving the use of boost log facilities for boost test markup information.	2020-03-05 11:38:11 +03:00
Piotr Jastrzebski	04fe18de0f	system_distributed_keyspace: add cdc-related tables The cdc_topology_description table will be used internally by nodes to send new CDC stream generations to other nodes. The cdc_description table is a user-facing table, used to inform users about new sets of CDC streams. Regenerate sstables and digests for schema_change_test. We don't need to protect this change by a schema feature: when a node creates these tables, it announces them to all other nodes. If schema agreement happens before this migration, all nodes will use a digest calculated without these tables. If it happens after, then all nodes will eventually know about these tables and use a digest calculated with these tables.	2020-01-30 11:10:08 +01:00
Pavel Emelyanov	9e4b41c32a	tests: Switch on migration notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Piotr Jastrzebski	396e35bf20	cdc: add schema_change test for cdc_options The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after CDC was enabled and a table with CDC enabled is created, in order to make sure that the digest computed including CDC column does not change spuriously as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	caa0a4e154	tests: disable CDC in schema_change_tests Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Avi Kivity	c150f2e5d7	schema_tables, cdc: don't store empty cdc columns in scylla_tables An empty cdc column in scylla_tables is hashed differently from a missing column. This causes schema mismatch when a schema is propagated to another node, because the other node will redact the schema column completely if the cluster feature isn't enabled, and an empty value is hashed differently from a missing value. Store a tombstone instead. Tombstones are removed before digesting, so they don't affect the outcome. This change also undoes the changes in `386221da84` ("schema_tables: handle 'cdc' options") to schema_change_test test_merging_does_not_alter_tables_which_didnt_change. That change enshrined the breakage into the test, instead of fixing the root cause, which was that we added an an extra mutation to the schema (for cdc options, which were disabled).	2020-01-05 14:36:18 +02:00
Konstantin Osipov	1c8736f998	tests: move all test source files to their new locations 1. Move tests to test (using singular seems to be a convention in the rest of the code base) 2. Move boost tests to test/boost, other (non-boost) unit tests to test/unit, tests which are expected to be run manually to test/manual. Update configure.py and test.py with new paths to tests.	2019-12-16 17:47:42 +03:00

37 Commits