Commit Graph

37 Commits

Author SHA1 Message Date
Piotr Sarna
151d8f7c58 test: regenerate schema_change_test for storage options case
Keyspace storage options series adds a new schema table:
system_schema.scylla_keyspaces. The regenerated cases ensure
that this new table is taken into account when the schema feature
is available.
2022-04-08 09:17:01 +02:00
Piotr Sarna
4705a5fa42 test: improve output of schema_change_test regeneration
Schema change test operates on pre-generated sstables, and sometimes
this set of sstables needs to be regenerated. In order to make the
regeneration process more ergonomic, the output is now directly
copyable as valid C++ representation of UUIDs.
2022-04-08 09:17:01 +02:00
Nadav Har'El
7be3129458 cdc: don't need current keyspace to create the log table
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.

It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.

So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
2022-02-16 08:38:56 +02:00
Kamil Braun
a664ac7ba5 treewide: require group0_guard when performing schema changes
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.

The guard will be a method of ensuring linearizability for group 0
operations.
2022-01-24 15:20:35 +01:00
Kamil Braun
283ac7fefe treewide: pass mutation timestamp from call sites into migration_manager::prepare_* functions
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
2022-01-24 15:12:50 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Gleb Natapov
f0a41c102a test: move schema_change_test.cc to new schema announcement api 2022-01-13 23:10:18 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Gleb Natapov
38e1f85959 migration_manager: drop view_ptr array from announce_column_family_update()
No users pass it any longer.
2021-12-11 12:31:07 +02:00
Botond Dénes
3f4f408bcf schema: add get_reversed()
A variant of make_reversed() which goes through the schema registry,
teaching the schema to the registry if necessary. This effectively
caches the result of the reversing and as an added bonus double
reversing yields the very same schema C++ object that was the starting
point.

Closes #9365
2021-09-22 18:55:25 +03:00
Tomasz Grabiec
83113d8661 Merge "raft: new schema for storing raft snapshots" from Pavel Solodovnikov
Previously, the layout for storing raft snapshot
descriptors contained a `config` field, which had `blob`
data type.

That means `raft::configuration` for the snapshot was serialized
as a whole in binary form. It's convenient to implement and
is the most compact form of representing the data, but:

1. Hard to debug due to the need to de-serialize the data.
2. Plants a time bomb wrt. changing data layout and also the
   documentation in the future.

Remove the `config` field from `system.raft_snapshots` and
extract it to a separate `system.raft_config` table to store
the data in exploded form.

Also, modify the schema of `system.raft_snapshots` table in
the following way: add a `server_id` field as a part of
composite partition key ((group_id, server_id)) to
be able to start multiple raft servers belonging to one raft
group on the same scylla node.

Rename `id` field in `raft_snapshots` to `snapshot_id` so
it's self-documenting.

Rename `snapshot_id` from clustering key since a given server
can have only one snapshot installed at a time.

Note that the `raft::server_address` stucture contains an opaque
`info` member, which is `bytes`, but in the `raft_config` table
we use `ip_addr inet` field, instead. We always know that the
corresponding member field is going to contain an IP address (either v4
or v6) of a given raft server.

So, now the snapshots schema looks like this:

    CREATE TABLE raft_snapshots (
        group_id timeuuid,
        server_id uuid,
        snapshot_id uuid,
        idx int,
        term int,
        -- no `config` field here, moved to `raft_config` table
        PRIMARY KEY ((group_id, server_id))
    )

    CREATE TABLE raft_config (
         group_id timeuuid,
         my_server_id uuid,
         server_id uuid,
         disposition text, -- can be either 'CURRENT` or `PREVIOUS'
         can_vote bool,
         ip_addr inet,
         PRIMARY KEY ((group_id, my_server_id), server_id, disposition)
    );

This way it's much easier to extend the schema with new fields,
very easy to debug and inspect via CQL, and it's much more descriptive
in terms of self-documentation.

Tests: unit(dev)

* manmanson/raft_snapshots_new_schema_v2:
  test: adjust `schema_change_test` to include new `system.raft_config` table
  raft: new schema for storing raft snapshots
  raft: pass server id to `raft_sys_table_storage` instance
2021-09-10 20:41:59 +02:00
Botond Dénes
f200c8104a schema: introduce make_reversed()
`make_revered()` creates a schema identical to the schema instance it is
called on, with clustering order reversed. To distinguish the reverse
schema from the original one, the node-id part of its version UUID is
bit-flipped. This ensures that reversing a schema twice will result in
the identical schema to the original one (although a different C++
object).

This reversed schema will be used in reversed reads, so intermediate
layers can be ignorant of the fact that the read happens in reverse.
2021-09-09 11:49:05 +03:00
Pavel Solodovnikov
b00443ab87 test: adjust schema_change_test to include new system.raft_config table
Check that the new table uses null sharder.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-08-27 09:30:17 +03:00
Pavel Solodovnikov
c0854a0f62 raft: create system tables only when raft experimental feature is set
Also introduce a tiny function to return raft-enabled db config
for cql testing.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>
2021-08-26 12:21:12 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Piotr Sarna
7e6beabf27 migration_manager: allow table updates with timestamp
In order to avoid needless schema disagreements, a way of announcing
a schema change with fixed timestamp is added.
That way, when nodes update schemas of their internal tables (e.g.
during updates), it's possible for all nodes to use an identical
timestamp for this operation, which in turn makes their digests
identical.
2021-05-10 10:10:38 +02:00
Pavel Emelyanov
37c91c4c5c tests: Use migration_manager from cql_test_env
All the tests that need migration manager are run inside
cql_test_env context and can use the migration manager
from the env. For now this is still the global one, but
next patch will change this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Avi Kivity
daeddda7cc treewide: remove inclusions of storage_proxy.hh from headers
storage_proxy.hh is huge and includes many headers itself, so
remove its inclusions from headers and re-add smaller headers
where needed (and storage_proxy.hh itself in source files that
need it).

Ref #1.
2021-04-20 21:23:00 +03:00
Pavel Solodovnikov
9d17a654a6 raft: use null_sharder for raft tables
Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210201105300.110210-1-pa.solodovnikov@scylladb.com>
2021-02-01 18:52:04 +02:00
Kamil Braun
bf115e7d69 schema_tables: put schema tables on shard 0
We use a custom sharder for all schema tables: every table under
the `system_schema` keyspace, plus `system.scylla_table_schema_history`.
This sharder puts all data on shard 0.

To achieve this, we hardcode the sharder in initial schema object
definitions. Furthermore - since the sharder is not stored inside schema
mutations yet - whenever we deserialize schema objects from mutations,
we modify the sharder based on the schema's keyspace and table names.

A regression test is added to ensure no one forgets to set the special
sharder for newly added schema tables. This test assumes that all newly
added schema tables will end up in the `system_schema` keyspace (other
tables may go unnoticed, unfortunately).

Closes #7947
2021-01-28 13:28:22 +02:00
Piotr Sarna
e26aa836a9 schema_change_test: skip distributed system tables in digest
With previous design of the schema change test, a regeneration
was necessary each time a new distributed system table was added.
It was not the original purpose of the test to keep track of new
distributed tables which simply propagate on their own,
so the test case is now modified: internal distributed tables
are not part of the schema digest anymore, which means that
changes inside them will not cause mismatches.

This change involves a one-shot regeneration of all digests,
which due to historical reasons included internal distributed
tables in the digest, but no further regenerations should ever
be necessary when a new internal distributed table is added.
2021-01-04 10:24:40 +01:00
Gleb Natapov
d3aa17591c migration_manager: drop announce_locally flag
It looks like the history of the flag begins in Cassandra's
https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is
introduced to speedup tests by not needing to start the gossiper.
The thing is we always start gossiper in our cql tests, so the flag only
introduce noise. And, of course, since we want to move schema to use raft
it goes against the nature of the raft to be able to apply modification only
locally, so we better get rid of the capability ASAP.

Tests: units(dev, debug)
Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>
2021-01-03 13:58:09 +02:00
Pavel Emelyanov
89fd524c5a schema-tables: Add database argument to make_update_table_mutations
There are 3 callers of this helper (cdc, migration manager and tests)
and all of them already have the database object at hands.

The argument will be used by next patch to remove call for global
storage proxy instance from make_update_indices_mutations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 21:21:22 +03:00
Piotr Jastrzebski
e9072542c1 Mark CDC as GA
Enable CDC by default.
Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc
flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:13 +01:00
Kamil Braun
ff78a3c332 cdc: rename CDC description tables... again
Commit a6ad70d3da changed the format of
stream IDs: the lower 8 bytes were previously generated randomly, now
some of them have semantics. In particular, the least significant byte
contains a version (stream IDs might evolve with further releases).

This is a backward-incompatible change: the code won't properly handle
stream IDs with all lower 8 bytes generated randomly. To protect us from
subtle bugs, the code has an assertion that checks the stream ID's
version.

This means that if an experimental user used CDC before the change and
then upgraded, they might hit the assertion when a node attempts to
retrieve a CDC generation with old stream IDs from the CDC description
tables and then decode it.
In effect, the user won't even be able to start a node.

Similarly as with the case described in
d89b7a0548, the simplest fix is to rename
the tables. This fix must get merged in before CDC goes out of
experimental.

Now, if the user upgrades their cluster from a pre-rename version, the
node will simply complain that it can't obtain the CDC generation
instead of preventing the cluster from working. The user will be able to
use CDC after running checkAndRepairCDCStreams.

Since a new table is added to the system_distributed keyspace, the
cluster's schema has changed, so sstables and digests need to be
regenerated for schema_digest_test.
2020-08-31 11:33:14 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Pavel Emelyanov
8618a02815 migration_manager: Remove db/schema_tables.hh inclustion into header
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:43 +03:00
Kamil Braun
1b7f1806ac test: improve comments on test_schema_digest_does_not_change
This test tends to cause a lot of discussion resulting from
not understanding what is actually being tested.

Closes https://github.com/scylladb/scylla/issues/6582.
2020-06-05 14:30:02 +02:00
Kamil Braun
d89b7a0548 cdc: rename CDC description tables
Commit 968177da04 has changed the schema
of cdc_topology_description and cdc_description tables in the
system_distributed keyspace.

Unfortunately this was a backwards-incompatible change: these tables
would always be created, irrespective of whether or not "experimental"
was enabled. They just wouldn't be populated with experimental=off.

If the user now tries to upgrade Scylla from a version before this change
to a version after this change, it will work as long as CDC is protected
b the experimental flag and the flag is off.

However, if we drop the flag, or if the user turns experimental on,
weird things will happen, such as nodes refusing to start because they
try to populate cdc_topology_description while assuming a different schema
for this table.

The simplest fix for this problem is to rename the tables. This fix must
get merged in before CDC goes out of experimental.
If the user upgrades his cluster from a pre-rename version, he will simply
have two garbage tables that he is free to delete after upgrading.

sstables and digests need to be regenerated for schema_digest_test since
this commit effectively adds new tables to the system_distributed keyspace.
This doesn't result in schema disagreement because the table is
announced to all nodes through the migration manager.
2020-06-05 09:59:16 +02:00
Piotr Dulikowski
38b7f1ad45 unit tests: register cdc extension before tests
In the following commits, using cdc in tests will require registering
cdc extension explicitly in db config.
2020-03-05 16:11:20 +01:00
Konstantin Osipov
ff3f9cb7cf test: stop using BOOST_TEST_MESSAGE() for logging
We use boost test logging primarily to generate nice XML xunit
files used in Jenkins. These XML files can be bloated
with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes
of build archives, on every build.

Let's use seastar logger for test logging instead, reserving
the use of boost log facilities for boost test markup information.
2020-03-05 11:38:11 +03:00
Piotr Jastrzebski
04fe18de0f system_distributed_keyspace: add cdc-related tables
The cdc_topology_description table will be used internally
by nodes to send new CDC stream generations to other nodes.

The cdc_description table is a user-facing table,
used to inform users about new sets of CDC streams.

Regenerate sstables and digests for schema_change_test.
We don't need to protect this change by a schema feature:
when a node creates these tables, it announces them
to all other nodes. If schema agreement happens before
this migration, all nodes will use a digest calculated
without these tables. If it happens after, then all nodes
will eventually know about these tables and use a digest
calculated with these tables.
2020-01-30 11:10:08 +01:00
Pavel Emelyanov
9e4b41c32a tests: Switch on migration notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Piotr Jastrzebski
396e35bf20 cdc: add schema_change test for cdc_options
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Piotr Jastrzebski
caa0a4e154 tests: disable CDC in schema_change_tests
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-01-05 14:39:23 +02:00
Avi Kivity
c150f2e5d7 schema_tables, cdc: don't store empty cdc columns in scylla_tables
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.

Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.

This change also undoes the changes in 386221da84 ("schema_tables:
 handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
2020-01-05 14:36:18 +02:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00