Commit Graph

30167 Commits

Author SHA1 Message Date
Botond Dénes
dec4e5659b test/boost/mutation_test: simplify test_compaction_data_stream_split test
This test has very elaborate infrastructure essentially duplicating
mutation, mutation::apply() and mutation::operator==. Drop all this
extra code and use mutations directly instead. This makes migrating the
test to v2 easier.
2022-02-21 12:29:24 +02:00
Botond Dénes
2941803da0 mutation_partition: do_compact(): do drop row tombstones covered by higher order tombstones
The comment on the public methods calling said method promises to do so
but doesn't actually follows through. This patch fixes this for row
tombstones, to mirror the behaviour of the mutation compactor. This is
especially important for tests that compare mutations compacted with
different methods.
2022-02-21 12:29:24 +02:00
Botond Dénes
f2e2b84038 multishard_mutation_query: migrate to v2
Mostly mechanical transformation. The main difference is in the detached
compaction state, from which we now get the range tombstone change,
instead of the range tombstone list. The code around this is a bit
awkward, will become simpler when compactor drops v1 support.
2022-02-21 12:29:24 +02:00
Botond Dénes
b330cba792 mutation_fragment_v2: range_tombstone_change: add memory_usage() 2022-02-21 12:29:24 +02:00
Botond Dénes
9e48237b86 evictable_reader_v2: terminate active range tombstones on reader recreation
Reader recreation messes with the continuity of the mutation fragment
stream because it breaks snapshot isolation. We cannot guarantee that a
range tombstone or even the partition started before will continue after
too. So we have to make sure to wrap up all loose threads when
recreating the reader. We already close uncontinued partitions. This
commit also takes care of closing any range tombstone started by
unconditionally emitting a null range tombstone. This is legal to do,
even if no range tombstone was in effect.
2022-02-21 12:29:24 +02:00
Botond Dénes
6db08ddeb2 evictable_reader_v2: restore handling of non-monotonically increasing positions
We thought that unlike v1, v2 will not need this. But it does.
Handled similarly to how v1 did it: we ensure each buffer represents
forward progress, when the last fragment in the buffer is a range
tombstone change:
* Ensure the content of the buffer represents progress w.r.t.
  _next_position_in_partition, thus ensuring the next time we recreate
  the reader it will continue from a later position.
* Continue reading until the next (peeked) fragment has a strictly
  larger position.

The code is just much nicer because it uses coroutines.
2022-02-21 12:29:24 +02:00
Botond Dénes
498d03836b evictable_reader_v2: simplify handling of reader recreation
The evictable reader has a handful of flags dictating what to do after
the reader is recreated: what to validate, what to drop, etc. We
actually need a single flag telling us if the reader was recreated or
not, all other things can be derived from existing fields.
This patch does exactly that. Furthermore it folds do_fill_buffer() into
fill_buffer() and replaces the awkward to use `should_drop_fragment()`
with `examine_first_fragments()`, which does a much better job of
encapsulating all validation and fragment dropping logic.
This code reorganization also fixes two bugs introduced by the v2
conversion:
* The loop in `do_fill_buffer()` could become infinite in certain
  circumstances due to a difference between the v1 and v2 versions of
  `is_end_of_stream()`.
* The position of the first non-dropped fragment is was not validated
  (this was integrated into the range tombstone trimming which was
  thrown out by the conversion).
2022-02-21 12:29:24 +02:00
Botond Dénes
d4ac473f7d mutation: counter_write_query: use v2 reader 2022-02-21 12:27:55 +02:00
Botond Dénes
fcda35d08e mutation: migrate consume() to v2
The underlying mutation format is still v1, so consume() ends up doing
an online conversion. This allows converting all downstream code to v2,
leaving the conversion close to the code that is yet to be migrated to
v2 native: the mutation itself.
2022-02-21 12:27:55 +02:00
Botond Dénes
1fa6537a2f mutation_fragment_v2,flat_mutation_reader_v2: mirror v1 concept organization
Currently all concepts are in mutation_fragment_v2.hh and
flat_mutation_reader_v2.hh. Organize concepts similar to how the v1 ones
are: move high-level consume concepts into
mutation_consumer_concepts.hh.
2022-02-21 12:27:55 +02:00
Botond Dénes
fb0e0ec7c1 mutation_reader: compacting_reader: require a v2 input reader
Before we add a v2 output option to the compactor, we want to get rid of
all the v1 inputs to make it simpler. This means that for a while the
compacting reader will be in a strange place of having a v2 input and a
v1 output. Hopefully, not for long.
2022-02-21 12:27:55 +02:00
Botond Dénes
45b36d91c6 db/view/view_builder: use v2 reader 2022-02-21 12:27:55 +02:00
Botond Dénes
bba20f5cce test/lib/flat_mutation_reader_assertions: adjust has_monotonic_positions() to v2 spec
The v2 spec allows for non-strictly monotonically increasing positions,
but has_monotonic_positions() tried to enforce it. Relax the check so it
conforms to the spec.
2022-02-21 12:27:55 +02:00
Nadav Har'El
f292d3d679 alternator: make schema modifications in CreateTable atomic
The Alternator CreateTable operation currently performs several schema-
changing operations separately - one by one: It creates a keyspace,
a table in that keyspace and possibly also multiple views, and it sets
tags on the table. A consequence of this is that concurrent CreateTable
and DeleteTable operations (for example) can result in unexpected errors
or inconsistent states - for example CreateTable wants to create the
table in the keyspace it just created, but a concurrent DeleteTable
deleted it. We have two issues about this problem (#6391 and #9868)
and three tests (test_table.py::test_concurrent_create_and_delete_table)
reproducing it.

In this patch we fix these problems by switching to the modern Scylla
schema-changing API: Instead of doing several schema-changing
operations one by one, we create a vector of schema mutation performing
all these operations - and then perform all these mutations together.

When the experimental Raft-based schema modifications is enabled, this
completely solves the races, and the tests begin to pass. However, if
the experimental Raft mode is not enabled, these tests continue to fail
because there is still no locking while applying the different schema
mutations (not even on a single node). So I put a special fixture
"fails_without_raft" on these tests - which means that the tests
xfail if run without raft, and expected to pass when run on Raft.

Indeed, after this patch
test/alternator/run --raft test_table.py::test_concurrent_create_and_delete_table

shows three passing tests (they also pass if we drastically improve the
number of iterations), while
test/alternator/run test_table.py::test_concurrent_create_and_delete_table

shows three xfailing tests.

All other Alternator tests pass as before with this patch, verifying
that the handling of new tables, new views, tags, and CDC log tables,
all happen correctly even after this patch.

A note about the implementation: Before this patch, the CreateTable code
used high-level functions like prepare_new_column_family_announcement().
These high-level functions become unusable if we write multiple schema
operations to one list of mutations, because for example this function
validates that the keyspace had already been created - when it hasn't
and that's the whole point. So instead we had to use lower-level
function like add_table_or_view_to_schema_mutation() and
before_create_column_family(). However, despite being lower level,
these functions were public so I think it's reasonable to use them,
and we probably have no other alternative.

Fixes #6391
Fixes #9868

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-18 09:03:52 +02:00
Nadav Har'El
46120ca4f4 Merge 'tools/scylla-sstable: change output of dump commands to JSON' from Botond Dénes
Replacing the previous text output with the exception of the dump-data
command. The text output was supposed to be human-friendly but it is not
really human friendlier than a well formatted JSON, the latter having
the additional advantage of being machine friendly too. Although the
text output already exists, having just one output format makes the code
much simpler and easier to maintain so we chose not to pay the higher
maintenance price for a format that is not expected to see much (if any)
use.
Although the JSON written by the tool is not formatted, it can easily be
formatted by e.g. piping it through `jq`. The latter also allows lookup
of specific field(s).
The JSON schema of each command is documented in the --help output of
the respective command (e.g. scylla sstable data-dump --help) .

We keep the text output of the dump-data command as this is using
scylla's built-in printer that we also use in logging and tests. Some
people might be used to this format, so leave it in: the code already
exists for it and lives in scylla core, so we don't need to maintain it
separately. The default output-format of dump-data is now JSON.

A smoke test suite is added for the dump commands too. The tests only
check that some output is present and that it is valid JSON.

Refs: #9882

Tests: unit(dev)

Also on: https://github.com/denesb/scylla.git scylla-sstable-json/v2

Changelog

v3:
* Rebase on recent master (which has the required seastar fixes for
  debug tests)

v2:
* Document the JSON schema of each command.
* Use the SAX-style API of rapidjson to generate streaming JSON, instead
  of hand-generating it.

Closes #10074

* github.com:scylladb/scylla:
  test/cql-pytest: add tests for scylla-sstable's dump commands
  test/cql-pytest: prepare for tool tests
  tools/schema_loader: auto-create the keyspace for all statements
  tools/scylla-sstable: change output of dump-scylla-metadata to json
  tools/scylla-sstable: change output of dump-statistics to json
  tools/scylla-sstable: change output of dump-summary to json
  tools/scylla-sstable: change output of dump-compression-info to json
  tools/scylla-sstable: change output of dump-index to json
  tools/scylla-sstable: add json support in --dump-data
  tools/scylla-sstable: add json_writer
  tools/scylla-sstable: use fmt::print in --dump-data
  tools/scylla-sstable: prepare --dump-data for multiple output formats
2022-02-18 07:52:19 +02:00
Botond Dénes
1e038b40cf test/cql-pytest: add tests for scylla-sstable's dump commands
The tests are smoke-tests: they mostly check that scylla doesn't crash
while dumping and it produces *some* output. When dumping json, the test
checks that it is valid json.
2022-02-17 15:24:24 +02:00
Botond Dénes
afab1a97c6 test/cql-pytest: prepare for tool tests
We want to add tool tests. These tests will have to invoke scylla
executable (as tools are hosted by the latter) and they want access to
the scylla data directories. Propagate the scylla path and data
directory used from `run` into the test suite via pytest request
parameters.
2022-02-17 15:24:24 +02:00
Botond Dénes
96082631c8 tools/schema_loader: auto-create the keyspace for all statements
Currently the keyspace is only auto-created for create type statements.
However the keyspace is needed even without UDTs being involved: for
example if the table contains a collection type. So auto-create the
keyspace unconditionally before preparing the first statement.

Also add a test-case with a create table statement which requires the
keyspace to be present at prepare time.
2022-02-17 15:24:24 +02:00
Botond Dénes
59ce247164 tools/scylla-sstable: change output of dump-scylla-metadata to json 2022-02-17 15:24:24 +02:00
Botond Dénes
2a7ed8212f tools/scylla-sstable: change output of dump-statistics to json 2022-02-17 15:24:24 +02:00
Botond Dénes
a617e66878 tools/scylla-sstable: change output of dump-summary to json 2022-02-17 14:17:11 +02:00
Botond Dénes
fb6b7c8036 tools/scylla-sstable: change output of dump-compression-info to json 2022-02-17 14:17:11 +02:00
Botond Dénes
f5c6d7e12e tools/scylla-sstable: change output of dump-index to json 2022-02-17 14:17:11 +02:00
Botond Dénes
bdbbda29c1 tools/scylla-sstable: add json support in --dump-data
But keep the old text output-format too. One can switch between the two
with the --output-format flag, which defaults to "json".
2022-02-17 14:17:11 +02:00
Botond Dénes
03bbf1b362 tools/scylla-sstable: add json_writer
Wrapping a rapidjson::Writer<> and mirrors the latter's API, providing
more convenient overloads for the Key() and String() methods, as well as
providing some extra, scylla-sstable specific methods too.
2022-02-17 14:17:11 +02:00
Botond Dénes
72f27c8782 tools/scylla-sstable: use fmt::print in --dump-data
The rest of the code is standardizing on fmt::print(), bring the code
for --dump-data in line.
2022-02-17 14:17:11 +02:00
Botond Dénes
ba2a61b2bc tools/scylla-sstable: prepare --dump-data for multiple output formats
Extract the actual dumping code into a separate class, which also
implements sstable_consumer interface. The dumping consumer now just
forwards calls to actual dumper through the abstract consumer interface,
allowing different concrete dumpers to be instantiated.
2022-02-17 14:17:11 +02:00
Piotr Dulikowski
adfd9d2f7a abstract_read_resolver::fail_request: make non-virtual
This method is not overrided by any of the derived classes, so it does
not need to be virtual.

(cherry picked from commit b7fb93dc46531bca8db535301a069df52991f9d9)
2022-02-17 12:34:37 +02:00
Michael Livshin
f8d4bafa5a to_string.hh: include <map>
The code uses `std::map`, so it should include the definition
explicitly.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-17 08:53:48 +02:00
Michael Livshin
a657dc9787 scylla-gdb.py: set source language to c++
When you interrupt a process in gdb using Ctrl-C or attach gdb to a
running process, usually gdb will show the current frame as
`syscall()` (no source information).  But in some less usual setups
gdb may happen to know that `syscall()` is implemented in assembly,
and even knows which line is current in which assembly file.

An unfortunate effect of gdb knowing that the current frame's source
language is assembly is that since assembly is not C++, gdb's
expression parser switches to "auto" while in the `syscall()` stack
frame.  And in the "auto" language explicit C++ global namespace
references like "::debug::the_database" are not syntactically valid,
which renders much of scylla-gdb.py unusable unless you remember to go
up the call stack before doing anything.

But since scylla-gdb.py is there to help debug Scylla, and Scylla is
written in C++, we can just set gdb source language to "c++" and avoid
the problem.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20220216235301.1206341-1-michael.livshin@scylladb.com>
2022-02-17 08:43:59 +02:00
Botond Dénes
948bc359c2 Merge "ME sstable format support" from Michael Livshin
"
This series implements support for the ME sstable format (introduced
in C* 3.11.11).

Tests: unit(dev)
"

* tag 'me-sstable-format-v5' of https://github.com/cmm/scylla:
  sstables: validate originating host id
  sstable: add is_uploaded() predicate
  config: make the ME sstable format default
  scylla-gdb.py: recognize ME sstables
  sstables: store originating host id in stats metadata
  system_keyspace: cache local host id before flushing
  database_test: ensure host id continuity
  sstables_manager: add get_local_host_id() method and support
  sstables_manager: formalize inheritability
  system_keyspace, main: load (or create) local host id earlier
  sstable_3_x_test: test ME sstable format too
  add "ME_SSTABLE" cluster feature
  add "sstable_format" config
  add support for the ME sstable format
  scylla-sstable: add ability to dump optionals and utils::UUID
  sstables: add ability to write and parse optionals
  globalize sstables::write(..., utils::UUID)
2022-02-16 18:28:16 +02:00
Michael Livshin
79bf79ebd3 sstables: validate originating host id
Add an additional sstable validation step to check that originating
host id matches the local host id.

This is only done for ME-and-up sstables, which do not come from
upload/, and when the local host id is known.

When local host id is unknown, check that the sstable belongs to a
system keyspace, i.e. whether it is plausible that Scylla is still
booting up and hasn't loaded/generated the local host id yet.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
3511d7cd21 sstable: add is_uploaded() predicate
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
3bf1e137fc config: make the ME sstable format default
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0ca58096cf scylla-gdb.py: recognize ME sstables
Also use the opportunity to unify two closely-related lists into a
dictionary.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
dd4e330cc5 sstables: store originating host id in stats metadata
With this change, ME sstables start carrying their originating host
id, which makes ME format feature-complete so it can be made default.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0ccd56e036 system_keyspace: cache local host id before flushing
Later in this series the ME sstable format is made default, which
means that `system.local` will likely be written as ME.

Since, in ME, originating host id is a part of sstable stats metadata,
the local host id needs to either already be cached by the time
`system.local` is flushed, or to somehow be special-case-ignored when
flushing `system.local`.

The former (done here) is optimistic (cache before flush), but the
alternative would be an abstraction violation and would also cost a
little time upon each sstable write.

(Cache-before-flush could be undone by catching any exceptions during
flush and un-caching, but inability to `co_await` in catch clauses
makes the code look rather awkward.  And there is no need to bother
because bootstrap failures should be fatal anyway)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
d8cc535297 database_test: ensure host id continuity
The "populate_from_quarantine_works" test case creates sstables with
one db config, then reads them with another.  Ensure that both configs
have the same host id so the sstables pass validation.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
3fef604075 sstables_manager: add get_local_host_id() method and support
Since ME sstable format includes originating host id in stats
metadata, local host id needs to be made available for writing and
validation.

Both Scylla server (where local host id comes from the `system.local`
table) and unit tests (where it is fabricated) must be accomodated.
Regardless of how the host id is obtained, it is stored in the db
config instance and accessed through `sstables_manager`.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0895188851 sstables_manager: formalize inheritability
The class is already inherited from in tests (along with overriding a
non-virtual method), so this seems to be called for.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
7d2af177eb system_keyspace, main: load (or create) local host id earlier
We want it to be cached before any sstable is written, so do it right
after system_keyspace::minimal_setup().

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
387c882dc7 sstable_3_x_test: test ME sstable format too
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
d370558279 add "ME_SSTABLE" cluster feature
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0b1447c702 add "sstable_format" config
Initialize it to "md" until ME format support is
complete (i.e. storing originating host id in sstable stats metadata
is implemented), so at present there is no observable change by
default.

Also declare "enable_sstables_md_format" unused -- the idea, going
forward, being that only "sstable_format" controls the written sstable
file format and that no more per-format enablement config options
shall be added.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
c96708d262 add support for the ME sstable format
The ME format has been introduced in Cassandra 3.11.11:

11952fae77/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java (L123)
d84c6e9810

It adds originating host id to sstable metadata in support of fixing
loss of commit log data when moving sstables between nodes:

https://issues.apache.org/jira/browse/CASSANDRA-16619

In Scylla:

* The supported way to ingest sstables is via upload/, where stored
  commit log replay position should be disregarded (but see
  https://github.com/scylladb/scylla/issues/10080).

* A later commit in this series implements originating host id
  validation for native ME sstables.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
3712a82ca7 scylla-sstable: add ability to dump optionals and utils::UUID
Needed for the ME sstable format.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
26bae0cd39 sstables: add ability to write and parse optionals
(that is, instances of `std::optional`).

The ME sstable format includes optional originating host id in stats
metadata.  We know how to write and parse uuids, but not how to write
and parse optionals.

The format is (used by C* in this case, and also happens to be
consistent with how booleans are serialized): first a boolean
indicating whether the contents are present (0 or 1, as a byte), then
the contents (if any).

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:23 +02:00
Michael Livshin
c00d272b16 globalize sstables::write(..., utils::UUID)
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:23 +02:00
Benny Halevy
b7b0c19fdc test: uuid: cement the assumption that default and null uuid are equal
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220216081623.830627-2-bhalevy@scylladb.com>
2022-02-16 10:19:47 +02:00
Benny Halevy
489e50ef3a utils: uuid: make operator bool explicit
Following up on 69fcc053bb

To prevent unintentional implicit conversions
e.g. to a number.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220216081623.830627-1-bhalevy@scylladb.com>
2022-02-16 10:19:47 +02:00