Commit Graph

30983 Commits

Author SHA1 Message Date
Botond Dénes
f527956cdb readers: remove v1 empty_reader
The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
2022-04-28 14:12:24 +03:00
Botond Dénes
ea37e9c04e readers: remove v1 delegating_reader
The only user is a test, which is hereby converted to use the v2
delegating reader.
2022-04-28 14:12:24 +03:00
Botond Dénes
70d019116f sstables/kl: make reader impl v2 native
The conversion is shallow: the meat of the logic remains v1, fragments
are converted to v2 right before being pushed into the buffer. This
approach is simple, surgical and is still better then a full
upgrade_to_v2().
2022-04-28 14:12:24 +03:00
Botond Dénes
a22b02c801 sstables/kl: return v2 reader from factory methods
This just moves the upgrade_to_v2() calls to the other side of said
factory methods, preparing the ground for converting the kl reader impl
to a native v2 one.
2022-04-28 14:12:24 +03:00
Botond Dénes
4b222e7f37 sstables: move mp_row_consumer_reader_k_l to kl/reader.cc
Its only user is in said file, so that is a better place for it.
2022-04-28 14:12:24 +03:00
Botond Dénes
4f77e74bd4 partition_snapshot_reader: convert implementation to native v2
The underlying mutation representation is still v1, so the
implementation still has to do conversion. This happens right above the
lsa reader component.
2022-04-28 14:12:12 +03:00
Botond Dénes
9c7455825b mutation_fragment_v2: range_tombstone_change: add minimal_memory_usage() 2022-04-28 14:11:51 +03:00
Avi Kivity
de0ee13f45 schema_tables: forward-declare user_function and user_aggerates
These bring in wasm.hh (though they really shouldn't) and make
everyone suffer. Forward declare instead and add missing includes
where needed.

Closes #10444
2022-04-28 07:22:02 +03:00
Botond Dénes
2c08468fcb Merge 'Make headers self-contained' from Avi Kivity
Minor fixlets to make `ninja dev-headers` pass.

Closes #10445

* github.com:scylladb/scylla:
  readers/from_mutations_v2.hh: make self-contained
  data_dictionary/storage_options.hh: make self-contained
2022-04-28 07:20:10 +03:00
Avi Kivity
a9812166cd replica, partition_snapshot_reader, keys: replace boost::any with std::any
Reduce #include load by standardizing on std::any.

In keys.cc, we just drop the unneeded include.

One instance of boost::any remains in config_file, due to a tie-in with
other boost components.

Closes #10441
2022-04-28 07:18:53 +03:00
Avi Kivity
3a81cb7cc3 readers/from_mutations_v2.hh: make self-contained
Due to an inline function, we need the definition of
flat_mutation_reader_v2.hh, so include it.
2022-04-27 15:55:16 +03:00
Avi Kivity
28406c2c56 data_dictionary/storage_options.hh: make self-contained
Add "seastarx.hh" so sstring works (rather than seastar::sstring).
2022-04-27 15:54:32 +03:00
Avi Kivity
333fdcb3f5 Update tools/java submodule (fix NodeProbe: Malformed IPv6 address at index)
* tools/java 9bc83b7a32...a4573759a2 (1):
  > CASSANDRA-17581 fix NodeProbe: Malformed IPv6 address at index

Fixes #10442.
2022-04-27 14:51:47 +03:00
Benny Halevy
e88871f4ec replica: database: move shard_of implementation to mutation layer
We don't need the database to determine the shard of the mutation,
only its schema. So move the implementation to the respecive
definitions of mutation and frozen_mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10430
2022-04-27 14:40:24 +03:00
Nadav Har'El
f6ce7891a5 test/alternator: add test for key length limits
DynamoDB limits partition-key length to 2048 bytes and sort-key length
to 1024 bytes. Alternator currently has no such limits officially, but
if a user tries a key length of over 64 KB, the result will be an
"internal server error" as Alternator runs into Scylla's low-level key
length limit of 64 KB.

In this patch we add (mostly xfailing) tests confirming all the above
observations. The tests include extensive comments on what they are
testing and why. Some of these tests (specifically, the ones checking
what happens above 64 KB) should pass once Alternator is fixed. Other
tests - requiring that the limits be exactly what they are in DynamoDB -
may either not pass or change in the future, depending on what we decide
the limits should be in Alternator.

Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10438
2022-04-26 18:09:19 +02:00
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Takuya ASADA
48b6aec16a scripts: use "out()" function for all capture_output subprocesses
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.

related #10390

Closes #10414
2022-04-26 13:56:52 +03:00
Benny Halevy
01f41630a5 compaction: time_window_compaction_strategy: reset estimated_remaining_tasks when running out of candidates
_estimated_remaining_tasks gets updated via get_next_non_expired_sstables ->
get_compaction_candidates, but otherwise if we return earlier from
get_sstables_for_compaction, it does not get updated and may go out of sync.

Refs #10418
(to be closed when the fix reaches branch-4.6)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10419
2022-04-26 11:26:48 +03:00
Benny Halevy
055141fc2e multishard_mutation_query: do_query: stop ctx if lookup_readers fails
lookup_readers might fail after populating some readers
and those better be closed before returning the exception.

Fixes #10351

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10425
2022-04-26 11:11:52 +03:00
Botond Dénes
bf1b6ced3c Merge "Make storage_service::bootstrap less if-y" from Pavel Emelyanov
"
The method in question performs node bootstrap in several different
modes
(regular, replacing, rnbo) and several subsequent if-else branches just
duplicate each-other. This set merges them making the code easier to
read.
"

* 'br-less-branchy-bootstrap' of https://github.com/xemul/scylla:
  storage_service: Remove pointless check in replace-bootstrap
  storage_service: Generalize wait for range setup
  storage_service: Merge common if-else branches in bootstrap
  storage_service: Move tables bootstrap-ON upwards
2022-04-26 10:58:30 +03:00
Raphael S. Carvalho
d79fb9a12f docs: Update compaction controller doc
The doc is being updated to reflect the changes in the commit
d8833de3bb ("Redefine Compaction Backlog to tame
compaction aggressiveness").

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-04-26 10:50:45 +03:00
Benny Halevy
db676e9e4a replica: database: apply: make sure the schema is synced or throw internal error
Currently an exception is thrown in the apply stage
when the schema is not synced, but it is too late
since returning an error doesn't pinpoint which code
path was using an unsync'ed schema so move the check
earlier, before _apply_stage is called.

We need to make sure the schema is synced earlier
when the mutation is applied so call on_internal_error
to generate a backtrace in testing and still throw
an error in production.

Typically storage_proxy::mutate_locally implicitly
ensures the schema is synced by making a global_schema_ptr
for it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220424110057.3957597-1-bhalevy@scylladb.com>
2022-04-25 12:18:47 +02:00
Benny Halevy
bcd35af7cf replica: table: generate_and_propagate_view_updates: pass mutation to make_flat_mutation_reader_from_mutations_v2
With f5ef687acd
we can consume the single mutation directly,
so there's n need to pass it as a vector of size 1.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220424103826.3930895-1-bhalevy@scylladb.com>
2022-04-24 22:19:19 +03:00
Avi Kivity
728479a6ea Merge 'Fix map subscript crashes when map or subscript is null' from Nadav Har'El
In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues #10361, #10399, #10401. The existing test `test_null.py::test_map_subscript_null` reproduced all these bugs sporadically.

In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then **define** both `m[NULL]` and `NULL[2]` to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., `NULL < 2` is also defined as resulting in NULL.

However, this decision differs from Cassandra, where `m[NULL]` is considered an error but `NULL[2]` is allowed. We believe that making `m[NULL]` be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like `m[a]`, where the column `a` can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan.

Fixes #10361
Fixes #10399
Fixes #10401

Closes #10420

* github.com:scylladb/scylla:
  expressions: don't dereference invalid map subscript in filter
  expressions: fix invalid dereference in map subscript evaluation
  test/cql-pytest: improve tests for map subscripts and nulls
2022-04-24 21:16:10 +03:00
Avi Kivity
a4be927e23 Revert "memtable_list: futurize clear_and_add"
This reverts commit 2325c566d9. It
causes a use-after-free of a memtable.

Fixes #10421.
2022-04-24 21:09:48 +03:00
Asias He
953af38281 streaming: Allow drop table during streaming
Currently, if a table is dropped during streaming, the streaming would
fail with no_such_column_family error.

Since the table is dropped anyway, it makes more sense to ignore the
streaming result of the dropped table, whether it is successful or
failed.

This allows users to drop tables during node operations, e.g., bootstrap
or decommission a node.

This is especially useful for the cloud users where it is hard to
coordinate between a node operation by admin and user cql change.

This patch also fixes a possible user after free issue by not passing
the table reference object around.

Fixes #10395

Closes #10396
2022-04-24 17:43:20 +03:00
Tzach Livyatan
607ccf0393 Update doc project name to scylla dev
Closes #10342
2022-04-24 17:40:54 +03:00
Nadav Har'El
fbb2a41246 expressions: don't dereference invalid map subscript in filter
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).

Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:

 * For NULL, we consider m[NULL] to result in NULL - instead of an error.
   This behavior is more consistent with other expressions that contain
   null - for example NULL[2] and NULL<2 both result in NULL as well.
   Moreover, if in the future we allow more complex expressions, such
   as m[a] (where a is a column), we can find the subscript to be null
   for some rows and non-null for other rows - and throwing an "invalid
   query" in the middle of the filtering doesn't make sense.

 * For UNSET_VALUE, we do consider this an error like Cassandra, and use
   the same error message as Cassandra. However, the current implementation
   checks for this error only when the expression is evaluated - not
   before. It means that if the scan is empty before the filtering, the
   error will not be reported and we'll silently return an empty result
   set. We currently consider this ok, but we can also change this in the
   future by binding the expression only once (today we do it on every
   evaluation) and validating it once after this binding.

Fixes #10361
Fixes #10399

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 16:05:34 +03:00
Nadav Har'El
808a93d29b expressions: fix invalid dereference in map subscript evaluation
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.

The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).

The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.

Fixes #10417

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 14:58:56 +03:00
Nadav Har'El
189b8845fe test/cql-pytest: improve tests for map subscripts and nulls
The test test_null.py::test_map_subscript_null turned out to reproduce
multiple bugs related to using map subscripts in filtering expressions.
One was issue #10361 (m[null] resulted in a bizarre error) or #10399
(m[null] resulted in a crash), and a different issue was #10401 (m[2]
resulted in a bizarre error or a crash if m itself was null). Moreover,
the same test uncovered different bugs depending how it was run - alone
or with other tests - because it was using a shared table.

In this patch we introduce two separate tests in test_filtering.py
which are designed to reproduce these separate bugs instead of mixing
them into one test. The new tests also cover a few more corners which
the previous test (which focused on nulls) missed - such as UNSET_VALUE.

The two new tests (and the old test_map_subscript_null) pass on
Cassandra so still assume that the Cassandra behavior - that m[null]
should be an error - is the correct behavior. We may want to change
the desired behavior (e.g., to decide that m[null] be null, not an
error), and change the tests accordingly later - but for now the
tests follow Cassandra's behavior exactly, and pass on Cassandra
and fail on Scylla (so are marked xfail).

The bugs reproduced by these tests involve randomness or reading
uninitialized memory, so these tests sometimes pass, sometimes fail,
and sometimes even crash (as reported in #10399 and #10401). So to
reproduce these bugs run the tests multiple times. For example:

    test/cql-pytest/run --count 100 --runxfail
        test_filtering.py::test_filtering_null_map_with_subscript

Refs #10361
Refs #10399
Refs #10401

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 13:26:26 +03:00
Avi Kivity
8624718983 Merge "row_cache: update reader implementations to v2" from Botond
"
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
This means there is still conversion ingoing when reading from cache and
when populating, but when reading from underlying, the stream can now be
passed through as-is without conversions.
Also, any future v2 related changes to the in-memory storage will now be
limited to the cache reader implementation itself.

In the process, the non-forwarding reader, whose only user is the cache,
is also converted to v2.
"

Performance results reported by Botond:

"
build/release/test/perf/perf_simple_query -c1 -m2G --flush --
duration=20

BEFORE
median 130421.76 tps ( 71.1 allocs/op,  12.1 tasks/op,   47462
insns/op)
median absolute deviation: 319.64
maximum: 131028.33
minimum: 127502.55

AFTER
median 133297.41 tps ( 64.1 allocs/op,  12.2 tasks/op,   45406
insns/op)
median absolute deviation: 2964.24
maximum: 137581.56
minimum: 123739.4

Getting rid of those upgrade/downgrade was good for allocs and ops.
Curiously there is a 0.1 rise in number of tasks though.
"

* 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla:
  row_cache: update reader implementations to v2
  range_tombstone_change_generator: flush(): add end_of_range
  readers/nonforwardable: convert to v2
  read_context: fix indentation
  read_context: coroutinize move_to_next_partition()
  row_cache: cache_entry::read(): return v2 reader
  row_cache: return v2 readers from make_reader*()
  readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
2022-04-23 19:10:43 +03:00
Botond Dénes
5e97fb9fc4 row_cache: update reader implementations to v2
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
2022-04-21 14:57:04 +03:00
Botond Dénes
5cc5fd4d23 range_tombstone_change_generator: flush(): add end_of_range
Allowing to flush all range tombstone changes, including those that have
a position equal to the passed in upper bound, when finishing off a
read-range, e.g. a clustering range from a slice.
2022-04-21 14:37:10 +03:00
Botond Dénes
7626beb729 readers/nonforwardable: convert to v2
It has a single user, the row cache, which for now has to
upgrade/downgrade around the nonforwardable reader, but this will go
away in the next patches when the row cache readers are converted to v2
proper.
2022-04-21 14:34:00 +03:00
Botond Dénes
b061acb668 Merge 'Remove queue reader v1' from Mikołaj Sielużycki
The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10371

* github.com:scylladb/scylla:
  readers: Remove queue_reader v1 and associated code.
  repair: Make mutation_fragment_queue internally upgrade fragments to v2
  repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
2022-04-21 12:34:48 +03:00
Mikołaj Sielużycki
f74fd0dd80 readers: Remove queue_reader v1 and associated code. 2022-04-20 17:56:34 +02:00
Mikołaj Sielużycki
339b60e5b0 repair: Make mutation_fragment_queue internally upgrade fragments to v2 2022-04-20 17:55:58 +02:00
Mikołaj Sielużycki
eeb2b458de repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
It makes mutation_fragment_queue copyable and makes the pointer to
pending mutation fragments in next commit stable. This allows moving the
mutation_fragment_queue without breaking the underlying
upgrading_consumer.
2022-04-20 17:51:58 +02:00
Botond Dénes
46481264e9 read_context: fix indentation
Broken by the previous patch (patches actually -- it was half-indent on
half-indent before that).
2022-04-20 10:59:09 +03:00
Botond Dénes
28f90728a3 read_context: coroutinize move_to_next_partition()
Makes the code more readable and the impending v2 transition less noisy.
2022-04-20 10:59:09 +03:00
Botond Dénes
2a0d7e8a1d row_cache: cache_entry::read(): return v2 reader
Push the conversion down one level. Soon we will make cache flat
mutation reader a v2 reader, this keeps the related noise separate.
2022-04-20 10:59:09 +03:00
Botond Dénes
0b035c9099 row_cache: return v2 readers from make_reader*()
And adjust callers. The factory functions just sprinkle upgrade_to_v2()
on returned readers for now.
One test in row_cache_test.cc had to be disabled, because the upgrade to
v2 wrapper we now have over cache readers doesn't allow it to directly
control the reader's buffer size and so the test fails. There is a FIXME
left in the test code and the test will be re-enabled once a native v2
reader implementation allows us to get rid of the upgrade wrapper.
2022-04-20 10:59:09 +03:00
Botond Dénes
c3c71b3aa5 readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
The argument type (v1 or v2 reader) is enough to disambiguate and
overloading the v1 method makes a transition to v2 more seamless.
2022-04-20 10:59:09 +03:00
Nadav Har'El
cc40685c28 test/cql-pytest: add test for filtering with IN restriction
It turns out that Cassandra does not allow IN restrictions together with
filtering, except, curiously, when the restriction is on a clustering key.
There is no real reason for this limitation - the error message even says
it is not *yet* supported.

Scylla, on the other hand, does support this case. Of course it's not
enough that we support it - we need to support it correctly... But we don't
have a full regression test that this support is correct - in
filtering_test.cc we test it with clustering and regular columns - but not
partition key columns.

So this patch adds a simple cql-pytest test that this sort of filtering
works in Scylla correctly for partition, clustering and regular columns
(and also confirms that these cases don't work, yet, on Cassandra).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220420075553.1008062-1-nyh@scylladb.com>
2022-04-20 09:56:22 +02:00
Konstantin Osipov
a3b790b413 test.py: add a dependency on python3-aiohttp and tabulate
Satisfy the build system requirements.

[avi: regenerate frozen toolchain]
2022-04-19 18:22:50 +03:00
Konstantin Osipov
097fbc7c5d .gitignore: ignore mypy_cache, the python lint cache 2022-04-19 16:48:47 +03:00
Pavel Emelyanov
41392a59bb storage_service: Remove pointless check in replace-bootstrap
The method in question is called in the branch where the replace address
is checked to be present, no need in extra explicit check.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-19 13:27:52 +03:00
Pavel Emelyanov
49481b1a21 storage_service: Generalize wait for range setup
Both the if is_replacing()/else branches call gossiper wating method as
their first steps. Can be done once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-19 13:27:52 +03:00
Pavel Emelyanov
d213e6ffd1 storage_service: Merge common if-else branches in bootstrap
There are three modes in there -- bootstrap, b.s. with RBNO and b.s. for
replacing. All three are checked two times in a row, but can be done
once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-19 13:27:52 +03:00
Pavel Emelyanov
b0df3a32b4 storage_service: Move tables bootstrap-ON upwards
This call just places a boolean flag on all. It won't hurt if it lasts
while the node is performing pre-bootstrap checks, but it allows making
the whole method less branchy.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-19 13:27:52 +03:00