The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
The conversion is shallow: the meat of the logic remains v1, fragments
are converted to v2 right before being pushed into the buffer. This
approach is simple, surgical and is still better then a full
upgrade_to_v2().
This just moves the upgrade_to_v2() calls to the other side of said
factory methods, preparing the ground for converting the kl reader impl
to a native v2 one.
The underlying mutation representation is still v1, so the
implementation still has to do conversion. This happens right above the
lsa reader component.
These bring in wasm.hh (though they really shouldn't) and make
everyone suffer. Forward declare instead and add missing includes
where needed.
Closes#10444
Minor fixlets to make `ninja dev-headers` pass.
Closes#10445
* github.com:scylladb/scylla:
readers/from_mutations_v2.hh: make self-contained
data_dictionary/storage_options.hh: make self-contained
Reduce #include load by standardizing on std::any.
In keys.cc, we just drop the unneeded include.
One instance of boost::any remains in config_file, due to a tie-in with
other boost components.
Closes#10441
We don't need the database to determine the shard of the mutation,
only its schema. So move the implementation to the respecive
definitions of mutation and frozen_mutation.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10430
DynamoDB limits partition-key length to 2048 bytes and sort-key length
to 1024 bytes. Alternator currently has no such limits officially, but
if a user tries a key length of over 64 KB, the result will be an
"internal server error" as Alternator runs into Scylla's low-level key
length limit of 64 KB.
In this patch we add (mostly xfailing) tests confirming all the above
observations. The tests include extensive comments on what they are
testing and why. Some of these tests (specifically, the ones checking
what happens above 64 KB) should pass once Alternator is fixed. Other
tests - requiring that the limits be exactly what they are in DynamoDB -
may either not pass or change in the future, depending on what we decide
the limits should be in Alternator.
Refs #10347
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10438
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.
Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.
Closes#10433
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.
related #10390Closes#10414
_estimated_remaining_tasks gets updated via get_next_non_expired_sstables ->
get_compaction_candidates, but otherwise if we return earlier from
get_sstables_for_compaction, it does not get updated and may go out of sync.
Refs #10418
(to be closed when the fix reaches branch-4.6)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10419
lookup_readers might fail after populating some readers
and those better be closed before returning the exception.
Fixes#10351
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10425
"
The method in question performs node bootstrap in several different
modes
(regular, replacing, rnbo) and several subsequent if-else branches just
duplicate each-other. This set merges them making the code easier to
read.
"
* 'br-less-branchy-bootstrap' of https://github.com/xemul/scylla:
storage_service: Remove pointless check in replace-bootstrap
storage_service: Generalize wait for range setup
storage_service: Merge common if-else branches in bootstrap
storage_service: Move tables bootstrap-ON upwards
The doc is being updated to reflect the changes in the commit
d8833de3bb ("Redefine Compaction Backlog to tame
compaction aggressiveness").
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently an exception is thrown in the apply stage
when the schema is not synced, but it is too late
since returning an error doesn't pinpoint which code
path was using an unsync'ed schema so move the check
earlier, before _apply_stage is called.
We need to make sure the schema is synced earlier
when the mutation is applied so call on_internal_error
to generate a backtrace in testing and still throw
an error in production.
Typically storage_proxy::mutate_locally implicitly
ensures the schema is synced by making a global_schema_ptr
for it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220424110057.3957597-1-bhalevy@scylladb.com>
In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues #10361, #10399, #10401. The existing test `test_null.py::test_map_subscript_null` reproduced all these bugs sporadically.
In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then **define** both `m[NULL]` and `NULL[2]` to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., `NULL < 2` is also defined as resulting in NULL.
However, this decision differs from Cassandra, where `m[NULL]` is considered an error but `NULL[2]` is allowed. We believe that making `m[NULL]` be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like `m[a]`, where the column `a` can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan.
Fixes#10361Fixes#10399Fixes#10401Closes#10420
* github.com:scylladb/scylla:
expressions: don't dereference invalid map subscript in filter
expressions: fix invalid dereference in map subscript evaluation
test/cql-pytest: improve tests for map subscripts and nulls
Currently, if a table is dropped during streaming, the streaming would
fail with no_such_column_family error.
Since the table is dropped anyway, it makes more sense to ignore the
streaming result of the dropped table, whether it is successful or
failed.
This allows users to drop tables during node operations, e.g., bootstrap
or decommission a node.
This is especially useful for the cloud users where it is hard to
coordinate between a node operation by admin and user cql change.
This patch also fixes a possible user after free issue by not passing
the table reference object around.
Fixes#10395Closes#10396
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).
Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:
* For NULL, we consider m[NULL] to result in NULL - instead of an error.
This behavior is more consistent with other expressions that contain
null - for example NULL[2] and NULL<2 both result in NULL as well.
Moreover, if in the future we allow more complex expressions, such
as m[a] (where a is a column), we can find the subscript to be null
for some rows and non-null for other rows - and throwing an "invalid
query" in the middle of the filtering doesn't make sense.
* For UNSET_VALUE, we do consider this an error like Cassandra, and use
the same error message as Cassandra. However, the current implementation
checks for this error only when the expression is evaluated - not
before. It means that if the scan is empty before the filtering, the
error will not be reported and we'll silently return an empty result
set. We currently consider this ok, but we can also change this in the
future by binding the expression only once (today we do it on every
evaluation) and validating it once after this binding.
Fixes#10361Fixes#10399
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.
The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).
The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.
Fixes#10417
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test test_null.py::test_map_subscript_null turned out to reproduce
multiple bugs related to using map subscripts in filtering expressions.
One was issue #10361 (m[null] resulted in a bizarre error) or #10399
(m[null] resulted in a crash), and a different issue was #10401 (m[2]
resulted in a bizarre error or a crash if m itself was null). Moreover,
the same test uncovered different bugs depending how it was run - alone
or with other tests - because it was using a shared table.
In this patch we introduce two separate tests in test_filtering.py
which are designed to reproduce these separate bugs instead of mixing
them into one test. The new tests also cover a few more corners which
the previous test (which focused on nulls) missed - such as UNSET_VALUE.
The two new tests (and the old test_map_subscript_null) pass on
Cassandra so still assume that the Cassandra behavior - that m[null]
should be an error - is the correct behavior. We may want to change
the desired behavior (e.g., to decide that m[null] be null, not an
error), and change the tests accordingly later - but for now the
tests follow Cassandra's behavior exactly, and pass on Cassandra
and fail on Scylla (so are marked xfail).
The bugs reproduced by these tests involve randomness or reading
uninitialized memory, so these tests sometimes pass, sometimes fail,
and sometimes even crash (as reported in #10399 and #10401). So to
reproduce these bugs run the tests multiple times. For example:
test/cql-pytest/run --count 100 --runxfail
test_filtering.py::test_filtering_null_map_with_subscript
Refs #10361
Refs #10399
Refs #10401
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
"
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
This means there is still conversion ingoing when reading from cache and
when populating, but when reading from underlying, the stream can now be
passed through as-is without conversions.
Also, any future v2 related changes to the in-memory storage will now be
limited to the cache reader implementation itself.
In the process, the non-forwarding reader, whose only user is the cache,
is also converted to v2.
"
Performance results reported by Botond:
"
build/release/test/perf/perf_simple_query -c1 -m2G --flush --
duration=20
BEFORE
median 130421.76 tps ( 71.1 allocs/op, 12.1 tasks/op, 47462
insns/op)
median absolute deviation: 319.64
maximum: 131028.33
minimum: 127502.55
AFTER
median 133297.41 tps ( 64.1 allocs/op, 12.2 tasks/op, 45406
insns/op)
median absolute deviation: 2964.24
maximum: 137581.56
minimum: 123739.4
Getting rid of those upgrade/downgrade was good for allocs and ops.
Curiously there is a 0.1 rise in number of tasks though.
"
* 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla:
row_cache: update reader implementations to v2
range_tombstone_change_generator: flush(): add end_of_range
readers/nonforwardable: convert to v2
read_context: fix indentation
read_context: coroutinize move_to_next_partition()
row_cache: cache_entry::read(): return v2 reader
row_cache: return v2 readers from make_reader*()
readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
Allowing to flush all range tombstone changes, including those that have
a position equal to the passed in upper bound, when finishing off a
read-range, e.g. a clustering range from a slice.
It has a single user, the row cache, which for now has to
upgrade/downgrade around the nonforwardable reader, but this will go
away in the next patches when the row cache readers are converted to v2
proper.
The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions.
tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)
Closes#10371
* github.com:scylladb/scylla:
readers: Remove queue_reader v1 and associated code.
repair: Make mutation_fragment_queue internally upgrade fragments to v2
repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
It makes mutation_fragment_queue copyable and makes the pointer to
pending mutation fragments in next commit stable. This allows moving the
mutation_fragment_queue without breaking the underlying
upgrading_consumer.
And adjust callers. The factory functions just sprinkle upgrade_to_v2()
on returned readers for now.
One test in row_cache_test.cc had to be disabled, because the upgrade to
v2 wrapper we now have over cache readers doesn't allow it to directly
control the reader's buffer size and so the test fails. There is a FIXME
left in the test code and the test will be re-enabled once a native v2
reader implementation allows us to get rid of the upgrade wrapper.
It turns out that Cassandra does not allow IN restrictions together with
filtering, except, curiously, when the restriction is on a clustering key.
There is no real reason for this limitation - the error message even says
it is not *yet* supported.
Scylla, on the other hand, does support this case. Of course it's not
enough that we support it - we need to support it correctly... But we don't
have a full regression test that this support is correct - in
filtering_test.cc we test it with clustering and regular columns - but not
partition key columns.
So this patch adds a simple cql-pytest test that this sort of filtering
works in Scylla correctly for partition, clustering and regular columns
(and also confirms that these cases don't work, yet, on Cassandra).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220420075553.1008062-1-nyh@scylladb.com>
The method in question is called in the branch where the replace address
is checked to be present, no need in extra explicit check.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both the if is_replacing()/else branches call gossiper wating method as
their first steps. Can be done once.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are three modes in there -- bootstrap, b.s. with RBNO and b.s. for
replacing. All three are checked two times in a row, but can be done
once.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This call just places a boolean flag on all. It won't hurt if it lasts
while the node is performing pre-bootstrap checks, but it allows making
the whole method less branchy.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>