Commit Graph

3039 Commits

Author SHA1 Message Date
Botond Dénes
e8f3d7dd13 sstables/index_reader: short-circuit fast-forward-to when at EOF
Attempting to call advance_to() on the index, after it is positioned at
EOF, can result in an assert failure, because the operation results in
an attempt to move backwards in the index-file (to read the last index
page, which was already read). This only happens if the index cache
entry belonging to the last index page is evicted, otherwise the advance
operation just looks-up said entry and returns it.
To prevent this, we add an early return conditioned on eof() to all the
partition-level advance-to methods.
A regression unit test reproducing the above described crash is also
added.
2022-05-05 14:42:37 +03:00
Botond Dénes
98f3d516a2 test/lib/random_schema: add a simpler overload for fixed partition count
Some tests want to generate a fixed amount of random partitions, make
their life easier.
2022-05-05 14:33:37 +03:00
Calle Wilund
78350a7e1b cdc: Ensure columns removed from log table are registered as dropped
If we are redefining the log table, we need to ensure any dropped
columns are registered in "dropped_columns" table, otherwise clients will not
be able to read data older than now.
Includes unit test.

Should probably be backported to all CDC enabled versions.

Fixes #10473
Closes #10474
2022-05-04 14:19:39 +02:00
Botond Dénes
4440d4b41a Merge "De-globalize gossiper" from Pavel Emelyanov
"
- Alternator gets gossiper for its proxy dependency

- Forward service method that takes global gossiper can re-use
  proxy method (forward -> proxy reference is already there)

- Table code is patched to require gossiper argument

- Snitch gets a dependency reference on snitch_ptr and some extra
  care for snitch driver vs snitch-ptr interaction and gossip test

- Cql test env should carry gossiper reference on-board

- Few places can re-use the existing local gossiper reference

- Scylla-gdb needs to get gossiper from debug namespace and needs
  _not_ to get feature service from gossiper
"

* 'br-gossiper-deglobal-2' of https://github.com/xemul/scylla:
  code: De-globalize gossiper
  scylla-gdb, main: Get feature service without gossiper help
  test: Use cql-test-env gossiper
  cql test env: Keep gossiper reference on board
  code: Use gossiper reference where possible
  snitch: Use local gossiper in drivers
  snitch: Keep gossiper reference
  test: Remove snitch from manual gossip test
  gossiper: Use container() instead of the global pointer
  main, cql_test_env: Start snitch later
  snitch: Move snitch_base::get_endpoint_info()
  forward service: Re-use proxy's helper with duplicated code
  table: Don't use global gossiper
  alternator: Don't use global gossiper
2022-05-03 15:56:07 +03:00
Nadav Har'El
6fb762630b cql-pytest: translate Cassandra's tests for SELECT operations
This is a translation of Cassandra's CQL unit test source file
    validation/operations/SelectTest.java into our our cql-pytest framework.

    This large test file includes 78 tests for various types of SELECT
    operations. Four additional tests require UDF in Java syntax,
    and were skipped.

    All 78 tests pass on Cassandra. 25 of the tests fail on Scylla
    reproducing 3 already known Scylla issues and 8 previously-unknown
    issues:

    Previously known issues:

      Refs #2962:  Collection column indexing
      Refs #4244:  Add support for mixing token, multi- and single-column
                   restrictions
      Refs #8627:  Cleanly reject updates with indexed values where
                   value > 64k

    Newly-discovered issues:

      Refs #10354: SELECT DISTINCT should allow filter on static columns,
                   not just partition keys
      Refs #10357: Spurious static row returned from query with filtering,
                   despite not matching filter
      Refs #10358: Comparison with UNSET_VALUE should produce an error
      Refs #10359: "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions
                   should match nothing
      Refs #10361: Null or UNSET_VALUE subscript should generate an
                   invalid request error
      Refs #10366: Enforce Key-length limits during SELECT
      Refs #10443: SELECT with IN and ORDER BY orders rows per partition
                   instead of for the entire response
      Refs #10448: The CQL token() function should validate its parameters

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10449
2022-05-03 11:45:05 +03:00
Pavel Emelyanov
e80adbade3 code: De-globalize gossiper
No code uses global gossiper instance, it can be removed. The main and
cql-test-env code now have their own real local instances.

This change also requires adding the debug:: pointer and fixing the
scylle-gdb.py to find the correct global location.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
b25fc29801 test: Use cql-test-env gossiper
There's yet another -test-env -- the alternator- one -- which needs
gossiper. It now uses global reference, but can grab gossiper reference
from the cql-test-env which partitipates in initialization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
b0544ba7bd cql test env: Keep gossiper reference on board
The reference is already available at the env initialization, but it's
not kept on the env instance itself. Will be used by the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
4bea0b7491 code: Use gossiper reference where possible
Some places in the code has function-local gossiper reference but
continue to use global instance. Re-use the local reference (it's going
to become sharded<> instance soon).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
38c77d0d85 snitch: Keep gossiper reference
The reference is put on the snitch_ptr because this is the sharded<>
thing and because gossiper reference is the same for different snitch
drivers. Also, getting gossiper from snitch_ptr by driver will look
simpler than getting it from any base class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
52fc4d6b22 test: Remove snitch from manual gossip test
It's not in use out there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
2d32c47d0d main, cql_test_env: Start snitch later
Snitch depends on gossiper and system keyspace, so it needs to be
started after those two do.

fixes #10402

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:32 +03:00
Botond Dénes
53c66fe24a Merge "Make LCS reshape and major more efficient by picking the ideal output level" from Raphael S. Carvalho
"
Today, both operations are picking the highest level as the ideal level for
placing the output, but the size of input should be used instead.

The formula for calculating the ideal level is:
    ceil(log base(fan_out) of (total_input_size / max_fragment_size))

        where fan_out = 10 by default,
        total_input_size = total size of input data and
        max_fragment_size = maximum size for fragment (160M by default)

    such that 20 fragments will be placed at level 2, as level 1
    capacity is 10 fragments only.

By placing the output in the incorrect level, tons of backlog will be generated
for LCS because it will either have to promote or demote fragments until the
levels are properly balanced.
"

* 'optimize_lcs_major_and_reshape/v2' of https://github.com/raphaelsc/scylla:
  compaction: LCS: avoid needless work post major compaction completion
  compaction: LCS: avoid needless work post reshape completion
  compaction: LCS: extract calculation of ideal level for input
  compaction: LCS: Fix off-by-one in formula used to calculate ideal level
2022-05-02 10:16:09 +03:00
Avi Kivity
5169ce40ef Merge 'loading_cache: force minimum size of unprivileged ' from Piotr Grabowski
This series enforces a minimum size of the unprivileged section when
performing `shrink()` operation.

When the cache is shrunk, we still drop entries first from unprivileged
section (as before this commit), however, if this section is already small
(smaller than `max_size / 2`), we will drop entries from the privileged
section.

This is necessary, as before this change the unprivileged section could
be starved. For example if the cache could store at most 50 entries and
there are 49 entries in privileged section, after adding 5 entries (that would
go to unprivileged section) 4 of them would get evicted and only the 5th one
would stay. This caused problems with BATCH statements where all
prepared statements in the batch have to stay in cache at the same time
for the batch to correctly execute.

To correctly check if the unprivileged section might get too small after
dropping an entry, `_current_size` variable, which tracked the overall size
of cache, is changed to two variables: `_unprivileged_section_size` and
`_privileged_section_size`, tracking section sizes separately.

New tests are added to check this new behavior and bookkeeping of the section
sizes. A test is added, that sets up a CQL environment with a very small
prepared statement cache, reproduces issue in #10440 and stresses the cache.

Fixes #10440.

Closes #10456

* github.com:scylladb/scylla:
  loading_cache_test: test prepared stmts cache
  loading_cache: force minimum size of unprivileged
  loading_cache: extract dropping entries to lambdas
  loading_cache: separately track size of sections
  loading_cache: fix typo in 'privileged'
2022-05-01 19:36:35 +03:00
Eliran Sinvani
e0c7178e75 query_processor: remove default internal query caching behavior
When executing internal queries, it is important that the developer
will decide if to cache the query internally or not since internal
queries are cached indefinitely. Also important is that the programmer
will be aware if caching is going to happen or not.
The code contained two "groups" of `query_processor::execute_internal`,
one group has caching by default and the other doesn't.
Here we add overloads to eliminate default values for caching behaviour,
forcing an explicit parameter for the caching values.
All the call sites were changed to reflect the original caching default
that was there.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-01 08:33:55 +03:00
Avi Kivity
7f1e368e92 Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes
Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop.

Fixes: #10450

Closes #10451

* github.com:scylladb/scylla:
  replica/database: drop_column_family(): drop querier cache entries after waiting for ops
  replica/database: finish coroutinizing drop_column_family()
  replica/database: make remove(const column_family&) private
2022-04-29 22:06:51 +03:00
Tomasz Grabiec
dbef83af71 Merge 'raft: fix startup hangs' from Kamil Braun
Fix hangs on Scylla node startup with Raft enabled that were caused by:
- a deadlock when enabling the USES_RAFT feature,
- a non-voter server forgetting who the leader is and not being able to forward a `modify_config` entry to become a voter.

Read the commit messages for details.

Fixes: #10379
Refs: #10355

Closes #10380

* github.com:scylladb/scylla:
  raft: actively search for a leader if it is not known for a tick duration
  raft: server: return immediately from `wait_for_leader` if leader is known
  service: raft: don't support/advertise USES_RAFT feature
2022-04-29 19:47:10 +02:00
Piotr Grabowski
6537dc6126 loading_cache_test: test prepared stmts cache
Add a new test that sets up a CQL environment with a very small prepared 
statements cache. The test reproduces a scenario described in #10440,
where a privileged section of prepared statement cache gets large
and that could possibly starve the unprivileged section, making it
impossible to execute BATCH statements. Additionally, at the end of the 
test, prepared statements/"simulated batches" with prepared statements 
are executed a random number of times, stressing the cache.

To create a CQL environment with small prepared cache, cql_test_config
is extended to allow setting custom memory_config value.
2022-04-29 19:22:55 +02:00
Piotr Grabowski
3f2224a47f loading_cache: force minimum size of unprivileged
This patch enforces a minimum size of unprivileged section when
performing shrink() operation.

When the cache is shrank, we still drop entries first from unprivileged
section (as before this commit), however if this section is already small
(smaller than max_size / 2), we will drop entries from the privileged
section.

For example if the cache could store at most 50 entries and there are 49 
entries in privileged section, after adding 5 entries (that would go to 
unprivileged section) 4 of them would get evicted and only the 5th one 
would stay. This caused problems with BATCH statements where all 
prepared statements in the batch have to stay in cache at the same time 
for the batch to correctly execute.

New tests are added to check this behavior and bookkeeping of section
sizes. 

Fixes #10440.
2022-04-29 19:19:04 +02:00
Raphael S. Carvalho
736c96cc6f compaction: LCS: avoid needless work post major compaction completion
That's done by picking the ideal level for the input, such
that LCS won't have to either promote or demote data, because
the output level is not the best candidate for having the
size of the output data.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-04-28 20:19:28 -03:00
Avi Kivity
aee94b7176 Merge "Convert remaining mutation sources to v2" from Botond
"
After the recent conversion of the row-cache, two v1 mutation sources
remained: the memtable and the kl sstable reader.
This series converts both to a native v2 implementation. The conversion
is shallow: both continue to read and process the underlying (v1) data
in v1, the fragments are converted to v2 right before being pushed to
the reader's buffer. This conversion is simple, surgical and low-risk.
It is also better than the upgrade_to_v2() used previously.

Following this, the remaining v1 reader implementations are removed,
with the exception of the downgrade_to_v1(), which is the only one left
at this point. Removing this requires converting all mutation sinks to
accept a v2 stream.

upgrade_to_v2() is now not used in any production code. It is still
needed to properly test downgrade_to_v1() (which is till used), so we
can't remove it yet. Instead it hidden as a private method of
mutation_source. This still allows for the above mentioned testing to
continue, while preventing anyone from being tempted to introduce new
usage.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/191
"

* 'convert-remaining-v1-mutation-sources/v2' of https://github.com/denesb/scylla:
  readers: make upgrade_to_v2() private
  test/lib/mutation_source_test: remove upgrade_to_v2 tests
  readers: remove v1 forwardable reader
  readers: remove v1 empty_reader
  readers: remove v1 delegating_reader
  sstables/kl: make reader impl v2 native
  sstables/kl: return v2 reader from factory methods
  sstables: move mp_row_consumer_reader_k_l to kl/reader.cc
  partition_snapshot_reader: convert implementation to native v2
  mutation_fragment_v2: range_tombstone_change: add minimal_memory_usage()
2022-04-28 20:31:23 +03:00
Nadav Har'El
ad7cc71748 Merge 'sstables: Fix deletion of partial SSTables' from Raphael "Raph" Carvalho
If SSTable write fails, it will leave a partial sst which contains
a temporary TOC in addition to other components partially written.
temporary TOC content is written upfront, to allow us from deleting
all partial components using the former content if write fails.

After commit e5fc4b6, partial sst cannot be deleted because it is
incorrectly assuming all files being deleted unconditionally has
TOC, but that's not true for partial files that need to be removed.
The consequence of this is that space of partial files cannot be
reclaimed, making it worse for Scylla to recover from ENOSPC,
which could happen by selecting a set of files for compaction with
higher chance of suceeeding given the free space.
Let's fix this by taking into account temp TOC for partial files.

Fixes #10410.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10411

* github.com:scylladb/scylla:
  sstables: Fix deletion of partial SSTables
  sstables: Fix fsync_directory()
  sstables: Rename dirname() to a more descriptive name
2022-04-28 16:35:46 +03:00
Botond Dénes
272da51f80 test/lib/mutation_source_test: remove upgrade_to_v2 tests
We don't have any upgrade_to_v2() left in production code, so no need to
keep testing it. Removing it from this test paves the way for removing
it for good (not in this series).
2022-04-28 14:12:24 +03:00
Botond Dénes
7420fb9411 readers: remove v1 forwardable reader
No users.
2022-04-28 14:12:24 +03:00
Botond Dénes
f527956cdb readers: remove v1 empty_reader
The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
2022-04-28 14:12:24 +03:00
Botond Dénes
ea37e9c04e readers: remove v1 delegating_reader
The only user is a test, which is hereby converted to use the v2
delegating reader.
2022-04-28 14:12:24 +03:00
Botond Dénes
024ceec61e replica/database: drop_column_family(): drop querier cache entries after waiting for ops
Reads (part of operations) running concurrent to `drop_column_family()`
can create querier cache entries while we wait for them to finish in
`await_pending_ops()`. Move the cache entry eviction to after this, to
ensure such entries are also cleaned up before destroying the table
object.
This moves the `_querier_cache.evict_all_for_table()` from
`database::remove()` to `database::drop_column_family()`. With that the
former doesn't have to return `future<>` anymore. While at it (changing
the signature) also rename `column_family` -> `table`.

Also add a regression unit test.
2022-04-28 13:40:13 +03:00
Benny Halevy
e88871f4ec replica: database: move shard_of implementation to mutation layer
We don't need the database to determine the shard of the mutation,
only its schema. So move the implementation to the respecive
definitions of mutation and frozen_mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10430
2022-04-27 14:40:24 +03:00
Nadav Har'El
f6ce7891a5 test/alternator: add test for key length limits
DynamoDB limits partition-key length to 2048 bytes and sort-key length
to 1024 bytes. Alternator currently has no such limits officially, but
if a user tries a key length of over 64 KB, the result will be an
"internal server error" as Alternator runs into Scylla's low-level key
length limit of 64 KB.

In this patch we add (mostly xfailing) tests confirming all the above
observations. The tests include extensive comments on what they are
testing and why. Some of these tests (specifically, the ones checking
what happens above 64 KB) should pass once Alternator is fixed. Other
tests - requiring that the limits be exactly what they are in DynamoDB -
may either not pass or change in the future, depending on what we decide
the limits should be in Alternator.

Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10438
2022-04-26 18:09:19 +02:00
Raphael S. Carvalho
791403e4bb sstables: Fix deletion of partial SSTables
If SSTable write fails, it will leave a partial sst which contains
a temporary TOC in addition to other components partially written.
temporary TOC content is written upfront, to allow us from deleting
all partial components using the former content if write fails.

After commit e5fc4b6, partial sst cannot be deleted because deletion
procedure is incorrectly assuming all SSTs being deleted unconditionally
have TOC, but partial SSTs only have TMP TOC instead.
That happens because parent_path() requires all path components to
exist due to its usage of fs::path::canonical.

The consequence of this is that space of partial files cannot be
reclaimed, making it worse for Scylla to recover from ENOSPC,
which could happen by selecting a set of files for compaction with
higher chance of suceeeding given the free space.

This is fixed by only calling parent_path() on TMP TOC, which is
guaranteed to exist prior to calling fsync_directory().

Fixes #10410.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-04-26 11:00:27 -03:00
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Gleb Natapov
7f26a8eef5 raft: actively search for a leader if it is not known for a tick duration
For a follower to forward requests to a leader the leader must be known.
But there may be a situation where a follower does not learn about
a leader for a while. This may happen when a node becomes a follower while its
log is up-to-date and there are no new entries submitted to raft. In such
case the leader will send nothing to the follower and the only way to
learn about the current leader is to get a message from it. Until a new
entry is added to the raft's log a follower that does not know who the
leader is will not be able to add entries. Kind of a deadlock. Note that
the problem is specific to our implementation where failure detection is
done by an outside module. In vanilla raft a leader sends messages to
all followers periodically, so essentially it is never idle.

The patch solves this by broadcasting specially crafted append reject to all
nodes in the cluster on a tick in case a leader is not known. The leader
responds to this message with an empty append request which will cause the
node to learn about the leader. For optimisation purposes the patch
sends the broadcast only in case there is actually an operation that
waits for leader to be known.

Fixes #10379
2022-04-25 14:51:22 +02:00
Avi Kivity
728479a6ea Merge 'Fix map subscript crashes when map or subscript is null' from Nadav Har'El
In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues #10361, #10399, #10401. The existing test `test_null.py::test_map_subscript_null` reproduced all these bugs sporadically.

In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then **define** both `m[NULL]` and `NULL[2]` to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., `NULL < 2` is also defined as resulting in NULL.

However, this decision differs from Cassandra, where `m[NULL]` is considered an error but `NULL[2]` is allowed. We believe that making `m[NULL]` be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like `m[a]`, where the column `a` can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan.

Fixes #10361
Fixes #10399
Fixes #10401

Closes #10420

* github.com:scylladb/scylla:
  expressions: don't dereference invalid map subscript in filter
  expressions: fix invalid dereference in map subscript evaluation
  test/cql-pytest: improve tests for map subscripts and nulls
2022-04-24 21:16:10 +03:00
Nadav Har'El
fbb2a41246 expressions: don't dereference invalid map subscript in filter
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).

Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:

 * For NULL, we consider m[NULL] to result in NULL - instead of an error.
   This behavior is more consistent with other expressions that contain
   null - for example NULL[2] and NULL<2 both result in NULL as well.
   Moreover, if in the future we allow more complex expressions, such
   as m[a] (where a is a column), we can find the subscript to be null
   for some rows and non-null for other rows - and throwing an "invalid
   query" in the middle of the filtering doesn't make sense.

 * For UNSET_VALUE, we do consider this an error like Cassandra, and use
   the same error message as Cassandra. However, the current implementation
   checks for this error only when the expression is evaluated - not
   before. It means that if the scan is empty before the filtering, the
   error will not be reported and we'll silently return an empty result
   set. We currently consider this ok, but we can also change this in the
   future by binding the expression only once (today we do it on every
   evaluation) and validating it once after this binding.

Fixes #10361
Fixes #10399

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 16:05:34 +03:00
Nadav Har'El
808a93d29b expressions: fix invalid dereference in map subscript evaluation
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.

The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).

The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.

Fixes #10417

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 14:58:56 +03:00
Nadav Har'El
189b8845fe test/cql-pytest: improve tests for map subscripts and nulls
The test test_null.py::test_map_subscript_null turned out to reproduce
multiple bugs related to using map subscripts in filtering expressions.
One was issue #10361 (m[null] resulted in a bizarre error) or #10399
(m[null] resulted in a crash), and a different issue was #10401 (m[2]
resulted in a bizarre error or a crash if m itself was null). Moreover,
the same test uncovered different bugs depending how it was run - alone
or with other tests - because it was using a shared table.

In this patch we introduce two separate tests in test_filtering.py
which are designed to reproduce these separate bugs instead of mixing
them into one test. The new tests also cover a few more corners which
the previous test (which focused on nulls) missed - such as UNSET_VALUE.

The two new tests (and the old test_map_subscript_null) pass on
Cassandra so still assume that the Cassandra behavior - that m[null]
should be an error - is the correct behavior. We may want to change
the desired behavior (e.g., to decide that m[null] be null, not an
error), and change the tests accordingly later - but for now the
tests follow Cassandra's behavior exactly, and pass on Cassandra
and fail on Scylla (so are marked xfail).

The bugs reproduced by these tests involve randomness or reading
uninitialized memory, so these tests sometimes pass, sometimes fail,
and sometimes even crash (as reported in #10399 and #10401). So to
reproduce these bugs run the tests multiple times. For example:

    test/cql-pytest/run --count 100 --runxfail
        test_filtering.py::test_filtering_null_map_with_subscript

Refs #10361
Refs #10399
Refs #10401

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 13:26:26 +03:00
Avi Kivity
8624718983 Merge "row_cache: update reader implementations to v2" from Botond
"
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
This means there is still conversion ingoing when reading from cache and
when populating, but when reading from underlying, the stream can now be
passed through as-is without conversions.
Also, any future v2 related changes to the in-memory storage will now be
limited to the cache reader implementation itself.

In the process, the non-forwarding reader, whose only user is the cache,
is also converted to v2.
"

Performance results reported by Botond:

"
build/release/test/perf/perf_simple_query -c1 -m2G --flush --
duration=20

BEFORE
median 130421.76 tps ( 71.1 allocs/op,  12.1 tasks/op,   47462
insns/op)
median absolute deviation: 319.64
maximum: 131028.33
minimum: 127502.55

AFTER
median 133297.41 tps ( 64.1 allocs/op,  12.2 tasks/op,   45406
insns/op)
median absolute deviation: 2964.24
maximum: 137581.56
minimum: 123739.4

Getting rid of those upgrade/downgrade was good for allocs and ops.
Curiously there is a 0.1 rise in number of tasks though.
"

* 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla:
  row_cache: update reader implementations to v2
  range_tombstone_change_generator: flush(): add end_of_range
  readers/nonforwardable: convert to v2
  read_context: fix indentation
  read_context: coroutinize move_to_next_partition()
  row_cache: cache_entry::read(): return v2 reader
  row_cache: return v2 readers from make_reader*()
  readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
2022-04-23 19:10:43 +03:00
Botond Dénes
5e97fb9fc4 row_cache: update reader implementations to v2
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
2022-04-21 14:57:04 +03:00
Botond Dénes
b061acb668 Merge 'Remove queue reader v1' from Mikołaj Sielużycki
The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10371

* github.com:scylladb/scylla:
  readers: Remove queue_reader v1 and associated code.
  repair: Make mutation_fragment_queue internally upgrade fragments to v2
  repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
2022-04-21 12:34:48 +03:00
Mikołaj Sielużycki
339b60e5b0 repair: Make mutation_fragment_queue internally upgrade fragments to v2 2022-04-20 17:55:58 +02:00
Mikołaj Sielużycki
eeb2b458de repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
It makes mutation_fragment_queue copyable and makes the pointer to
pending mutation fragments in next commit stable. This allows moving the
mutation_fragment_queue without breaking the underlying
upgrading_consumer.
2022-04-20 17:51:58 +02:00
Botond Dénes
0b035c9099 row_cache: return v2 readers from make_reader*()
And adjust callers. The factory functions just sprinkle upgrade_to_v2()
on returned readers for now.
One test in row_cache_test.cc had to be disabled, because the upgrade to
v2 wrapper we now have over cache readers doesn't allow it to directly
control the reader's buffer size and so the test fails. There is a FIXME
left in the test code and the test will be re-enabled once a native v2
reader implementation allows us to get rid of the upgrade wrapper.
2022-04-20 10:59:09 +03:00
Botond Dénes
c3c71b3aa5 readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
The argument type (v1 or v2 reader) is enough to disambiguate and
overloading the v1 method makes a transition to v2 more seamless.
2022-04-20 10:59:09 +03:00
Nadav Har'El
cc40685c28 test/cql-pytest: add test for filtering with IN restriction
It turns out that Cassandra does not allow IN restrictions together with
filtering, except, curiously, when the restriction is on a clustering key.
There is no real reason for this limitation - the error message even says
it is not *yet* supported.

Scylla, on the other hand, does support this case. Of course it's not
enough that we support it - we need to support it correctly... But we don't
have a full regression test that this support is correct - in
filtering_test.cc we test it with clustering and regular columns - but not
partition key columns.

So this patch adds a simple cql-pytest test that this sort of filtering
works in Scylla correctly for partition, clustering and regular columns
(and also confirms that these cases don't work, yet, on Cassandra).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220420075553.1008062-1-nyh@scylladb.com>
2022-04-20 09:56:22 +02:00
Botond Dénes
75786c42cb Merge 'Add repair unit tests/v1' from Mikołaj Sielużycki
This patch series splits up parts of repair pipeline to allow unit testing
various bits of code without having to run full dtest suite. The reason why
repair pipeline has no unit tests is that by definition repair requires multiple
nodes, while unit test environment works only for a single node.

However, it is possible to explicitly define interfaces between various parts of the
pipeline, inject dependencies and test them individually. This patch series is focused
on taking repair_rows_on_wire (frozen mutation representation of changes coming from
another node) and flushing them to an sstable.

The commits are split into the following parts:
- pulling out classes to separate headers so that they can be included (potentially indirectly) from the test,
- pulling out repair_meta::to_repair_rows_list and part of repair_meta::flush_rows_in_working_row_buf so that they can be tested,
- refactoring repair_writer so that the actual writing logic can be injected as dependency,
- creating the unit test.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10345

* github.com:scylladb/scylla:
  repair: Add unit test for flushing repair_rows_on_wire to disk.
  repair: Extract mutation_fragment_queue and repair_writer::impl interfaces.
  repair: Make parts of repair_writer interface private.
  repair: Rename inputs to flush_rows.
  repair: Make repair_meta::flush_rows a free function.
  repair: Split flush_rows_in_working_row_buf to two functions and make one static.
  repair: Rename inputs to to_repair_rows_list.
  repair: Make to_repair_rows_list a free function.
  repair: Make repair_meta::to_repair_rows_list a static function
  repair: Fix indentation in repair_writer.
  repair: Move repair_writer to separate header.
  repair: Move repair_row to a separate header.
  repair: Move repair_sync_boundary to a separate header.
  repair: Move decorated_key_with_hash to separate header.
  repair: Move row_repair hashing logic to separate class and file.
2022-04-14 18:17:03 +03:00
Botond Dénes
737cc798ca Merge "Add flat_mutation_reader_from_mutation_v2" from Benny Halevy
"
Optimize consuming from a single partition.

This gives us significant improvement with single, small mutations,
as shown with perf_mutation_readers, compared to the vector-based
flat_mutation_reader_from_mutations_v2.

These are expected to be common on the write path,
and can be optimized for view building.

results from: perf_mutation_readers -c1 --random-seed=840478750
(userspace cpu-frequency governer, 2.2GHz)

test                                      iterations      median         mad         min         max

Before:
combined.one_row                              720118   825.668ns     1.020ns   824.648ns   827.750ns

After:
combined.one_mutation                         881482   751.157ns     0.397ns   750.211ns   751.912ns
combined.one_row                              843270   756.553ns     0.303ns   755.889ns   757.911ns

The grand plan is to follow up
with make_flat_mutation_reader_from_frozen_mutation_v2
so that we can read directly from either a mutation
or frozen_mutation without having to unfreeze it e.g. in
table::push_view_replica_updates.

Test: unit(dev)
Perf: perf_mutation_readers(release)
"

* tag 'flat_mutation_reader_from_mutation-v3' of https://github.com/bhalevy/scylla:
  perf: perf_mutation_readers: add one_mutation case
  test: mutation_query_test: make make_source static
  mutation readers: refactor make_flat_mutation_reader_from_mutation*_v2
  mutation readers: add make_flat_mutation_reader_from_mutation_v2
  readers: delete slice_mutation.hh
  test: flat_mutation_reader_test: mock_consumer: add debug logging
  test: flat_mutation_reader_test: mock_consumer: make depth counter signed
2022-04-14 17:23:21 +03:00
Botond Dénes
fa75d58cf0 Merge "Make snitch start/stop code look classical" from Pavel Emelyanov
"
There's a generic way to start-stop services in scylla, that includes
5 "actions" (some are optional and/or implicit though)

  service_config cfg = ...
  sharded<service>.start(cfg)
  service.invoke_on_all(&service::start)
  service.invoke_on_all(&service::shutdown)
  service.invoke_on_all(&servuce::stop)
  sharded<service>.stop()

and most of the service out there conforms to that scheme. Not snitch
(spoiler: and not tracing), for which there's a couple of helpers that
do all that magic behind the scenes, "configuring" snitch is done with
the help of overloaded constructors. The latter is extra complicated
with the need to register snitch drivers in class-registry for each
constructor overload. Also there's an external shards synchronization
on stop.

This set brings snitch start/stop code to the described standard: the
create/stop helpers are removed, creation acceps the config structure,
per-shard start/stop (snitch has no drain for now) happens in the
simple invoke-on-all manner.

The intended side effect of this change is the ability to add explicit
dependencies to snitch (in the future, not in this set).

tests: unit(dev)
"

* 'br-snitch-config' of https://github.com/xemul/scylla:
  snitch: Remove create_snitch/stop_snitch
  snitch: Simplify stop (and pause_io)
  snitch: Move io_is_stopped to property-file driver
  snitch: Remove init_snitch_obj()
  snitch: Move instance creation into snitch_ptr constructor
  snitch: Make config-based construction of all drivers
  snitch: Declare snitch_ptr peering and rework container() method
  snitch: Introduce container() method
2022-04-14 16:56:32 +03:00
Benny Halevy
f5ef687acd perf: perf_mutation_readers: add one_mutation case
Measure performance of the single-mutation reader:
make_flat_mutation_reader_from_mutation_v2.

Comparable to the `one_row` case that consumes the
single mutation using the multi-mutatio reader:
make_flat_mutation_reader_from_mutations_v2

perf_mutation_readers shows ~20-30% improvement of
make_flat_mutation_reader_from_mutation_v2
the same single mutation, just given as a single-item vector
to make_flat_mutation_reader_from_mutations_v2.

test                                      iterations      median         mad         min         max

Before:
combined.one_row                              720118   825.668ns     1.020ns   824.648ns   827.750ns

After:
combined.one_mutation                         881482   751.157ns     0.397ns   750.211ns   751.912ns
combined.one_row                              843270   756.553ns     0.303ns   755.889ns   757.911ns

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 11:39:05 +03:00
Benny Halevy
a4b69fe7b6 test: mutation_query_test: make make_source static
No need for it to be public.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 11:15:19 +03:00
Benny Halevy
e85241d5b6 mutation readers: add make_flat_mutation_reader_from_mutation_v2
Optimize reading from a single partition.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 11:14:43 +03:00