Commit Graph

29248 Commits

Author SHA1 Message Date
Benny Halevy
3eabfad9fc test: database_test: add snapshot_with_quarantine_works
Test that snapshot includes quarantined sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
11b54d44d9 test: database_test: add populate_from_quarantine_works
Test that we load quarantined sstables by
creating a dataset, moving a sstable to the quarantine dir,
and then reload the table and verify the dataset.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
075962b45a distributed_loader: populate_keyspace: populate also from the quarantine dir
sstables in the quarantine subdirectory are part of the table.
They're just not eligible for non-scrub compaction.

Call populate_column_family also for the quarantine subdirectory,
allowing it to not exist.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
f643dc90a9 distributed_loader: populate_column_family: add must_exist param
Check if the directory to be loaded exists.

Currently must_exist=true in all cases,
but it may be set to false when loading directories
that may not exist.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
13e7b00f2e sstables: add is_quarantined
Quarantined sstables will reside in a "quarantine" subdirectory
and are also not eligible for regular compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
07c5ddf182 sstables: add is_eligible_for_compaction
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.

is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Benny Halevy
bdc53880d4 sstables: define symbolic names for table subdirectories
Define the "staging", "upload", and "snapshots" subdirectory
names as named const expressions in the sstables namespace
rather than relying on their string representation,
that could lead to typo mistakes.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Pavel Emelyanov
db5678bb7f Merge "Kill unused code in compaction" from Raphael
tests: unit(dev).

* github.com/raphaelsc/scylla.git cleanups_for_compaction_12_03
  compaction_strategy: kill unused compaction_strategy_type::major
  compaction: Log skip of fully expired sstables
  compaction_strategy: kill unused can_compact_partial_runs()
  compaction: kill useless on_skipped_expired_sstable()
  compaction: merge _total_input_sstables and _ancestors
2021-12-03 19:22:08 +03:00
Raphael S. Carvalho
2f9f089eda compaction_strategy: kill unused compaction_strategy_type::major
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:27:10 -03:00
Raphael S. Carvalho
0e3d388ebb compaction: Log skip of fully expired sstables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:25:48 -03:00
Raphael S. Carvalho
9725e5efa9 compaction_strategy: kill unused can_compact_partial_runs()
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:20:51 -03:00
Raphael S. Carvalho
7a7a2467fa compaction: kill useless on_skipped_expired_sstable()
It was introduced by commit 5206a97915 because fully expired sstable
wouldn't be registed and therefore could be never removed from backlog
tracker. This is no longer possible as table is now responsible for
removing all input sstables. So let's kill on_skipped_expired_sstable()
as it's now only boilerplate we don't need.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:19:29 -03:00
Raphael S. Carvalho
32c2534e91 compaction: merge _total_input_sstables and _ancestors
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:19:23 -03:00
Piotr Sarna
0bd139e81c Merge 'cql3: expr: detemplate and deinline find_in_expression()
... and count_if()' from Avi Kivity

The expression code provides some utilities to examine and manipulate
expressions at prepare time. These are not (or should not be) in the fast
path and so should be optimized for compile time and code footprint
rather than run time.

This series does so by detemplating and deinlining find_in_expression()
and count_if().

Closes #9712

* github.com:scylladb/scylla:
  cql3: expr: adjust indentation in recurse_until()
  cql3: expr: detemplate count_if()
  cql3: expr: detemplate count_if()
  cql3: expr: rewrite count_if() in terms of recurse_until()
  cql3: expr: deinline recurse_until()
  cql3: expr: detemplate find_in_expression
2021-12-03 15:41:07 +01:00
Piotr Sarna
3867ca2fd6 Merge 'cql3: Don't allow unset values inside UDT' from Jan Ciołek
Scylla doesn't support unset values inside UDT.
The old code used to convert `unset` to `null`, which seems incorrect.

There is an extra space in the error message to retain compatability with Cassandra.

Fixes: #9671

Closes #9724

* github.com:scylladb/scylla:
  cql-pytest: Enable test for UDT with unset values
  cql3: Don't allow unset values inside UDT
2021-12-03 15:36:55 +01:00
Jan Ciolek
3ae8752812 cql-pytest: Enable test for UDT with unset values
The test testUDTWithUnsetValues was marked as xfail,
but now the issue has been fixed and we can enable it.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-12-03 14:46:21 +01:00
Jan Ciolek
be14904416 cql3: Don't allow unset values inside UDT
Scylla doesn't support unset values inside UDT.
The old code used to convert unset to null, which seems incorrect.

There is an extra space in the error message to retain compatability with Cassandra.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-12-03 14:46:21 +01:00
Gleb Natapov
b954d91b4f migration_manager: co-routinize announce_column_family_drop function.
Message-Id: <20211202150531.1277448-30-gleb@scylladb.com>
2021-12-03 13:29:34 +02:00
Raphael S. Carvalho
4a02e312f6 compaction: increase disjoint tolerance in TWCS reshape
When reshaping TWCS table in relaxed mode, which is the case for
offstrategy and boot, disjoint tolerance is too strict, which can
lead those processes to do more work than needed.
Let's increase the tolerance to max threshold, which will limit the
amount of sstables opened in compaction to a reasonable amount.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130132538.56285-1-raphaelsc@scylladb.com>
2021-12-03 06:38:42 +02:00
Raphael S. Carvalho
6ad630c095 scylla-gdb.py: fix unique ptr on newer libstdc++
unfortunately, correctness of std_unique_ptr and similar depends on
their implementation in libstdc++. let's support unique ptr on
newer systems while maintaining backward compatibility.

./test.py --mode=release scylla-gdb now passes to me, also verified
`scylla compaction-tasks` produces correct info.

Fixes #9677.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211202173534.359672-1-raphaelsc@scylladb.com>
2021-12-03 06:33:54 +02:00
Avi Kivity
3b82ef854d Merge "Some compaction manager cleanups" from Raphael
"
couple of preparatory changes for coroutinization of manager
"

* 'some_compaction_manager_cleanups_v5' of github.com:raphaelsc/scylla:
  compaction_manager: move check_for_cleanup into perform_cleanup()
  compaction_manager: replace get_total_size by one liner
  compaction_manager: make consistent usage of type and name table
  compaction_manager: simplify rewrite_sstables()
  compaction_manager: restore indentation
2021-12-02 19:53:13 +02:00
Pavel Emelyanov
5cfeac0c90 paxos: Drop forward declarations of seastar pointers
They will break compilation after next seastar update, but the
good news is that scylla compiles even without them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211202173643.1070-1-xemul@scylladb.com>
2021-12-02 19:49:03 +02:00
Konstantin Osipov
bdb924cdac cql3: co-routinize create_table_statement::announce_migration()
Message-Id: <20211202150531.1277448-4-gleb@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
e4f35e2139 migration_manager: Eliminate storage service from passive announcing
Currently storage service acts as a glue between database schema value
and the migration manager "passive_announce" call. This interposing is
not required, migration manager can do all the management itself, and
the linkage can be done in main.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
a751a1117a migration_manager: Coroutinize drain()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
eb8e30f696 migration_manager: Rename stop to drain then bring it back
Because today's migration_manager::stop is called drain-time.
Keep the .stop for next patch, but since it's called when the
whole migration_manager stops, guard it against re-entrances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
798f4b0e3f migration_manager: Sanitize (maybe_)schedule_schema_pull
Both calls are now private. Also the non-maybe one can become void
and handle pull exceptions by itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
421679e428 migration_manager: Schedule schema pulls upon gossip events
Move the calls from respective storage service notification callbacks.
One non-move change is that token metadata that was available on the
storage service should be taken from storage proxy, but this change
is aligned with future changes -- migration manager depends on proxy
and will get a local proxy reference some day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
d4d0bd147e migration_manager: Subscribe on gossiper events
This is to start schema pulls upon on_join, on_alive and on_change ones
in the next patch. Migration manager already has gossiper reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Botond Dénes
259649c779 sstables/index_reader: improved diagnostics on missing index entry
Add the summary index and the bound's address to the error message, so
it can be correlated with other trace level logging when investigating a
problem.

Refs: #9446

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211202124955.542293-2-bdenes@scylladb.com>
2021-12-02 19:43:30 +02:00
Botond Dénes
f0b9519999 test/lib/exception_utils: add message_matches() predicate
Which checks the message against the given regex.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211202124955.542293-1-bdenes@scylladb.com>
2021-12-02 19:43:30 +02:00
Nadav Har'El
605a2de398 config: change default prometheus_address handling, again
In the very recent commit 3c0e703 fixing issue #8757, we changed the
default prometheus_address setting in scylla.yaml to "localhost", to
match the default listen_address in the same file. We explained in that
commit how this helped developers who use an unchanged scylla.yaml, and
how it didn't hurt pre-existing users who already had their own scylla.yaml.

However, it was quickly noted by Tzach and Amnon that there is one use case
that was hurt by that fix:

Our existing documentation, such as the installation guide
https://www.scylladb.com/download/?platform=centos ask the user to take
our initial scylla.yaml, and modify listen_address, rpc_address, seeds,
and cluster_name - and that's it. That document - and others - don't
tell the user to also override prometheus_address, so users will likely
forget to do so - and monitoring will not work for them.

So this patch includes a different solution to #8757.
What it does is:
1. The setting of prometheus_address in scylla.yaml is commented out.
2. In config.cc, prometheus_address defaults to empty.
3. In main.cc, if prometheus_address is empty (i.e., was not explicitly
   set by the user), the value of listen_address is used instead.

In other words, the idea is that prometheus_address, if not explicitly set
by the user, should default to listen_address - which is the address used
to listen to the internal Scylla inter-node protocol.

Because the documentation already tells the user to set listen_address
and to not leave it set to localhost, setting it will also open up
prometheus, thereby solving #9701. Meanwhile, developers who leave the
default listen_address=localhost will also get prometheus_address=localhost,
so the original #8757 is solved as well. Finally, for users who had an old
scylla.yaml where prometheus_address was explicitly set to something,
this setting will continue to be used. This was also a requirement of
issue #8757.

Fixes #9701.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211129155201.1000893-1-nyh@scylladb.com>
2021-12-02 19:43:30 +02:00
Avi Kivity
7cfd278c32 db: size_estimates_virtual_reader: convert to flat_mutation_reader_v2
As part of changing the codebase to flat_mutation_reader_v2,
change size_estimates_virtual_reader.

Since the bulk of the work is done by
make_flat_mutation_reader_from_mutations() (which is unchanged),
only glue code is affected. It is also not performance sensitive,
so the extra conversions are unimportant.

Test: unit (dev)

Closes #9707
2021-12-02 19:43:30 +02:00
Avi Kivity
b920f2500d db: virtual_table: convert chained_delegating_reader to v2
As part of changing the codebase to flat_mutation_reader_v2,
change chained_delegating_reader and its user virtual_table.

Since the reader does not process fragments (only forwarding
things around), only glue code is affected. It is also not
performance sensitive, so the extra conversions are unimportant.

Test: unit (dev)

Closes #9706
2021-12-02 19:43:30 +02:00
Raphael S. Carvalho
6d750d4f59 compaction_manager: move check_for_cleanup into perform_cleanup()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:31 -03:00
Raphael S. Carvalho
9aed7e9d67 compaction_manager: replace get_total_size by one liner
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:31 -03:00
Raphael S. Carvalho
760cfd93fb compaction_manager: make consistent usage of type and name table
new code in manager adopted name and type table, whereas historical
code still uses name and type column family. let's make it consistent
for newcomers to not get confused.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:27 -03:00
Raphael S. Carvalho
e460f72250 compaction_manager: simplify rewrite_sstables()
as rewrite_sstables() switched to coroutine, it can be simplified
by not using smart pointers to handle lifetime issues.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 08:15:41 -03:00
Raphael S. Carvalho
48124fc15a compaction_manager: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 08:15:38 -03:00
Piotr Sarna
761c691149 alternator,ttl: simplify getting primary key column values
Key column values fetched during the TTL scan have a well-defined
order - primary columns come first. This assumption is now used
to simplify getting the values from rows during scans without
having to consult result metadata first.

Tests: unit(release)
Message-Id: <dcb19b8bab0dd02838693fe06d5a835ea2f378ff.1638357005.git.sarna@scylladb.com>
2021-12-02 10:29:41 +02:00
Piotr Sarna
337906bc1c alternator: precompute scan range parameters in a function
This commit addresses a very simple FIXME left in alternator TTL
implementation - it reduces the number of parameters passed
to scan_table_ranges() by enclosing the parameters in a separate
object.

Tests: unit(release)

Message-Id: <214afcd9d5c1968182ad98550105f82add216c80.1638354094.git.sarna@scylladb.com>
2021-12-02 10:04:05 +02:00
Raphael S. Carvalho
de165b864c repair: Enable off-strategy compaction for rebuild
Let's enable offstrategy for repair based rebuild, for it to take
advantage of offstrategy benefits, one of the most important
being compaction not acting aggressively, which is important
for both reducing operation time and delivering good latency
while the operation is running.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130115957.13779-1-raphaelsc@scylladb.com>
2021-12-02 09:58:58 +02:00
David Garcia
954d5d5d63 Fix cql docs error
Closes #9613
2021-12-02 09:58:58 +02:00
Avi Kivity
ef3edcf848 test: refine test suite names exposed via xunit format
The test suite names seen by Jenkins are suboptimal: there is
no distinction between modes, and the ".cc" suffix of file names
is interpreted as a class name, which is converted to a tree node
that must be clicked to expand. Massage the names to remove
unnecessary information and add the mode.

Closes #9696
2021-12-02 09:58:58 +02:00
Avi Kivity
9edd86362a test: sstable_test: don't read compressed file size from closed file
We read the compressed file size from a file that was already closed,
resulting in EBADF on my machine. Not sure why it works for everyone
else.

Fix by reading the size using the path.

Closes #9675
2021-12-01 16:28:46 +02:00
Raphael S. Carvalho
f23e0d7f2d compaction_manager: Disconsider inactive tasks when filtering sstables
After commit 1f5b17f, overlapping can be introduced in level 1 because
procedure that filters out sstables from partial runs is considering
inactive tasks, so L1 sstables can be incorrectly filtered out from
next compaction attempt. When L0 is merged into L1, overlapping is
then introduced in L1 because old L1 sstables weren't considered in
L0 -> L1 compaction.

From now on, compaction_manager::get_candidates() will only consider
active tasks, to make sure actual partial runs are filtered out.

Fixes #9693.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>
2021-12-01 16:11:44 +02:00
Raphael S. Carvalho
9de7abdc80 compaction: LCS: Fix inefficiency when pushing SSTables to higher levels
To satisfy backlog controller, commit 28382cb25c changed LCS to
incrementally push sstables to highest level *when* there's nothing else
to be done.

That's overkill because controller will be satisfied with level L being
fanout times larger than L-1. No need to push everything to last level as
it's even worse than a major, because any file being promoted will
overlap with ~10 files in next level. At least, the cost is amortized by
multiple iterations, but terrible write amplification is still there.
Consequently, this reduces overall efficiency.
For example, it might happen that LCS in table A start pushing everything
to highest level, when table B needs resources for compaction to reduce its
backlog. Increased write amplification in A may prevent other tables
from reducing their backlog in a timely manner.

It's clear that LCS should stop promoting as soon as level L is 10x
larger than L-1, so strategy will still be satisfied while fixing the
inefficiency problem.

Now layout will look like as follow:
SSTables in each level: [0, 2, 15, 121]

Previously, it looked like once table stopped being written to:
SSTables in each level: [0, 0, 0, 138]

It's always good to have everything in a single run, but that comes
with a high write amplification cost which we cannot afford in steady
state. With this change, the layout will still be good enough to make
everybody happy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129143606.71257-1-raphaelsc@scylladb.com>
2021-12-01 16:10:25 +02:00
Gleb Natapov
f2ab5f4e60 raft service: insert a new raft instance into the servers' list only after it is started
RPC module starts to dispatching calls to a server the moment it is in the
servers' list, but until raft::server::start() completes the instance is
not fully created yet and is not ready to accept anything. Fix the code
that initialize new raft group to insert new raft instance into the list
only after it is started.

Message-Id: <YZTFFW9v0NlV7spR@scylladb.com>
2021-12-01 13:11:49 +01:00
Nadav Har'El
94eb5c55c8 Merge 'Loading cache improve eviction use policy' from Vladislav Zolotarov
This series introduces a new version of a loading_cache class.
The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache.

The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows:
  * We define 2 dynamic cache sections which total size should not exceed the maximum cache size.
  * New cache entry is always added to the "unprivileged" section.
  * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section.
  * Both sections' entries obey expiration and reload rules in the same way as before this patch.
  * When cache entries need to be evicted due to a size restriction "unprivileged" section's
    least recently used entries are evicted first.

More details may be found in #8674.

In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590.
There is a patch that fixes it as well.

Closes #9708

* github.com:scylladb/scylla:
  loading_cache: account unprivileged section evictions
  loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy
  authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed
  loading_cache::timestamped::lru_entry: refactoring
  loading_cache.hh: rearrange the code (no functional change)
  loading_cache: use std::pmr::polymorphic_allocator
2021-12-01 13:13:53 +02:00
Calle Wilund
3e21fea2b6 test_streamts: test_streams_starting_sequence_number fix 'LastEvaluatedShardId' usage
It is not part of raw response, but of the 'StreamDescription' object.
Test fails internmittently depending on PK randomization.

Closes #9710
2021-12-01 11:05:40 +02:00