Test that we load quarantined sstables by
creating a dataset, moving a sstable to the quarantine dir,
and then reload the table and verify the dataset.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
sstables in the quarantine subdirectory are part of the table.
They're just not eligible for non-scrub compaction.
Call populate_column_family also for the quarantine subdirectory,
allowing it to not exist.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Check if the directory to be loaded exists.
Currently must_exist=true in all cases,
but it may be set to false when loading directories
that may not exist.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Quarantined sstables will reside in a "quarantine" subdirectory
and are also not eligible for regular compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.
is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define the "staging", "upload", and "snapshots" subdirectory
names as named const expressions in the sstables namespace
rather than relying on their string representation,
that could lead to typo mistakes.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It was introduced by commit 5206a97915 because fully expired sstable
wouldn't be registed and therefore could be never removed from backlog
tracker. This is no longer possible as table is now responsible for
removing all input sstables. So let's kill on_skipped_expired_sstable()
as it's now only boilerplate we don't need.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
... and count_if()' from Avi Kivity
The expression code provides some utilities to examine and manipulate
expressions at prepare time. These are not (or should not be) in the fast
path and so should be optimized for compile time and code footprint
rather than run time.
This series does so by detemplating and deinlining find_in_expression()
and count_if().
Closes#9712
* github.com:scylladb/scylla:
cql3: expr: adjust indentation in recurse_until()
cql3: expr: detemplate count_if()
cql3: expr: detemplate count_if()
cql3: expr: rewrite count_if() in terms of recurse_until()
cql3: expr: deinline recurse_until()
cql3: expr: detemplate find_in_expression
Scylla doesn't support unset values inside UDT.
The old code used to convert `unset` to `null`, which seems incorrect.
There is an extra space in the error message to retain compatability with Cassandra.
Fixes: #9671Closes#9724
* github.com:scylladb/scylla:
cql-pytest: Enable test for UDT with unset values
cql3: Don't allow unset values inside UDT
The test testUDTWithUnsetValues was marked as xfail,
but now the issue has been fixed and we can enable it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Scylla doesn't support unset values inside UDT.
The old code used to convert unset to null, which seems incorrect.
There is an extra space in the error message to retain compatability with Cassandra.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When reshaping TWCS table in relaxed mode, which is the case for
offstrategy and boot, disjoint tolerance is too strict, which can
lead those processes to do more work than needed.
Let's increase the tolerance to max threshold, which will limit the
amount of sstables opened in compaction to a reasonable amount.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130132538.56285-1-raphaelsc@scylladb.com>
unfortunately, correctness of std_unique_ptr and similar depends on
their implementation in libstdc++. let's support unique ptr on
newer systems while maintaining backward compatibility.
./test.py --mode=release scylla-gdb now passes to me, also verified
`scylla compaction-tasks` produces correct info.
Fixes#9677.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211202173534.359672-1-raphaelsc@scylladb.com>
"
couple of preparatory changes for coroutinization of manager
"
* 'some_compaction_manager_cleanups_v5' of github.com:raphaelsc/scylla:
compaction_manager: move check_for_cleanup into perform_cleanup()
compaction_manager: replace get_total_size by one liner
compaction_manager: make consistent usage of type and name table
compaction_manager: simplify rewrite_sstables()
compaction_manager: restore indentation
Currently storage service acts as a glue between database schema value
and the migration manager "passive_announce" call. This interposing is
not required, migration manager can do all the management itself, and
the linkage can be done in main.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Because today's migration_manager::stop is called drain-time.
Keep the .stop for next patch, but since it's called when the
whole migration_manager stops, guard it against re-entrances.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both calls are now private. Also the non-maybe one can become void
and handle pull exceptions by itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Move the calls from respective storage service notification callbacks.
One non-move change is that token metadata that was available on the
storage service should be taken from storage proxy, but this change
is aligned with future changes -- migration manager depends on proxy
and will get a local proxy reference some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to start schema pulls upon on_join, on_alive and on_change ones
in the next patch. Migration manager already has gossiper reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add the summary index and the bound's address to the error message, so
it can be correlated with other trace level logging when investigating a
problem.
Refs: #9446
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211202124955.542293-2-bdenes@scylladb.com>
In the very recent commit 3c0e703 fixing issue #8757, we changed the
default prometheus_address setting in scylla.yaml to "localhost", to
match the default listen_address in the same file. We explained in that
commit how this helped developers who use an unchanged scylla.yaml, and
how it didn't hurt pre-existing users who already had their own scylla.yaml.
However, it was quickly noted by Tzach and Amnon that there is one use case
that was hurt by that fix:
Our existing documentation, such as the installation guide
https://www.scylladb.com/download/?platform=centos ask the user to take
our initial scylla.yaml, and modify listen_address, rpc_address, seeds,
and cluster_name - and that's it. That document - and others - don't
tell the user to also override prometheus_address, so users will likely
forget to do so - and monitoring will not work for them.
So this patch includes a different solution to #8757.
What it does is:
1. The setting of prometheus_address in scylla.yaml is commented out.
2. In config.cc, prometheus_address defaults to empty.
3. In main.cc, if prometheus_address is empty (i.e., was not explicitly
set by the user), the value of listen_address is used instead.
In other words, the idea is that prometheus_address, if not explicitly set
by the user, should default to listen_address - which is the address used
to listen to the internal Scylla inter-node protocol.
Because the documentation already tells the user to set listen_address
and to not leave it set to localhost, setting it will also open up
prometheus, thereby solving #9701. Meanwhile, developers who leave the
default listen_address=localhost will also get prometheus_address=localhost,
so the original #8757 is solved as well. Finally, for users who had an old
scylla.yaml where prometheus_address was explicitly set to something,
this setting will continue to be used. This was also a requirement of
issue #8757.
Fixes#9701.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211129155201.1000893-1-nyh@scylladb.com>
As part of changing the codebase to flat_mutation_reader_v2,
change size_estimates_virtual_reader.
Since the bulk of the work is done by
make_flat_mutation_reader_from_mutations() (which is unchanged),
only glue code is affected. It is also not performance sensitive,
so the extra conversions are unimportant.
Test: unit (dev)
Closes#9707
As part of changing the codebase to flat_mutation_reader_v2,
change chained_delegating_reader and its user virtual_table.
Since the reader does not process fragments (only forwarding
things around), only glue code is affected. It is also not
performance sensitive, so the extra conversions are unimportant.
Test: unit (dev)
Closes#9706
new code in manager adopted name and type table, whereas historical
code still uses name and type column family. let's make it consistent
for newcomers to not get confused.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
as rewrite_sstables() switched to coroutine, it can be simplified
by not using smart pointers to handle lifetime issues.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Key column values fetched during the TTL scan have a well-defined
order - primary columns come first. This assumption is now used
to simplify getting the values from rows during scans without
having to consult result metadata first.
Tests: unit(release)
Message-Id: <dcb19b8bab0dd02838693fe06d5a835ea2f378ff.1638357005.git.sarna@scylladb.com>
This commit addresses a very simple FIXME left in alternator TTL
implementation - it reduces the number of parameters passed
to scan_table_ranges() by enclosing the parameters in a separate
object.
Tests: unit(release)
Message-Id: <214afcd9d5c1968182ad98550105f82add216c80.1638354094.git.sarna@scylladb.com>
Let's enable offstrategy for repair based rebuild, for it to take
advantage of offstrategy benefits, one of the most important
being compaction not acting aggressively, which is important
for both reducing operation time and delivering good latency
while the operation is running.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130115957.13779-1-raphaelsc@scylladb.com>
The test suite names seen by Jenkins are suboptimal: there is
no distinction between modes, and the ".cc" suffix of file names
is interpreted as a class name, which is converted to a tree node
that must be clicked to expand. Massage the names to remove
unnecessary information and add the mode.
Closes#9696
We read the compressed file size from a file that was already closed,
resulting in EBADF on my machine. Not sure why it works for everyone
else.
Fix by reading the size using the path.
Closes#9675
After commit 1f5b17f, overlapping can be introduced in level 1 because
procedure that filters out sstables from partial runs is considering
inactive tasks, so L1 sstables can be incorrectly filtered out from
next compaction attempt. When L0 is merged into L1, overlapping is
then introduced in L1 because old L1 sstables weren't considered in
L0 -> L1 compaction.
From now on, compaction_manager::get_candidates() will only consider
active tasks, to make sure actual partial runs are filtered out.
Fixes#9693.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>
To satisfy backlog controller, commit 28382cb25c changed LCS to
incrementally push sstables to highest level *when* there's nothing else
to be done.
That's overkill because controller will be satisfied with level L being
fanout times larger than L-1. No need to push everything to last level as
it's even worse than a major, because any file being promoted will
overlap with ~10 files in next level. At least, the cost is amortized by
multiple iterations, but terrible write amplification is still there.
Consequently, this reduces overall efficiency.
For example, it might happen that LCS in table A start pushing everything
to highest level, when table B needs resources for compaction to reduce its
backlog. Increased write amplification in A may prevent other tables
from reducing their backlog in a timely manner.
It's clear that LCS should stop promoting as soon as level L is 10x
larger than L-1, so strategy will still be satisfied while fixing the
inefficiency problem.
Now layout will look like as follow:
SSTables in each level: [0, 2, 15, 121]
Previously, it looked like once table stopped being written to:
SSTables in each level: [0, 0, 0, 138]
It's always good to have everything in a single run, but that comes
with a high write amplification cost which we cannot afford in steady
state. With this change, the layout will still be good enough to make
everybody happy.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129143606.71257-1-raphaelsc@scylladb.com>
RPC module starts to dispatching calls to a server the moment it is in the
servers' list, but until raft::server::start() completes the instance is
not fully created yet and is not ready to accept anything. Fix the code
that initialize new raft group to insert new raft instance into the list
only after it is started.
Message-Id: <YZTFFW9v0NlV7spR@scylladb.com>
This series introduces a new version of a loading_cache class.
The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache.
The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows:
* We define 2 dynamic cache sections which total size should not exceed the maximum cache size.
* New cache entry is always added to the "unprivileged" section.
* After a cache entry is read more than SectionHitThreshold times it moves to the second cache section.
* Both sections' entries obey expiration and reload rules in the same way as before this patch.
* When cache entries need to be evicted due to a size restriction "unprivileged" section's
least recently used entries are evicted first.
More details may be found in #8674.
In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590.
There is a patch that fixes it as well.
Closes#9708
* github.com:scylladb/scylla:
loading_cache: account unprivileged section evictions
loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy
authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed
loading_cache::timestamped::lru_entry: refactoring
loading_cache.hh: rearrange the code (no functional change)
loading_cache: use std::pmr::polymorphic_allocator