Commit Graph

434 Commits

Author SHA1 Message Date
Piotr Dulikowski
6ab41d76e6 replica/table: adjust the view read-before-write to return static rows when needed
Adjusts the read-before-write query issued in
`table::do_push_view_replica_updates` so that, when needed, requests
static columns and makes sure that the static row is present.
2022-12-06 11:21:16 +01:00
Pavel Emelyanov
abd3602b10 sstable_directory: Remove sstable creation callback
It's no longer used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
db657a8d1c sstable_directory: Keep error handler generator
Yet another continuation to previous patch -- IO error handlers
generator is also needed to create sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4281f4af42 sstable_directory: Keep schema_ptr
Continuation of one-before-previous patch. In order to create sstable
without external lambda the directory code needs schema.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
8df1bcb907 sstable_directory: Use directory semaphore from manager
After previous patch sstables_directory code may no longer require for
semaphore argument, because it can get one from manager. This makes the
directory API shorter and simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4da941e159 sstable_directory: Keep reference on manager
The sstables_directly accesses /var/lib/scylla/data in two ways -- lists
files in it and opens sstables. The latter is abdtracted with the help
of lambdas passed around, but the former (listing) is done by using
directory liters from utils.

Listing sstables components with directlry lister won't work for object
storage, the directory code will need to call some abstraction layer
instead. Opening sstables with the help of a lambda is a bit of
overkill, having sstables manager at hand could make it much simpler.

Said that, this patch makes sstables_directly reference sstables_manager
on start.

This change will also simplify directory semaphore usage (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
5e13ce2619 sstables_manager: Keep directory semaphore reference
Preparational patch. The semaphore will be used by sstables_directory in
next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:18 +03:00
Pavel Emelyanov
be8512d7cc sstables, code: Wrap directory semaphore with concurrency
Currently this is a sharded<semaphore> started/stopped in main and
referenced by database in order to be fed into sstables code. This
semaphore always comes with the "concurrency" parameter that limits the
parallel_for_each parallelizm.

This patch wraps both together into directory_semaphore class. This
makes its usage simpler and will allow extending it in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 11:59:30 +03:00
Avi Kivity
02b66bb31a Merge 'Mark sstable::<directory accessing methods> private' from Pavel Emelyanov
One of the prerequisites to make sstables reside on object-storage is not to let the rest of the code "know" the filesystem path they are located on (because sometimes they will not be on any filesystem path). This patch makes the methods that can reveal this path back private so that later they can be abstracted out.

Closes #12182

* github.com:scylladb/scylladb:
  sstable: Mark some methods private
  test: Don't get sstable dir when known
  test: Use move_to_quarantine() helper
  test: Use sstable::filename() overload without dir name
  sstables: Reimplement batch directory sync after move
  table, tests: Make use of move_to_new_dir() default arg
  sstables: Remove fsync_directory() helper
  table: Simplify take_snapshot()'s collecting sstables names
2022-12-04 17:45:37 +02:00
Pavel Emelyanov
a702affd4d sstables: Reimplement batch directory sync after move
There's a table::move_sstables_from_staging() method that gets a bunch
of sstables and moves them from staging subdit into table's root
datadir. Not to flush the root dir for every sstable move, it asks the
sstable::move_to_new_dir() not to flush, but collects staging dir names
and flushes them and the root dir at the end altothether.

In order to make it more friendly to object-storage and to remove one
more caller of sstable::get_dir() the delayed_commit_changes struct is
introduced. It collects _all_ the affected dir names in unordered_set,
then allows flushing them. By default the move_to_new_dir() doesn't
receive this object and flushes the directories instantly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:08:47 +03:00
Pavel Emelyanov
1b42d5fce3 table, tests: Make use of move_to_new_dir() default arg
The method in question accepts boolean bit whether or not it should sync
directories at the end. It's always true but in one case, so there's the
default value for it. Make use of it.

Anticipating the suggestion to replace bool with bool_class -- next
patch will replace it with something else.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:07:16 +03:00
Pavel Emelyanov
80f5d7393f table: Simplify take_snapshot()'s collecting sstables names
The method in question "snapshots" all sstables it can find, then writes
their Datafile names into the manifest file. To get the list of file
names it iterates over sstables list again and does silly conversion of
full file path to file name with the help of the directory path length.

This all can be made much simpler if just collecting component names
directly at the time sstable is hardlinked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:02:37 +03:00
Pavel Emelyanov
71179ff5ab distributed_loader: Use coroutine::lambda in sleeping coroutine
According to seastar/doc/lambda-coroutine-fiasco.md lambda that
co_awaits once loses its capture frame. In distrobuted_loader
code there's at least one of that kind.

fixes: #12175

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12170
2022-12-02 13:06:33 +02:00
Avi Kivity
f73a51250c database: abort on illegal per partition rate limit operation
Without memory corruption it's not possible for the switch to
fall through, and the compiler will error if we forget to add
a case. The compiler however is obliged to consider that we might
store some other value in the variable.
2022-11-28 21:58:30 +02:00
Avi Kivity
77be69b600 database: don't compare signed and unsigned types
gcc warns it can lead to undefined behavior, though 2G entries
in a list of mutations are unlikely. Use the correct type for iteration.
2022-11-28 21:58:30 +02:00
Avi Kivity
7c66fdcad1 Merge 'Simplify sstable_directory configuration' from Pavel Emelyanov
When started the sstable_directory is constructed with a bunch of booleans that control the way its process_sstable_dir method works. It's shorter and simpler to pass these booleans into method directly, all the more so there's another flag that's already passed like this.

Closes #12005

* github.com:scylladb/scylladb:
  sstable_directory: Move all RAII booleans onto flags
  sstable_directory: Convert sort-sstables argument to flags struct
  sstable_directory: Drop default filter
2022-11-23 16:16:04 +02:00
Pavel Emelyanov
22133a3949 sstable_directory: Move all RAII booleans onto flags
There's a bunch of booleans that control the behavior of sstable
directory scanning. Currently they are described as verbose
bool_class<>-es and are put into sstable_directory construction time.

However, these are not used outside of .process_sstable_dir() method and
moving them onto recently added flags struct makes the code much
shorter (29 insertions(+), 121 deletions(-))

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:30:00 +03:00
Pavel Emelyanov
7ca5e143d7 sstable_directory: Convert sort-sstables argument to flags struct
The sstable_directory::process_sstable_dir() accepts a boolean to
control its behavior when collecting sstables. Turn this boolean into a
structure of flags. The intention is to extend this flags set in the
future (next patch).

This boolean is true all the time, but one place sets it to true in a
"verbose" manner, like this:

        bool sort_sstables_according_to_owner = false;
        process_sstable_dir(directory, sort_sstables_according_to_owner).get();

the local variable is not used anymore. Using designated initializers
solves the verbosity in a nicer manner.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:19:23 +03:00
Pavel Emelyanov
7c7017d726 sstable_directory: Drop default filter
It's used as default argument for .reshape() method, but callers specify
it explicitly. At the same time the filter is simple enough and is only
used in one place so that the caller can just use explicit lambda.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:19:23 +03:00
Pavel Emelyanov
2f9b7931af sstables: Delete log file in replay_pending_delete_log()
It's natural that the replayer cleans up after itself

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:16:22 +03:00
Pavel Emelyanov
bdc47b7717 sstables: Move deletion log manipulations to sstable_directory.cc
The deletion log concept uses the fact that files are on a POSIX
filesystem. Support for another storage type will have to reimplement
this place, so keep the FS-specific code in _directory.cc file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:16:21 +03:00
Pavel Emelyanov
a61c96a627 sstables: Use fs::path in replay_pending_delete_log()
It's called by a code that has fs::path at hand and internally uses
helpers that need fs::path too, so no need to convert it back and forth.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
bc62ca46d4 lister: Make lister::dir_entry_types an enum_set
This type is currently an unordered_set, but only consists of at most
two elements. Making it an enum_set renders it into a size_t variable
and better describes the intention.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:01:45 +03:00
Pavel Emelyanov
c6021b57a1 database: Avoid useless local variable
It's used to run lister::scan_dir() with directory_entry_type::directory
only, but for that is copied around on lambda captures. It's simpler
just to use the value directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:00:49 +03:00
Raphael S. Carvalho
b88acffd66 replica: Allow one compaction_backlog_tracker for each compaction_group
Today, compaction_backlog_tracker is managed in each compaction_strategy
implementation. So every compaction strategy is managing its own
tracker and providing a reference to it through get_backlog_tracker().

But this prevents each group from having its own tracker, because
there's only a single compaction_strategy instance per table.
To remove this limitation, compaction_strategy impl will no longer
manage trackers but will instead provide an interface for trackers
to be created, such that each compaction group will be allowed to
have its own tracker, which will be managed by compaction manager.

On compaction strategy change, table will update each group with
the new tracker, which is created using the previously introduced
ompaction_group_sstable_set_updater.

Now table's backlog will be the sum of all compaction_group backlogs.
The normalization factor is applied on the sum, so we don't have
to adjust each individual backlog to any factor.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:22:51 -03:00
Raphael S. Carvalho
90991bda69 replica: Refactor table::set_compaction_strategy for multiple groups
Refactoring the function for it to accomodate multiple compaction
groups.

To still provide strong exception guarantees, preparation and
execution of changes will be separated.

Once multiple groups are supported, each group will be prepared
first, and the noexcept execution will be done as a last step.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
244efddb22 Fix exception safety when transferring ongoing charges to new backlog tracker
When setting a new strategy, the charges of old tracker is transferred
to the new one.

The problem is that we're not reverting changes if exception is
triggered before the new strategy is successfully set.

To fix this exception safety issue, let's copy the charges instead
of moving them. If exception is triggered, the old tracker is still
the one used and remain intact.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
d1e2dbc592 replica: move_sstables_from_staging: Use tracker from group owning the SSTable
When moving SSTables from staging directory, we'll conditionally add
them to backlog tracker. As each group has its own tracker, a given
sstable will be added to the tracker of the group that owns it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
9031dc3199 replica: Move table::backlog_tracker_adjust_charges() to compaction_group
Procedures that call this function happen to be in compaction_group,
so let's move it to group. Simplifies the change where the procedure
retrieves tracker from the group itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
116459b69e replica: table::discard_sstables: Use compaction_group's backlog tracker
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
b2d8545b15 replica: Disable backlog tracker in compaction_group::stop()
As we're moving backlog tracker to compaction group, we need to
stop the tracker there too. We're moving it a step earlier in
table::stop(), before sstables are cleared, but that's okay
because it's still done after the group was deregistered
from compaction manager, meaning no compactions are running.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
91b0d772e2 replica: database_sstable_write_monitor: use compaction_group's backlog tracker
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
f37a05b559 replica: Move table::do_add_sstable() to compaction_group
All callers of do_add_sstable() live in compaction_group, so it
should be moved into compaction_group too. It also makes easier
for the function to retrieve the backlog tracker from the group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
1ec0ef18a5 compaction/table_state: Introduce get_backlog_tracker()
This interface will be helpful for allowing replica::table, unit
tests and sstables::compaction to access the compaction group's tracker
which will be managed by the compaction manager, once we complete
the decoupling work.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Botond Dénes
725e5b119d Revert "replica: Pick new generation for SSTables being moved from staging dir"
This reverts commit ba6186a47f.

Said commit violates the widely held assumption that sstables
generations can be used as sstable identity. One known problem caused
this is potential OOO partition emitted when reading from sstables
(#11843). We now also have a better fix for #11789 (the bug this commit
was meant to fix): 4aa0b16852. So we can
revert without regressions.

Fixes: #11843

Closes #11886
2022-11-09 16:35:31 +02:00
Raphael S. Carvalho
a57724e711 Make off-strategy compaction wait for view building completion
Prior to off-strategy compaction, streaming / repair would place
staging files into main sstable set, and wait for view building
completion before they could be selected for regular compaction.

The reason for that is that view building relies on table providing
a mutation source without data in staging files. Had regular compaction
mixed staging data with non-staging one, table would have a hard time
providing the required mutation source.

After off-strategy compaction, staging files can be compacted
in parallel to view building. If off-strategy completes first, it
will place the output into the main sstable set. So a parallel view
building (on sstables used for off-strategy) may potentially get a
mutation source containing staging data from the off-strategy output.
That will mislead view builder as it won't be able to detect
changes to data in main directory.

To fix it, we'll do what we did before. Filter out staging files
from compaction, and trigger the operation only after we're done
with view building. We're piggybacking on off-strategy timer for
still allowing the off-strategy to only run at the end of the
node operation, to reduce the amount of compaction rounds on
the data introduced by repair / streaming.

Fixes #11882.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11919
2022-11-08 08:53:58 +02:00
Benny Halevy
eb3a94e2bc table: perform_cleanup_compaction: flush memtable
We don't explicitly cleanup the memtable, while
it might hold tokens disowned by the current node.

Flush the memtable before performing cleanup compaction
to make sure all tokens in the memtable are cleaned up.

Note that non-owned ranges are invalidate in the cache
in compaction_group::update_main_sstable_list_on_compaction_completion
using desc.ranges_for_cache_invalidation.

Fixes #1239

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:40 +02:00
Benny Halevy
fc278be6c4 table: add perform_cleanup_compaction
Move the integration with compaction_manager
from the api layer to the tabel class so
it can also make sure the memtable is cleaned up in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:33 +02:00
Botond Dénes
4aa0b16852 Merge 'distributed_loader: detect highest generation before populating column families' from Benny Halevy
We should scan all sstables in the table directory and its
subdirectories to determine the highest sstable version and generation
before using it for creating new sstables (via reshard or reshape).

Otherwise, the generations of new sstables created when populating staging (via reshard or reshape) may collide with generations in the base directory, leading to https://github.com/scylladb/scylladb/issues/11789

Refs scylladb/scylladb#11789
Fixes scylladb/scylladb#11793

Closes #11795

* github.com:scylladb/scylladb:
  distributed_loader: populate_column_family: reindent
  distributed_loader: coroutinize populate_column_family
  distributed_loader: table_population_metadata: start: reindent
  distributed_loader: table_population_metadata: coroutinize start_subdir
  distributed_loader: table_population_metadata: start_subdir: reindent
  distributed_loader: pre-load all sstables metadata for table before populating it
2022-10-21 14:07:51 +03:00
Avi Kivity
6b0afb968d Merge 'reader_concurrency_semaphore: add set_resources()' from Botond Dénes
Allowing to change the total or initial resources the semaphore has. After calling `set_resources()` the semaphore will look like as if it was created with the specified amount of resources when created.

Use the new method in `replica::database::revert_initial_system_read_concurrency_boost()` so it doesn't lead to strange semaphore diagnostics output. Currently the system semaphore has 90/100 count units when there are no reads against it, which has led to some confusion.

I also plan on using the new facility in enterprise.

Closes #11772

* github.com:scylladb/scylladb:
  replica/database: revert initial boost to system semaphore with set_resources()
  reader_concurrency_semaphore: add set_resources()
2022-10-19 18:04:20 +03:00
Raphael S. Carvalho
ba6186a47f replica: Pick new generation for SSTables being moved from staging dir
When moving a SSTable from staging to base dir, we reused the generation
under the assumption that no SSTable in base dir uses that same
generation. But that's not always true.

When reshaping staging dir, reshape compaction can pick a generation
taken by a SSTable in base dir. That's because staging dir is populated
first and it doesn't have awareness of generations in base dir yet.

When that happens, view building will fail to move SSTable in staging
which shares the same generation as another in base dir.

We could have played with order of population, populating base dir
first than staging dir, but the fragility wouldn't be gone. Not
future proof at all.
We can easily make this safe by picking a new generation for the SSTable
being moved from staging, making sure no clash will ever happen.

Fixes #11789.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11790
2022-10-19 15:33:30 +03:00
Benny Halevy
4d7f0be929 distributed_loader: populate_column_family: reindent 2022-10-19 14:18:38 +03:00
Benny Halevy
030afaa934 distributed_loader: coroutinize populate_column_family
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 14:18:04 +03:00
Benny Halevy
0f23ee14c9 distributed_loader: table_population_metadata: start: reindent 2022-10-19 14:16:59 +03:00
Benny Halevy
39cec4f304 distributed_loader: table_population_metadata: coroutinize start_subdir
Calling it in a seastar thread was done to reduce code churn
and facilitate backporting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 14:16:59 +03:00
Benny Halevy
5749a54cab distributed_loader: table_population_metadata: start_subdir: reindent 2022-10-19 14:16:59 +03:00
Benny Halevy
119c0f3983 distributed_loader: pre-load all sstables metadata for table before populating it
We should scan all sstables in the table directory and its
subdirectories to determine the highest sstable version and generation
before using it for creating new sstables (via reshard or reshape).

Fixes scylladb/scylladb#11793

Note: table_population_metadata::start_subdir is called
in a seastar thread to facilitate backporting to old versions
that do not support coroutines yet.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 14:16:57 +03:00
Botond Dénes
d85208a574 replica/database: revert initial boost to system semaphore with set_resources()
Unlike the current method (which uses consume()), this will also adjust the
initial resources, adjusting the semaphore as if it was created with the
reduced amount of resources in the first place. This fixes the confusing
90/100 count resources seen in diagnostics dump outputs.
2022-10-17 07:39:20 +03:00
Pavel Emelyanov
3e0b61d707 compaction_manager: Relax history paths
There's a virtual method on table_state to update the entry in system
keyspace. It's an overkill to facilitate tests that don't want this.
With new system_keyspace weak referencing it can be made simpled by
moving the updating call to the compaction_manager itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
f9b57df471 database: Plug/unplug system_keyspace
There's a circular dependency between system_keyspace and database. The
former needs the latter because it needs to execula local requests via
query_processor. The latter needs the former via compaction manager and
large data handler, database depends on both and these too need to
insert their entries into system keyspace.

To cut this loop the compaction manager and large data handler both get
a weak reference on the system keysace. Once system keyspace starts is
activcates this reference via the database call. When system keyspace is
shutdown-ed on stop, it deactivates the reference.

Technically the weak reference is implemented by marking the system_k.s.
object as async_sharded_service, and the "reference" in question is the
shared_from_this() pointer. When compaction manager or large data
handler need to update a system keyspace's table, they both hold an
extra reference on the system keyspace until the entry is committed,
thus making sure that sys._k.s. doesn't stop from under their feet. At
the same time, unplugging the reference on shutdown makes sure that no
new entries update will appear and the system_k.s. will eventually be
released.

It's not a C++ classical reference, because system_keyspace starts after
and stops before database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00