Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Patch 55a8421e3d fixed an inefficiency when rebuilding
statistics with many compaction groups, but it incorrectly removed
the update for newly added SSTables. This patch restores it.
When a new SSTable is added to any of the groups, the stats are
incrementally updated (as before). On compaction completion,
statistics are still rebuilt by simply iterating through each
group, which keeps track of its own stats.
Unit tests are added to guarantee the stats are correct both after
compaction completion and memtable flush.
Fixes#12808.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12834
We currently have two method families to generate partition keys:
* make_keys() in test/lib/simple_schema.hh
* token_generation_for_shard() in test/lib/sstable_utils.hh
Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size.
This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation.
Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.
Closes#12657
* github.com:scylladb/scylladb:
test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends
test/lib/simple_schema: remove now unused make_keys() and friends
test: migrate to tests::generate_partition_key[s]()
test/lib/test_services: add table_for_tests::make_default_schema()
test/lib: add key_utils.hh
test/lib/random_schema.hh: value_generator: add min_size_in_bytes
Introduces task manager's compaction module. That's an initial
part of integration of compaction with task manager.
When fully integrated, task manager will allow user to track compaction
operations, check status and progress of each individual one. It will help
with creating an asynchronous version of rest api that forces any compaction.
Currently, users can see with /task_manager/list_modules api call that
compaction is one of the modules accessible through task manager.
They won't get any additional information though, since compaction
tasks are not created yet.
A shared_ptr to compaction module is kept in compaction manager.
Closes#12635
* github.com:scylladb/scylladb:
compaction: test: pass task_manager to compaction_manager in test environment
compaction: create and register task manager's module for compaction
tasks: add task_manager constructor without arguments
Makes sstable_set::all() interface robust, and introduces sstable_set::size() to avoid copies when retrieving set size.
Closes#12716
* github.com:scylladb/scylladb:
treewide: Use new sstable_set::size() wherever possible
sstables: Introduce sstable_set::size()
sstables: Fix fragility of sstable_set::all() interface
Each instance of compaction manager should have compaction module pointer
initialized. All contructors get task_manager reference with which
the module is created.
Preferred aternative to sstable_set->all()->size(), which may
involve of copy elements from a single set or multiple ones
if compound_sstable_set is used.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.
Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Tests using replica::table::add_sstable_and_update_cache() cannot
rely on the sstable being added to a single compaction group, if
the test was forced to run with multiple groups.
Additionally let's remove try_flush_memtable_to_sstable() which
is retricted to a single group, allowing the entire test to now
pass with multiple groups.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Use the newly introduced key generation facilities, instead of the the
old inflexible alternatives and hand-rolled code.
Most of the migrations are mechanic, but there are two tests that
were tricky to migrate:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same
size. This makes some sense in the case of the key count estimation
test, but makes no sense at all to me in the case of the sstable run
test.
change row purge condition for compacting_reader to remove all expired
rows to avoid read perfomance problems when there are many expired
tombstones in row cache
Refs #2252Closes#12565
The "compact" test case runs in sstables::test_env and additionally
wraps it with another instance provided by do_with_tmp_directory helper.
It's simpler to create the temp dir by hand and use outter env.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
... and use in some places of sstable_compaction_test. This will allow
getting rid of global test_db_config thing later
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This reverts commit ba6186a47f.
Said commit violates the widely held assumption that sstables
generations can be used as sstable identity. One known problem caused
this is potential OOO partition emitted when reading from sstables
(#11843). We now also have a better fix for #11789 (the bug this commit
was meant to fix): 4aa0b16852. So we can
revert without regressions.
Fixes: #11843Closes#11886
When moving a SSTable from staging to base dir, we reused the generation
under the assumption that no SSTable in base dir uses that same
generation. But that's not always true.
When reshaping staging dir, reshape compaction can pick a generation
taken by a SSTable in base dir. That's because staging dir is populated
first and it doesn't have awareness of generations in base dir yet.
When that happens, view building will fail to move SSTable in staging
which shares the same generation as another in base dir.
We could have played with order of population, populating base dir
first than staging dir, but the fragility wouldn't be gone. Not
future proof at all.
We can easily make this safe by picking a new generation for the SSTable
being moved from staging, making sure no clash will ever happen.
Fixes#11789.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#11790
What's contained in this series:
- Refactored compaction tests (and utilities) for integration with multiple groups
- The idea is to write a new class of tests that will stress multiple groups, whereas the existing ones will still stress a single group.
- Fixed a problem when cloning compound sstable set (cannot be triggered today so I didn't open a GH issue)
- Many changes in replica::table for allowing integration with multiple groups
Next:
- Introduce for_each_compaction_group() for iterating over groups wherever needed.
- Use for_each_compaction_group() in replica::table operations spanning all groups (API, readers, etc).
- Decouple backlog tracker from compaction strategy, to allow for backlog isolation across groups
- Introduce static option for defining number of compaction groups and implement function to map a token to its respective group.
- Testing infrastructure for multiple compaction groups (helpful when testing the dynamic behavior: i.e. merging / splitting).
Closes#11592
* github.com:scylladb/scylladb:
sstable_resharding_test: Switch to table_for_tests
replica: Move compacted_undeleted_sstables into compaction group
replica: Use correct compaction_group in try_flush_memtable_to_sstable()
replica: Make move_sstables_from_staging() robust and compaction group friendly
test: Rename column_family_for_tests to table_for_tests
sstable_compaction_test: Use column_family_for_tests::as_table_state() instead
test: Don't expose compound set in column_family_for_tests
test: Implement column_family_for_tests::table_state::is_auto_compaction_disabled_by_user()
sstable_compaction_test: Merge table_state_for_test into column_family_for_tests
sstable_compaction_test: use table_state_for_test itself in fully_expired_sstables()
sstable_compaction_test: Switch to table_state in compact_sstables()
sstable_compaction_test: Reduce boilerplate by switching to column_family_for_tests
Commit aba475fe1d accidentally fixed a race, which happens in
the following sequence of events:
1) storage service starts drain() via API for example
2) main's abort source is triggered, calling compaction_manager's do_stop()
via subscription.
2.1) do_stop() initiates the stop but doesn't wait for it.
2.2) compaction_manager's state is set to stopped, such that
compaction_manager::stop() called in defer_verbose_shutdown()
will wait for the stop and not start a new one.
3) drain() calls compaction_manager::drain() changing the state from
stopped to disabled.
4) main calls compaction_manager::stop() (as described in 2.2) and
incorrectly tries to stop the manager again, because the state was
changed in step 3.
aba475fe1d accidentally fixed this problem because drain() will no
longer take place if it detects the shutdown process was initiated
(it does so by ignoring drain request if abort source's subscription
was unlinked).
This shows us that looking at the state to determine if stop should be
performed is fragile, because once the state changes from A to B,
manager doesn't know the state was A. To make it robust, we can instead
check if the future that stores stop's promise is engaged, meaning that
the stop was already initiated and we don't have to start a new one.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#11711
That's important for multiple compaction groups. Once replica::table
supports multiple groups, there will be no table::as_table_state(),
so for testing table with a single group, we'll be relying on
column_family_for_tests::as_table_state().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This change will make table_state_for_test the table_state of
column_family_for_tests. Today, an unit test has to keep a reference
to them both and logically couple them, but that's error prone.
This change is also important when replica::table supports multiple
compaction groups, so unit tests won't have to directly reference
the table_state of table, but rather use the one managed by
column_family_for_tests.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The switch is important once we have multiple compaction groups,
as a single table may own several groups. There will no longer be
a replica::table::as_table_state().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Lots of boilerplate is reduced, and will also help to complete the
switch from replica::table to compaction::table_state in the unit
tests.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
get_sstable_set() as its name implies is not confined to the main
or maintenance set, nor to a specific compaction group, so let's
make it return the compound set which spans all groups, meaning
all sstables tracked by a table will be returned.
This is a regression introduced in 1e7a444. It affects the API
to return sstable list containing a partition key, as sstables
in maintenance would be missed, fooling users of the API like
tools that could trust the output.
Each compaction group is returning the main and maintenance set
in table_state's main_sstable_set() and maintenance_sstable_set(),
respectively.
Fixes#11681.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This commit is restricted to moving maintenance set into compaction_group.
Next, we'll introduce compound set into it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Preparatory change for main sstable set to be moved into compaction
group. After that, tests can no longer direct access the main
set.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
column_family_test::add_sstable will soon be changed to run in a thread,
and it's not needed in this procedure, so let's remove its usage.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
and override it in table::table_state to get the tombstone_gc_state
from the table's compaction_manager.
It is going to be used in the next patched to pass the gc state
from the compaction_strategy down to sstables and compaction.
table_state_for_test was modified to just keep a null
tombstone_gc_state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All users of `column_family_test_config()`, get the semaphore parameter
for it from `sstable_test_env`. It is clear that the latter serves as
the storage space for stable objects required by the table config. This
patch just enshrines this fact by moving the config factory method to
`sstable_test_env`, so it can just get what it needs from members.
Currently they are copied for the get_sstables function
so this change reduces copies.
Also, it will allow further decoupling of compaction_manager
from replica::database, by letting the caller of
perform_cleanup and perform_sstable_upgrade get the
owned token ranges from db and pass it to the perform_*
functions in the following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
over the rest api
Performing compaction scrub user did not know whether an operation
was aborted.
If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
The series unifies memtable flush error handling into table::seal_active_memtable
following up on f6d9d6175f.
The goal here is to prevent an infinite retry loop as in #10498
by aborting on any error that is not bad_alloc.
Fixes#10498Closes#10691
* github.com:scylladb/scylla:
test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once
test: memtable_test: failed_flush_prevents_writes: extend error injection
table: seal_active_memtable: abort if retried for too long
table: seal_active_memtable: abort on unexpected error
table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable
dirty_memory_manager: flush_when_needed: move error handling to flush_one/seal_active_memtable
dirty_memory_manager: flush_permit: add has_sstable_write_permit
dirty_memory_manager: flush_permit: release_sstable_write_permit: mark noexcept
dirty_memory_manager: flush_permit: make _sstable_write_permit optional
table: reindent seal_active_memtable
table: coroutinize seal_active_memtable
memtable_list: mark functions noexcept
commitlog: make discard_completed_segments and friends noexcept
dirty_memory_manager: flush_when_needed: target error handling at flush_one
database: delete unused seal_delayed_fn_type
dirty_memory_manager: mark functions noexcept
memtable: mark functions noexcept
memtable: memtable_encoding_stats_collector: mark functions noexcept
encoding_state: mark functions noexcept
logalloc: mark free functions noexcept
logalloc: allocating_section: mark functions noexcept
logalloc: allocating_section: guard: mark constructor noexcept
logalloc: reclaim_lock: mark functions noexcept
logalloc: tracker_reclaimer_lock: mark constructor noexcept
logalloc: mark shard_tracker noexcept
logalloc: region: mark functions const/noexcept
logalloc: basic_region_impl: mark functions noexcept
logalloc: region_impl: mark functions noexcept
utils: log_heap: mark functions noexcept
logalloc: region_impl: object_descriptor: mark functions noexcept
logalloc: region_group: mark functions noexcept
logalloc: tracker: mark functions const/noexcept
logalloc: tracker::impl: make region_occupancy and friends const
logalloc: tracker::impl: occupancy: get rid of reclaiming_lock
logalloc: tracker::impl: mark functions noexcept
logalloc: segment: mark functions const / noexcept
logalloc: segment_pool: add const variant of descriptor method
logalloc: segment_pool: move descriptor method to class definition
logalloc: segment_pool: mark functions const/noexcept
logalloc: segment_pool: delete unused free_or_restore_to_reserve method
utils: dynamic_bitset: mark functions noexcept
utils: dynamic_bitset: delete unused members
logalloc: segment_store, segment_pool: idx_from_segment: get a const segment* in const overload
logalloc: segment_store, segment_pool: return const segment* from segment_from_idx() const
logalloc: segment_store: make can_allocate_more_segments const
logalloc: segment_store: mark functions noexcept
logalloc: segment_descriptor: mark functions noexcept
logalloc: occupancy_stats: mark functions noexcept
min_max_tracker: mark functions noexcept
gc_clock, db_clock: mark functions noexcept
dirty_memory_manager: region_group: mark functions noexcept
dirty_memory_manager: region_group: make simple constructor noexcept
dirty_memory_manager: region_group_reclaimer mark functions noexcept
logalloc: lsa_buffer: mark functions noexcept
We know that sstable_run is supposed to contain disjoint files only,
but this assumption can temporarily break when switching strategies
as TWCS, for example, can incorrectly pick the same run id for
sstables in different windows during segregation. So when switching
from TWCS to ICS, it could happen a sstable_run won't contain disjoint
files. We should definitely fix TWCS and any other strategy doing
that, but sstable_run should have disjointness as actual invariant,
not be relaxed on it. Otherwise, we cannot build readers on this
assumption, so more complicated logic have to be added to merge
overlapping files.
After this patch, sstable_run will reject insertion of a file that
will cause the invariant to break, so caller will have to check
that and push that file into a different sstable run.
Closes#11116
And let seal_active_memtable decide about how to handle them
as now all flush error handling logic is implemented there.
In particular, unlike today, sstable write errors will
cause internal error rather than loop forever.
Also, check for shutdown earlier to ignore errors
like semaphore_broken that might happen when
the table is stopped.
Refs #10498
(The issue will be considered fixed when going
into maintenance mode on write errors rather than
throwing internal error and potentially retrying forever)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that submit() switched to table_state, compaction_reenabler
and friends can switch to table_state too.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
auto_compaction_disabled_by_user is a configuration that can be enabled
or disabled on a particular table. We're adding this interface to
avoid having to push the configuration for every compaction_state,
which would result in redundant information as the configuration
value is the same for all table states.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The idea is that we'll have a single on-completion interface for both
"in-strategy" and off-strategy compactions, so not to pollute table_state
with one interface for each.
replica::table::on_compaction_completion is being moved into private namespace.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
compaction_manager needs this interface when setting the sstable
creation lambda in compaction_descriptor, which is then forwarded
into the actual compaction procedure.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
in_strategy_sstables() doesn't have to be implemented in table, as it's
simply about main set with maintenance and staging files filtered out.
Also, let's make it switch to table_state as part of ongoing work.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>