The code assumes that if there is no transition_state there should be no
nodes that currently in transition in a state other then left_token_ring
state, but rebuild operation also creates such nodes, so add the check
for it as well.
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.
Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.
Fixes#13370Closes#14979
* github.com:scylladb/scylladb:
migration_manager: announce: remove the default value of description
test: always pass empty description to migration_manager::announce
migration_manager: announce: provide descriptions for all calls
Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304.
This series introduces `compaction_manager::on_compaction_completion` that intercepts the call
to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up.
`cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it.
Fixes#14304Closes#14858
* github.com:scylladb/scylladb:
compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
sstable: add on_delete observer
compaction_manager: add on_compaction_completion
sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
before this change, we use generator expression to initialize CMAKE_CXX_FLAGS_RELEASE, this has two problems:
1. the generator expression is not expanded when setting a regular variable.
2. the ending ">" is missing in one of the generator expression.
3. the parameters are not separated with ";"
so address them, let's just
* use `add_compile_options()` directly, as the corresponding `mode.${build_mode}.cmake` is only included when the "${build_mode}" is used.
* add comma in between the command line options.
* add the missing closing ">"
Closes#14996
* github.com:scylladb/scylladb:
build: cmake: pass --gc-sections to ld not ar
build: cmake: use add_compile_options() in release build
when it comes to `regular_compaction_task_executor`, we repeat the
compaction until the compaction can not proceed, so after an iteration
of compaction completes successfully, the task can still continue with
yet another round of the compaction as it sees appropriate. so let's
update the comment to reflect this fact.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14891
quite a few member variables serves as the configuration for
a given compaction, they are immutable in the life cycle of it,
so for better readability, let's mark them `const`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14981
The only place that still calls it is static force_blocking_flush method. If can be made non-static already. Also, while at it, coroutinize some system_keyspace methods and fix a FIXME regarding replica::database access in it.
Closes#14984
* github.com:scylladb/scylladb:
code: Remove query-context.hh
code: Remove qctx
system_keyspace: Use system_keyspace's container() to flush
system_keyspace: Make force_blocking_flush() non-static
system_keyspace: Coroutinize update_tokens()
system_keyspace: Coroutinize save_truncation_record()
ar is not able to tell which sections to be GC'ed, hence it does
not care about --gc-sections, but ld does. let's add this option
to CMAKE_EXE_LINKER_FLAGS.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we use generator expression to initialize
CMAKE_CXX_FLAGS_RELEASE, this has two problems:
1. the generator expression is not expanded when setting
a regular variable.
2. the ending ">" is missing in one of the generator
expression.
3. the parameters are not separated with ";"
so address them, let's just
* use `add_compile_options()` directly, as the corresponding
`mode.${build_mode}.cmake` is only included when the
"${build_mode}" is used.
* add comma in between the command line options.
* add the missing closing ">"
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
In force_blocking_flush() there's an invoke-on-all invocation of
replica::database::flush() and a FIXME to get the replica database from
somewhere else rather than via query-processor -> data_dictionary.
Since now the force_blocking_flush() is non-static the invoke-on-all can
happen via system_keyspace's container and the database can be obtained
directly from the sys.ks. local instance
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Erase retired sstable from compaction_state::sstables_requiring_cleanup
also on_compaction_completion (in addition to
compacting_sstable_registration::release_compacting
for offstrategy compaction with piggybacked cleanup
or any other compaction type that doesn't use
compacting_sstable_registration.
Add cleanup_during_offstrategy_incremental_compaction_test
that is modeled after cleanup_incremental_compaction_test to check
that cleanup doesn't attempt to cleanup already-deleted
sstables that were left over by offstrategy compaction
in sstables_requiring_cleanup.
Fixes#14304
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prevent div-by-zero byt returning const level 1
if max_sstable_size is zero, as configured by
cleanup_incremental_compaction_test, before it's
extended to cover also offstrategy compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Pass the call to the table on_compaction_completion
so we can manage the sstables requiring cleanup state
along the way.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We remove the default value for the description parameter of
migration_manager::announce to avoid using it in the future.
Thanks to this, all commands in system.group0_history will
have a non-null description.
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
The system.group0_history table provides useful descriptions
for each command committed to Raft group 0. One way of applying
a command to group 0 is by calling migration_manager::announce.
This function has the description parameter set to empty string
by default. Some calls to announce use this default value which
causes null values in system.group0_history. We want
system.group0_history to have an actual description for every
command, so we change all default descriptions to reasonable ones.
We can't provide a reasonable description to announce in
query_processor::execute_thrift_schema_command because this
function is called in multiple situations. To solve this issue,
we add the description parameter to this function and to
handler::execute_schema_command that calls it.
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.
While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.
An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.
Fixes#12479Closes#14970
Knowing that a server gained or lost leadership in group 0 is
sometimes useful for the purpose of debugging, so we log
information about these events on the INFO level.
Gaining and losing leadership are relatively rare events, so
this change shouldn't flood the logs.
Closes#14877
For `removenode`, we make a removed node a non-voter early. There is no
downside to it because the node is already dead. Moreover, it improves
availability in some situations.
For `decommission`, if we decommission a node when the number of nodes
is even, we make it a non-voter early to improve availability. All
majorities containing this node will remain majorities when we make this
node a non-voter and remove it from the set because the required size of
a majority decreases.
We don't change `decommission` when the number of nodes is odd since
this may reduce availability.
Fixes#13959Closes#14911
* github.com:scylladb/scylladb:
raft: make a decommissioning node a non-voter early
raft: topology_coordinator: implement step_down_as_nonvoter
raft: make a removed node a non-voter early
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.
Closes#14959
* github.com:scylladb/scylladb:
test: rewrite wait_task test
test: move ThreadWrapper to rest_util.py
When deleting multiple sstables with the same prefix
the deletion atomicity is ensured by the pending_delete_log file,
so if scylla crashes in the middle, deletions will be replyed on
restart.
Therefore, we don't have to ensure atomicity of each individual
`unlink`. We just need to sync the directory once, before
removing the pending_delete_log file.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14967
This makes it possible to remove remaining users of the global qctx.
The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available
So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code
Closes#14933
* github.com:scylladb/scylladb:
schema_tables: Stop using qctx
database: Add wasm::manager& dependency
main, cql_test_env, wasm: Start wasm::manager earlier
wasm: Shuffle context::context()
wasm: Add manager::remove()
wasm: Add manager::precompile()
wasm: Move stop() out of query_processor
wasm: Make wasm sharded<manager>
query_processor: Wrap wasm stuff in a struct
The metrics are registered on-demand when load-balancer is invoked, so that only leader exports the metrics. When leader changes, the old leader will stop exporting.
The metrics are divided into two levels: per-dc and per-node. In prometheus, they will have appropriate labels for dc and host_id values.
Closes#14962
* github.com:scylladb/scylladb:
tablet_allocator: unregister metrics when leadership is lost
tablets: load_balancer: Export metrics
service, raft: Move balance_tablets() to tablet_allocator
tablet_allocator: Start even if tablets feature is not enabled
main, storage_service: Pass tablet allocator to storage_service
Before the patch, tablet metadata update was processed on local schema merge
before table changes.
When table is dropped, this means that for a while table will exist
without a corresponding tablet map. This can cause memtable flush for
this table to fail, resulting in intentional abort(). That's because
sstable writing attempts to access tablet map to generate sharding
metadata.
If auto_snapshot is enabled, this is much more likely to happen,
because we flush memtables on table drop.
To fix the problem, process tablet metadata after dropping tables, but
before creating tables.
Fixes#14943Closes#14954
There are two places in there that need qctx to get query_processor from
to, in turn, get wasm::manager from. Fortunately, both places have the
database reference at hand and can get the wasm::manager from it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It will be needed by replica::database and should be available that
early. It doesn't depend on anything and can be moved in the starting
order safely
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a constructor that builds context out of const manager reference.
The existing one needs to get engine and instance cache and does it via
query_processor. This change lets removing those exports and finally --
drop the wasm::manager -> cql3::query_processor friendship
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is one of the users of query_processor's export of wasm::manager's
instance cache. Remove it in advance
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When the q.p. stops it also "stops" the wasm manager. Move this call
into main. The cql test env doesn't need this change, it stops the whole
sharded service which stops instances on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe
This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are three wasm-only fields on q.p. -- engine, cache and runner.
This patch groups them on a single wasm_context structure to make it
earier to manipulate them in the next patches
The 'friend' declaration it temporary, will go away soon
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently, feature service uses `system_keyspace::load_topology_state`
to load information about features from the `system.topology` table.
This function implicitly assumes that it is called after schema
commitlog replay and will correspond to the state of the topology state
machine after some command is applied.
However, feature check happens before the commitlog replay. If some
group 0 command consists of multiple mutations that are not applied
atomically, the `load_topology_state` function may fail to construct a
`service::topology` object based on the table state. Moreover, this
function not only checks `system.topology` but also
`system.cdc_generations_v3` - in the case of the issue, the entry that
was loaded from the this table didn't contain the `num_ranges`
parameter.
In order to fix this, the feature check code now uses
`load_topology_features_state` which only loads enabled and supported
features from `system.topology`. Only this information is really
necessary for the feature check, and it doesn't have any invariants to
check.
Fixes: #14944Closes#14955
* github.com:scylladb/scylladb:
feature_service: don't load whole topology state to check features
system_keyspace: separate loading topology_features from topology
topology_state_machine: extract features-related fields to a struct
untyped_result_set: add missing_column_exception
Hold a (newly added) group0_state_machine gate
that is closed and waited on in group0_state_machine::abort()
To prevent use-after-free when destroying the group0_state_machine
while transfer_snapshot runs.
Fixes#14907
Also, use an abort_source in group0_state_machine
to abort an ongoing transfer_snapshot operation
on group0_state_machine::abort()
Closes#14952
* github.com:scylladb/scylladb:
raft: group0_state_machine: transfer_snapshot: make abortable
raft: group0_state_machine: transfer_snapshot: hold gate