This is to replace full path sitting on this object eventually. For now
they have to co-exist, but state will be used to make_sstable()-s from
manager with its new API
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To add a sharded service to the cql_test_env one needs to patch it in 5 or 6 places
- add cql_test_env reference
- add cql_test_env constructor argument
- initialize the reference in initializer list
- add service variable to do_with method
- pass the variable to cql_test_env constructor
- (optionally) export it via cql_test_env public method
Steps 1 through 5 are annoying, things get much simpler if look like
- add cql_test_env variable
- (optionally) export it via cql_test_env public method
This is what this PR does
refs: #2795Closes#15028
* github.com:scylladb/scylladb:
cql_test_env: Drop local *this reference
cql_test_env: Drop local references
cql_test_env: Move most of the stuff in run_in_thread()
cql_test_env: Open-code env start/stop and remove both
cql_test_env: Keep other services as class variables
cql_test_env: Keep services as class variables
cql_test_env: Construct env early
cql_test_env: De-static fdpinger variable
cql_test_env: Define all services' variables early
cql_test_env: Keep group0_client pointer
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
The local auto& foo = env._foo references in run_in_thread() a no longer
needed, the code that uses foo can be switched to use _foo (this->_foo)
instead
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Thw do_with() method is static and cannot just access cql_test_env
variable's fields, using local references instead. To simplify this,
most of the method's content is moved to non-static run_in_thread()
method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are more services on do_with() stack that are not referenced from
the cql_test_env. Move them to be class variables too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now they are duplicated -- variables exist on do_with() stack and the
class references some of them. This patch makes is vice-versa -- all the
variables are on the cql_test_env and do_with() references them. The
latter will change soon
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nowadays they are all scattered along the .do_with() function. Keeping
them in one early place makes it possible to relocate them onto the
cql_test_env later
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now reference, but some time later it won't be able to get
initialized construction-time, so turn it into pointer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test uses qualified ks.cf name to find the schema, but it's the only
test case that does it. There's no point in maintaining a dedicated
helper on the cql_test_env just for that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This check is pointless. The subsequent call to find_column_family()
would call on_internal_error() in case schema is not found, and since
cql_test_env sets abort-on-internal-error to true, this would fail just
like that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Surprisingly there's a dedicated helper for the check opposite to the
one fixed in the previous patch. Fix one too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Same as in previous patch, the cql_test_env::require_table_exists()
helper is exactly the same, but returns future and asserts on failures
for no gain
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The cql_test_env::require_keyspace_exists() performs exactly the same
check, but is future-returning function for no reason and it assert()s
on failure, that's less informative (not that it ever failed) than
BOOST_REQUIRE
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's like in previous patch, and for the same reason, but the change is
a bit more complicated because it uses resolved futures' results in few
places, so it likely deserves separate commit
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Those two use straight .then-s sequences, no point in keeping them that
long. Being threads makes next patches shorter and nicer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
there is chance that the default port of 9000 has been used on the host
running the test, in that case, we should try to use another available
port.
so, in this change, we try ports in the ranges of [9000, 9000+1000), and
use the first one which is not connectable.
Fixes#14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14997
* github.com:scylladb/scylladb:
test: stop using HostRegistry in MinioServer
s3/client: check for available port before starting minio server
test.py schedules calls to cluster .uninstall() and .stop() making
double calls to it running at the same time. Mark the cluster as not
running early on.
While there, do the same for .stop_gracefully() for consistency.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#14987
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.
so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.
Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.
Fixes#13370Closes#14979
* github.com:scylladb/scylladb:
migration_manager: announce: remove the default value of description
test: always pass empty description to migration_manager::announce
migration_manager: announce: provide descriptions for all calls
there is chance that the default port of 9000 has been used on the
host running the test, in that case, we should try to use another
available port.
so, in this change, we try ports in the ranges of [9000, 9000+1000),
and use the first one which is not connectable.
Fixes#14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304.
This series introduces `compaction_manager::on_compaction_completion` that intercepts the call
to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up.
`cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it.
Fixes#14304Closes#14858
* github.com:scylladb/scylladb:
compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
sstable: add on_delete observer
compaction_manager: add on_compaction_completion
sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
Erase retired sstable from compaction_state::sstables_requiring_cleanup
also on_compaction_completion (in addition to
compacting_sstable_registration::release_compacting
for offstrategy compaction with piggybacked cleanup
or any other compaction type that doesn't use
compacting_sstable_registration.
Add cleanup_during_offstrategy_incremental_compaction_test
that is modeled after cleanup_incremental_compaction_test to check
that cleanup doesn't attempt to cleanup already-deleted
sstables that were left over by offstrategy compaction
in sstables_requiring_cleanup.
Fixes#14304
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.
While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.
An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.
Fixes#12479Closes#14970
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.
Closes#14959
* github.com:scylladb/scylladb:
test: rewrite wait_task test
test: move ThreadWrapper to rest_util.py
This makes it possible to remove remaining users of the global qctx.
The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available
So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code
Closes#14933
* github.com:scylladb/scylladb:
schema_tables: Stop using qctx
database: Add wasm::manager& dependency
main, cql_test_env, wasm: Start wasm::manager earlier
wasm: Shuffle context::context()
wasm: Add manager::remove()
wasm: Add manager::precompile()
wasm: Move stop() out of query_processor
wasm: Make wasm sharded<manager>
query_processor: Wrap wasm stuff in a struct
The metrics are registered on-demand when load-balancer is invoked, so that only leader exports the metrics. When leader changes, the old leader will stop exporting.
The metrics are divided into two levels: per-dc and per-node. In prometheus, they will have appropriate labels for dc and host_id values.
Closes#14962
* github.com:scylladb/scylladb:
tablet_allocator: unregister metrics when leadership is lost
tablets: load_balancer: Export metrics
service, raft: Move balance_tablets() to tablet_allocator
tablet_allocator: Start even if tablets feature is not enabled
main, storage_service: Pass tablet allocator to storage_service
Before the patch, tablet metadata update was processed on local schema merge
before table changes.
When table is dropped, this means that for a while table will exist
without a corresponding tablet map. This can cause memtable flush for
this table to fail, resulting in intentional abort(). That's because
sstable writing attempts to access tablet map to generate sharding
metadata.
If auto_snapshot is enabled, this is much more likely to happen,
because we flush memtables on table drop.
To fix the problem, process tablet metadata after dropping tables, but
before creating tables.
Fixes#14943Closes#14954
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It will be needed by replica::database and should be available that
early. It doesn't depend on anything and can be moved in the starting
order safely
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe
This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All compaction task executors, except for regular compaction one,
become task manager compaction tasks.
Creating and starting of major_compaction_task_executor is modified
to be consistent with other compaction task executors.
Closes#14505
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to cover compaction group tasks
compaction: turn custom_task_executor into compaction_task_impl
compaction: turn sstables_task_executor into sstables_compaction_task_impl
compaction: change sstables compaction tasks type
compaction: move table_upgrade_sstables_compaction_task_impl
compaction: pass task_info through sstables compaction
compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
comapction: use optional task info in major compaction
compaction: use perform_compaction in compaction_manager::perform_major_compaction
While describing materialized view, print `synchronous_updates` option
only if the tag is present in schema's extensions map. Previously if the
key wasn't present, the default (false) value was printed.
Fixes: #14924Closes#14928
we wait for the same condition couple lines before, so no need to
check it again using `BOOST_CHECK_EQUAL()`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14921
before this change, if the object_store test fails, the tempdir
will be preserved. and if our CI test pipeline is used to perform
the test, the test job would scan for the artifacts, and if the
test in question fails, it would take over 1 hour to scan the tempdir.
to alleviate the pain, let's just keep the scylla logging file
no matter the test fails or succeeds. so that jenkins can scan the
artifacts faster if the test fails.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14880
This PR implements the functionality of the raft-based cluster features
needed to safely manage and enable cluster features, according to the
cluster features on raft design doc.
Enabling features is a two phase process, performed by the topology
coordinator when it notices that there are no topology changes in
progress and there are some not-yet enabled features that are declared
to be supported by all nodes:
1. First, a global barrier is performed to make sure that all nodes saw
and persisted the same state of the `system.topology` table as the
coordinator and see the same supported features of all nodes. When
booting, nodes are now forbidden to revoke support for a feature if all
nodes declare support for it, a successful barrier this makes sure that
no node will restart and disable the features.
2. After a successful barrier, the features are marked as enabled in the
`system.topology` table.
The whole procedure is a group 0 operation and fails if the topology
table is modified in the meantime (e.g. some node changes its supported
features set).
For now, the implementation relies on gossip shadow round check to
protect from nodes without all features joining the cluster. In a
followup, a new joining procedure will be implemented which involves the
topology coordinator and lets it verify joining node's cluster features
before the new node is added to group 0 and to the cluster.
A set of tests for the new implementation is introduced, containing the
same tests as for the non-raft-based cluster feature implementation plus
one additional test, specific to this implementation.
Closes#14722
* github.com:scylladb/scylladb:
test: topology_experimental_raft: cluster feature tests
test: topology: fix a skipped test
storage_service: add injection to prevent enabling features
storage_service: initialize enabled features from first node
topology_state_machine: add size(), is_empty()
group0_state_machine: enable features when applying cmds/snapshots
persistent_feature_enabler: attach to gossip only if not using raft
feature_service: enable and check raft cluster features on startup
storage_service: provide raft_topology_change_enabled flag from outside
storage_service: enable features in topology coordinator
storage_service: add barrier_after_feature_update
topology_coordinator: exec_global_command: make it optional to retake the guard
topology_state_machine: add calculate_not_yet_enabled_features