Commit Graph

5409 Commits

Author SHA1 Message Date
Petr Gusev
1ddc76ffd1 test_fencing: add test_fence_hints
The test makes a write through the first node with
the third node down, this causes a hint to be stored on the
first node for the second. We increment the version
and fence_version on the third node, restart it,
and expect to see a hint delivery failure
because of versions mismatch. Then we update the versions
of the first node and expect hint to be successfully
delivered.
2023-08-22 15:48:40 +04:00
Petr Gusev
c434d26b36 test.py: add skip_mode decorator and fixture
Syntactic sugar for marking tests to be
skipped in a particular mode.

There is skip_in_debug/skip_in_release in suite.yaml,
but they can be applied only on the entire file,
which is unnatural and inconvenient. Also, they
don't allow to specify a reason why the test is skipped.

Separate dictionary skipped_funcs is needed since
we can't use pytest fixtures in decorators.
2023-08-22 15:48:40 +04:00
Petr Gusev
a639d161e6 test.py: add mode fixture
Sometimes a test wants to know what mode
it is running in so that e.g. it can skip
itself in some of them.
2023-08-22 15:48:40 +04:00
Petr Gusev
0b7a90dff6 pylib: add ScyllaMetrics
This patch adds facilities to work
with Scylla metrics from test.py tests.
The new metrics property was added to
ManagerClient, its query method
sends a request to Scylla metrics
endpoint and returns and object
to conveniently access the result.

ScyllaMetrics is copy-pasted from
test_shedding.py. It's difficult
to reuse code between 'new' and 'old'
styles of tests, we can't just import
pylib in 'old' tests because of some
problems with python search directories.
A past commit of mine that attempted
to solve this problem was rejected on review.
2023-08-22 14:31:04 +04:00
Petr Gusev
360453fd87 fencing: add simple data plane test
The test starts a three node cluster
and manually decrements the version on
the last node. It then tries to write
some data through the last node and
expects to get 'stale topology' exception.
2023-08-22 14:31:01 +04:00
Petr Gusev
5361de76f9 random_tables.py: add counter column type
We'll need it for fencing test.
2023-08-11 17:37:09 +04:00
Kamil Braun
8f658fb139 Merge 's3/client: check for available port before starting minio server' from Kefu Chai
there is chance that the default port of 9000 has been used on the host
running the test, in that case, we should try to use another available
port.

so, in this change, we try ports in the ranges of [9000, 9000+1000), and
use the first one which is not connectable.

Fixes #14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14997

* github.com:scylladb/scylladb:
  test: stop using HostRegistry in MinioServer
  s3/client: check for available port before starting minio server
2023-08-10 14:01:13 +02:00
Alejo Sanchez
e2122163f5 test/pylib: protect double call to cluster stop
test.py schedules calls to cluster .uninstall() and .stop() making
double calls to it running at the same time. Mark the cluster as not
running early on.

While there, do the same for .stop_gracefully() for consistency.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #14987
2023-08-10 13:37:49 +02:00
Kefu Chai
0c0a59bf62 test: stop using HostRegistry in MinioServer
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.

so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 23:40:22 +08:00
Kamil Braun
59c410fb97 Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.

Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.

Fixes #13370

Closes #14979

* github.com:scylladb/scylladb:
  migration_manager: announce: remove the default value of description
  test: always pass empty description to migration_manager::announce
  migration_manager: announce: provide descriptions for all calls
2023-08-09 16:58:41 +02:00
Kefu Chai
29554b0fc6 s3/client: check for available port before starting minio server
there is chance that the default port of 9000 has been used on the
host running the test, in that case, we should try to use another
available port.

so, in this change, we try ports in the ranges of [9000, 9000+1000),
and use the first one which is not connectable.

Fixes #14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 17:33:42 +08:00
Botond Dénes
108e510a23 Merge 'Update sstable_requiring_cleanup on compaction completion' from Benny Halevy
Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304.

This series introduces `compaction_manager::on_compaction_completion` that intercepts the call
to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up.

`cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it.

Fixes #14304

Closes #14858

* github.com:scylladb/scylladb:
  compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
  compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
  sstable: add on_delete observer
  compaction_manager: add on_compaction_completion
  sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
2023-08-09 11:03:45 +03:00
Pavel Emelyanov
f1515c610e code: Remove query-context.hh
The whole thing is unused now, so the header is no longer needed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:11:07 +03:00
Pavel Emelyanov
413d81ac16 code: Remove qctx
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:10:56 +03:00
Benny Halevy
7a7c8d0d23 compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
Erase retired sstable from compaction_state::sstables_requiring_cleanup
also on_compaction_completion (in addition to
compacting_sstable_registration::release_compacting
for offstrategy compaction with piggybacked cleanup
or any other compaction type that doesn't use
compacting_sstable_registration.

Add cleanup_during_offstrategy_incremental_compaction_test
that is modeled after cleanup_incremental_compaction_test to check
that cleanup doesn't attempt to cleanup already-deleted
sstables that were left over by offstrategy compaction
in sstables_requiring_cleanup.

Fixes #14304

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:16:46 +03:00
Benny Halevy
ea64ae54f8 sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
Make sure that there are no sstables_requiring_cleanup
after cleanup compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:12:01 +03:00
Patryk Jędrzejczak
866c9a904d test: always pass empty description to migration_manager::announce
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
2023-08-07 14:38:11 +02:00
Avi Kivity
4f7e83a4d0 cql3: select_statement: reject DISTINCT with GROUP BY on clustering keys
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.

While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.

An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.

Fixes #12479

Closes #14970
2023-08-07 15:35:59 +03:00
Botond Dénes
fa4aec90e9 Merge 'test: tasks: Fix task_manager/wait_task test ' from Aleksandra Martyniuk
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.

Closes #14959

* github.com:scylladb/scylladb:
  test: rewrite wait_task test
  test: move ThreadWrapper to rest_util.py
2023-08-07 09:04:29 +03:00
Avi Kivity
6c1e44e237 Merge 'Make replica::database and cql3::query_processor share wasm manager' from Pavel Emelyanov
This makes it possible to remove remaining users of the global qctx.

The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available

So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code

Closes #14933

* github.com:scylladb/scylladb:
  schema_tables: Stop using qctx
  database: Add wasm::manager& dependency
  main, cql_test_env, wasm: Start wasm::manager earlier
  wasm: Shuffle context::context()
  wasm: Add manager::remove()
  wasm: Add manager::precompile()
  wasm: Move stop() out of query_processor
  wasm: Make wasm sharded<manager>
  query_processor: Wrap wasm stuff in a struct
2023-08-06 17:00:28 +03:00
Avi Kivity
412629a9a1 Merge 'Export tablet load-balancer metrics' from Tomasz Grabiec
The metrics are registered on-demand when load-balancer is invoked, so that only leader exports the metrics. When leader changes, the old leader will stop exporting.

The metrics are divided into two levels: per-dc and per-node. In prometheus, they will have appropriate labels for dc and host_id values.

Closes #14962

* github.com:scylladb/scylladb:
  tablet_allocator: unregister metrics when leadership is lost
  tablets: load_balancer: Export metrics
  service, raft: Move balance_tablets() to tablet_allocator
  tablet_allocator: Start even if tablets feature is not enabled
  main, storage_service: Pass tablet allocator to storage_service
2023-08-06 16:58:27 +03:00
Tomasz Grabiec
f26e65d4d4 tablets: Fix crash on table drop
Before the patch, tablet metadata update was processed on local schema merge
before table changes.

When table is dropped, this means that for a while table will exist
without a corresponding tablet map. This can cause memtable flush for
this table to fail, resulting in intentional abort(). That's because
sstable writing attempts to access tablet map to generate sharding
metadata.

If auto_snapshot is enabled, this is much more likely to happen,
because we flush memtables on table drop.

To fix the problem, process tablet metadata after dropping tables, but
before creating tables.

Fixes #14943

Closes #14954
2023-08-06 16:45:43 +03:00
Tomasz Grabiec
67c7aadded service, raft: Move balance_tablets() to tablet_allocator
The implementation will access metrics registered from tablet_allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
5bfc8b0445 main, storage_service: Pass tablet allocator to storage_service
Tablet balancing will be done through tablet_allocator later.
2023-08-05 03:10:26 +02:00
Pavel Emelyanov
fa93ac9bfd database: Add wasm::manager& dependency
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
f4e7ffa0fc main, cql_test_env, wasm: Start wasm::manager earlier
It will be needed by replica::database and should be available that
early. It doesn't depend on anything and can be moved in the starting
order safely

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
243f2217dd wasm: Make wasm sharded<manager>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe

This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Aleksandra Martyniuk
629f893355 test: rewrite wait_task test
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.
2023-08-04 13:34:58 +02:00
Aleksandra Martyniuk
9d2e55fd37 test: move ThreadWrapper to rest_util.py
Move ThreadWrapper to rest_util.py so it can be reused in different tests.
2023-08-04 13:29:03 +02:00
Botond Dénes
4d538e1363 Merge 'Task manager tasks covering compaction group compaction' from Aleksandra Martyniuk
All compaction task executors, except for regular compaction one,
become task manager compaction tasks.

Creating and starting of major_compaction_task_executor is modified
to be consistent with other compaction task executors.

Closes #14505

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to cover compaction group tasks
  compaction: turn custom_task_executor into compaction_task_impl
  compaction: turn sstables_task_executor into sstables_compaction_task_impl
  compaction: change sstables compaction tasks type
  compaction: move table_upgrade_sstables_compaction_task_impl
  compaction: pass task_info through sstables compaction
  compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
  compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
  comapction: use optional task info in major compaction
  compaction: use perform_compaction in compaction_manager::perform_major_compaction
2023-08-04 10:11:00 +03:00
Michał Jadwiszczak
b92d47362f schema::describe: print 'synchronous_updates' only if it was specified
While describing materialized view, print `synchronous_updates` option
only if the tag is present in schema's extensions map. Previously if the
key wasn't present, the default (false) value was printed.

Fixes: #14924

Closes #14928
2023-08-04 09:52:37 +03:00
Kefu Chai
d8d91379e7 test: remove unnecessary check in compaction_manager_basic_test
we wait for the same condition couple lines before, so no need to
check it again using `BOOST_CHECK_EQUAL()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14921
2023-08-04 09:26:22 +03:00
Kefu Chai
d4ee84ee1e s3/test: nuke tempdir but keep $tempdir/log
before this change, if the object_store test fails, the tempdir
will be preserved. and if our CI test pipeline is used to perform
the test, the test job would scan for the artifacts, and if the
test in question fails, it would take over 1 hour to scan the tempdir.

to alleviate the pain, let's just keep the scylla logging file
no matter the test fails or succeeds. so that jenkins can scan the
artifacts faster if the test fails.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14880
2023-08-03 11:07:59 +03:00
Konstantin Osipov
df97135583 test.py: forward the optional property file when creating a server
To support multi-DC tests we need to provide a property
file when creating a server.
Forward it from the test client to test.py.

Closes #14683
2023-08-02 13:45:19 +02:00
Kamil Braun
b835acf853 Merge 'Cluster features on raft: topology coordinator + check on boot' from Piotr Dulikowski
This PR implements the functionality of the raft-based cluster features
needed to safely manage and enable cluster features, according to the
cluster features on raft design doc.

Enabling features is a two phase process, performed by the topology
coordinator when it notices that there are no topology changes in
progress and there are some not-yet enabled features that are declared
to be supported by all nodes:

1. First, a global barrier is performed to make sure that all nodes saw
   and persisted the same state of the `system.topology` table as the
   coordinator and see the same supported features of all nodes. When
   booting, nodes are now forbidden to revoke support for a feature if all
   nodes declare support for it, a successful barrier this makes sure that
   no node will restart and disable the features.
2. After a successful barrier, the features are marked as enabled in the
   `system.topology` table.

The whole procedure is a group 0 operation and fails if the topology
table is modified in the meantime (e.g. some node changes its supported
features set).

For now, the implementation relies on gossip shadow round check to
protect from nodes without all features joining the cluster. In a
followup, a new joining procedure will be implemented which involves the
topology coordinator and lets it verify joining node's cluster features
before the new node is added to group 0 and to the cluster.

A set of tests for the new implementation is introduced, containing the
same tests as for the non-raft-based cluster feature implementation plus
one additional test, specific to this implementation.

Closes #14722

* github.com:scylladb/scylladb:
  test: topology_experimental_raft: cluster feature tests
  test: topology: fix a skipped test
  storage_service: add injection to prevent enabling features
  storage_service: initialize enabled features from first node
  topology_state_machine: add size(), is_empty()
  group0_state_machine: enable features when applying cmds/snapshots
  persistent_feature_enabler: attach to gossip only if not using raft
  feature_service: enable and check raft cluster features on startup
  storage_service: provide raft_topology_change_enabled flag from outside
  storage_service: enable features in topology coordinator
  storage_service: add barrier_after_feature_update
  topology_coordinator: exec_global_command: make it optional to retake the guard
  topology_state_machine: add calculate_not_yet_enabled_features
2023-08-02 12:32:27 +02:00
Kefu Chai
d28c06b65b test: remove unused #include in sstable_*_test.cc
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.

in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed, except for "test/lib/scylla_test_case.hh"
as it brings some command line options used by scylla tests.

see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14922
2023-08-02 11:58:03 +03:00
Benny Halevy
949ea43034 topology: unindex_node: erase dc from datacenters when empty
In branch 5.2 we erase `dc` from `_datacenters` if there are
no more endpoints listed in `_dc_endpoints[dc]`.

This was lost unintentionally in f3d5df5448
and this commit restores that behavior, and fixes test_remove_endpoint.

Fixes scylladb/scylladb#14896

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14897
2023-08-02 09:08:24 +03:00
Piotr Dulikowski
d40bb0bacb test: topology_experimental_raft: cluster feature tests
Although the implementation of cluster features on raft is not complete
yet, it makes sense to add some tests for the existing implementation.
The `test_raft_cluster_features.py` file includes the same set of tests
as the file with non-raft-based cluster feature tests, plus one
additional test which checks that a node will not allow disabling a
feature if it sees that other nodes support it (even though the feature
is not enabled yet).
2023-08-01 18:54:58 +02:00
Piotr Dulikowski
435005b6a5 test: topology: fix a skipped test
The `test_partial_upgrade_can_be_finished_with_removenode` test does not
work because the `cql` variable is used before it is declared. It was
not noticed because the test is marked as skipped, and does not work for
the non-raft cluster feature implementation. The variable declaration is
moved higher and the test now works; it will be used to test the raft
cluster feature implementation.
2023-08-01 18:54:58 +02:00
Piotr Dulikowski
61a44e0bc0 storage_service: provide raft_topology_change_enabled flag from outside
Information about whether we are using topology changes on raft or not
will be soon necessary for the persistent feature enabler, so that it
can do some additional checks based on the local raft topology state.
2023-08-01 18:54:57 +02:00
Kamil Braun
8bb3732d66 Merge 'storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal' from Patryk Jędrzejczak
We add the CDC generation optimality check in
`storage_service::raft_check_and_repair_cdc_streams` so that it doesn't
create new generations when unnecessary. Since
`generation_service::check_and_repair_cdc_streams` already has this
check, we extract it to the new `is_cdc_generation_optimal` function to
not duplicate the code.

After this change, multiple tasks could wait for a single generation
change. Calling `signal` on `topology_state_machine.event` would't wake
them all. Moreover, we must ensure the topology coordinator wakes when
his logic expects it. Therefore, we change all `signal` calls on
`topology_state_machine.event` to `broadcast`.

We delay the deletion of the `new_cdc_generation` request to the moment
when the topology transition reaches the `publish_cdc_generation` state.
We need this change to ensure the added CDC generation optimality check
in the next commit has an intended effect. If we didn't make it, it
would be possible that a task makes the `new_cdc_generation` request,
and then, after this request was removed but before committing the new
generation, another task also makes the `new_cdc_generation` request. In
such a scenario, two generations are created, but only one should. After
delaying the deletion of `new_cdc_generation` requests, the second
request would have no effect.

Additionally, we modify the `test_topology_ops.py` test in a way that
verifies the new changes. We call
`storage_service::raft_check_and_repair_cdc_streams` multiple times
concurrently and verify that exactly one generation has been created.

Fixes #14055

Closes #14789

* github.com:scylladb/scylladb:
  storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal
  storage_service: delay deletion of the new_cdc_generation request
  raft topology: broadcast on topology_state_machine.event instead of signal
  cdc: implement the is_cdc_generation_optimal function
2023-08-01 12:10:00 +02:00
Kamil Braun
84bb75ea0a Merge 'service: migration_manager: change the prepare_ methods to functions' from Patryk Jędrzejczak
The `migration_manager` service is responsible for schema convergence in
the cluster - pushing schema changes to other nodes and pulling schema
when a version mismatch is observed. However, there is also a part of
`migration_manager` that doesn't really belong there - creating
mutations for schema updates. These are the functions with `prepare_`
prefix. They don't modify any state and don't exchange any messages.
They only need to read the local database.

We take these functions out of `migration_manager` and make them
separate functions to reduce the dependency of other modules (especially
`query_processor` and CQL statements) on `migration_manager`. Since all
of these functions only need access to `storage_proxy` (or even only
`replica::database`), doing such a refactor is not complicated. We just
have to add one parameter, either `storage_proxy` or `database` and both
of them are easily accessible in the places where these functions are
called.

This refactor makes `migration_manager` unneeded in a few functions:
- `alternator::executor::create_keyspace`,
- `cql3::statements::alter_type_statement::prepare_announcement_mutations`,
- `cql3::statements::schema_altering_statement::prepare_schema_mutations`,
- `cql3::query_processor::execute_thrift_schema_command:`,
- `thrift::handler::execute_schema_command`.

We remove the `migration_manager&` parameter from all these functions.

Fixes #14339

Closes #14875

* github.com:scylladb/scylladb:
  cql3: query_processor::execute_thrift_schema_command: remove an unused parameter
  cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter
  cql3: alter_type_statement::prepare_announcement_mutations: change parameters
  alternator: executor::create_keyspace: remove an unused parameter
  service: migration_manager: change the prepare_ methods to functions
2023-08-01 11:56:56 +02:00
Avi Kivity
dac93b2096 Merge 'Concurrent tablet migration and balancing' from Tomasz Grabiec
This change makes tablet load balancing more efficient by performing
migrations independently for different tablets, and making new load
balancing plans concurrently with active migrations.

The migration track is interrupted by pending topology change operations.

The coordinator executes the load balancer on edges of tablet state
machine transitions. This allows new migrations to be started as soon
as tablets finish streaming.

The load balancer is also continuously invoked as long as it produces
a non-empty plan. This is in order to saturate the cluster with
streaming. A single make_plan() call is still not saturating, due
to the way algorithm is implemented.

Overload of shards is limited by the fact that load balancer algorithm tracks
streaming concurrency on both source and target shards of active
migrations and takes concurrency limit into account when producing new
migrations.

Closes #14851

* github.com:scylladb/scylladb:
  tablets: load_balancer: Remove double logging
  tests: tablets: Check that load balancing is interrupted by topology change
  tests: tablets: Add test for load balancing with active migrations
  tablets: Balance tablets concurrently with active migrations
  storage_service, tablets: Extract generate_migration_updates()
  storage_service, tablets: Move get_leaving_replica() to tablets.cc
  locator: tablets: Move std::hash definition earlier
  storage_service: Advance tablets independently
  topology_coordinator: Fix missed notification on abort
  tablets: Add formatter for tablet_migration_info
2023-07-31 16:44:33 +03:00
Botond Dénes
4a02865ea1 Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk
Maps related to column families in database are extracted
to a column_families_data class. Access to them is possible only
through methods. All methods which may preempt hold rwlock
in relevant mode, so that the iterators can't become invalid.

Fixes: #13290

Closes #13349

* github.com:scylladb/scylladb:
  replica: make tables_metadata's attributes private
  replica: add methods to get a filtered copy of tables map
  replica: add methods to check if given table exists
  replica: add methods to get table or table id
  replica: api: return table_id instead of const table_id&
  replica: iterate safely over tables related maps
  replica: pass tables_metadata to phased_barrier_top_10_counts
  replica: add methods to safely add and remove table
  replica: wrap column families related maps into tables_metadata
  replica: futurize database::add_column_family and database::remove
2023-07-31 15:31:59 +03:00
Botond Dénes
72043a6335 Merge 'Avoid using qctx in schema_tables' column-mapping queries' from Pavel Emelyanov
There are three methods in system_keyspace namespace that run queries over `system.scylla_table_schema_history` table. For that they use qctx which's not nice.

Fortunately, all the callers already have the system_keyspace& local variable or argument they can pass to those methods. Since the accessed table belongs to system keyspace, the latter declares the querying methods as "friends" to let them get private `query_processor& _qp` member

Closes #14876

* github.com:scylladb/scylladb:
  schema_tables: Extract query_processor from system_keyspace for querying
  schema_tables: Add system_keyspace& argument to ..._column_mapping() calls
  migration_manager: Add system_keyspace argument to get_schema_mapping()
2023-07-31 15:00:59 +03:00
Botond Dénes
781721218f Merge 'storage_service: refresh_sync_nodes: restrict to normal token owners' from Benny Halevy
It is possible that topology will contain nodes that are no longer normal token owners, so they don't need to be sync'ed with.

Fixes scylladb/scylladb#14793

Closes #14798

* github.com:scylladb/scylladb:
  storage_service: refresh_sync_nodes: restrict to reachable token owners
  storage_service: refresh_sync_nodes: fix log message
  locator: topology: node::state: make fine grained
2023-07-31 14:52:19 +03:00
Benny Halevy
d903d03bf8 locator: topology: node::state: make fine grained
Currently the node::state is coarse grained
so one cannot distinguish between e.g. a leaving
node due to decommission (where the node is used
for reading) vs. due to remove node (where the
node is not used for reading).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 10:33:48 +03:00
Kefu Chai
47e27dd2d2 test: wait until there is no pending tasks in compaction_manager_basic_test
before this change, after triggering the compaction,
compaction_manager_basic_test waits until the triggered compaction
completes. but since the regular compaction is run in a loop which
does not stop until either the daemon is stopping, or there is no
more sstables to be compacted, or the compaction is disabled.

but we only get the input sstables for compaction after swiching
to the "pending" state, and acquiring the read lock of the
compaction_state, and acquiring the read lock is implemented as
an coroutine, so there is chance that coroutine is suspended,
and the execution switches to the test. in this case, the test
will find that even after the triggered compaction completes,
there are still one or more pending compactions. hence the test
fails.

to address this problem, instead of just waiting for the compaction
to complete, we also wait until the number of pending compaction tasks
is 0. so that even if the test manages to sneak into the time window,
it won't proceed and starting check the compaction manager's stats.

Fixes #14865
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14889
2023-07-31 10:29:18 +03:00
Nadav Har'El
04e5082d52 alternator: limit expression length and recursion depth
DynamoDB limits of all expressions (ConditionExpression, UpdateExpression,
ProjectionExpression, FilterExpression, KeyConditionExpression) to just
4096 bytes. Until now, Alternator did not enforce this limit, and we had
an xfailing test showing this.

But it turns out that not enforcing this limit can be dangerous: The user
can pass arbitrarily-long and arbitrarily nested expressions, such as:

    a<b and (a<b and (a<b and (a<b and (a<b and (a<b and (...))))))

or
    (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((

and those can cause recursive algorithms in Alternator's parser and
later when applying expressions to recurse very deeply, overflow the
stack, and crash.

This patch includes new tests that demonstrate how Scylla crashes during
parsing before enforcing the 4096-byte length limit on expressions.
The patch then enforces this length limit, and these tests stop crashing.
We also verify that deeply-nested expressions shorter than the 4096-byte
limit are apparently short enough for our recursion ability, and work
as expected.

Unforuntately, running these tests many times showed that the 4096-byte
limit is not low enough to avoid all crashes so this patch needs to do
more:

The parsers created by ANTLR are recursive, and there is no way to limit
the depth of their recursion (i.e., nothing like YACC's YYMAXDEPTH).
Very deep recursion can overflow the stack and crash Scylla. After we
limited the length of expression strings to 4096 bytes this was *almost*
enough to prevent stack overflows. But unfortunetely the tests revealed
that even limited to 4096 bytes, the expression can sometimes recurse
too deeply: Consider the expression "((((((....((((" with 4000 parentheses.
To realize this is a syntax error, the parser needs to do a recursive
call 4000 times. Or worse - because of other Antlr limitations (see rants
in comments in expressions.g) it's actually 12000 recursive calls, and
each of these calls have a pretty large frame. In some cases, this
overflows the stack.

The solution used in this patch is not pretty, but works. We add to rules
in alternator/expressions.g that recurse (there are two of those - "value"
and "boolean_expression") an integer "depth" parameter, which we increase
when the rule recurses. Moreover, we add a so-called predicate
"{depth<MAX_DEPTH}?" that stops the parsing when this limit is reached.
When the parsing is stopped, the user will see a special kind of parse
error, saying "expression nested too deeply".

With this last modification to expressions.g, the tests for deeply-nested but
still-below-4096-bytes expressions
(test_limits.py::test_deeply_nested_expression_*) would not fail sporadically
as they did without it.

While adding the "expression nested too deeply" case, I also made the
general syntax-error reporting in Alternator nicer: It no longer prints
the internal "expression_syntax_error" type name (an exception type will
only be printed if some sort of unexpected exception happens), and it
prints the character position where the syntax error (or too deep
nested expression) was recognized.

Fixes #14473

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14477
2023-07-31 08:57:54 +03:00
Tomasz Grabiec
96d06b58df tests: tablets: Check that load balancing is interrupted by topology change
We add a special mode of load balancing, enabled through error
injection, which causes it to continuously generate plans. This
should keep the topology coordinator continuously in the tablet
migration track.

We enable this mode in test_tablets.py:test_bootstrap before
bootstrapping nodes to see that bootstrap request interrupts
tablet migration track. If this would not be the case, the
test will hang.
2023-07-31 01:45:23 +02:00