scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Petr Gusev	1ddc76ffd1	test_fencing: add test_fence_hints The test makes a write through the first node with the third node down, this causes a hint to be stored on the first node for the second. We increment the version and fence_version on the third node, restart it, and expect to see a hint delivery failure because of versions mismatch. Then we update the versions of the first node and expect hint to be successfully delivered.	2023-08-22 15:48:40 +04:00
Petr Gusev	c434d26b36	test.py: add skip_mode decorator and fixture Syntactic sugar for marking tests to be skipped in a particular mode. There is skip_in_debug/skip_in_release in suite.yaml, but they can be applied only on the entire file, which is unnatural and inconvenient. Also, they don't allow to specify a reason why the test is skipped. Separate dictionary skipped_funcs is needed since we can't use pytest fixtures in decorators.	2023-08-22 15:48:40 +04:00
Petr Gusev	a639d161e6	test.py: add mode fixture Sometimes a test wants to know what mode it is running in so that e.g. it can skip itself in some of them.	2023-08-22 15:48:40 +04:00
Petr Gusev	0b7a90dff6	pylib: add ScyllaMetrics This patch adds facilities to work with Scylla metrics from test.py tests. The new metrics property was added to ManagerClient, its query method sends a request to Scylla metrics endpoint and returns and object to conveniently access the result. ScyllaMetrics is copy-pasted from test_shedding.py. It's difficult to reuse code between 'new' and 'old' styles of tests, we can't just import pylib in 'old' tests because of some problems with python search directories. A past commit of mine that attempted to solve this problem was rejected on review.	2023-08-22 14:31:04 +04:00
Petr Gusev	360453fd87	fencing: add simple data plane test The test starts a three node cluster and manually decrements the version on the last node. It then tries to write some data through the last node and expects to get 'stale topology' exception.	2023-08-22 14:31:01 +04:00
Petr Gusev	5361de76f9	random_tables.py: add counter column type We'll need it for fencing test.	2023-08-11 17:37:09 +04:00
Kamil Braun	8f658fb139	Merge 's3/client: check for available port before starting minio server' from Kefu Chai there is chance that the default port of 9000 has been used on the host running the test, in that case, we should try to use another available port. so, in this change, we try ports in the ranges of [9000, 9000+1000), and use the first one which is not connectable. Fixes #14985 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14997 * github.com:scylladb/scylladb: test: stop using HostRegistry in MinioServer s3/client: check for available port before starting minio server	2023-08-10 14:01:13 +02:00
Alejo Sanchez	e2122163f5	test/pylib: protect double call to cluster stop test.py schedules calls to cluster .uninstall() and .stop() making double calls to it running at the same time. Mark the cluster as not running early on. While there, do the same for .stop_gracefully() for consistency. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14987	2023-08-10 13:37:49 +02:00
Kefu Chai	0c0a59bf62	test: stop using HostRegistry in MinioServer since MinioServer find a free port by itself, there is no need to provide it an IP address for it anymore -- we can always use 127.0.0.1. so, in this change, we just drop the HostRegistry parameter passed to the constructor of MinioServer, and pass the host address in place of it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 23:40:22 +08:00
Kamil Braun	59c410fb97	Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak The `system.group0_history` table provides useful descriptions for each command committed to Raft group 0. One way of applying a command to group 0 is by calling `migration_manager::announce`. This function has the `description` parameter set to empty string by default. Some calls to `announce` use this default value which causes `null` values in `system.group0_history`. We want `system.group0_history` to have an actual description for every command, so we change all default descriptions to reasonable ones. Going further, We remove the default value for the `description` parameter of `migration_manager::announce` to avoid using it in the future. Thanks to this, all commands in `system.group0_history` will have a non-null description. Fixes #13370 Closes #14979 * github.com:scylladb/scylladb: migration_manager: announce: remove the default value of description test: always pass empty description to migration_manager::announce migration_manager: announce: provide descriptions for all calls	2023-08-09 16:58:41 +02:00
Kefu Chai	29554b0fc6	s3/client: check for available port before starting minio server there is chance that the default port of 9000 has been used on the host running the test, in that case, we should try to use another available port. so, in this change, we try ports in the ranges of [9000, 9000+1000), and use the first one which is not connectable. Fixes #14985 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 17:33:42 +08:00
Botond Dénes	108e510a23	Merge 'Update sstable_requiring_cleanup on compaction completion' from Benny Halevy Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304. This series introduces `compaction_manager::on_compaction_completion` that intercepts the call to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up. `cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it. Fixes #14304 Closes #14858 * github.com:scylladb/scylladb: compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0 sstable: add on_delete observer compaction_manager: add on_compaction_completion sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty	2023-08-09 11:03:45 +03:00
Pavel Emelyanov	f1515c610e	code: Remove query-context.hh The whole thing is unused now, so the header is no longer needed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:11:07 +03:00
Pavel Emelyanov	413d81ac16	code: Remove qctx Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:10:56 +03:00
Benny Halevy	7a7c8d0d23	compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup Erase retired sstable from compaction_state::sstables_requiring_cleanup also on_compaction_completion (in addition to compacting_sstable_registration::release_compacting for offstrategy compaction with piggybacked cleanup or any other compaction type that doesn't use compacting_sstable_registration. Add cleanup_during_offstrategy_incremental_compaction_test that is modeled after cleanup_incremental_compaction_test to check that cleanup doesn't attempt to cleanup already-deleted sstables that were left over by offstrategy compaction in sstables_requiring_cleanup. Fixes #14304 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-08 08:16:46 +03:00
Benny Halevy	ea64ae54f8	sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty Make sure that there are no sstables_requiring_cleanup after cleanup compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-08 08:12:01 +03:00
Patryk Jędrzejczak	866c9a904d	test: always pass empty description to migration_manager::announce In the next commit, we remove the default value for the description parameter of migration_manager::announce to avoid using it in the future. However, many calls to announce in tests use the default value. We have to change it, but we don't really care about descriptions in the tests, so we pass the empty string everywhere.	2023-08-07 14:38:11 +02:00
Avi Kivity	4f7e83a4d0	cql3: select_statement: reject DISTINCT with GROUP BY on clustering keys While in SQL DISTINCT applies to the result set, in CQL it applies to the table being selected, and doesn't allow GROUP BY with clustering keys. So reject the combination like Cassandra does. While this is not an important issue to fix, it blocks un-xfailing other issues, so I'm clearing it ahead of fixing those issues. An issue is unmarked as xfail, and other xfails lose this issue as a blocker. Fixes #12479 Closes #14970	2023-08-07 15:35:59 +03:00
Botond Dénes	fa4aec90e9	Merge 'test: tasks: Fix task_manager/wait_task test ' from Aleksandra Martyniuk Rewrite test that checks whether task_manager/wait_task works properly. The old version didn't work. Delete functions used in old version. Closes #14959 * github.com:scylladb/scylladb: test: rewrite wait_task test test: move ThreadWrapper to rest_util.py	2023-08-07 09:04:29 +03:00
Avi Kivity	6c1e44e237	Merge 'Make replica::database and cql3::query_processor share wasm manager' from Pavel Emelyanov This makes it possible to remove remaining users of the global qctx. The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code Closes #14933 * github.com:scylladb/scylladb: schema_tables: Stop using qctx database: Add wasm::manager& dependency main, cql_test_env, wasm: Start wasm::manager earlier wasm: Shuffle context::context() wasm: Add manager::remove() wasm: Add manager::precompile() wasm: Move stop() out of query_processor wasm: Make wasm sharded<manager> query_processor: Wrap wasm stuff in a struct	2023-08-06 17:00:28 +03:00
Avi Kivity	412629a9a1	Merge 'Export tablet load-balancer metrics' from Tomasz Grabiec The metrics are registered on-demand when load-balancer is invoked, so that only leader exports the metrics. When leader changes, the old leader will stop exporting. The metrics are divided into two levels: per-dc and per-node. In prometheus, they will have appropriate labels for dc and host_id values. Closes #14962 * github.com:scylladb/scylladb: tablet_allocator: unregister metrics when leadership is lost tablets: load_balancer: Export metrics service, raft: Move balance_tablets() to tablet_allocator tablet_allocator: Start even if tablets feature is not enabled main, storage_service: Pass tablet allocator to storage_service	2023-08-06 16:58:27 +03:00
Tomasz Grabiec	f26e65d4d4	tablets: Fix crash on table drop Before the patch, tablet metadata update was processed on local schema merge before table changes. When table is dropped, this means that for a while table will exist without a corresponding tablet map. This can cause memtable flush for this table to fail, resulting in intentional abort(). That's because sstable writing attempts to access tablet map to generate sharding metadata. If auto_snapshot is enabled, this is much more likely to happen, because we flush memtables on table drop. To fix the problem, process tablet metadata after dropping tables, but before creating tables. Fixes #14943 Closes #14954	2023-08-06 16:45:43 +03:00
Tomasz Grabiec	67c7aadded	service, raft: Move balance_tablets() to tablet_allocator The implementation will access metrics registered from tablet_allocator.	2023-08-05 21:48:08 +02:00
Tomasz Grabiec	5bfc8b0445	main, storage_service: Pass tablet allocator to storage_service Tablet balancing will be done through tablet_allocator later.	2023-08-05 03:10:26 +02:00
Pavel Emelyanov	fa93ac9bfd	database: Add wasm::manager& dependency The dependency is needed by db::schema_tables to get wasm manager for its needs. This patch prepares the ground. Now the wasm::manager is shared between replica::database and cql3::query_processor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Pavel Emelyanov	f4e7ffa0fc	main, cql_test_env, wasm: Start wasm::manager earlier It will be needed by replica::database and should be available that early. It doesn't depend on anything and can be moved in the starting order safely Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Pavel Emelyanov	243f2217dd	wasm: Make wasm sharded<manager> The wasm::manager is just cql3::wasm_context renamed. It now sits in lang/wasm* and is started as a sharded service in main (and cql test env). This move also needs some headers shuffling, but it's not severe This change is required to make it possible for the wasm::manager to be shared (by reference) between q.p. and replica::database further Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Aleksandra Martyniuk	629f893355	test: rewrite wait_task test Rewrite test that checks whether task_manager/wait_task works properly. The old version didn't work. Delete functions used in old version.	2023-08-04 13:34:58 +02:00
Aleksandra Martyniuk	9d2e55fd37	test: move ThreadWrapper to rest_util.py Move ThreadWrapper to rest_util.py so it can be reused in different tests.	2023-08-04 13:29:03 +02:00
Botond Dénes	4d538e1363	Merge 'Task manager tasks covering compaction group compaction' from Aleksandra Martyniuk All compaction task executors, except for regular compaction one, become task manager compaction tasks. Creating and starting of major_compaction_task_executor is modified to be consistent with other compaction task executors. Closes #14505 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to cover compaction group tasks compaction: turn custom_task_executor into compaction_task_impl compaction: turn sstables_task_executor into sstables_compaction_task_impl compaction: change sstables compaction tasks type compaction: move table_upgrade_sstables_compaction_task_impl compaction: pass task_info through sstables compaction compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl comapction: use optional task info in major compaction compaction: use perform_compaction in compaction_manager::perform_major_compaction	2023-08-04 10:11:00 +03:00
Michał Jadwiszczak	b92d47362f	schema::describe: print 'synchronous_updates' only if it was specified While describing materialized view, print `synchronous_updates` option only if the tag is present in schema's extensions map. Previously if the key wasn't present, the default (false) value was printed. Fixes: #14924 Closes #14928	2023-08-04 09:52:37 +03:00
Kefu Chai	d8d91379e7	test: remove unnecessary check in compaction_manager_basic_test we wait for the same condition couple lines before, so no need to check it again using `BOOST_CHECK_EQUAL()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14921	2023-08-04 09:26:22 +03:00
Kefu Chai	d4ee84ee1e	s3/test: nuke tempdir but keep $tempdir/log before this change, if the object_store test fails, the tempdir will be preserved. and if our CI test pipeline is used to perform the test, the test job would scan for the artifacts, and if the test in question fails, it would take over 1 hour to scan the tempdir. to alleviate the pain, let's just keep the scylla logging file no matter the test fails or succeeds. so that jenkins can scan the artifacts faster if the test fails. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14880	2023-08-03 11:07:59 +03:00
Konstantin Osipov	df97135583	test.py: forward the optional property file when creating a server To support multi-DC tests we need to provide a property file when creating a server. Forward it from the test client to test.py. Closes #14683	2023-08-02 13:45:19 +02:00
Kamil Braun	b835acf853	Merge 'Cluster features on raft: topology coordinator + check on boot' from Piotr Dulikowski This PR implements the functionality of the raft-based cluster features needed to safely manage and enable cluster features, according to the cluster features on raft design doc. Enabling features is a two phase process, performed by the topology coordinator when it notices that there are no topology changes in progress and there are some not-yet enabled features that are declared to be supported by all nodes: 1. First, a global barrier is performed to make sure that all nodes saw and persisted the same state of the `system.topology` table as the coordinator and see the same supported features of all nodes. When booting, nodes are now forbidden to revoke support for a feature if all nodes declare support for it, a successful barrier this makes sure that no node will restart and disable the features. 2. After a successful barrier, the features are marked as enabled in the `system.topology` table. The whole procedure is a group 0 operation and fails if the topology table is modified in the meantime (e.g. some node changes its supported features set). For now, the implementation relies on gossip shadow round check to protect from nodes without all features joining the cluster. In a followup, a new joining procedure will be implemented which involves the topology coordinator and lets it verify joining node's cluster features before the new node is added to group 0 and to the cluster. A set of tests for the new implementation is introduced, containing the same tests as for the non-raft-based cluster feature implementation plus one additional test, specific to this implementation. Closes #14722 * github.com:scylladb/scylladb: test: topology_experimental_raft: cluster feature tests test: topology: fix a skipped test storage_service: add injection to prevent enabling features storage_service: initialize enabled features from first node topology_state_machine: add size(), is_empty() group0_state_machine: enable features when applying cmds/snapshots persistent_feature_enabler: attach to gossip only if not using raft feature_service: enable and check raft cluster features on startup storage_service: provide raft_topology_change_enabled flag from outside storage_service: enable features in topology coordinator storage_service: add barrier_after_feature_update topology_coordinator: exec_global_command: make it optional to retake the guard topology_state_machine: add calculate_not_yet_enabled_features	2023-08-02 12:32:27 +02:00
Kefu Chai	d28c06b65b	test: remove unused #include in sstable_*_test.cc for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed, except for "test/lib/scylla_test_case.hh" as it brings some command line options used by scylla tests. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14922	2023-08-02 11:58:03 +03:00
Benny Halevy	949ea43034	topology: unindex_node: erase dc from datacenters when empty In branch 5.2 we erase `dc` from `_datacenters` if there are no more endpoints listed in `_dc_endpoints[dc]`. This was lost unintentionally in `f3d5df5448` and this commit restores that behavior, and fixes test_remove_endpoint. Fixes scylladb/scylladb#14896 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14897	2023-08-02 09:08:24 +03:00
Piotr Dulikowski	d40bb0bacb	test: topology_experimental_raft: cluster feature tests Although the implementation of cluster features on raft is not complete yet, it makes sense to add some tests for the existing implementation. The `test_raft_cluster_features.py` file includes the same set of tests as the file with non-raft-based cluster feature tests, plus one additional test which checks that a node will not allow disabling a feature if it sees that other nodes support it (even though the feature is not enabled yet).	2023-08-01 18:54:58 +02:00
Piotr Dulikowski	435005b6a5	test: topology: fix a skipped test The `test_partial_upgrade_can_be_finished_with_removenode` test does not work because the `cql` variable is used before it is declared. It was not noticed because the test is marked as skipped, and does not work for the non-raft cluster feature implementation. The variable declaration is moved higher and the test now works; it will be used to test the raft cluster feature implementation.	2023-08-01 18:54:58 +02:00
Piotr Dulikowski	61a44e0bc0	storage_service: provide raft_topology_change_enabled flag from outside Information about whether we are using topology changes on raft or not will be soon necessary for the persistent feature enabler, so that it can do some additional checks based on the local raft topology state.	2023-08-01 18:54:57 +02:00
Kamil Braun	8bb3732d66	Merge 'storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal' from Patryk Jędrzejczak We add the CDC generation optimality check in `storage_service::raft_check_and_repair_cdc_streams` so that it doesn't create new generations when unnecessary. Since `generation_service::check_and_repair_cdc_streams` already has this check, we extract it to the new `is_cdc_generation_optimal` function to not duplicate the code. After this change, multiple tasks could wait for a single generation change. Calling `signal` on `topology_state_machine.event` would't wake them all. Moreover, we must ensure the topology coordinator wakes when his logic expects it. Therefore, we change all `signal` calls on `topology_state_machine.event` to `broadcast`. We delay the deletion of the `new_cdc_generation` request to the moment when the topology transition reaches the `publish_cdc_generation` state. We need this change to ensure the added CDC generation optimality check in the next commit has an intended effect. If we didn't make it, it would be possible that a task makes the `new_cdc_generation` request, and then, after this request was removed but before committing the new generation, another task also makes the `new_cdc_generation` request. In such a scenario, two generations are created, but only one should. After delaying the deletion of `new_cdc_generation` requests, the second request would have no effect. Additionally, we modify the `test_topology_ops.py` test in a way that verifies the new changes. We call `storage_service::raft_check_and_repair_cdc_streams` multiple times concurrently and verify that exactly one generation has been created. Fixes #14055 Closes #14789 * github.com:scylladb/scylladb: storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal storage_service: delay deletion of the new_cdc_generation request raft topology: broadcast on topology_state_machine.event instead of signal cdc: implement the is_cdc_generation_optimal function	2023-08-01 12:10:00 +02:00
Kamil Braun	84bb75ea0a	Merge 'service: migration_manager: change the prepare_ methods to functions' from Patryk Jędrzejczak The `migration_manager` service is responsible for schema convergence in the cluster - pushing schema changes to other nodes and pulling schema when a version mismatch is observed. However, there is also a part of `migration_manager` that doesn't really belong there - creating mutations for schema updates. These are the functions with `prepare_` prefix. They don't modify any state and don't exchange any messages. They only need to read the local database. We take these functions out of `migration_manager` and make them separate functions to reduce the dependency of other modules (especially `query_processor` and CQL statements) on `migration_manager`. Since all of these functions only need access to `storage_proxy` (or even only `replica::database`), doing such a refactor is not complicated. We just have to add one parameter, either `storage_proxy` or `database` and both of them are easily accessible in the places where these functions are called. This refactor makes `migration_manager` unneeded in a few functions: - `alternator::executor::create_keyspace`, - `cql3::statements::alter_type_statement::prepare_announcement_mutations`, - `cql3::statements::schema_altering_statement::prepare_schema_mutations`, - `cql3::query_processor::execute_thrift_schema_command:`, - `thrift::handler::execute_schema_command`. We remove the `migration_manager&` parameter from all these functions. Fixes #14339 Closes #14875 * github.com:scylladb/scylladb: cql3: query_processor::execute_thrift_schema_command: remove an unused parameter cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter cql3: alter_type_statement::prepare_announcement_mutations: change parameters alternator: executor::create_keyspace: remove an unused parameter service: migration_manager: change the prepare_ methods to functions	2023-08-01 11:56:56 +02:00
Avi Kivity	dac93b2096	Merge 'Concurrent tablet migration and balancing' from Tomasz Grabiec This change makes tablet load balancing more efficient by performing migrations independently for different tablets, and making new load balancing plans concurrently with active migrations. The migration track is interrupted by pending topology change operations. The coordinator executes the load balancer on edges of tablet state machine transitions. This allows new migrations to be started as soon as tablets finish streaming. The load balancer is also continuously invoked as long as it produces a non-empty plan. This is in order to saturate the cluster with streaming. A single make_plan() call is still not saturating, due to the way algorithm is implemented. Overload of shards is limited by the fact that load balancer algorithm tracks streaming concurrency on both source and target shards of active migrations and takes concurrency limit into account when producing new migrations. Closes #14851 * github.com:scylladb/scylladb: tablets: load_balancer: Remove double logging tests: tablets: Check that load balancing is interrupted by topology change tests: tablets: Add test for load balancing with active migrations tablets: Balance tablets concurrently with active migrations storage_service, tablets: Extract generate_migration_updates() storage_service, tablets: Move get_leaving_replica() to tablets.cc locator: tablets: Move std::hash definition earlier storage_service: Advance tablets independently topology_coordinator: Fix missed notification on abort tablets: Add formatter for tablet_migration_info	2023-07-31 16:44:33 +03:00
Botond Dénes	4a02865ea1	Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk Maps related to column families in database are extracted to a column_families_data class. Access to them is possible only through methods. All methods which may preempt hold rwlock in relevant mode, so that the iterators can't become invalid. Fixes: #13290 Closes #13349 * github.com:scylladb/scylladb: replica: make tables_metadata's attributes private replica: add methods to get a filtered copy of tables map replica: add methods to check if given table exists replica: add methods to get table or table id replica: api: return table_id instead of const table_id& replica: iterate safely over tables related maps replica: pass tables_metadata to phased_barrier_top_10_counts replica: add methods to safely add and remove table replica: wrap column families related maps into tables_metadata replica: futurize database::add_column_family and database::remove	2023-07-31 15:31:59 +03:00
Botond Dénes	72043a6335	Merge 'Avoid using qctx in schema_tables' column-mapping queries' from Pavel Emelyanov There are three methods in system_keyspace namespace that run queries over `system.scylla_table_schema_history` table. For that they use qctx which's not nice. Fortunately, all the callers already have the system_keyspace& local variable or argument they can pass to those methods. Since the accessed table belongs to system keyspace, the latter declares the querying methods as "friends" to let them get private `query_processor& _qp` member Closes #14876 * github.com:scylladb/scylladb: schema_tables: Extract query_processor from system_keyspace for querying schema_tables: Add system_keyspace& argument to ..._column_mapping() calls migration_manager: Add system_keyspace argument to get_schema_mapping()	2023-07-31 15:00:59 +03:00
Botond Dénes	781721218f	Merge 'storage_service: refresh_sync_nodes: restrict to normal token owners' from Benny Halevy It is possible that topology will contain nodes that are no longer normal token owners, so they don't need to be sync'ed with. Fixes scylladb/scylladb#14793 Closes #14798 * github.com:scylladb/scylladb: storage_service: refresh_sync_nodes: restrict to reachable token owners storage_service: refresh_sync_nodes: fix log message locator: topology: node::state: make fine grained	2023-07-31 14:52:19 +03:00
Benny Halevy	d903d03bf8	locator: topology: node::state: make fine grained Currently the node::state is coarse grained so one cannot distinguish between e.g. a leaving node due to decommission (where the node is used for reading) vs. due to remove node (where the node is not used for reading). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 10:33:48 +03:00
Kefu Chai	47e27dd2d2	test: wait until there is no pending tasks in compaction_manager_basic_test before this change, after triggering the compaction, compaction_manager_basic_test waits until the triggered compaction completes. but since the regular compaction is run in a loop which does not stop until either the daemon is stopping, or there is no more sstables to be compacted, or the compaction is disabled. but we only get the input sstables for compaction after swiching to the "pending" state, and acquiring the read lock of the compaction_state, and acquiring the read lock is implemented as an coroutine, so there is chance that coroutine is suspended, and the execution switches to the test. in this case, the test will find that even after the triggered compaction completes, there are still one or more pending compactions. hence the test fails. to address this problem, instead of just waiting for the compaction to complete, we also wait until the number of pending compaction tasks is 0. so that even if the test manages to sneak into the time window, it won't proceed and starting check the compaction manager's stats. Fixes #14865 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14889	2023-07-31 10:29:18 +03:00
Nadav Har'El	04e5082d52	alternator: limit expression length and recursion depth DynamoDB limits of all expressions (ConditionExpression, UpdateExpression, ProjectionExpression, FilterExpression, KeyConditionExpression) to just 4096 bytes. Until now, Alternator did not enforce this limit, and we had an xfailing test showing this. But it turns out that not enforcing this limit can be dangerous: The user can pass arbitrarily-long and arbitrarily nested expressions, such as: a<b and (a<b and (a<b and (a<b and (a<b and (a<b and (...)))))) or ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( and those can cause recursive algorithms in Alternator's parser and later when applying expressions to recurse very deeply, overflow the stack, and crash. This patch includes new tests that demonstrate how Scylla crashes during parsing before enforcing the 4096-byte length limit on expressions. The patch then enforces this length limit, and these tests stop crashing. We also verify that deeply-nested expressions shorter than the 4096-byte limit are apparently short enough for our recursion ability, and work as expected. Unforuntately, running these tests many times showed that the 4096-byte limit is not low enough to avoid all crashes so this patch needs to do more: The parsers created by ANTLR are recursive, and there is no way to limit the depth of their recursion (i.e., nothing like YACC's YYMAXDEPTH). Very deep recursion can overflow the stack and crash Scylla. After we limited the length of expression strings to 4096 bytes this was almost enough to prevent stack overflows. But unfortunetely the tests revealed that even limited to 4096 bytes, the expression can sometimes recurse too deeply: Consider the expression "((((((....((((" with 4000 parentheses. To realize this is a syntax error, the parser needs to do a recursive call 4000 times. Or worse - because of other Antlr limitations (see rants in comments in expressions.g) it's actually 12000 recursive calls, and each of these calls have a pretty large frame. In some cases, this overflows the stack. The solution used in this patch is not pretty, but works. We add to rules in alternator/expressions.g that recurse (there are two of those - "value" and "boolean_expression") an integer "depth" parameter, which we increase when the rule recurses. Moreover, we add a so-called predicate "{depth<MAX_DEPTH}?" that stops the parsing when this limit is reached. When the parsing is stopped, the user will see a special kind of parse error, saying "expression nested too deeply". With this last modification to expressions.g, the tests for deeply-nested but still-below-4096-bytes expressions (test_limits.py::test_deeply_nested_expression_*) would not fail sporadically as they did without it. While adding the "expression nested too deeply" case, I also made the general syntax-error reporting in Alternator nicer: It no longer prints the internal "expression_syntax_error" type name (an exception type will only be printed if some sort of unexpected exception happens), and it prints the character position where the syntax error (or too deep nested expression) was recognized. Fixes #14473 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14477	2023-07-31 08:57:54 +03:00
Tomasz Grabiec	96d06b58df	tests: tablets: Check that load balancing is interrupted by topology change We add a special mode of load balancing, enabled through error injection, which causes it to continuously generate plans. This should keep the topology coordinator continuously in the tablet migration track. We enable this mode in test_tablets.py:test_bootstrap before bootstrapping nodes to see that bootstrap request interrupts tablet migration track. If this would not be the case, the test will hang.	2023-07-31 01:45:23 +02:00

1 2 3 4 5 ...

5409 Commits