scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Andrei Chekun	8d1d206aff	[test.py] Add uniqueness to the test name In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973	2024-06-14 11:23:04 +02:00
Andrei Chekun	93b9b85c12	[test.py] Refactor alternator, nodetool, rest_api Make alternator, nodetool and rest_api test directories as python packages. Move scylla-gdb to scylla_gdb and make it python package.	2024-06-13 13:56:10 +02:00
Pavel Emelyanov	1b9cedb3f3	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19222	2024-06-11 09:55:06 +02:00
Raphael S. Carvalho	7b41630299	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19099	2024-06-11 06:59:04 +03:00
Calle Wilund	51c53d8db6	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. Closes scylladb/scylladb#19086	2024-06-11 06:59:04 +03:00
Nadav Har'El	73dfa4143a	cql-pytest: translate Cassandra's tests for SELECT DISTINCT This is a translation of Cassandra's CQL unit test source file DistinctQueryPagingTest.java into our cql-pytest framework. The 5 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for one already-known issue: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18971	2024-06-11 06:59:04 +03:00
Botond Dénes	7b2aad56c4	test/boost/sstable_datafile_test: remove unused semaphores The tests use the ones from test_env, the explicitely created ones are unused. Closes scylladb/scylladb#19167	2024-06-09 20:43:59 +03:00
Tomasz Grabiec	c8f71f4825	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 Closes scylladb/scylladb#19137	2024-06-09 19:39:57 +03:00
Avi Kivity	7b301f0cb9	Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code. This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737) Closes scylladb/scylladb#19166 * github.com:scylladb/scylladb: code: Enlighten wasm headers usage lang: Unfriend wasm context from manager lang, cql3, schema_tables: Don't mess with db::config lang: Don't use db::config to create lua context lang: Don't use db::config to create wasm context lang: Drop manager::precompile() method cql3, schema_tables: Generalize function creation wasm: Replace startup_context with wasm_config lang: Add manager::start() method lang: Move manager to lang namespace lang: Move wasm::manager to its .cc/.hh files	2024-06-09 19:32:26 +03:00
Avi Kivity	b2a500a9a1	Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. Closes scylladb/scylladb#18729 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-09 19:20:18 +03:00
Nadav Har'El	13cf6c543d	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19080	2024-06-09 19:19:09 +03:00
Kefu Chai	f4706be8a8	test: test_topology_ops: adapt to tablets in `e7d4e080`, we reenabled the background writes in this test, but when running with tablets enabled, background writes are still disabled because of #17025, which was fixed last week. so we can enable background writes with tablets. in this change, * background writes are enabled with tablets. * increase the number of nodes by 1 so that we have enough nodes to fulfill the needs of tablets, which enforces that the number of replicas should always satisfy RF. * pass rf to `start_writes()` explicitly, so we have less magic numbers in the test, and make the data dependencies more obvious. Fixes #17589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18707	2024-06-08 17:46:37 +02:00
Gleb Natapov	34cf5c81f6	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes scylladb/scylladb#18863 Closes scylladb/scylladb#19138	2024-06-07 15:31:44 +02:00
Pavel Emelyanov	b854bf4b83	lang: Don't use db::config to create lua context Similarly to previous patch, lua context needs db::config for creation. It's better to get the configurables via lang::manager::config. One thing to note -- lua config carries updateable_values on board, but respective db::config options and _not_ LiveUpdate-able, so the lua config could just use simple data types. This patch keeps updateable values intact for brevity. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	783ccc0a74	lang: Don't use db::config to create wasm context The managerr needs to get two "fuel" configurables from db::config in order to create context. Instead of carrying db config from callers, keep the options on existing lang::manager::config and use them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	fe7ff7172d	wasm: Replace startup_context with wasm_config The lang::manager starts with the help of a context because it needs to have std::shared_ptr<> pointg to cross-shard shared wasm engine and runner thread. For that a context is created in advance, that then helps sharing the engine and runner across manager instances. This patch removes the "context" and replaces it with classical manager::config. With it, it's lang::manager who's now responsible for initializing itself. In order to have cross-shard engine and thread pointers, the start() method uses invoke_on_others() facility to share the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	0dad72b736	lang: Add manager::start() method Just like any other sharded<> service, the lang::manager now starts and stops in a classical sequence of await sharded<manager>::start() defer([] { await sharded<manager>::stop() }) await sharded<manager>::invoke_on_all(&manager::start) For now the method is no-op, next patches will start using it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	f950469af5	lang: Move manager to lang namespace And, while at it, rename local variable to refer to it to as "manager" not "wasm". Query processor and database also have getters named "wasm()", these are not renamed yet to keep patch smaller (and those getters are going to be reworked further anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	1dec79e97d	lang: Move wasm::manager to its .cc/.hh files It's going to become a facade in front of both -- wasm and lua, so keep it in files with language independent names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Piotr Dulikowski	e18aeb2486	Merge 'mv: gossip the same backlog if a different backlog was sent in a response' from Wojciech Mitros Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog - this can be observed in the following scenario: 1. Cluster starts, all nodes gossip their empty view update backlog to one another 2. On node N, `view_update_backlog_broker` (the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1s 3. Within the next 1s, coordinator (different than N) sends a write to N causing a remote view update (which we do not wait for). As a result, node N replies immediately with an increased view update backlog, which is then noted by the coordinator. 4. Still within the 1s, node N finishes the view update in the background, dropping its view update backlog to 0. 5. In the next and following iterations of `view_update_backlog_broker` on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the check ``` auto backlog = _sp.local().get_view_update_backlog(); if (backlog_published && backlog_published == backlog) { sleep_abortable(gms::gossiper::INTERVAL, _as).get(); continue; } ``` After this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty This patch fixes the issue this by notifying the gossip that a different backlog was sent in a response, causing it to send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461 Similarly to https://github.com/scylladb/scylladb/pull/18646, without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests Closes scylladb/scylladb#18663 github.com:scylladb/scylladb: mv: gossip the same backlog if a different backlog was sent in a response node_update_backlog: divide adding and fetching backlogs	2024-06-07 10:20:21 +02:00
Avi Kivity	cd553848c1	Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole. This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are: - schema modification - auth modifications - service levels modifications Fixes https://github.com/scylladb/scylladb/issues/17738 Closes scylladb/scylladb#17910 * github.com:scylladb/scylladb: raft: rename mutations_collector to group0_batch raft: rename announce to commit cql3: raft: attach description to each mutations collector group auth: unify mutations_generator type auth: drop redundant 'this' keyword auth: remove no longer used code from standard_role_manager::legacy_modify_membership cql3: auth: use mutation collector for service levels statements cql3: auth: use mutation collector for alter role cql3: auth: use mutation collector for grant role and revoke role cql3: auth: use mutation collector for drop role and auto-revoke auth: add refactored modify_membership func in standard_role_manager auth: implement empty revoke_all in allow_all_authorizer auth: drop request_execution_exception handling from default_authorizer::revoke_all Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" cql3: auth: use mutation collector for grant and revoke permissions cql3: extract changes_tablets function in alter_keyspace_statement cql3: auth: use mutation collector for create role statement auth: move create_role code into service auth: add a way to announce mutations having only client_state ref auth: add collect_mutations common helper auth: remove unused header in common.hh auth: add class for gathering mutations without immediate announce auth: cql3: use auth facade functions consistently on write path auth: remove unused is_enforcing function	2024-06-06 17:31:26 +03:00
Marcin Maliszkiewicz	63e6334a64	raft: rename mutations_collector to group0_batch	2024-06-06 13:26:34 +02:00
Kamil Braun	57e810c852	Merge 'Serialize repair with tablet migration' from Tomasz Grabiec We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. Closes scylladb/scylladb#18641 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-06 11:27:03 +02:00
Kamil Braun	256517b570	Merge 'tablets: Filter-out left nodes in get_natural_endpoints()' from Tomasz Grabiec The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 Closes scylladb/scylladb#18955 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-06 11:23:27 +02:00
Wojciech Mitros	272e80fe0a	node_update_backlog: divide adding and fetching backlogs Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.	2024-06-06 10:45:13 +02:00
Botond Dénes	cd10beb89d	Merge 'Don't use db::config by gossiper' from Pavel Emelyanov All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places. Closes scylladb/scylladb#19051 * github.com:scylladb/scylladb: gossiper: Stop using db::config gossiper: Move force_gossip_generation on gossip_config gossiper: Move failure_detector_timeout_ms on gossip_config main: Fix indentation after previous patch main: Make gossiper config a sharded parameter main: Add local variable for set of seeds main: Add local variable for group0 id main: Add local variable for cluster_name	2024-06-06 09:12:51 +03:00
Nadav Har'El	b5fd854c77	cql-pytest: be more forgiving to ancient versions of Scylla We recently added to cql-pytest tests the ability to check if tablets are enabled or not (for some tablet-specific tests). When running tests against Cassandra or old pre-tablet versions of Scylla, this fact is detected and "False" is returned immediately. However, we still look at a system table which didn't exist on really ancient versions of Scylla, and tests couldn't run against such versions. The fix is trivial: if that system table is missing, just ignore the error and return False (i.e., no tablets). There were no tablets on such ancient versions of Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19098	2024-06-06 08:53:26 +03:00
Tomasz Grabiec	2c3f7c996f	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 Closes scylladb/scylladb#19004	2024-06-05 18:07:24 +03:00
Tomasz Grabiec	5ca54a6e88	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	98323be296	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	c45ce41330	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service	2024-06-05 16:11:22 +02:00
Kamil Braun	18f5d6fd89	Merge 'Fail bootstrap if ip mapping is missing during double write stage' from Gleb Natapov If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. Closes scylladb/scylladb#18927 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-05 11:15:15 +02:00
Raphael S. Carvalho	3983f69b2d	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19092	2024-06-05 10:11:02 +02:00
Pavel Emelyanov	dcc083110d	gossiper: Stop using db::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Marcin Maliszkiewicz	ac0e164a6b	raft: rename announce to commit Old wording was derived from existing code which originated from schema code. Name commit better describes what we do here.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	370a5b547e	cql3: raft: attach description to each mutations collector group This description is readable from raft log table. Previously single description was provided for the whole announce call but since it can contain mutations from various subsystems now description was moved to add_mutation(s)/add_generator function calls.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a88b7fc281	cql3: auth: use mutation collector for service levels statements This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	97a5da5965	cql3: auth: use mutation collector for alter role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a12c8ebfce	cql3: auth: use mutation collector for grant role and revoke role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	5ba7d1b116	cql3: auth: use mutation collector for drop role and auto-revoke The main theme of this commit is executing drop keyspace/table/aggregate/function statements in a single transaction together with auth auto-revoke logic. This is the logic which cleans related permissions after resource is deleted. It contains serveral parts which couldn't easily be split into separate commits mainly because mutation collector related paths can't be mixed together. It would require holding multiple guards which we don't support. Another reason is that with mutation collector the changes are announced in a single place, at the end of statement execution, if we'd announce something in the middle then it'd lead to raft concurrent modification infinite loop as it'd invalidate our guard taken at the begining of statement execution. So this commit contains: - moving auto-revoke code to statement execution from migration_listener * only for auth-v2 flow, to not break the old one * it's now executed during statement execution and not merging schemas, which means it produces mutations once as it should and not on each node separately * on_before callback family wasn't used because I consider it much less readable code. Long term we want to remove auth_migration_listener. - adding mutation collector to revoke_all * auto-revoke uses this function so it had to be changed, auth::revoke_all free function wrapper was added as cql3 layer should not use underlying_authorizer() directly. - adding mutation collector to drop_role * because it depends on revoke_all and we can't mix old and new flows * we need to switch all functions auth::drop_role call uses * gradual use of previously introduced modify_membership, otherwise we would need to switch even more code in this commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	01fb43e35f	Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" This reverts commit `80ed442be2`. This logic was replaced in previous commit by dynamic cast. Hopefully even this cast will be eliminated in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	2a6cfbfb33	cql3: auth: use mutation collector for create role statement This is done to achieve single transaction semantics. grant_permissions_to_creator is logically part of create role but its change will be included in following commits as it spans multiple usages. Additinally we disabled rollback during create role as it won't work and is not needed with single transaction logic.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	7e0a801f53	auth: add class for gathering mutations without immediate announce To achieve write atomicity across different tables we need to announce mutations in a single transaction. So instead of each function doing a separate announce we need to collect mutations and announce them once at the end.	2024-06-04 15:43:04 +02:00
Botond Dénes	d120f0d7d3	Merge 'tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: #16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. Closes scylladb/scylladb#18735 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-04 08:43:44 +03:00
Nadav Har'El	95db1c60d6	test/alternator: fix a test failing on Amazon DynamoDB The test test_table.py::test_concurrent_create_and_delete_table failed on Amazon DynamoDB because of a silly typo - "false" instead of "False". A function detecting Scylla tried to return false when noticing this isn't Scylla - but had a typo, trying to return "false" instead of "False". This patch fixes this typo, and the test now works on DynamoDB: test/alternator/run --aws test_table.py::test_concurrent_create_and_delete_table Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17799	2024-06-02 22:25:56 +03:00
Tomasz Grabiec	7b1eea794b	test: perf: Add test for tablet load balancer effectiveness	2024-06-02 14:23:00 +02:00
Aleksandra Martyniuk	d7e80a6520	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	fc0796f684	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	a82a2f0624	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents.	2024-05-31 10:27:09 +02:00
Nadav Har'El	c786621b4c	test/cql-pytest: reproduce bug of secondary index used before built This patch adds a test reproducing for the known issue #7963, where after adding a secondary-index to a table, queries might immediately start to use this index - even before it is built - and produce wrong results. The issue is still open and unfixed, so the new test is marked "xfail". Interestingly, even though Cassandra claims to have found and fixed a similar bug in 2015 (CASSANDRA-8505), this test also fails on Cassandra - trying a query right after CREATE INDEX and before it was fully built may cause the query to fail. Refs #7963 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18993	2024-05-31 10:05:00 +03:00

1 2 3 4 5 ...

6960 Commits