scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Artsiom Mishuta	efb079ec15	test/pylib: Introduce Manager broken state: Waiting for all tasks does not guarantee that test will not spawn new tasks while we wait Manager broken state prevents all future put requests in case of 1) fail during task waiting 2) Test continue to create tasks in test_after stage	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	a8bab03c15	test/pylib: Wait for tasks from Tasks History: To ensure the atomicity of tests and recycle clusters without any issues, it is crucial that all active requests in ScyllaClusterManager are completed before proceeding further.	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	2ee063c90c	test/pylib: Introduce Tasks History: Topology tests might spawn asynchronous tasks in parallel in ScyllaClusterManager. Tasks history is introduced to be able log and analyze all actions against cluster in case of failures	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	38125a0049	test/pylib: Introduce Stop Event indrodce stop event that interrupt start node on state "wait for node started" if someone wants to stop it	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	4c2527efce	test/pylib: Introduce Start-Stop Lock: The methods stop, stop_gracefully, and start in ScyllaServer are not designed for parallel execution. To circumvent issues arising from concurrent calls, a start_stop_lock has been introduced. This lock ensures that these methods are executed sequentially.	2024-05-14 14:24:03 +02:00
Andrei Chekun	76a766cab0	Migrate alternator tests to PythonTestSuite As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features. Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite. Related: https://github.com/scylladb/scylladb/issues/18188 Closes scylladb/scylladb#18442	2024-05-13 13:23:29 +03:00
Avi Kivity	51d09e6a2a	cql3: castas_fcts: do not rely on boost casting large multiprecision integers to floats behavior In [1] a bug casting large multiprecision integers to floats is documented (note that it received two fixes, the most recent and relevant is [2]). Even with the fix, boost now returns NaN instead of ±∞ as it did before [3]. Since we cannot rely on boost, detect the conditions that trigger the bug and return the expected result. The unit test is extended to cover large negative numbers. Boost version behavior: - 1.78 - returns ±∞ - 1.79 - terminates - 1.79 + fix - returns NaN Fixes https://github.com/scylladb/scylladb/issues/18508 [1] https://github.com/boostorg/multiprecision/issues/553 [2] `ea786494db` [3] https://github.com/boostorg/math/issues/1132 Closes scylladb/scylladb#18532	2024-05-13 13:18:28 +03:00
Avi Kivity	cc8b4e0630	batchlog_manager, test: initialize delay configuration In `b4e66ddf1d` (4.0) we added a new batchlog_manager configuration named delay, but forgot to initialize it in cql_test_env. This somehow worked, but doesn't with clang 18. Fix it by initializing to 0 (there isn't a good reason to delay it). Also provide a default to make it safer. Closes scylladb/scylladb#18572	2024-05-13 07:57:35 +03:00
Avi Kivity	c8cc47df2d	Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is moved out of this shard. Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64933.90 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42163 insns/op, 0 errors) 65865.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42155 insns/op, 0 errors) 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 67029.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 68361.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42166 insns/op, 0 errors) median 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) median absolute deviation: 784.00 maximum: 68361.21 minimum: 64933.90 ``` Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 63744.12 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 66613.16 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 69667.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42184 insns/op, 0 errors) 67824.78 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42180 insns/op, 0 errors) 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median absolute deviation: 631.05 maximum: 69667.39 minimum: 63744.12 ``` Fixes: #16877. Closes scylladb/scylladb#17664 * github.com:scylladb/scylladb: test: add test for back and forth tablets migration replica: allocate storage groups dynamically replica: refresh snapshot in compaction_group::cleanup replica: add rwlock to storage_group_manager replica: handle reads of non-existing tablets gracefully service: move to cleanup stage if allow_write_both_read_old fails replica: replace table::as_table_state compaction: pass compaction group id to reshape_compaction_group replica: open code get_compaction_group in perform_cleanup_compaction replica: drop single_compaction_group_if_available	2024-05-12 21:22:02 +03:00
Nadav Har'El	9813ec9446	Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core). Related: https://github.com/scylladb/scylladb/issues/12518 Closes scylladb/scylladb#13121 * github.com:scylladb/scylladb: test: perf: alternator: add option to skip data pre-population perf-alternator-workloads: add operations-per-shard option test: perf: add global secondary indexes write workload for alternator test: perf: add option to continue after failed request test: perf: add read modify write workload for alternator (lwt) test: perf: add scan workload for alternator test: perf: add end-to-end benchmark for alternator test: perf: extract result aggregation logic to a separate struct	2024-05-12 18:15:29 +03:00
Kefu Chai	fd14b6f26b	test/nodetool: do not accept 1 return code when passing --help to nodetool in `906700d5`, we accepted 0 as well as the return code of "nodetool <command> --help", because we needed to be prepared for the newer seastar submodule while be compatible with the older seastar versions. now that in `305f1bd3`, we bumped up the seastar module, and this commit picked up the change to return 0 when handling "--help" command line option in seastar, we are able to drop the workaround. so, in this change, we only use "0" as the expected return code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18627	2024-05-12 14:30:31 +03:00
Aleksandra Martyniuk	51fdda4199	test: add test for back and forth tablets migration	2024-05-10 15:08:56 +02:00
Aleksandra Martyniuk	b4371a0ea0	replica: allocate storage groups dynamically Currently empty storage_groups are allocated for tablets that are not on this shard. Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is cleaned up. Stop compaction group before it's deallocated. Add a flag to table::cleanup_tablet deciding whether to deallocate sgs and use it in commitlog tests.	2024-05-10 15:08:21 +02:00
Aleksandra Martyniuk	532653f118	replica: replace table::as_table_state Replace table::as_table_state with table::try_get_table_state_with_static_sharding which throws if a table does not use static sharding.	2024-05-10 14:56:38 +02:00
Botond Dénes	3286a6fa14	Merge 'Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components). The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory. Closes scylladb/scylladb#18186 * github.com:scylladb/scylladb: sstable_datafile_test: add testcase to test reclaim during reload sstable_datafile_test: add test to verify auto reload of reclaimed components sstables_manager: reload previously reclaimed components when memory is available sstables_manager: start a fiber to reload components sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables sstable_datafile_test: add test to verify reclaimed components reload sstables: support reloading reclaimed components sstables_manager: add new intrusive set to track the reclaimed sstables sstable: add link and comparator class to support new instrusive set sstable: renamed intrusive list link type sstable: track memory reclaimed from components per sstable sstable: rename local variable in sstable::total_reclaimable_memory_size	2024-05-10 13:01:01 +03:00
Avi Kivity	37d32a5f8b	Merge 'Cleanup inactive reads on tablet migration' from Botond Dénes When a tablet is migrated away, any inactive read which might be reading from said tablet, has to be dropped. Otherwise these inactive reads can prevent sstables from being removed and these sstables can potentially survive until the tablet is migrated back and resurrect data. This series introduces the fix as well as a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18110 Closes scylladb/scylladb#18179 * github.com:scylladb/scylladb: test: add test for cleaning up cached querier on tablet migration querier: allow injecting cache entry ttl by error injector replica/table: cleanup_tablet(): clear inactive reads for the tablet replica/database: introduce clear_inactive_reads_for_tablet() replica/database: introduce foreach_reader_concurrency_semaphore reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table() reader_concurrency_semaphore: allow storing a range with the inactive reader reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()	2024-05-09 17:34:49 +03:00
Lakshmi Narayanan Sreethar	4d22c4b68b	sstable_datafile_test: add testcase to test reclaim during reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 19:57:40 +05:30
Lakshmi Narayanan Sreethar	a080daaa94	sstable_datafile_test: add test to verify auto reload of reclaimed components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	24064064e9	sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables The testcase uses an sstable whose mutation key and the generation are owned by different shards. Due to this, when process_sstable_dir is called, the sstable gets loaded into a different shard than the one that was intended. This also means that the sstable and the sstable manager end up in different shards. The following patch will introduce a condition variable in sstables manager which will be signalled from the sstables. If the sstable and the sstable manager are in different shards, the signalling will cause the testcase to fail in debug mode with this error : "Promise task was set on shard x but made ready on shard y". So, fix it by supplying appropriate generation number owned by the same shard which owns the mutation key as well. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	69b2a127b0	sstable_datafile_test: add test to verify reclaimed components reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Marcin Maliszkiewicz	a1099791c4	test: perf: alternator: add option to skip data pre-population	2024-05-09 13:59:17 +02:00
Marcin Maliszkiewicz	fd416fac3b	perf-alternator-workloads: add operations-per-shard option	2024-05-09 13:59:13 +02:00
Marcin Maliszkiewicz	5b8acf182a	test: perf: add global secondary indexes write workload for alternator	2024-05-09 13:59:08 +02:00
Marcin Maliszkiewicz	43a64ac558	test: perf: add option to continue after failed request	2024-05-09 13:59:03 +02:00
Marcin Maliszkiewicz	70b5b5024b	test: perf: add read modify write workload for alternator (lwt)	2024-05-09 13:58:58 +02:00
Marcin Maliszkiewicz	5b8e554431	test: perf: add scan workload for alternator	2024-05-09 13:58:54 +02:00
Marcin Maliszkiewicz	55030b1550	test: perf: add end-to-end benchmark for alternator The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).	2024-05-09 13:58:40 +02:00
Marcin Maliszkiewicz	6152223890	test: perf: extract result aggregation logic to a separate struct It will be reused later by a new tool.	2024-05-09 13:58:29 +02:00
Gleb Natapov	3b40d450e5	gossiper: try to locate an endpoint by the host id when applying state if search by IP fails Even if there is no endpoint for the given IP the state can still belong to existing endpoint that was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft topology mode only to not have impact on gossiper mode. Also make the test more robust in detecting wrong amount of entries in the peers table. Today it may miss that there is a wrong entry there because the map will squash two entries for the same host id into one. Fixes: scylladb/scylladb#18419 Fixes: scylladb/scylladb#18457	2024-05-09 13:14:54 +02:00
Kefu Chai	906700d523	test/nodetool: accept -1 returncode also when --help is invoked in newer seastar, 0 is returned as the returncode of the application when handling `--help`. to prepare for this behavior, let's accept it before updating the seastar submodule. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18574	2024-05-09 08:26:44 +03:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Botond Dénes	0438febdc9	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter	2024-05-09 08:26:43 +03:00
Benny Halevy	0156e97560	storage_proxy: cas: reject for tablets-enabled tables Currently, LWT is not supported with tablets. In particular the interaction between paxos and tablet migration is not handled yet. Therefore, it is better to outright reject LWT queries for tablets-enabled tables rather than support them in a flaky way. This commit also marks tests that depend on LWT as expeced to fail. Fixes scylladb/scylladb#18066 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18103	2024-05-09 08:26:43 +03:00
Botond Dénes	96a7ed7efb	Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili This is the second half of the fix for issue #13968. The first half is already merged with PR #18346 Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row. This change adds a new counter for dead rows to the large_partitions table. This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells. This is an enhancement, and backporting is not needed. Fixes #13968 Closes scylladb/scylladb#18458 * github.com:scylladb/scylladb: sstable: added test for counting dead rows sstable: added docs for system.large_partitions.dead_rows sstable: added cluster feature for dead rows and range tombstones sstable: write dead_rows count to system.large_partitions sstable: added counter for dead rows	2024-05-09 08:26:43 +03:00
Kamil Braun	4dcae66380	Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369 Closes scylladb/scylladb#18379 * github.com:scylladb/scylladb: test: get rid of server-side server_restart test: util: get rid of the `restart` helper test: {auth,topology}: use manager.rolling_restart	2024-05-08 09:45:08 +02:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Ferenc Szili	60bf846f68	sstable: added test for counting dead rows	2024-05-07 15:44:33 +02:00
Raphael S. Carvalho	2d9142250e	Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this will be handled by explicitly closing the connection, so if scylla fails to update gossiper state due to premature abort on shutdown, then we won't be stuck in an endless reconnection attempt (later through heartbeats (30s interval)), causing the test to timeout. Manifests in scylla logs like this: gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted) gossip - failure_detector_loop: Finished main loop migration_manager - stopping migration service storage_service - Shutting down native transport server gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) cql_server_controller - CQL server stopped ... gossip - My status = NORMAL gossip - Announcing shutdown gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924 gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924 gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested) Refs #14746. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18484	2024-05-07 02:31:02 +02:00
Piotr Dulikowski	5459cfed6a	Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables) Fixes https://github.com/scylladb/scylladb/issues/17737 Closes scylladb/scylladb#17939 * github.com:scylladb/scylladb: auth: don't run legacy migrations on auth-v2 startup auth: fix indent in password_authenticator::start auth: remove unused service::has_existing_legacy_users func	2024-05-06 19:53:35 +02:00
Piotr Dulikowski	92e5018ddb	test: get rid of server-side server_restart Restarting a node amounts to just shutting it down and then starting again. There is no good reason to have a dedicated endpoint in the ScyllaClusterManager for restarting when it can be implemented by calling two endpoints in a sequence: stop and start - it's just code duplication. Remove the server_restart endpoint in ScyllaClusterManager and reimplement it as two endpoint calls in the ManagerClient.	2024-05-06 12:54:53 +02:00
Piotr Dulikowski	8de2bda7ae	test: util: get rid of the `restart` helper We already have `ManagerClient.server_restart`, which can be used in its place.	2024-05-06 12:24:40 +02:00
Piotr Dulikowski	897e603bf0	test: {auth,topology}: use manager.rolling_restart Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369	2024-05-06 12:24:40 +02:00
Kamil Braun	ccbb9f5343	Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. https://github.com/scylladb/scylladb/issues/17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Additionally, improve `test_raft_snapshot_request` that would become flaky after the change so it's not sensitive to changes anymore. The raft-based topology was experimental before 6.0, no need to backport. Ref: scylladb/scylladb#17545 Closes scylladb/scylladb#18497 * github.com:scylladb/scylladb: topology_coordinator: clear obsolete generations earlier test: test_raft_snapshot_request: improve the last assertion test: test_raft_snapshot_request: find raft leader after restart test: test_raft_shanpshot_request: simplify appended_command	2024-05-06 12:03:33 +02:00
Kamil Braun	1a50a524e7	Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198 Closes scylladb/scylladb#18261 * github.com:scylladb/scylladb: test: topology: test that upgrade succeeds after recent removal topology_coordinator: compute cluster size correctly during upgrade	2024-05-06 11:06:09 +02:00
Kamil Braun	16846bf5ce	Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb With topology over raft all operation are already serialized by the coordinator anyway, so no need to synchronize removenode using api lock. All others are still synchronized since there cannot be executed in parallel for the same node anyway. * 'gleb/17681-fix' of github.com:scylladb/scylla-dev: storage_service: do not take API lock for removenode operation if topology coordinator is enabled test: return file mark from wait_for that points after the found string	2024-05-06 09:03:03 +02:00
Kefu Chai	5ca9a46a91	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18515	2024-05-05 23:31:48 +03:00
Nadav Har'El	21557cfaa6	cql3: Fix invalid JSON parsing for JSON object with different key types More than three years ago, in issue #7949, we noticed that trying to set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the fromJson() function) fails - the ascii key is incorrectly parsed. We fixed that issue in commit `75109e9519` but unfortunately, did not do our due diligence: We did not write enough tests inspired by this bug, and failed to discover that actually we have the same bug for many other key types, not just for "ascii". Specifically, the following key types have exactly the same bug: * blob * date * inet * time * timestamp * timeuuid * uuid Other types, like numbers or boolean worked "by accident" - instead of parsing them as a normal string, we asked the JSON parser to parse them again after removing the quotes, and because unquoted numbers and unquoted true/false happwn to work in JSON, this didn't fail. The fix here is very simple - for all native types (i.e., not collections or tuples), the encoding of the key in JSON is simply a quoted string - and removing the quotes is all we need to do and there's no need to run the JSON parser a second time. Only for more elaborate types - collections and tuples - we need to run the JSON parser a second time on the key string to build the more elaborate object. This patch also includes tests for fromJson() reading a map with all native key types, confirming that all the aforementioned key types were broken before this patch, and all key types (including the numbers and booleans which worked even befoe this patch) work with this patch. Fixes #18477. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18482	2024-05-05 15:42:43 +03:00
Kefu Chai	f2b1c47dfc	test/boost: s/boost::range::random_shuffle/std::ranges::shuffle/ `boost::range::random_shuffle()` uses the deprecated `std::random_shuffle()` under the hood, so let's use `std::ranges::shuffle()` which is available since C++20. this change should address the warning like: ``` [312/753] CXX build/debug/test/boost/counter_test.o In file included from test/boost/counter_test.cc:17: /usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr ecated-declarations] 106 \| detail::random_shuffle(boost::begin(rng), boost::end(rng)); \| ^ test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here 507 \| boost::range::random_shuffle(shards); \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' has been explicitly marked deprecated here 4489 \| _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle") \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST' 1957 \| # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST' 1941 \| __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) \| ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18517	2024-05-05 15:39:57 +03:00
Kefu Chai	53b98a8610	test: string_format_test: disable test if {fmt} >= 10.0.0 {fmt} v10.0.0 introduces formatter for `std::optional`, so there is no need to test it. furthermore the behavior of this formatter is different from our homebrew one. so let's skip this test if {fmt} v10.0.0 or up is used. Refs #18508 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18509	2024-05-03 11:34:23 +03:00
Avi Kivity	8de81f8f91	Merge 'Unstall merge topology snapshot' from Benny Halevy This series adds facilities to gently convert canonical mutations back to mutations and to gently make canonical mutations or freeze mutations in a seastar thread. Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls due to large mutation, as seed in the test_add_many_nodes_under_load dtest. Also, migration_manager migration_request was converted to use a seastar thread to use the above facilities to prevent reactor stalls with large schema mutations, e,g, with a large number of tables, and/or when reading tablets mutations with a large number of tablets in a table. perf-simple-query --write results: Before: ``` median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) ``` After: ``` median 79716.73 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53314 insns/op, 0 errors) ``` Closes scylladb/scylladb#18290 * github.com:scylladb/scylladb: storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method raft: group0_state_machine: write_mutations_to_database: freeze mutations gently database: apply_in_memory: unfreeze_gently large mutations storage_service: get_system_mutations: make_canonical_mutation_gently tablets: read_tablet_mutations: make_canonical_mutation_gently schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference storage_service: merge_topology_snapshot: freeze_gently canonical_mutation: add make_canonical_mutation_gently frozen_mutation: move unfreeze_gently to async_utils mutation: add freeze_gently idl-compiler: generate async serialization functions for stub members raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently storage_service: merge_topology_snapshot: co_await to_mutation_gently canonical_mutation: add to_mutation_gently idl-compiler: emit include directive in generated impl header file mutation_partition: add apply_gently collection_mutation: improve collection_mutation_view formatting mutation_partition: apply_monotonically: do not support schema upgrade test/perf: report also log_allocations/op	2024-05-02 23:24:38 +03:00

1 2 3 4 5 ...

6803 Commits