scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	846f0bd16e	sstables: Fix incremental selection with compound sstable set Incremental selection may not work properly for LCS and ICS due to an use-after-free bug in partitioned set which came into existence after compound set was introduced. The use-after-free happens because partitioned set wasn't taking into account that the next position can become the current position in the next iteration, which will be used by all selectors managed by compound set. So if next position is freed, when it were being used as current position, subsequent selectors would find the current position freed, making them produce incorrect results. Fix this by moving ownership of next pos from incremental_selector_impl to incremental_selector, which makes it more robust as the latter knows better when the selection is done with the next pos. incremental_selector will still return ring_position_view to avoid copies. Fixes #8802. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>	2021-06-13 16:45:07 +03:00
Tomasz Grabiec	7521301b72	Merge "raft: add tests for non-voters and fix related bugs" from Kostja Add test coverage inspired by etcd for non-voter servers, and fix issues discovered when testing. * scylla-dev/raft-learner-test-v4: raft: (testing) test non-voter can vote raft: (testing) test receiving a confchange in a snapshot raft: (testing) test voter-non-voter config change loop raft: (testing) test non-voter doesn't start election on election timeout raft: (testing) test what happens when a learner gets TimeoutNow raft: (testing) implement a test for a leader becoming non-voter raft: style fix raft: step down as a leader if converted to a non-voter raft: improve configuration consistency checks raft: (testing) test that non-voter stays in PIPELINE mode raft: (testing) always return fsm_debug in create_follower()	2021-06-12 21:36:47 +03:00
Nadav Har'El	9774c146cc	cql-pytest: add test for connecting with different SSL/TLS versions This is a reproducer for issue #8827, that checks that a client which tries to connect to Scylla with an unsupported version of SSL or TLS gets the expected error alert - not some sort of unexpected EOF. Issue #8827 is still open, so this test is still xfailing. However, I verified that with a fix for this issue, the test passes. The test also prints which protocol versions worked - so it also helps checking issue #8837 (about the ancient SSL protocol being allowed). Refs #8837 Refs #8827 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>	2021-06-12 21:36:47 +03:00
Michael Livshin	2bbc293e22	tests: improve error reporting of test_env::reusable_sst() Distinguish the "no such sstable" case from any reading errors. While at it, coroutinize the function. Refs #8785. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210610113304.264922-1-michael.livshin@scylladb.com>	2021-06-11 19:06:43 +02:00
Konstantin Osipov	2be8a73c34	raft: (testing) test non-voter can vote When a non-voter is requested a vote, it must vote to preserve liveness. In Raft, servers respond to messages without consulting with their current configuration, and the non-voter may not have the latest configuration when it is requested to vote.	2021-06-11 17:16:57 +03:00
Konstantin Osipov	eaf32f2c3c	raft: (testing) test receiving a confchange in a snapshot	2021-06-11 17:16:56 +03:00
Konstantin Osipov	d08ad76c24	raft: (testing) test voter-non-voter config change loop	2021-06-11 17:16:55 +03:00
Konstantin Osipov	6e4619fe87	raft: (testing) test non-voter doesn't start election on election timeout	2021-06-11 17:16:55 +03:00
Konstantin Osipov	c8ae13a392	raft: (testing) test what happens when a learner gets TimeoutNow Once learner receives TimeoutNow it becomes a candidate, discovers it can't vote, doesn't increase its term and converts back to a follower. Once entries arrive from a new leader it updates its term.	2021-06-11 17:16:55 +03:00
Konstantin Osipov	a972269630	raft: (testing) implement a test for a leader becoming non-voter	2021-06-11 17:16:55 +03:00
Konstantin Osipov	3e6fd5705b	raft: (testing) test that non-voter stays in PIPELINE mode Test that configuration changes preserve PIPELINE mode.	2021-06-11 17:07:39 +03:00
Konstantin Osipov	1dfe946c91	raft: (testing) always return fsm_debug in create_follower() create_follower() is a test helper, so it's OK to return a test-enabled FSM from it. This will be used in a subsequent patch/test case.	2021-06-11 12:24:43 +03:00
Alejo Sanchez	ff34a6515d	raft: replication test: fix elect_new_leader Recently, the logic of elect_new_leader was changed to allow the old leader to vote for the new candidate. But the implementation is wrong as it re-connects the old leader in all cases disregarding if the nodes were already disconnected. Check if both old leader and the requested new leader are connected first and only if it is the case then the old leader can participate in the election. There were occasional hangs in the loop of elect_new_leader because other nodes besides the candidate were ticked. This patch fixes the loop by removing ticks inside of it. The loop is needed to handle prevote corner cases (e.g. 2 nodes). While there, also wait log on all followers to avoid a previously dropped leader to be a dueling candidate. And update _leader only if it was changed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210609193945.910592-3-alejo.sanchez@scylladb.com>	2021-06-10 12:36:25 +02:00
Nadav Har'El	b26fcf5567	test/alternator: increase timeouts in test_tracing.py The query tracing tests in test/alternator's test_tracing.py had one timeout of 30 seconds to find the trace, and one unclearly-coded timeout for finding the right content for the trace. We recently saw both timeouts exceeded in tests, but only rarely and only in debug mode, in a run 100 times slower than normal. This patch increases both timeouts to 100 seconds. Whatever happens then, we win: If the test stops failing, we know the new timeout was enough. If the test continues to fail, we will be able to conclude that we have a real bug - e.g., perhaps one of the LWT operations has a bug causing it to hang indefinitely. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210608205026.1600037-1-nyh@scylladb.com>	2021-06-10 09:19:01 +03:00
Tomasz Grabiec	419ee84d86	Merge "sstable: validate first and last keys ordering" from Benny In #8772, an assert validating first token <= last token failed in leveled_manifest::overlapping. It is unclear how we got to that state, so add validation in sstable::set_first_and_last_keys() that the to-be-set first and last keys are well ordered. Otherwise, throw malformed_sstable_exception. set_first_and_last_keys is called both on the write path from the sstable writer before the sstable is sealed, and on the open/load path via update_info_for_opened_data(). This series also fixes issues with unit tests with regards to first/last keys so they won't fail the validation. Refs #8772 Test: unit(dev) DTest: next-gating(dev), materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_half_to_max_test(debug) * tag 'validate-first-and-last-keys-ordering-v1': sstable: validate first and last keys ordering test: lib: reusable_sst: save unexpected errors test: sstable_datafile_test: stcs_reshape_test: use token_generation_for_current_shard test: sstable_test: define primary key in schema for compressed sstable	2021-06-09 14:43:02 +02:00
Tomasz Grabiec	ce7a404f17	Merge "Cleanups/refactoring for Raft Group 0" from Kostja * scylla-dev/raft-group-0-part-1-rebase: raft: (service) pass Raft service into storage_service raft: (service) add comments for boot steps raft: add ordering for raft::server_address based on id raft: (internal) simplify construction of tagged_id raft: (internal) tagged_id minor improvements	2021-06-09 10:48:05 +02:00
Konstantin Osipov	267a8e99ad	raft: (service) pass Raft service into storage_service Raft group 0 initialization and configuration changes should be integrated with Scylla cluster assembly, happening when starting the storage service and joining the cluster. Prepare for this. Since Raft service depends on query processor, and query processor depends on storage service, to break a dependency loop split Raft initialization into two steps: starting an under-constructed instance of "sharded" Raft service, accepting an under-constructed instance of "sharded" query_processor, and then passed into storage service start function, and then the local state of Raft groups from system tables once query processor starts. Consistently abbreviate raft_services instance raft_svcs, as is the convention at Scylla. Update the tests.	2021-06-08 14:52:32 +03:00
Konstantin Osipov	d42d5aee8c	raft: (internal) simplify construction of tagged_id Make it easy to construct tagged_id from UUID.	2021-06-08 14:52:32 +03:00
Konstantin Osipov	c9a23e9b8a	raft: (internal) tagged_id minor improvements Introduce a syntax helper tagged_id::create_random_id(), used to create a new Raft server or group id. Provide a default ordering for tagged ids, for use in Raft leader discovery, which selects the smallest id for leader.	2021-06-08 14:52:32 +03:00
Nadav Har'El	355dbf2140	test/cql-pytest: option for running the tests over SSL This patch adds a "--ssl" option to test/cql-pytest's pytest, as well as to the run script test/cql-pytest/run. When "test/cql-pytest/run --ssl" is used, Scylla is started listening for encrypted connections on its standard port (9042) - using a temporary unsigned certificate. Then, the individual tests connect to this encrypted port using TLSv1.2 (Scylla doesn't support earlier version of SSL) instead of TCP. This "--ssl" feature allows writing test which stress various aspects of the connection (e.g., oversized requests - see PR #8800), and then be able to run those tests in both TCP and SSL modes. Fixes #8811 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210607200329.1536234-1-nyh@scylladb.com>	2021-06-08 11:43:20 +02:00
Avi Kivity	3e3003fcc1	Merge 'cql3: limit the concurrency of indexed statements' from Piotr Sarna Indexed select statements fetch primary key information from their internal materialized views and then use it to query the base table. Unfortunately, the current mechanism for retrieving base table rows makes it easy to overwhelm the replicas with unbounded concurrency - the number of concurrent ops is increased exponentially until a short read is encountered, but it's not enough to cap the concurrency - if data is fetched row-by-row, then short reads usually don't occur and as a result it's easy to see concurrency of 1M or higher. In order to avoid overloading the replicas, the concurrency of indexed queries is now capped at 4096 and additionally throttled if enough results are already fetched. For paged queries it means that the query returns as soon as 1MB of data is ready, and for unpaged ones the concurrency will no longer be doubled as soon as the previous iteration fetched 1MB of results. The fixed 4096 value can be subject to debate, its reasoning is as follows: for 2KiB rows, so moderately large but not huge, they result in fetching 10MB of data, which is the granularity used by replicas. For 200B rows, which is rather small, the result would still be around 1MB. At the same time, 4096 separate tasks also means 4096 allocations, so increasing the number also strains the allocator. Fixes #8799 Tests: unit(release), manual: observing metrics of modified index_paging_test Closes #8814 * github.com:scylladb/scylla: cql3: limit the transitional result size for indexed queries cql3: return indexed pages after 1MB worth of data cql3: limit the concurrency of indexed statements	2021-06-07 18:00:51 +03:00
Gleb Natapov	01b6a2eb38	raft: randomized_nemesis_test: tick virtual clock less aggressively Currently each tick of the virtual clock immediately schedules the next one at the end of the task queue, but this is too aggressive. If a tick generates work that need two tasks to be scheduled one after another such implementation will make the task queue grow to infinity. Considering that in the debug mode even ready future causes preemption and task queue shuffling may cause two or more ticks to be executed without any other work done in the middle it is very easy to get to such situation. The patch changes the virtual clock to tick only when a shard is idle. Message-Id: <20210606140305.2930189-1-gleb@scylladb.com>	2021-06-07 16:54:56 +02:00
Piotr Sarna	df0d44486a	cql3: limit the transitional result size for indexed queries Unpaged indexed queries already have a concurrency limit of 4096, but now the concurrency is further limited by previous number of bytes fetched. Once this number reached 1MB, the concurrency will not be increased in consecutive queries to avoid overload.	2021-06-07 16:29:18 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Tomasz Grabiec	50d64646cd	Merge "raft: replication test fixes and OOP refactor" from Alejo Feature requests, fixes, and OOP refactor of replication_test. Note: all known bugs and hangs are now fixed. A new helper class "raft_cluster" is created. Each move of a helper function to the class has its own commit. New helpers are provided To simplify code, for now only a single apply function can be set per raft_cluster. No tests were using in any other way. In the future, there could be custom apply functions per server dynamically assigned, if this becomes needed. * alejo/raft-tests-replication-02-v3-30: (66 commits) raft: replication test: wait for log for both index and term raft: replication test: reset network at construction raft: replication test: use lambda visitor for updates raft: replication test: move structs into class raft: replication test: move data structures to cluster class raft: replication test: remove shared pointers raft: replication test: move get_states() to raft_cluster raft: replication test: test_server inside raft_cluster raft: replication test: rpc declarative tests raft: replication test: add wait_log raft: replication test: add stop and reset server raft: replication test: disconnect 2 support raft: replication test: explicit node_id naming raft: replication test: move definitions up raft: replication test: no append entries support raft: replication test: fix helper parameter raft: replication test: stop servers out of config raft: replication test: wait log when removing leader from configuration raft: replication test: only manipulate servers in configuration raft: replication test: only cancel rearm ticker for removed server ...	2021-06-06 19:18:49 +03:00
Piotr Sarna	cb17aa1e53	Merge 'test/alternator: rewrite run script to share code with cql-pytest's run script' from Nadav Har'El In this small series, I rewrite test/alternator/run to Python using the utility functions developed for test/cql-pytest. In the future, we should do the same to test/redis/run and test/scylla-gdb/run. The benefit of this rewrite is less code duplication (all run scripts start with the same duplicate code to deal with temporary directories, to run Scylla IP addresses, etc.), but most importantly - in the future fixes we do to cql-pytest (e.g., parameters needed to start Scylla efficiently, how to shut down Scylla, etc.) will appear automatically in alternator test without needing to remember to change both. Another benefit is that test/alternator/run will now be Python, not a shell script. This should make it easier to integrate it into test.py (refs #6212) in the future - if we want to. Closes #8792 * github.com:scylladb/scylla: test/alternator: rewrite test/alternator/run script in Python test/cql-pytest: make test run code more general	2021-06-06 19:18:49 +03:00
Avi Kivity	872cd8f692	test: adjust copyright statement to use ScyllaDB rather than old name	2021-06-06 19:18:49 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	2187a59089	treewide: move `service::cas_request` out from `storage_proxy.hh` And remove all remaining inclusions of `storage_proxy.hh` in the headers. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Gleb Natapov	bb822c92ab	raft: change raft::rpc api to return void for most sending functions Most RAFT packets are sent very rarely during special phases of the protocol (like election or leader stepdown). The protocol itself does not care if a packet is sent or dropped, so returning futures from their send function does not serve any purpose. Change the raft's rpc interface to return void for all packet types but append_request. We still want to get a future from sending append_request for backpressure purposes since replication protocol is more efficient if there is no packet loss, so it is better to pause a sender than dropping packets inside the rpc. Rpc is still allowed to drop append_requests if overloaded.	2021-06-06 19:18:49 +03:00
Benny Halevy	3f9bad0f0a	test: compound_test: use tests::random For reproducibility. Test: compound_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210602061910.286893-2-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	40e032ff8b	test: compound_test: use to seastar test framework Prepare for using tests::random instead of std::rand for reproducibility. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210602061910.286893-1-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Calle Wilund	3b55ef36d1	cf_prop_defs: Fix extensions merge to handle removal Fixes #8773 When refactored for cdc, properties -> extensions merge was modified so it did not handle _removal_ (i.e. an extension function returning null -> no entry in new map). This causes certain enterprise extensions to not be able to disable themselves. Fixed by filtering existing extensions by property keywords. Unit test added. Closes #8774	2021-06-06 09:21:23 +03:00
Nadav Har'El	f22ed3ff5c	test/alternator: reduce very high timeout in one tracing test In test_tracing.py::test_slow_query_log, the was what looked like an innocent 30-second timeout, but this was in fact a 8 minute timeout - because it started with sleeping 1 second, then 2 seconds, then 3, ... until 30 seconds. Such a high timeout is frustrating when trying to debug failures in the test - which is only expected to take 2 seconds (and all of it because of an artificial timeout). So fix the loop to stop iterating after 60 seconds (a compromise between 30 seconds and 8 minutes...), sleeping a constant amount between iterations. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210601150631.1037158-1-nyh@scylladb.com>	2021-06-06 09:21:23 +03:00
Avi Kivity	100d6f4094	build: enable -Wunused-function Also drop a single violation in transport/server.cc. This helps prevent dead code from piling up. Three functions in row_cache_test that are not used in debug mode are moved near their user, and under the same ifdef, to avoid triggering the error. Closes #8767	2021-06-06 09:21:23 +03:00
Alejo Sanchez	3e91a8ca0d	raft: replication test: wait for log for both index and term Waiting on index alone does not guarantee leader correct leader log propagation. This patch add checking also the term of the leader's last log entry. This was exposed with occasional problems with packet drops. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-04 08:38:19 -04:00
Alejo Sanchez	545893145e	raft: replication test: reset network at construction Reset network in constructor, not in unrelated function. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-04 08:18:32 -04:00
Alejo Sanchez	294dcfb204	raft: replication test: use lambda visitor for updates Process updates with a lambda visitor. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-04 08:18:31 -04:00
Nadav Har'El	0bb2e010f5	test/alternator: rewrite test/alternator/run script in Python We already wrote the test/cql-pytest/run script in Python in a way it can be reusable for the other test//run scripts. So this patch replaces the test/alternator/run shell script with Python code which does the same thing (safely runs Scylla with Alternator and pytest on it in a temporary directory and IP address), but sharing most of the code that cql-pytest uses. The benefit of reusing the test/cql-pytest/run.py library goes beyond shorter code - the main benefit will be that we can't forget to fix one of the test//run scripts (e.g., add more command line options or fix a bug) when fixing another one. To make the test/cql-pytest/run.py library reusable for running Alternator, I needed to generalize a few things in this patch (e.g., the way we check and wait for Scylla to boot with the different APIs we intend to check). There is also one bug-fix on how interrupts are handled (they are now better guaranteed to kill pytest) - and now fixing this bug benefits all runners using run.py (cql-pytest/run, cql-pytest/run-cassandra and alternator/run). In the future, we can port the runners which are still duplicate shell scripts - test/redis/run and test/scylla-gdb/run - to Python in a similar manner to what we did here for test/alternator/run. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-03 11:23:00 +03:00
Nadav Har'El	ef45fccdae	test/cql-pytest: make test run code more general Change the cql-pytest-specific run_cql_pytest() function to a more general function to run pytest in any directory. Will be useful for reusing the same code for other test runners (e.g., Alternator), and is also clearer. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-03 11:22:36 +03:00
Benny Halevy	7a4591119b	test: lib: reusable_sst: save unexpected errors reusable_sst tries openeing an sstable using all sstable format versions in descending order. It is expected to see "file not found" if the actual sstable version is not the latest one. That said, we may hit other error if the sstable is malformed in any way, so do not override this kind of error if "file not found" errors are hit after it, and return the unexpected error instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 12:25:29 +03:00
Benny Halevy	9452b99b40	test: sstable_datafile_test: stcs_reshape_test: use token_generation_for_current_shard Currently the test is using "first_key", "last_key" literals for the first and last keys and expects them to sort properly with the murmur3 partitioner. Also it does that for all generated sstables which is less interesting for reshape. Use token_generation_for_current_shard to generate random, properly ordered keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 12:25:29 +03:00
Benny Halevy	d5405dade7	test: sstable_test: define primary key in schema for compressed sstable Otherwise, the primary_key will be considered as composite, as its length does not equal 1. That hampers token caluclation when decorating the dirst and last keys in the summary file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 12:25:29 +03:00
Alejo Sanchez	a3fc974de9	raft: replication test: move structs into class Move auxiliary classes connection and hash_connection out of raft_cluster and into connected class. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00
Alejo Sanchez	5b688d42d7	raft: replication test: move data structures to cluster class Move state_machine, persistence, connection, hash_connection, connected, failure_detector, and rpc inside raft_cluster. This commit moves declaration of class raft_cluster up. (Minimize changed lines) Moves apply_fn definition from state_machine to raft_cluster. Fixes namespace in declarations Keeps static rpc::net outside for now to keep this commit simple. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00
Alejo Sanchez	1250d910ee	raft: replication test: remove shared pointers Following gleb, tomek, and kamil's suggestion, remove unnecessary use of lw_shared_ptr. This also solves the problem of constructing a lw_shared_ptr from a forward declaration (connected) in a subsequent patch. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00
Alejo Sanchez	aa1200ee50	raft: replication test: move get_states() to raft_cluster Move get_states() helper inside raft cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00
Alejo Sanchez	740545cdc5	raft: replication test: test_server inside raft_cluster Since there are no more external users of test_server, move it to raft_cluster and remove member access operator. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00
Alejo Sanchez	1ee4408869	raft: replication test: rpc declarative tests Convert rpc replication tests to declarative form. This will enable moving remaining parts inside raft_cluster. For test stability, add support for checking rpc config of a node eventually changes to the expected configuration. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:03 -04:00

1 2 3 4 5 ...

1810 Commits