scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Author	SHA1	Message	Date
Piotr Sarna	e257ec11c0	treewide: remove service level controller from query state ... since it's accessible through its member, client state.	2021-05-10 11:48:14 +02:00
Piotr Sarna	d1f2e8b469	treewide: propagate service level to client state ... since it's going to be used to set up per-service-level timeouts.	2021-05-10 11:48:14 +02:00
Piotr Sarna	e8d271fea9	db: add extracting service level info via CQL	2021-05-10 11:45:09 +02:00
Piotr Sarna	7e6beabf27	migration_manager: allow table updates with timestamp In order to avoid needless schema disagreements, a way of announcing a schema change with fixed timestamp is added. That way, when nodes update schemas of their internal tables (e.g. during updates), it's possible for all nodes to use an identical timestamp for this operation, which in turn makes their digests identical.	2021-05-10 10:10:38 +02:00
Raphael S. Carvalho	8480839932	LCS/reshape: Don't reshape single sstable in level 0 with strict mode With strict mode, it could happen that a sstable alone in level 0 is selected for offstrategy compaction, which means that we could run into an infinite reshape process. This is fixed by respecting the offstrategy threshold. Unit test is added. Fixes #8573. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210506181324.49636-1-raphaelsc@scylladb.com>	2021-05-09 11:09:54 +03:00
Tomasz Grabiec	abe3d7d7d3	Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity storage_proxy uses std::vector<inet_address> for small lists of nodes - for replication (often 2-3 replicas per operation) and for pending operations (usually 0-1). These vectors require an allocation, sometimes more than one if reserve() is not used correctly. This series switches storage_proxy to use utils::small_vector instead, removing the allocations in the common case. Test results (perf_simple_query --smp 1 --task-quota-ms 10): ``` before: median 184810.98 tps ( 91.1 allocs/op, 20.1 tasks/op, 54564 insns/op) after: median 192125.99 tps ( 87.1 allocs/op, 20.1 tasks/op, 53673 insns/op) ``` 4 allocations and ~900 instructions are removed (the tps figure is also improved, but it is less reliable due to cpu frequency changes). The type change is unfortunately not contained in storage_proxy - the abstraction leaks to providers of replica sets and topology change vectors. This is sad but IMO the benefits make it worthwhile. I expect more such changes can be applied in storage_proxy, specifically std::unordered_set<gms::inet_address> and vectors of response handles. Closes #8592 * github.com:scylladb/scylla: storage_proxy, treewide: use utils::small_vector inet_address_vector:s storage_proxy, treewide: introduce names for vectors of inet_address utils: small_vector: add print operator for std::ostream hints: messages.hh: add missing #include	2021-05-06 18:00:54 +02:00
Tomasz Grabiec	6aec8cc447	Merge "raft: fixes and improvements for snapshot transfer" from Gleb * scylla-dev/raft-snapshot-fixes-v4: raft: document that add entry my throw commit_status_unknown raft: test: add test of a leadership change during ongoing snapshot transfer raft: test: retry submitting an entry if it was dropped raft: test: wait for the log to be fully replicated on new leader only raft: drop waiters with outdated terms raft: make snapshot transfer abortable raft: accept snapshots transfer from multiple nodes simultaneously raft: do not send probes while transferring snapshot raft: handle messages sending errors raft: test: return error from rpc module if nodes are disconnected raft: fix a typo in a variable name	2021-05-06 17:44:22 +02:00
Gleb Natapov	3a1bff26dd	raft: test: add test of a leadership change during ongoing snapshot transfer	2021-05-06 11:34:31 +03:00
Gleb Natapov	612e0f08c4	raft: test: retry submitting an entry if it was dropped	2021-05-06 11:34:31 +03:00
Gleb Natapov	0b2c9c549a	raft: test: wait for the log to be fully replicated on new leader only When forcing new leader it should be enough to wait for log to be fully replicated to that particular leader.	2021-05-06 11:34:31 +03:00
Gleb Natapov	6abe2772dc	raft: make snapshot transfer abortable A snapshot transfer may take a lot of time and meanwhile a leader doing it may lose the leadership. If that happens the ongoing snapshot transfer becomes obsolete since the snapshot will be rejected by the receiving node as coming from an old leader. Make snapshot transfer abortable and abort them when leader changes.	2021-05-06 11:34:31 +03:00
Gleb Natapov	d0ebd79deb	raft: test: return error from rpc module if nodes are disconnected Returning an error when nodes are disconnected closer resembles what will happen in real networking.	2021-05-06 11:34:31 +03:00
Botond Dénes	c872a963b6	test: move reader_concurrency_semaphore related tests into separate file The mutation_reader_test is already one of our largest test files. Move the reader concurrency semaphore related tests to a new file, making them easier to find making the mutation reader test a little bit smaller too.	2021-05-06 08:59:47 +03:00
Botond Dénes	5f217b6dee	test: mutation_reader_test: convert restricted reader tests to semaphore tests These two tests (restricted_reader_timeout and restricted_reader_max_queue_length) are testing the semaphore in reality, but through the restricted reader, which is distracting as it needlessly brings in an additional layer into the picture. Rewrite them to test the semaphore directly, getting much lighter in the process.	2021-05-06 08:57:12 +03:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Gleb Natapov	745f63991f	raft: test: fix c&p error in a test Message-Id: <YJKBOwBX8hqHLxsB@scylladb.com>	2021-05-05 17:18:49 +02:00
Botond Dénes	992819b188	database: add get_unlimited_query_max_result_size() Similar to the already existing get_reader_concurrency_semaphore(), this method determines the appropriate max result size for the query class, which is deduced from the current scheduling group. This method shares its scheduling group -> query class association mechanism with the above mentioned semaphore getter.	2021-05-05 13:30:42 +03:00
Tomasz Grabiec	121eb32679	Merge 'test: perf: report instructions retired per operations' from Avi Kivity Instructions retired per op is a much more stable than time per op (inverse throughput) since it isn't much affected by changes in CPU frequencey or other load on the test system (it's still somewhat affected since a slower system will run more reactor polls per op). It's also less indicative of real performance, since it's possible for fewer inststructions to execute in more time than more instructions, but that isn't an issue for comparative tests). This allows incremental changes to the code base to be compared with more confidence. Current results are around 55k instructions per read, and 52k for writes. Closes #8563 * github.com:scylladb/scylla: test: perf: tidy up executor_stats snapshot computation test: perf: report instructions retired per operations test: perf: add RAII wrapper around Linux perf_event_open() test: perf: make executor_stats_snapshot() a member function of executor	2021-05-05 00:54:08 +02:00
Tomasz Grabiec	b8665c459d	Merge "raft: replication test updates" from Alejo Cleanups, fixes, and configuration change support for replication tests. * alejo/raft-tests-replication-01-fixes-v13: raft: replication test: remove obsolete helper raft: replication test: add_entry with retries raft: replication test: support config change raft: replication test: add dummy command support raft: replication test: test both with and without prevote raft: replication test: make initial leader just default raft: replication test: create command helper raft: replication test: free elections as helper raft: replication test: fix election connectivity raft: replication test: fix custom election raft: replication test: add helpers for threshold and election raft: replication test: connectivity improvement raft: replication test: helper for server_address raft: replication test: use wait_log() raft: replication test: cycle leader more raft: replication test: fix a test description raft: replication test: remove multiple state machines raft: replication test: remove checksum raft: replication test: remove unused class param	2021-05-04 18:52:47 +02:00
Alejo Sanchez	27ad2a0f28	raft: replication test: remove obsolete helper As we are now serially adding commands with consecutive integers there is no need to build vectors of commands. Remove helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-04 11:01:07 -04:00
Alejo Sanchez	0a54fd848b	raft: replication test: add_entry with retries The current leader might have stepped down. Try again and learn if there's a new leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-04 11:00:46 -04:00
Nadav Har'El	df65d09e08	Merge ' cdc: log: fill cdc$deleted_ columns in pre-images ' from Piotr Grabowski Before this change, `cdc$deleted_` columns were all `NULL` in pre-images. Lack of such information made it hard to correctly interpret the pre-image rows, for example: ``` INSERT INTO tbl(pk, ck, v, v2) VALUES (1, 1, null, 1); INSERT INTO tbl(pk, ck, v2) VALUES (1, 1, 1); ``` For this example, pre-image generated for the second operation would look like this (in both `true` and `full` pre-image mode): ``` pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 ``` `v=NULL` has two meanings: 1. If pre-image was in `true` mode, `v=NULL` describes that v was not affected (affected columns: pk, ck, v2). 2. If pre-image was in `full` mode, `v=NULL` describes that v was equal to `NULL` in the pre-image. Therefore, to properly decode pre-images you would need to know in which mode pre-image was configured on the CDC-enabled table at the moment this CDC log row was inserted. There is no way to determine such information (you can only check a current mode of pre-image). A solution to this problem is to fill in the `cdc$deleted_` columns for pre-images. After this PR, for the `INSERT` described above, CDC now generates the following log row: If in pre-image 'true' mode: ``` pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 ``` If in pre-image 'full' mode: ``` pk=1, ck=1, v=NULL, cdc$deleted_v=true, v2=1 ``` A client library now can properly decode a pre-image row. If it sees a `NULL` value, it can now check the `cdc$deleted_` column to determine if this `NULL` value was a part of pre-image or it was omitted due to not being an affected column in the delta operation. No such change is necessary for the post-image rows, as those images are always generated in the `full` mode. Additional example: Additional example of trouble decoding pre-images before this change. tbl2 - `true` pre-image mode, tbl3 - `full` pre-image mode: ``` INSERT INTO tbl2(pk, ck, v, v2) VALUES (1, 1, 5, 1); INSERT INTO tbl3(pk, ck, v, v2) VALUES (1, 1, null, 1); ``` ``` INSERT INTO tbl2(pk, ck, v2) VALUES (1, 1, 1); ``` generated pre-image: ``` pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 ``` ``` INSERT INTO tbl3(pk, ck, v2) VALUES (1, 1, 1); ``` generated pre-image: ``` pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1 ``` Both pre-images look the same, but: 1. `v=NULL` in tbl2 describes v being omitted from the pre-image. 2. `v=NULL` in tbl3 described v being `NULL` in the pre-image. Closes #8568 * github.com:scylladb/scylla: cdc: log: assert post_image is always in full mode cdc: tests: check cdc$deleted_ columns in images cdc: log: fill cdc$deleted_ columns in pre-images	2021-05-04 14:45:27 +03:00
Piotr Grabowski	778fbb144f	cdc: tests: check cdc$deleted_ columns in images Add a test that checks whether the cdc$deleted_ columns are properly filled in the pre/post-image rows. This test checks tables with only atomic columns, tables with frozen collections and non-frozen collections. The test is performed with both 'true' pre-image mode and 'full' pre-image mode.	2021-05-04 12:33:15 +02:00
Calle Wilund	7e345e37e8	cql/cdc_batch_delete_postimage_test - rename test files + fix result The tests, when added, where not named kosher (_test), which the runner apparently quaintly, require to pick it up (instead of the more sensisble .cql). Thusly, the test was never run beyond initial creation, and also bit-rotted slightly during behaviour changes. Renamed and re-resulted. Closes #8581	2021-05-04 12:47:33 +03:00
Alejo Sanchez	56e977ae69	raft: replication test: support config change Add support for configuration change on leader. Keep track of servers in config in test. Add a dummy entry to confirm configuration changed. If the add fails, because the old leader was not in the new config and stepped down, the config is considered changed, too. Add a test with some configuration changes. Add a test cycling every scenario for 1 of 4 nodes removed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	8d8af92cbb	raft: replication test: add dummy command support Use a special value as dummy entry to be ignored when seen in state machine input. Ignore dummy entries for count. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	4aa52be7e5	raft: replication test: test both with and without prevote Before this change the default was prevote enabled. With this change each test is run with and without prevote. This duplicates the number of test cases. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	e759e492c7	raft: replication test: make initial leader just default The test suite requires an initial leader and at the moment it's always just 0. Make it default and simplify code. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	eb5bbcdec7	raft: replication test: create command helper Factor out repeated code and make it available for other uses. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	eb94dd26dc	raft: replication test: free elections as helper Add a helper to run free elections and use it in partitioning. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	cb297a57df	raft: replication test: fix election connectivity If a leader was already disconnected the election of a new leader could re-connect. Save original connectivity and restore it when done electing new leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	0a5c605713	raft: replication test: fix custom election Use the new specific connectivity to manage old leader disconnection more specifically. This fixes having elections where the vote of the old leader is required for quorum. For example {A,B} and we want to switch leader. For B to become candidate it has to see A as down. Then A has to see B's request for vote, and vote for A. So to make the general case old leader needs to be first disconnected from all nodes, make the desired node candidate, then have the old leader connected only to the desired candidate (else, other nodes would see the new candidate as disrupting a live leader). Also, there might be stray messages from the former leader. These could revert the candidate to follower. To handle this this patch retries the process until the desired node becomes leader. The helper function elect_me_leader() is split and renamed to wait_until_candidate() and wait_election_done(). The former ticks until the node is a candidate and the later waits until a candidate either becomes a leader or reverts to follower The existing etcd test workaround of incrementing from n=2 to n=3 nodes is corrected back to original n=2. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	9909983e38	raft: replication test: add helpers for threshold and election Add 2 helper functions for making nodes reach timeout threshold and to elect a specific node. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	38526d7a2f	raft: replication test: connectivity improvement Replace simple full disconnect of a node with specific from -> to disconnection tracking. This will help electing new leaders. Say there are {A,B,C} with A leader and we want to elect B. Before this patch, we would disconnect A, run an election with just {B,C}, and then re-connect A. If we have {A,B} and want to elect B, this won't work as B needs 2/2+1 votes and A is disconnected. Even if we made A stepped down. This patch corrects this shortcoming. (@gleb-cloudius) With this patch, we can specify other followers (not the previous or next leader) to not see the old leader, but the new and old leaders see each other just fine. In the example {A,B,C} above we can cut A<->B specifcally. Also, this is closer to etcd testing and should help porting cases. NOTE: in the current test implementation failure_detector reports node.is_alive(other_node) if there is a connection both ways. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	f53dea432c	raft: replication test: helper for server_address A helper function to convert from local 0-based id to raft 1-based server_address. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	294e16cf8b	raft: replication test: use wait_log() Use wait_log() helper in leftover election code. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	355c8a052f	raft: replication test: cycle leader more For ported etcd test cycle leader, cycle some more. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	5b2c9a6c94	raft: replication test: fix a test description Fix replace_log_leaders_log_empty description comment. Reported by @kbraun Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	bbb56e2265	raft: replication test: remove multiple state machines Checksum was removed so undo support for multiple versions added in: test: add support for different state machines `43dc5e7dc2` NOTE: as there is a test with custom total_values, expected value cannot be static const anymore. (line 630) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	e77af8573b	raft: replication test: remove checksum Previously, entries were added in parallel and we needed to check if order was broken. Using a simple checksum was better than a hash as you could easily find the position it broke (we add consecutive numbers). Now order of entries is forced so it's not useful. This patch removes it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	9335941b49	raft: replication test: remove unused class param persisted_snapshots is not used in state_machine class. Remove it. Reported by @kbraun Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Pavel Solodovnikov	4c351ff260	raft: switch `group_id` type from `uint64_t` to `utils::UUID` Introduce a tagged id struct for `group_id`. Raft code would want to generate quite a lot of unique raft groups in the future (e.g. tablets). UUID is designed exactly for that (e.g. larger capacity than `uint64_t`, obviously, and also has built-in procedures to generate random ids). Also, this is a preparation to make "raft group 0" use a random ID instead of a literal fixed `0` as a group id. The purpose is that every scylla cluster must have a unique ID for "raft group 0" since we don't want the nodes from some other cluster to disrupt the current cluster. This can happen if, for some reason, a foreign node happens to contact a node in our cluster. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210429170630.533596-3-pa.solodovnikov@scylladb.com>	2021-05-02 16:39:54 +03:00
Botond Dénes	26ae9555d1	test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front The fuzzy test consumes a large chunk of resource from the semaphore up-front to simulate a contested semaphore. This isn't an accurate simulation, because no permit will have more than 1 units in reality. Furthermore this can even cause a deadlock since `8aaa3a7` as now we rely on all count units being available to make forward progress when memory is scarce. This patch just cuts out this part of the test, we now have a dedicated unit test for checking a heavily contested semaphore, that does it properly, so no need to try to fix this clumsy attempt that is just making trouble at this point. Refs: #8493 Tests: release(multishard_mutation_query_test:fuzzy_test) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210429084458.40406-1-bdenes@scylladb.com>	2021-04-29 11:45:53 +03:00
Avi Kivity	2b252ef9b7	test: perf: tidy up executor_stats snapshot computation Now that executor_stats_snapshot() is a member function, we can move the capture of _count into invocations into it, capturing all the stats in one place.	2021-04-28 19:02:35 +03:00
Avi Kivity	863b49af03	test: perf: report instructions retired per operations Instructions retired per op is a much more stable than time per op (inverse throughput) since it isn't much affected by changes in CPU frequencey or other load on the test system (it's still somewhat affected since a slower system will run more reactor polls per op). It's also less indicative of real performance, since it's possible for fewer inststructions to execute in more time than more instructions, but that isn't an issue for comparative tests). This allows incremental changes to the code base to be compared with more confidence.	2021-04-28 18:46:55 +03:00
Avi Kivity	0bc98caf3e	test: perf: add RAII wrapper around Linux perf_event_open() Make it easy to embed in other classes. A helper function is provided for the instructions retired counter.	2021-04-28 18:41:02 +03:00
Avi Kivity	498e6b9a64	test: perf: make executor_stats_snapshot() a member function of executor I'd like to add an instructions counter which isn't accessible via a global, so make the snapshot function a member. Out of respect to #1, define functions for getting the number of allocations and tasks processed, as they need heavy header files.	2021-04-28 18:38:35 +03:00
Avi Kivity	a43e896396	test: perf: don't truncate allocation/req and tasks/req report I used {:.0} to truncate to integer, but apparently that resulted in only one significant digit in the report, so 93.1 was reported as 90. Use the {:5.1f} to avoid truncation, and even get an extra digit (we can have fractional tasks/op due to batching). Current result is 93.1 allocs/op, 20.1 tasks/op (which suggests batch size of around 10). Closes #8550	2021-04-28 12:50:13 +02:00
Nadav Har'El	7d2df8a9bc	test/alternator,cql-pytest: fix resource leak on failure In the alternator and cql-pytest test frameworks, we have some convenient contextmanager-based functions that allows us to create a temporary resource (e.g., a table) that will be automatically deleted, for example: with create_stream_test_table(...) as table: test_something(table) However, our implementation of these functions wasn't safe. We had code looking like: table = ... yield table table.delete() The thinking was that the cleanup part (the table.delete()) will be called after the user's code. However, if the user's code threw (i.e., a failed assertion), the cleanup wasn't called... When the user's code throws, it looks as if the "yield" throws. So the correct code should look like: table = ... try: yield table finally: table.delete() Python's contextmanager documentation indeed gives this idiom in its example. This patch fixes all contextmanager implementations in our tests to do the cleanup even if the user's "with" block throws. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210428083748.552203-1-nyh@scylladb.com>	2021-04-28 10:51:02 +02:00
Nadav Har'El	f50db50d10	test/cql-pytest: test for "WHERE v=NULL" in restrictions Issues #4476 and #8489, and also Cassandra's CASSANDRA-10715, all request that filtering with "WHERE v=NULL" should return the rows where the column v is unset. However, we made a deliberate decision to do something else: That "WHERE v=NULL" should match no row. Exactly like it does in SQL. This is what this test verifies - that "WHERE v=NULL" never matches any row - not even rows where "v" is unset. This test is expected to fail on Cassandra (so marked cassandra_bug), because in Cassandra the "WHERE v=NULL" restriction is forbidden, instead of succeeding and returning nothing. Although we differ here from Cassandra, after a lot of deliberation we decided that Scylla's behavior is the correct one, so this test verifies it. Refs #4776. Refs #8489. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210426183145.323301-1-nyh@scylladb.com>	2021-04-27 09:26:33 +03:00

1 2 3 4 5 ...

1592 Commits