scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	1e7a444a8b	table: Wire compound sstable set From now own, _sstables becomes the compound set, and _main_sstables refer only to the main sstables of the table. In the near future, maintenance set will be introduced and will also be managed by the compound set. So add_sstable() and on_compaction_completion() are changed to explicitly insert and remove sstables from the main set. By storing compound set in _sstables, functions which used _sstables for creating reader, computing statistics, etc, will not have to be changed when we introduce the maintenance set, so code change is a lot minimized by this approach. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:46:06 -03:00
Raphael S. Carvalho	e4b5f5ba33	sstable_set: Introduce compound sstable set This new sstable set implementation is useful for combining operation of multiple sstable sets, which can still be referenced individually via its shared ptr reference. It will be used when maintenance set is introduced in table, so a compound set is required to allow both sets to have their operations efficiently combined. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:49 -03:00
Nadav Har'El	4a7d3175e9	test/alternator: make another test faster The slowest test in test_streams.py is test_list_streams_paged. It is meant to test the ListStreams operation with paging. The existing test repeated its test four times, for four different stream types. However, there is no reason to suspect that the ListStreams operation might somehow be different for the four stream types... We already have other tests which create streams of the four types, and uses these streams - we don't need the test for ListStreams to also test creating the four types. By doing this test just once, not four times, we can save around 1.5 seconds of test time. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210318073755.1784349-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	79af728335	test/alternator: make tracing test a bit faster In the test test_tracing.py::test_tracing_all, we do some operations and then need to wait until they appear in the tracing table. The current code used an exponentially-increasing delay during this wait, starting with 0.1 seconds and then doubling the delay until we find what we're looking for. However, it turns out that the delay until the data appears in the table is deliberately chosen by Scylla - and is always around 2 seconds. In this case, an exponential delay is really bad - we will usually wait for around 1 seconds too long after the needed wait of 2 seconds. So in this patch we replace the exponential delay by a constant delay - we wait 0.3 seconds between each retry. This change makes the test test_tracing.py::test_tracing_all finish in a little over 2 seconds, instead of a little over 3 seconds before this patch. We cannot reduce this 2 second time any further unless we make the 2-second tracing delay configurable. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210318000040.1782933-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	4e87f95b42	test/alternator: remove slow and unhelpful test The test test_table.py::test_table_streams_on creates tables with various stream types, and then immediately deletes them without testing anything. This is a slow test (taking almost a full second on my laptop), and is redundant because in test_streams.py we have tests which create tables with streams in the same way - but then actually test that things work with these streams. So this test might as well be removed, and this is what we do in this patch. Removing this test shaves another second from the Alternator test suite's run time. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317230530.1780849-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	879656e3e0	test/alternator: make a test faster, safer and more correct The test test_condition_expression.py::test_condition_expression_with_forbidden_rmw takes half a second to run (dev build, on my laptop), one of the slowest tests in Alternator's test suite. Part of the reason was that it needlessly set the same table to forbidden_rmw, multiple times. Instead of doing that, we switch to using the test_table_s_forbid_rmw fixture, which is a table like test_table_s but created just once in forbid_rmw mode. The result is a faster test (0.05 seconds instead of 0.5 seconds), but also safer if we ever want to run tests in parallel. It also fixes a bug in the test: At the end of the test, we intended to double-check that although the forbid_rmw table forbids read-modify-write operations, it does allow pure writes. Yet the test did this after clearing the forbid_rmw mode... So after this patch the test verifies this on the forbid_rmw table, as intended. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317222703.1779992-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	1c2e473e62	test/alternator: make a test faster The test test_condition_expression.py::test_condition_expression_with_permissive_write_isolation Currently takes (on my laptop, dev build) a full two seconds, one of the slowest tests. It is not surprising it is slow - it runs five other tests three times each (for three different write isolation modes), but it doesn't have to be this slow. Before this patch, for each of the five tests we switch the write isolation mode three times, and these switches involve schema changes and are fairly slow. So in this patch we reverse the loop - and switch the write isolation mode to the outer loop. This patch halves the runtime of this test - from two seconds to one. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317221045.1779329-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Benny Halevy	7862cad669	sstable_set: partitioned_sstable_set: clone: do clone all sstables The existing implementation wrongfully shares _all sstables rather than cloning it. This caused a use-after-free in `repair_meta::do_estimate_partitions_on_local_shard` when traversing a shared sstable_set, during which `table::make_reader_excluding_sstables` erased an entry. The erase should have happened on a cloned copy of the sstable_list, not on a shared copy. The regression was introduced in `c3b8757fa1`. Added a unit test that reproduces the share-on-copy issue for partitioned_stable_set (sstables::sstable_set). Fixes #8274 Test: unit(release, debug) DTest: materialized_views_test.py:TestMaterializedViews.simple_repair_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210317145552.701559-1-bhalevy@scylladb.com>	2021-03-18 11:15:59 +02:00
Nadav Har'El	42169b2eef	Merge 'Alternator: add slow query logging' from Piotr Sarna This series adds slow query logging capability to alternator. Queries which last longer than the specified threshold are logged in `system_traces.node_slow_log` and traced. In order to be better prepared for https://github.com/scylladb/scylla/issues/2572, this series also expands the tracing API to allow custom key-value params and adds a custom `alternator_op` parameter to the slow node log. This information can also be deduced from the tracing session id by consulting the system_traces.events table, but https://github.com/scylladb/scylla/issues/2572 's assumption is that this tracing might not always be available in the future. This series comes with a simple test case which checks if operation logs indeed end up in `system_traces.node_slow_log`. Tests: unit(dev, alternator pytest) manual: verified that no operations are logged if slow query logging is disabled; verified that operations that take less time than the threshold are not logged; verified with test_batch.py::test_batch_write_item_large that a large-enough operation is indeed logged and traced. Fixes #8292 Example trace: ```cql cqlsh> select parameters, duration from system_traces.node_slow_log where start_time=b7a44589-8711-11eb-8053-14c6c5faf955; parameters \| duration ---------------------------------------------------------------------------------------------+---------- {'alternator_op': 'DeleteTable', 'query': '{"TableName": "alternator_Test_1615979572905"}'} \| 75732 ``` Closes #8298 * github.com:scylladb/scylla: alternator: add test for slow query logging alternator: allow enabling slow query logging tracing: allow providing a custom session record param	2021-03-18 11:15:59 +02:00
Piotr Sarna	efe734c575	alternator: add test for slow query logging The test checks whether slow queries are properly logged in the system_traces.node_slow_log system table. The test is deterministic because it uses the threshold of 0ms to qualify a query as slow, which effectively makes all queries "slow enough".	2021-03-17 13:24:26 +01:00
Dejan Mircevski	8db24fc03b	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. Fixes #8265 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8287	2021-03-17 09:59:22 +02:00
Avi Kivity	972ea9900c	Merge 'commitlog: Make pre-allocation drop O_DSYNC while pre-filling' from Calle Wilund Refs #7794 Iff we need to pre-fill segment file ni O_DSYNC mode, we should drop this for the pre-fill, to avoid issuing flushes until the file is filled. Done by temporarily closing, re-opening in "normal" mode, filling, then re-opening. Closes #8250 * github.com:scylladb/scylla: commitlog: Make pre-allocation drop O_DSYNC while pre-filling commitlog: coroutinize allocate_segment_ex	2021-03-17 09:59:22 +02:00
Tomasz Grabiec	40121621f6	Merge "Kill some get_local_migration_manager() calls" from Pavel Emelyanov There are a bunch of such calls in schema altering statements and there's currently no way to obtain the migration manager for such statements, so a relatively big rework needed. The solution in this set is -- all statements' execute() methods are called with query processor as first argument (now the storage proxy is there), query processor references and provides migration manager for statements. Those statements that need proxy can already get it from the query processor. Afterwards table_helper and thrift code can also stop using the global migration manager instance, since they both have query processor in needed places. While patching them a couple of calls to global storage proxy also go away. The new query processor -> migration manager dependency fits into current start-stop sequence: the migration manager is started early, the query processor is started after it. On stop the query processor remains alive, but the migration manager stops. But since no code currently (should) call get_local_migration_manager() it will _not_ call the query_processor::get_migration_manager() either, so this dangling reference is ugly, but safe. Another option could be to make storage proxy reference migration manager, but this dependency doesn't look correct -- migration manager is higher-level service than the storage proxy is, it is migration manager who currently calls storage proxy, but not the vice versa. * xemul/br-kill-some-migration-managers-2: cql3: Get database directly from query processor thrift: Use query_processor::get_migration_manager() table_helper: Use query_processor::get_migration_manager() cql3: Use query_processor::get_migration_manager() (lambda captures cases) cql3: Use query_processor::get_migration_manager() (alter_type statement) cql3: Use query_processor::get_migration_manager() (trivial cases) query_processor: Keep migration manager onboard cql3: Pass query processor to announce_migration:s cql3: Switch to qp (almost) in schema-altering-stmt cql3: Change execute()'s 1st arg to query_processor	2021-03-17 09:59:22 +02:00
Pavel Solodovnikov	93c565a1bf	raft: allow raft server to start with initial term 0 Prior to the fix there was an assert to check in `raft::server_impl::start` that the initial term is not 0. This restriction is completely artificial and can be lifted without any problems, which will be described below. The only place that is dependent on this corner case is in `server_impl::io_fiber`. Whenever term or vote has changed, they will be both set in `fsm::get_output`. `io_fiber` checks whether it needs to persist term and vote by validating that the term field is set (by actually executing a `term != 0` condition). This particular check is based on an unobvious fact that the term will never be 0 in case `fsm::get_output` saves term and vote values, indicating that they need to be persisted. Vote and term can change independently of each other, so that checking only for term obscures what is happening and why even more. In either case term will never be 0, because: 1. If the term has changed, then it's naturally greater than 0, since it's a monotonically increasing value. 2. If the vote has changed, it means that we received a vote request message. In such case we have already updated our term to the requester's term. Switch to using an explicit optional in `fsm_output` so that a reader don't have to think about the motivation behind this `if` and just checks that `term_and_vote` optional is engaged. Given the motivation described above, the corresponding assert(_fsm->get_current_term() != term_t(0)); in `server_impl::start` is removed. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Nadav Har'El	e344f74858	Merge 'logalloc: improve background reclaim shares management' from Avi Kivity The log structured allocator's background reclaimer tries to allocate CPU power proportional to memory demand, but a bug made that not happen. Fix the bug, add some logging, and future-proof the timer. Also, harden the test against overcommitted test machines. Fixes #8234. Test: logalloc_test(dev), 20 concurrent runs on 2 cores (1 hyperthread each) Closes #8281 * github.com:scylladb/scylla: test: logalloc_test: harden background reclain test against cpu overcommit logalloc: background reclaim: use default scheduling group for adjusting shares logalloc: background reclaim: log shares adjustment under trace level logalloc: background reclaim: fix shares not updated by periodic timer	2021-03-17 09:59:21 +02:00
Pavel Emelyanov	1de235f4da	query_processor: Keep migration manager onboard The query processor sits upper than the migration manager, in the services layering, it's started after and (will be) stopped before the migration manager. The migration manager is needed in schema altering statements which are called with query processor argument. They will later get the migration manager from the query processor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-15 19:00:58 +03:00
Avi Kivity	65fea203d2	test: logalloc_test: harden background reclain test against cpu overcommit Use thread CPU time instead of real time to avoid an overcommitted machine from not being able to supply enough CPU for the test.	2021-03-15 13:54:49 +02:00
Alejo Sanchez	88063b6e3e	raft: tests: move common helpers to header Move common test helper functions and data structures to a common helpers.hh header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-15 06:16:58 -04:00
Alejo Sanchez	6139ad6337	raft: tests: move boost tests to tests/raft Move raft boost tests to test/raft directory. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-15 06:16:58 -04:00
Calle Wilund	48ca01c3ab	commitlog: Make pre-allocation drop O_DSYNC while pre-filling Refs #7794 Iff we need to pre-fill segment file ni O_DSYNC mode, we should drop this for the pre-fill, to avoid issuing flushes until the file is filled. Done by temporarily closing, re-opening in "normal" mode, filling, then re-opening. v2: * More comment v3: * Add missing flush v4: * comment v5: * Split coroutine and fix into separate patches	2021-03-15 09:35:45 +00:00
Tomasz Grabiec	f2ecb4617e	Merge "raft: implement prevoting stage in leader election" from Gleb This is how PhD explain the need for prevoting stage: One downside of Raft's leader election algorithm is that a server that has been partitioned from the cluster is likely to cause a disruption when it regains connectivity. When a server is partitioned, it will not receive heartbeats. It will soon increment its term to start an election, although it won't be able to collect enough votes to become leader. When the server regains connectivity sometime later, its larger term number will propagate to the rest of the cluster (either through the server's RequestVote requests or through its AppendEntries response). This will force the cluster leader to step down, and a new election will have to take place to select a new leader. Prevoting stage is addressing that. In the Prevote algorithm, a candidate only increments its term if it first learns from a majority of the cluster that they would be willing to grant the candidate their votes (if the candidate's log is sufficiently up-to-date, and the voters have not received heartbeats from a valid leader for at least a baseline election timeout). The Prevote algorithm solves the issue of a partitioned server disrupting the cluster when it rejoins. While a server is partitioned, it won't be able to increment its term, since it can't receive permission from a majority of the cluster. Then, when it rejoins the cluster, it still won't be able to increment its term, since the other servers will have been receiving regular heartbeats from the leader. Once the server receives a heartbeat from the leader itself, it will return to the follower state(in the same term). In our implementation we have "stable leader" extension that prevents spurious RequestVote to dispose an active leader, but AppendEntries with higher term will still do that, so prevoting extension is also required. * scylla-dev/raft-prevote-v5: raft: store leader and candidate state in state variant raft: add boost tests for prevoting raft: implement prevoting stage in leader election raft: reset the leader on entering candidate state raft: use modern unordered_set::contains instead of find in become_candidate	2021-03-12 11:15:51 +01:00
Gleb Natapov	e231186a7b	raft: store leader and candidate state in state variant We already have server state dependant state in fsm, so there is no need to maintain "voters" and "tracker" optionals as well. The upside is that optional and variant sates cannot drift apart now.	2021-03-12 11:12:57 +02:00
Gleb Natapov	e17e7d57bd	raft: add boost tests for prevoting	2021-03-12 11:12:57 +02:00
Avi Kivity	486f6bf29c	Merge "sstables: move format specific reader code to kl/, mx/" from Botond " Currently the sstable reader code is scattered across several source files as following (paths are relative to sstables/): * partition.cc - generic reader code; * row.hh - format specific code related to building mutation fragments from cells; * mp_row_consumer.hh - format specific code related to parsing the raw byte stream; This is a strange organization scheme given that the generic sstable reader is a template and as such it doesn't itself depend on the other headers where the consumer and context implementations live. Yet these are all included in partition.cc just so the reader factory function can instantiate the sstable reader template with the format specific objects. This patchset reorganizes this code such that the generic sstable reader is exposed in a header. Furthermore, format specific code is moved to the kl/ and mx/ directories respectively. Each directory has a reader.hh with a single factory function which creates the reader, all the format specific code is hidden from sight. The added benefit is that now reader code specific to a format is centralized in the format specific folder, just like the writer code. This patchset only moves code around, no logical changes are made. Tests: unit(dev) " * 'sstable-reader-separation/v1' of https://github.com/denesb/scylla: sstables: get rid of mp_row_consumer.{hh,cc} sstables: get rid of row.hh sstables/mp_row_consumer.hh: remove unused struct new_mutation sstables: move mx specific context and consumer to mx/reader.cc sstables: move kl specific context and consumer to kl/reader.cc sstables: mv partition.cc sstable_mutation_reader.hh	2021-03-11 16:57:54 +02:00
Botond Dénes	3ba782bddd	sstables: get rid of row.hh Move stuff contained therein to `sstable_mutation_reader.{hh,cc}` which will serve as the collection point of utility stuff needed by all reader implementations.	2021-03-11 12:17:13 +02:00
Botond Dénes	4e3ae9d913	sstables: move kl specific context and consumer to kl/reader.cc Move all the kl format specific context and consumer code to kl/reader* and add a factory function `kl::make_reader()` which takes over the job of instantiating the `sstable_mutation_reader` with the kl specific context and consumer. Code which is used by test is moved to kl/reader_impl.hh, while code that can be hidden us moved to kl/reader.cc. Users who just want to create a reader only have to include kl/reader.hh.	2021-03-11 12:17:13 +02:00
Avi Kivity	c8f692e526	Merge 'cql3: Rewrite get_clustering_bounds() using expressions' from Dejan Mircevski Instead of using the `restrictions` class hierarchy, calculate the clustering slice using the `expr::expression` representation of the WHERE clause. This will allow us to eventually drop the `restrictions` hierarchy altogether. Tests: unit (dev, debug) Closes #8227 * github.com:scylladb/scylla: cql3: Make get_clustering_bounds() use expressions cql3/expr: Add is_multi_column() cql3/expr: Add more operators to needs_filtering cql3: Replace CK-bound mode with comparison_order cql3/expr: Make to_range globally visible cql3: Gather slice-defining WHERE expressions cql3: Add statement_restrictions::_where test: Add unit tests for get_clustering_bounds	2021-03-11 11:46:52 +02:00
Dejan Mircevski	990de02d28	cql3: Make get_clustering_bounds() use expressions Use expressions instead of _clustering_columns_restrictions. This is a step towards replacing the entire restrictions class hierarchy with expressions. Update some expected results in unit tests to reflect the new code. These new results are equivalent to the old ones in how storage_proxy::query() will process them (details: bound_view::from_range() returns the same result for an empty-prefix singular as for (-inf,+inf)). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-03-10 21:25:43 -05:00
Dejan Mircevski	2525759027	test: Add unit tests for get_clustering_bounds ... as guardrails for the upcoming rewrite. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-03-10 21:17:26 -05:00
Benny Halevy	ff5b42a0fa	bytes_ostream: max_chunk_size: account for chunk header Currently, if the data_size is greater than max_chunk_size - sizeof(chunk), we end up allocating up to max_chunk_size + sizeof(chunk) bytes, exceeding buf.max_chunk_size(). This may lead to allocation failures, as seen in https://github.com/scylladb/scylla/issues/7950, where we couldn't allocate 131088 (= 128K + 16) bytes. This change adjusted the expose max_chunk_size() to be max_alloc_size (128KB) - sizeof(chunk) so that the allocated chunks would normally be allocated in 128KB chunks in the write() path. Added a unit test - test_large_placeholder that stresses the chunk allocation path from the write_place_holder(size) entry point to make sure it handles large chunk allocations correctly. Refs #7950 Refs #8081 Test: unit(release), bytes_ostream_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210303143413.902968-1-bhalevy@scylladb.com>	2021-03-10 19:54:12 +02:00
Avi Kivity	5342d79461	Merge "Preparatory work in sstable_set for the upcoming compound_sstable_set_impl" from Raphael * 'preparatory_work_for_compound_set' of github.com:raphaelsc/scylla: sstable_set: move all() implementation into sstable_set_impl sstable_set: preparatory work to change sstable_set::all() api sstables: remove bag_sstable_set	2021-03-10 19:19:26 +02:00
Botond Dénes	cf28552357	mutation_test: test_mutation_diff_with_random_generator: compact input mutations This test checks that `mutation_partition::difference()` works correctly. One of the checks it does is: m1 + m2 == m1 + (m2 - m1). If the two mutations are identical but have compactable data, e.g. a shadowable tombstone shadowed by a row marker, the apply will collapse these, causing the above equality check to fail (as m2 - m1 is null). To prevent this, compact the two input mutations. Fixes: #8221 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210310141118.212538-1-bdenes@scylladb.com>	2021-03-10 16:28:14 +01:00
Raphael S. Carvalho	05b07c7161	sstable_set: preparatory work to change sstable_set::all() api users of sstable_set::all() rely on the set itself keeping a reference to the returned list, so user can iterate through the list assuming that it is alive all the way through. this will change in the future though, because there will be a compound set impl which will have to merge the all() of multiple managed sets, and the result is a temporary value. so even range-based loops on all() have to keep a ref to the returned list, to avoid the list from being prematurely destroyed. so the following code for (auto& sst : sstable_set.all()) { ...} becomes for (auto sstables = sstable_set.all(); auto& sst : sstables) { ... } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-10 12:02:12 -03:00
Avi Kivity	746798fd56	Merge "sstables: get rid of data_consume_context" from Botond " This class is basically a wrapper around a unique pointer and a few short convenience methods, but is otherwise a distraction in trying to untangle the maze that is the sstable reader class hierachy. So this patchset folds it into its only real user: the sstable reader. " * 'data_consume_context_bye' of https://github.com/denesb/scylla: sstable: move data_consume_* factory methods to row.hh sstables: fold data_consume_context: into its users sstables: partition.cc: remove data_consume_* forward declarations	2021-03-10 16:45:32 +02:00
Botond Dénes	1aa2424dcf	sstable: move data_consume_* factory methods to row.hh	2021-03-10 15:40:50 +02:00
Botond Dénes	a06465a8f3	sstables: fold data_consume_context: into its users `data_consume_context` is a thin wrapper over the real context object and it does little more than forward method calls to it. The few methods doing more then mere forwarding can be folded into its single real user: `sstable_reader`.	2021-03-10 15:38:58 +02:00
Nadav Har'El	f41dac2a3a	alternator: avoid large contiguous allocation for request body Alternator request sizes can be up to 16 MB, but the current implementation had the Seastar HTTP server read the entire request as a contiguous string, and then processed it. We can't avoid reading the entire request up-front - we want to verify its integrity before doing any additional processing on it. But there is no reason why the entire request needs to be stored in one big contiguous allocation. This always a bad idea. We should use a non- contiguous buffer, and that's the goal of this patch. We use a new Seastar HTTPD feature where we can ask for an input stream, instead of a string, for the request's body. We then begin the request handling by reading lthe content of this stream into a vector<temporary_buffer<char>> (which we alias "chunked_content"). We then use this non-contiguous buffer to verify the request's signature and if successful - parse the request JSON and finally execute it. Beyond avoiding contiguous allocations, another benefit of this patch is that while parsing a long request composed of chunks, we free each chunk as soon as its parsing completed. This reduces the peak amount of memory used by the query - we no longer need to store both unparsed and parsed versions of the request at the same time. Although we already had tests with requests of different lengths, most of them were short enough to only have one chunk, and only a few had 2 or 3 chunks. So we also add a test which makes a much longer request (a BatchWriteItem with large items), which in my experiment had 17 chunks. The goal of this test is to verify that the new signature and JSON parsing code which needs to cross chunk boundaries work as expected. Fixes #7213. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210309222525.1628234-1-nyh@scylladb.com>	2021-03-10 09:22:34 +01:00
Tomasz Grabiec	c9c2beabc0	Merge "raft: replication tests as individual boost tests" from Alejo * alejo/raft-tests-replication-boost-5: raft: replication test: use Seastar random generator raft: replication test: rename drop_replication raft: replication test: change to Boost test raft: replication test: id helper functions raft: replication test: improve handling connectivity raft: replication test: parametrize snapshots raft: replication test: parametrize drop_replication raft: replication test: remove unused configuration raft: replication test: add license	2021-03-09 17:58:59 +01:00
Pavel Emelyanov	096e452db9	test: Fix exit condition of row_cache_test::test_eviction_from_invalidated The test populates the cache, then invalidates it, then tries to push huge (10x times the segment size) chunks into seastar memory hoping that the invalid entries will be evicted. The exit condition on the last stage is -- total memory of the region (sum of both -- used and free) becomes less than the size of one chunk. However, the condition is wrong, because cache usually contains a dummy entry that's not necessarily on lru and on some test iteration it may happen that evictable size < chunk size < evictable size + dummy size In this case test fails with bad_alloc being unable to evict the memory from under the dummy. fixes: #7959 tests: unit(row_cache_test), unit(the failing case with the triggering seed from the issue + 200 times more with random seeds) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210309134138.28099-1-xemul@scylladb.com>	2021-03-09 17:57:52 +01:00
Alejo Sanchez	f67b85e2b3	raft: replication test: use Seastar random generator Use the random generator provided by Seastar test suite. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 12:52:07 -04:00
Alejo Sanchez	1bf10a87c6	raft: replication test: rename drop_replication Rename drop_replication to packet_drops for readability. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 12:52:07 -04:00
Alejo Sanchez	6e193ee3bf	raft: replication test: change to Boost test Change test/raft directory to Boost test type. Run replication_test cases with their own test. RAFT_TEST_CASE macro creates 2 test cases, one with random 20% packet loss named name_drops. The directory test/raft is changed to host Boost tests instead of unit. While there improve the documentation. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 12:52:07 -04:00
Alejo Sanchez	8d9c797954	raft: replication test: id helper functions In raft the UUID 0 is a special case so server ids start at 1. Add two helper functions. Convert local 0-based id to raft 1-based UUID. And from UUID to raft_id. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 12:50:12 -04:00
Alejo Sanchez	0ffa450222	raft: replication test: improve handling connectivity Change global map of disconnected servers to a more intuitive class connected. The class is callable for the most common case connected(id). Methods connect(), disconnect(), and all() are provided for readability instead of directly calling map methods (insert, erase, clear). They also support both numerical (0 based) and server_id (UUID, 1 based) ids. The actual shared map is kept in a lw_shared_ptr. The class is passed around to be copy-constructed which is practically just creating a new lw_shared_ptr. Internally it tracks disconnected servers but externally it's more intuitive to use connect instead of disconnect. So it reads "connected id" and "not disconnected id", without double negatives. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 12:39:29 -04:00
Alejo Sanchez	7a644f37d3	raft: replication test: parametrize snapshots Snapshots and persisted snapshots created per test instead of globals. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 11:58:20 -04:00
Alejo Sanchez	f72e89fcfe	raft: replication test: parametrize drop_replication Pass drop_replication down instead of keeping it global. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 11:58:20 -04:00
Alejo Sanchez	5a03670f91	raft: replication test: remove unused configuration Remove test case configuration as it's not implemented yet. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 11:58:20 -04:00
Alejo Sanchez	efc6681cd6	raft: replication test: add license Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-09 11:58:20 -04:00
Gleb Natapov	2a41ad0b57	raft: add testing for non-voting members Add tests to check if quorum (for leader election and commit index purposes) is calculated correctly in the presence of non-voting members. Message-Id: <20210304101158.1237480-3-gleb@scylladb.com>	2021-03-09 13:51:09 +01:00
Gleb Natapov	dd6ba3d507	raft: add non-voting member support This patch adds a support for non-voting members. Non voting member is a member which vote is not counted for leader election purposes and commit index calculation purposes and it cannot become a leader. But otherwise it is a normal raft node. The state is needed to let new nodes to catch up their log without disturbing a cluster. All kind of transitions are allowed. A node may be added as a voting member directly or it may be added as non-voting and then changed to be voting one through additional configuration change. A node can be demoted from voting to non-voting member through a configuration change as well. Message-Id: <20210304101158.1237480-2-gleb@scylladb.com>	2021-03-09 13:47:48 +01:00

1 2 3 4 5 ...

1367 Commits