scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 20:46:56 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	c64a156c53	table: Introduce maintenance sstable set This new sstable set will hold sstables created by repair-based operations. A repair-based op creates 1 sstable per vrange (256), so sstables added to this new set are disjoint, therefore they can be efficiently read from using partitioned_sstable_set. Compound set is changed to include this new set, so sstables in this new set are automatically included when creating readers, computing statistics, and so on. This new set is not backlog tracked, so changes were needed to prevent a sstable in this set from being added or removed from the tracker. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:47 -03:00
Raphael S. Carvalho	1e7a444a8b	table: Wire compound sstable set From now own, _sstables becomes the compound set, and _main_sstables refer only to the main sstables of the table. In the near future, maintenance set will be introduced and will also be managed by the compound set. So add_sstable() and on_compaction_completion() are changed to explicitly insert and remove sstables from the main set. By storing compound set in _sstables, functions which used _sstables for creating reader, computing statistics, etc, will not have to be changed when we introduce the maintenance set, so code change is a lot minimized by this approach. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:46:06 -03:00
Raphael S. Carvalho	42b309b43e	table: prepare make_reader_excluding_sstables() to work with compound sstable set Compound set will not be inserted or erased directly, so let's change this function to build a new set from scratch instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	4e142458eb	table: prepare discard_sstables() to work with compound sstable set After compound set, discard_sstables() will have to prune each set individually and later refresh the compound set. So let's change the function to support multiple sstable sets, taking into account that a sstable set may not want to be backlog tracked. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	d25822a030	table: extract add_sstable() common code into a function The purpose is to allow the code to be eventually reused by maintenance sstable set, which will be soon introduced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	e4b5f5ba33	sstable_set: Introduce compound sstable set This new sstable set implementation is useful for combining operation of multiple sstable sets, which can still be referenced individually via its shared ptr reference. It will be used when maintenance set is introduced in table, so a compound set is required to allow both sets to have their operations efficiently combined. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:49 -03:00
Raphael S. Carvalho	1261519266	reshape: STCS: preserve token contiguity when reshaping disjoint sstables When reshaping hundreds of disjoint sstables, like on bootstrap, contiguity wasn't being preserved because the heuristic for picking candidates didn't take into account their token range, which resulted in reshape messing with the contiguity that could otherwise be preserved by respecting the token order of the disjoint sstables. In other words, sstables with the smallest first tokens should be compacted first. By doing that, the contiguity is preserved even across size tiers, after reshape has completed its possible multiple rounds to get all the data in shape. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:36:18 -03:00
Nadav Har'El	4a7d3175e9	test/alternator: make another test faster The slowest test in test_streams.py is test_list_streams_paged. It is meant to test the ListStreams operation with paging. The existing test repeated its test four times, for four different stream types. However, there is no reason to suspect that the ListStreams operation might somehow be different for the four stream types... We already have other tests which create streams of the four types, and uses these streams - we don't need the test for ListStreams to also test creating the four types. By doing this test just once, not four times, we can save around 1.5 seconds of test time. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210318073755.1784349-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	79af728335	test/alternator: make tracing test a bit faster In the test test_tracing.py::test_tracing_all, we do some operations and then need to wait until they appear in the tracing table. The current code used an exponentially-increasing delay during this wait, starting with 0.1 seconds and then doubling the delay until we find what we're looking for. However, it turns out that the delay until the data appears in the table is deliberately chosen by Scylla - and is always around 2 seconds. In this case, an exponential delay is really bad - we will usually wait for around 1 seconds too long after the needed wait of 2 seconds. So in this patch we replace the exponential delay by a constant delay - we wait 0.3 seconds between each retry. This change makes the test test_tracing.py::test_tracing_all finish in a little over 2 seconds, instead of a little over 3 seconds before this patch. We cannot reduce this 2 second time any further unless we make the 2-second tracing delay configurable. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210318000040.1782933-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	4e87f95b42	test/alternator: remove slow and unhelpful test The test test_table.py::test_table_streams_on creates tables with various stream types, and then immediately deletes them without testing anything. This is a slow test (taking almost a full second on my laptop), and is redundant because in test_streams.py we have tests which create tables with streams in the same way - but then actually test that things work with these streams. So this test might as well be removed, and this is what we do in this patch. Removing this test shaves another second from the Alternator test suite's run time. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317230530.1780849-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	879656e3e0	test/alternator: make a test faster, safer and more correct The test test_condition_expression.py::test_condition_expression_with_forbidden_rmw takes half a second to run (dev build, on my laptop), one of the slowest tests in Alternator's test suite. Part of the reason was that it needlessly set the same table to forbidden_rmw, multiple times. Instead of doing that, we switch to using the test_table_s_forbid_rmw fixture, which is a table like test_table_s but created just once in forbid_rmw mode. The result is a faster test (0.05 seconds instead of 0.5 seconds), but also safer if we ever want to run tests in parallel. It also fixes a bug in the test: At the end of the test, we intended to double-check that although the forbid_rmw table forbids read-modify-write operations, it does allow pure writes. Yet the test did this after clearing the forbid_rmw mode... So after this patch the test verifies this on the forbid_rmw table, as intended. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317222703.1779992-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Nadav Har'El	1c2e473e62	test/alternator: make a test faster The test test_condition_expression.py::test_condition_expression_with_permissive_write_isolation Currently takes (on my laptop, dev build) a full two seconds, one of the slowest tests. It is not surprising it is slow - it runs five other tests three times each (for three different write isolation modes), but it doesn't have to be this slow. Before this patch, for each of the five tests we switch the write isolation mode three times, and these switches involve schema changes and are fairly slow. So in this patch we reverse the loop - and switch the write isolation mode to the outer loop. This patch halves the runtime of this test - from two seconds to one. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317221045.1779329-1-nyh@scylladb.com>	2021-03-18 11:24:18 +01:00
Takuya ASADA	d9a625c842	scylla_setup: don't run node-exporter setup when it's not installed We need to run package existance check before run setup of node-exporter. Fixes #8276 Closes #8278	2021-03-18 11:24:18 +01:00
Avi Kivity	f038d1555c	Merge 'Add more context to configure.py' from Piotr Sarna This series makes configure.py output slightly more helpful in case of incorrect parameters passed to the compiler/linker. Closes #8267 * github.com:scylladb/scylla: configure: print more context if the linking attempt failed configure: provide more context on failed ./configure.py run configure: add verbose option to try_compile_and_link	2021-03-18 11:24:18 +01:00
Takuya ASADA	0424a41e30	tools/toolchain: stop ignoring error on install-dependencies.sh, run jmx/java script correctly We should run install-dependencies.sh with -e option to prevent ignoring error in the script. Also, need to add tools/jmx/install-dependencies.sh and tools/java/install-dependencies.sh, to fix 'No such file or directory' error on them. Fixes #8293 Closes #8294 [avi: did not regenerate toolchain image, since no new packages are installed]	2021-03-18 11:24:18 +01:00
Avi Kivity	b91d6776a0	Update tools/java submodule * tools/java fdc8fcc22c...7b66b7a0fc (1): > dist/redhat: add support SLES	2021-03-18 11:24:18 +01:00
Nadav Har'El	bd742f2951	Merge 'treewide: get rid of incorrect reinterpret casts' from Michał Chojnowski In some places we use the `reinterpret_cast<const net::packed<T>>(&x)` pattern to reinterpret memory. This is a violation of C++'s aliasing rules, which invokes undefined behaviour. The blessed way to correctly reinterpret memory is to copy it into a new object. Let's do that. Note: the reinterpret_cast way has no performance advantage. Compilers recognize the memory copy pattern and optimize it away. Closes #8241 * github.com:scylladb/scylla: treewide: get rid of unaligned_cast treewide: get rid of incorrect reinterpret casts	2021-03-18 11:24:18 +01:00
Benny Halevy	7862cad669	sstable_set: partitioned_sstable_set: clone: do clone all sstables The existing implementation wrongfully shares _all sstables rather than cloning it. This caused a use-after-free in `repair_meta::do_estimate_partitions_on_local_shard` when traversing a shared sstable_set, during which `table::make_reader_excluding_sstables` erased an entry. The erase should have happened on a cloned copy of the sstable_list, not on a shared copy. The regression was introduced in `c3b8757fa1`. Added a unit test that reproduces the share-on-copy issue for partitioned_stable_set (sstables::sstable_set). Fixes #8274 Test: unit(release, debug) DTest: materialized_views_test.py:TestMaterializedViews.simple_repair_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210317145552.701559-1-bhalevy@scylladb.com>	2021-03-18 11:15:59 +02:00
Piotr Sarna	ea096de1b4	service, transport: avoid using private storage_service fields ... in the transport controller. Instead, simple getters would suffice. Message-Id: <582a71d0c1b61edf0107f5a2ef96536c395972d0.1615988516.git.sarna@scylladb.com>	2021-03-18 11:15:59 +02:00
Nadav Har'El	42169b2eef	Merge 'Alternator: add slow query logging' from Piotr Sarna This series adds slow query logging capability to alternator. Queries which last longer than the specified threshold are logged in `system_traces.node_slow_log` and traced. In order to be better prepared for https://github.com/scylladb/scylla/issues/2572, this series also expands the tracing API to allow custom key-value params and adds a custom `alternator_op` parameter to the slow node log. This information can also be deduced from the tracing session id by consulting the system_traces.events table, but https://github.com/scylladb/scylla/issues/2572 's assumption is that this tracing might not always be available in the future. This series comes with a simple test case which checks if operation logs indeed end up in `system_traces.node_slow_log`. Tests: unit(dev, alternator pytest) manual: verified that no operations are logged if slow query logging is disabled; verified that operations that take less time than the threshold are not logged; verified with test_batch.py::test_batch_write_item_large that a large-enough operation is indeed logged and traced. Fixes #8292 Example trace: ```cql cqlsh> select parameters, duration from system_traces.node_slow_log where start_time=b7a44589-8711-11eb-8053-14c6c5faf955; parameters \| duration ---------------------------------------------------------------------------------------------+---------- {'alternator_op': 'DeleteTable', 'query': '{"TableName": "alternator_Test_1615979572905"}'} \| 75732 ``` Closes #8298 * github.com:scylladb/scylla: alternator: add test for slow query logging alternator: allow enabling slow query logging tracing: allow providing a custom session record param	2021-03-18 11:15:59 +02:00
Avi Kivity	de45575ea9	Merge "Allow all supported compaction types to be stopped by nodetool stop" from Raphael " All compaction types can now be stopped with the nodetool stop command, example: nodetool stop SCRUB Supported types are: COMPACTION, CLEANUP, VALIDATION, SCRUB, INDEX_BUILD, RESHARD, UPGRADE, RESHAPE. " * 'stop_compaction_types_v2' of github.com:raphaelsc/scylla: compaction: Allow all supported compaction types to be stopped compaction: introduce function to map compaction name to respective type compaction: refactor mapping of compaction type to string compaction: move compaction_name() out of line	2021-03-18 11:15:59 +02:00
Botond Dénes	981699ae76	sstables: move promoted_index_blocks_reader into own header index_entry.hh (the current home of `promoted_index_blocks_reader`) is included in `sstables.hh` and thus in half our code-base. All that code really doesn't need the definition of the promoted index blocks reader which also pulls in the sstables parser mechanism. Move it into its own header and only include it where it is actually needed: the promoted index cursor implementations. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210317093654.34196-1-bdenes@scylladb.com>	2021-03-18 11:15:59 +02:00
Botond Dénes	5859195b36	sstables: mx/parser.hh: add missing include Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210317093806.34858-1-bdenes@scylladb.com>	2021-03-18 11:15:59 +02:00
Benny Halevy	2e7677f76b	sstables: sstable_set_impl: include mutation_reader.hh To make sstables/sstable_set_impl.hh self-sufficient mutation_reader.hh provides position_reader_queue, needed by time_series_sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210317094223.590067-1-bhalevy@scylladb.com>	2021-03-18 11:15:59 +02:00
Michał Chojnowski	5c3385730b	treewide: get rid of unaligned_cast unaligned_cast violates strict aliasing rules. Replace it with safe equivalents.	2021-03-17 17:00:41 +01:00
Michał Chojnowski	4e35befcf2	treewide: get rid of incorrect reinterpret casts In some places we use the `reinterpret_cast<const net::packed<T>>(&x)` pattern to reinterpret memory. This is a violation of C++'s aliasing rules, which invokes undefined behaviour. The blessed way to correctly reinterpret memory is to copy it into a new object. Let's do that. Note: the reinterpret_cast way has no performance advantage. Compilers recognize the memory copy pattern and optimize it away.	2021-03-17 17:00:38 +01:00
Piotr Sarna	efe734c575	alternator: add test for slow query logging The test checks whether slow queries are properly logged in the system_traces.node_slow_log system table. The test is deterministic because it uses the threshold of 0ms to qualify a query as slow, which effectively makes all queries "slow enough".	2021-03-17 13:24:26 +01:00
Benny Halevy	6846319e65	partitioned_sstables_set: insert: propagate exception Do not swallow the caught exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210316170821.496218-1-bhalevy@scylladb.com>	2021-03-17 13:29:03 +02:00
Piotr Sarna	f9adee70d2	alternator: allow enabling slow query logging Alternator is now aware of the slow query logging configuration and can start tracing slow queries.	2021-03-17 11:20:42 +01:00
Piotr Sarna	5386739354	tracing: allow providing a custom session record param The mechanism of session record params is currently only used to store query strings and a couple more params like consistency level, but since we now have more frontends than just CQL and Thrift, it would be nice to also allow the users to put custom parameters in there. An immediate first user of this mechanism would be alternator, which is going to put the operation type under the "alternator_op" key. The operation type is not part of the query string due to how DynamoDB's protocol works - the op type is stored separately in the HTTP header. While it's possible to extract the operation type from the session_id, it might not be the case once #2572 is implemented.	2021-03-17 11:14:28 +01:00
Gleb Natapov	32d386d0d8	raft: fix use after free during logging in append_entries_reply() As the existing comment explains a progress can be deleted at the point of logging. The logging should only be done if the progress still exists. Message-Id: <YFDFVRQU1iVYhFdM@scylladb.com>	2021-03-17 09:59:22 +02:00
Dejan Mircevski	8db24fc03b	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. Fixes #8265 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8287	2021-03-17 09:59:22 +02:00
Avi Kivity	1afd6fbe06	hashing: appending_hash: convert from enable_if to concepts A little simpler to understand. Closes #8288	2021-03-17 09:59:22 +02:00
Piotr Sarna	7961a28835	Merge 'storage_proxy: Include counter writes in... ... `writes_coordinator_outside_replica_set`' from Juliusz Stasiewicz With this change, coordinator prefers himself as the "counter leader", so if another endpoint is chosen as the leader, we know that coordinator was not a member of replica set. With this guarantee we can increment `scylla_storage_proxy_coordinator_writes_coordinator_outside_replica_set` metric after electing different leader (that metric used to neglect the counter updates). The motivation for this change is to have more reliable way of counting non-token-aware queries. Fixes #4337 Closes #8282 * github.com:scylladb/scylla: storage_proxy: Include counter writes in `writes_coordinator_outside_replica_set` counters: Favor coordinator as leader	2021-03-17 09:59:22 +02:00
Avi Kivity	972ea9900c	Merge 'commitlog: Make pre-allocation drop O_DSYNC while pre-filling' from Calle Wilund Refs #7794 Iff we need to pre-fill segment file ni O_DSYNC mode, we should drop this for the pre-fill, to avoid issuing flushes until the file is filled. Done by temporarily closing, re-opening in "normal" mode, filling, then re-opening. Closes #8250 * github.com:scylladb/scylla: commitlog: Make pre-allocation drop O_DSYNC while pre-filling commitlog: coroutinize allocate_segment_ex	2021-03-17 09:59:22 +02:00
Dejan Mircevski	992d5c6184	cql3/expr: Improve column printing Before this change, we would print an expression like this: ((ColumnDefinition{name=c, type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN, componentIndex=0, droppedAt=-9223372036854775808}) = 0000007b) Now, we print the same expression like this: (c = 0000007b) Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8285	2021-03-17 09:59:22 +02:00
Tomasz Grabiec	40121621f6	Merge "Kill some get_local_migration_manager() calls" from Pavel Emelyanov There are a bunch of such calls in schema altering statements and there's currently no way to obtain the migration manager for such statements, so a relatively big rework needed. The solution in this set is -- all statements' execute() methods are called with query processor as first argument (now the storage proxy is there), query processor references and provides migration manager for statements. Those statements that need proxy can already get it from the query processor. Afterwards table_helper and thrift code can also stop using the global migration manager instance, since they both have query processor in needed places. While patching them a couple of calls to global storage proxy also go away. The new query processor -> migration manager dependency fits into current start-stop sequence: the migration manager is started early, the query processor is started after it. On stop the query processor remains alive, but the migration manager stops. But since no code currently (should) call get_local_migration_manager() it will _not_ call the query_processor::get_migration_manager() either, so this dangling reference is ugly, but safe. Another option could be to make storage proxy reference migration manager, but this dependency doesn't look correct -- migration manager is higher-level service than the storage proxy is, it is migration manager who currently calls storage proxy, but not the vice versa. * xemul/br-kill-some-migration-managers-2: cql3: Get database directly from query processor thrift: Use query_processor::get_migration_manager() table_helper: Use query_processor::get_migration_manager() cql3: Use query_processor::get_migration_manager() (lambda captures cases) cql3: Use query_processor::get_migration_manager() (alter_type statement) cql3: Use query_processor::get_migration_manager() (trivial cases) query_processor: Keep migration manager onboard cql3: Pass query processor to announce_migration:s cql3: Switch to qp (almost) in schema-altering-stmt cql3: Change execute()'s 1st arg to query_processor	2021-03-17 09:59:22 +02:00
Raphael S. Carvalho	2065e2c912	partitioned_sstable_set: adjust select_sstable_runs() to work with compound set compound set will select runs from all of its managed sets, so let's adjust select_sstable_runs() to only return runs which belong to it. without this adjustment, selection of runs would fail because function would try to unconditionally retrieve the run which may live somewhere else. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312042255.111060-3-raphaelsc@scylladb.com>	2021-03-17 09:59:22 +02:00
Raphael S. Carvalho	02b2df1ea9	sstable_set: move select_sstable_runs() into partitioned_sstable_set after compound set is introduced, select_sstable_runs() will no longer work because the sstable runs live in sstable_set, but they should actually live in the sstable_set being written to. Given that runs is a concept that belongs only to strategies which use partitioned_sstable_set, let's move the implementation of select_sstable_runs() to it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312042255.111060-2-raphaelsc@scylladb.com>	2021-03-17 09:59:22 +02:00
Avi Kivity	11308c05f4	Update tools/jmx submodule * tools/jmx 15c1d4f...9c687b5 (1): > dist/redhat: add support SLES	2021-03-17 09:59:22 +02:00
Calle Wilund	a0745f9498	messaging_service: Enforce dc/rack membership iff required for non-tls connections When internode_encryption is "rack" or "dc", we should enforce incoming connections are from the appropriate address spaces iff answering on non-tls socket. This is implemented by having two protocol handlers. One for tls/full notls, and one for mixed (needs checking) connections. The latter will ask snitch if remote address is kosher, and refuse the connection otherwise. Note: requires seastar patches: "rpc: Make is possible for rpc server instance to refuse connection" "RPC: (client) retain local address and use on stream creation" Note that ip-level checks are not exhaustive. If a user is also using "require_client_auth" with dc/rack tls setting we should warn him that there is a possibility that someone could spoof himself pass the authentication. Closes #8051	2021-03-17 09:59:22 +02:00
Avi Kivity	bcd41cb32d	Merge 'Support installing our rpm to SLES' from Takuya ASADA Basically SLES support is already done in `f20736d93d`, but it was for offline installer. This fixes few more problems to install our rpm to SLES. After this change, we can just install our rpm for both CentOS/RHEL and SLES in single image, like unified deb. SLES uses original package manager called 'zypper', but it does support yum repository so no need to change required for repo. Closes #8277 * github.com:scylladb/scylla: scylla_coredump_setup: support SLES scylla_setup: use rpm to check package availability for SLES dist: install optional packages for SLES	2021-03-17 09:59:22 +02:00
Tomasz Grabiec	cc0bb92afe	Merge "raft: provide a ticker for each raft server" from Pavel Solodovnikov Automatically initialize and start a timer in `raft_services::add_server` for each raft server instance created. The patch set also changes several other things in order for tickers to work: 1. A bug in `raft_sys_table_storage` which caused an exception if `raft::server::start` is called without any persisted state. 2. `raft_services::add_server` now automatically calls `raft::server::start()` since a server instance should be started before any of its methods can be called. 3. Raft servers can now start with initial term = 0. There was an artificial restriction which is now lifted. 4. Raft schema state machine now returns a ready future instead of throwing "not implemented" exception in `abort()`. * github.com/ManManson/scylla.git/raft_services_tickers_v9_next_rebase: raft/raft_services: provide a ticker for each raft server raft/raft_services: switch from plain `throw` to `on_internal_error` raft/raft_services: start server instance automatically in `add_server` raft: return ready future instead of throwing in schema_raft_state_machine raft: allow raft server to start with initial term 0 raft/raft_sys_table_storage: fix loading term/vote and snapshot from empty state	2021-03-17 09:59:22 +02:00
Nadav Har'El	e344f74858	Merge 'logalloc: improve background reclaim shares management' from Avi Kivity The log structured allocator's background reclaimer tries to allocate CPU power proportional to memory demand, but a bug made that not happen. Fix the bug, add some logging, and future-proof the timer. Also, harden the test against overcommitted test machines. Fixes #8234. Test: logalloc_test(dev), 20 concurrent runs on 2 cores (1 hyperthread each) Closes #8281 * github.com:scylladb/scylla: test: logalloc_test: harden background reclain test against cpu overcommit logalloc: background reclaim: use default scheduling group for adjusting shares logalloc: background reclaim: log shares adjustment under trace level logalloc: background reclaim: fix shares not updated by periodic timer	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	aaea8c6c7d	raft/raft_services: provide a ticker for each raft server Automatically initialize a ticker for each raft server instance when `raft_services::add_server` is called. A ticker is a timer which regularly calls `raft::server::tick` in order to tick its raft protocol state machine. Note that the timer should start after the server calls its `start()` method, because otherwise it would crash since fsm is not initialized yet. Currently, the tick interval is hardcoded to be 100ms. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	1496a3559f	raft/raft_services: switch from plain `throw` to `on_internal_error` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	975c9a8021	raft/raft_services: start server instance automatically in `add_server` Raft server instance cannot be used in any way prior to calling the `start()` method, which initializes its internal state, e.g. raft protocol state machine. Otherwise, it will likely result in a crash. Also, properly stop the servers on shutdown via `raft_services::stop_servers()`. In case some exception happened inside `add_server`, the `init` function will de-initialize what it already initialized, i.e. raft rpc verbs. This is important since otherwise it would break further initialization process and, what is more important, will prevent raft rpc verbs deinitialization. This will cause a crash in `messaging_service` uninit procedure, because raft rpc handlers would still be initialized. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	0b3dba07bd	raft: return ready future instead of throwing in schema_raft_state_machine The current implementation throws an exception, which will cause a crash when stopping scylla. This will be used in the next patch. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	93c565a1bf	raft: allow raft server to start with initial term 0 Prior to the fix there was an assert to check in `raft::server_impl::start` that the initial term is not 0. This restriction is completely artificial and can be lifted without any problems, which will be described below. The only place that is dependent on this corner case is in `server_impl::io_fiber`. Whenever term or vote has changed, they will be both set in `fsm::get_output`. `io_fiber` checks whether it needs to persist term and vote by validating that the term field is set (by actually executing a `term != 0` condition). This particular check is based on an unobvious fact that the term will never be 0 in case `fsm::get_output` saves term and vote values, indicating that they need to be persisted. Vote and term can change independently of each other, so that checking only for term obscures what is happening and why even more. In either case term will never be 0, because: 1. If the term has changed, then it's naturally greater than 0, since it's a monotonically increasing value. 2. If the vote has changed, it means that we received a vote request message. In such case we have already updated our term to the requester's term. Switch to using an explicit optional in `fsm_output` so that a reader don't have to think about the motivation behind this `if` and just checks that `term_and_vote` optional is engaged. Given the motivation described above, the corresponding assert(_fsm->get_current_term() != term_t(0)); in `server_impl::start` is removed. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Pavel Solodovnikov	ae5f26adec	raft/raft_sys_table_storage: fix loading term/vote and snapshot from empty state When a raft server is started for the first time and there isn't any persisted state yet, provide default return values for `load_term_and_vote` and `load_snapshot`. The code currently does not handle this corner case correctly and fail with an exception. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00

1 2 3 4 5 ...

25560 Commits