scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 06:53:12 +00:00

Author	SHA1	Message	Date
Botond Dénes	d2ddaced4e	test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds The lifecycle of the reader lifecycle policy and all the resources the reads use is now enclosed in that of the multishard reader thanks to its close() method. We can now remove all the workarounds we had in place to keep different resources as long as background reader cleanup finishes.	2021-06-16 11:29:36 +03:00
Botond Dénes	5a271e42a5	test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore So that when this method returns the semaphore is safe to destroy. This in turn will enable us to get rid of all the machinery we have in place to deal with the semaphore having to out-live the lifecycle policy without a clear time as to when it can be safe to destroy.	2021-06-16 11:29:36 +03:00
Botond Dénes	c09c62a0fb	test/lib/reader_lifecycle_policy: use a more robust eviction mechanism The test reader lifecycle policy has a mode in which it wants to ensure all inactive readers are evicted, so tests can stress reader recreation logic. For this it currently employs a trick of creating a waiter on the semaphore. I don't even know how this even works (or if it even does) but it sure complicates the lifecycle policy code a lot. So switch to the much more reliable and simple method of creating the semaphore with a single count and no memory. This ensures that all inactive reads are immediately evicted, while still allows a single read to be admitted at all times.	2021-06-16 11:29:36 +03:00
Botond Dénes	578a092e4a	reader_concurrency_semaphore: wait for all permits to be destroyed in stop() To prevent use-after-free resulting from any permit out-living the semaphore.	2021-06-16 11:29:36 +03:00
Botond Dénes	a10a6e253e	test/lib/reader_lifcecycle_policy: fix indentation Left broken from the previous patch.	2021-06-16 11:29:36 +03:00
Botond Dénes	8c7447effd	mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard Currently shard_reader::close() (its caller) goes to the remote shard, copies back all fragments left there to the local shard, then calls `destroy_reader()`, which in the case of the multishard mutation query copies it all back to the native shard. This was required before because `shard_reader::stop()` (`close()`'s) predecessor) couldn't wait on `smp::submit_to()`. But close can, so we can get rid of all this back-and-forth and just call `destroy_reader()` on the shard the reader lives on, just like we do with `create_reader()`.	2021-06-16 11:29:35 +03:00
Botond Dénes	4ecf061c90	reader_lifecycle_policy implementations: fix indentation Left broken from the previous patch.	2021-06-16 11:21:38 +03:00
Botond Dénes	a7e59d3e2c	mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter The shard reader is now able to wait on the stopped reader and pass the already stopped reader to `destroy_reader()`, so we can de-futurize the reader parameter of said method. The shard reader was already patched to pass a ready future so adjusting the call-site is trivial. The most prominent implementation, the multishard mutation query, can now also drop its `_dismantling_gate` which was put in place so it can wait on the background stopping if readers. A consequence of this move is that handling errors that might happen during the stopping of the reader is now handled in the shard reader, not all lifecycle policy implementations.	2021-06-16 11:21:38 +03:00
Botond Dénes	13d7806b62	mutation_reader: shard_reader::close(): wait on the remote reader We now have a future<> returning close() method so we don't need to do the cleanup of the remote reader in the background, detaching it from the shard-reader under destruction. We can now wait for the cleanup properly before the shard reader is destroyed and just pass the stopped reader to reader_lifecycle_policy::destroy_reader(). This patch does the first part -- moving the cleanup to the foreground, the API change of said method will come in the next patch.	2021-06-16 11:21:38 +03:00
Botond Dénes	ab8d2a04a5	multishard_mutation_query: destroy remote parts in the foreground Currently the foreign fields of the reader meta are destroyed in the background via the foreign pointer's destructor (with one exception). This makes the already complicated life-cycle of these parts and their dependencies even harder to reason about, especially in tests, where even things like semaphores live only within the test. This patch makes sure to destroy all these remote fields in the foreground in either `save_reader()` or `stop()`, ensuring that once `stop()` returns, everything is cleaned up.	2021-06-16 11:21:38 +03:00
Botond Dénes	7552cc73cf	mutation_reader: shard_reader::close(): close _reader The reason we got away without closing _reader so far is that it is an `std::unique_ptr<evictable_reader>` which is a `flat_mutation_reader::impl` instance, without the `flat_mutation_reader` wrapper, which contains the validations for close.	2021-06-16 11:21:33 +03:00
Botond Dénes	98e5f0429b	mutation_reader: reader_lifcecycle_policy::destroy_reader(): remove out-of-date comment About the multishard reader not being able to wait on returned future. It can now via the `close()` method.	2021-06-15 15:23:32 +03:00
Tomasz Grabiec	9d49a26e79	Merge "raft: randomized_nemesis_test: tick servers less often than the network in basic_test" from Kamil Previously `ticker` would use a single function, `on_tick`, which it called in a loop with yields in-between. In `basic_test` we would use this to tick every object in synchrony. However, to closely simulate a production environment, we want the tick ratios to be different. For example Raft servers should be ticked rarely compared to the network. We may also want to give the Seastar reactor more space between the function calls (e.g. if they cause a bunch of work to be created for the reactor that needs more than one tick to complete). To support these use cases we first generalize `ticker` to take a set of functions with associated numbers. These numbers are the call periods of their corresponding functions: given {n, f}, `f` will be called each `n`th tick. We use this new functionality to tick Raft servers less often than the network in basic_test. This patchset effectively reverts `01b6a2eb38` which caused the ticker to call `on_tick` only when the Seastar reactor had no work to do. This approach is unfortunately incompatible with the approach taken there. We do want the ticker to race with other work, potentially producing more work while already scheduled work is executing, and we want to see in tests what happens when we adjust the ticking ratios of different subsystems. The previous approach also had a problem where if there was an infinite task loop executing, the ticker wouldn't ever tick. The previous fix was introduced since the ticker caused too much work to be produced (so the reactor couldn't keep up) due to ticking the Raft servers too often (after each yield). This commit deals with the problem in a different way, by ticking the servers rarely, which also resembles "real-life" scenarios better. * kbr/tick-network-often-v4: raft: randomized_nemesis_test: generalize `ticker` to take a set of functions raft: randomized_nemesis_test: split `environment::tick` into two functions raft: randomized_nemesis_test: fix potential use-after-free in basic_test	2021-06-15 01:54:57 +02:00
Kamil Braun	8f1caa6a90	raft: randomized_nemesis_test: generalize `ticker` to take a set of functions ... with associated calling periods and use the new API in `basic_test`. Previously `ticker` would use a single function, `on_tick`, which it called in a loop with yields in-between. In `basic_test` we would use this to tick every object in synchrony. However, to closely simulate a production environment, we may want the tick ratios to be different. For example Raft servers should be ticked rarely compared to the network. We may also want to give the Seastar reactor more space between the function calls (e.g. if they cause a bunch of work to be created for the reactor that needs more than one tick to complete). To support these use cases we generalize `ticker` to take a set of functions with associated numbers. These numbers are the call periods of their corresponding functions: given {n, f}, `f` will be called each `n`th tick. We also modify `basic_test` to use this new approach: we tick Raft servers once per 10 network ticks (in particular, once per 10 reactor yields). This commit effectively reverts `01b6a2eb38` which caused the ticker to call `on_tick` only when the Seastar reactor had no work to do. This approach is unfortunately incompatible with the approach taken there. We do want the ticker to race with other work, potentially producing more work while already scheduled work is executing, and we want to see in tests what happens when we adjust the ticking ratios of different subsystems. The previous approach also had a problem where if there was an infinite task loop executing, the ticker wouldn't ever tick. The previous fix was introduced since the ticker caused too much work to be produced (so the reactor couldn't keep up) due to ticking the Raft servers too often (after each yield). This commit deals with the problem in a different way, by ticking the servers rarely, which also resembles "real-life" scenarios better. With this change we must also wait a bit longer for the first node to elect itself as a leader at the beginning of the test.	2021-06-14 16:54:38 +02:00
Kamil Braun	c0b80f1f8a	raft: randomized_nemesis_test: split `environment::tick` into two functions One for ticking the network and one for ticking the servers.	2021-06-14 16:54:38 +02:00
Kamil Braun	f42776aded	raft: randomized_nemesis_test: fix potential use-after-free in basic_test The test starts by waiting a certain number of ticks for the first node to elect itself as a leader. If this wait times out - i.e. the number of ticks passes before the node manages to elect itself - the future associated with the task which checks for the leader condition becomes discarded (it is passed to `with_timeout`) and the task may keep using the `environment` (which it has a reference to) even after the `environment` is destroyed. Furthermore, the aforementioned task is a coroutine which uses lambda captures in its body. Leaving `with_timeout` destroys the lambda object, causing the coroutine to refer to no-longer-existing captures. We fix the problems by: - making `environment` `weakly_referencable` and checking if its alive before it's used inside the task, - not capturing anything in the lambda but passing whatever's needed as function arguments (so these things get allocated inside the coroutine frame).	2021-06-14 16:54:38 +02:00
Nadav Har'El	3645c7104b	Merge: Wrap alternator start-stop into controller Merged patch series by Pavel Emelyanov: Alternator start and stop code is sitting inside the main() and it's a big piece of code out there. Havig it all in main complicates rework of start-stop sequences, it's much more handy to have it in alternator/. This set puts the mentioned code into transport- and thrift- like controller model. While doing it one more call for global storage service goes away. * 'br-alternator-clientize' of https://github.com/xemul/scylla: alternator: Move start-stop code into controller alternator: Move the whole starting code into a sched group alternator: Dont capture db, use cfg alternator: Controller skeleton alternator: Controller basement alternator: Drop storage service from executor	2021-06-14 15:44:10 +03:00
Michael Livshin	15b0e5c4d2	sstables: count read range tombstones Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210602152210.17948-2-michael.livshin@scylladb.com>	2021-06-14 14:37:33 +02:00
Michael Livshin	9ef2317248	row_cache: count range tombstones processed during read Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210602152210.17948-1-michael.livshin@scylladb.com>	2021-06-14 14:29:05 +02:00
Nadav Har'El	6726fe79b6	Merge 'view: fix use-after-move when handling view update failures' from Piotr Sarna The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Fixes #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build) Closes #8834 * github.com:scylladb/scylla: view: fix use-after-move when handling view update failures db,view: explicitly move the mutation to its helper function db,view: pass base token by value to mutate_MV	2021-06-14 13:15:35 +03:00
Alejo Sanchez	5c8092cf42	raft: fix election with disruptive candidate This patch also fixes rare hangs in debug mode for drops_04 without prevote. Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling Tests: unit ({dev}), unit ({debug}), unit ({release}) Changes in v2: - Fixed commit message @kostja Whithout prevote, a node disconnected for long enough becomes candidate. While disconnected (A) it keeps increasing its term. When it rejoins it disrupts the current leader (C) which steps down due to the higher term in (A)'s append_entries_reply and (C) also increases its term. Meanwhile followers (B) and (D) don't know (C) stepped down but see it alive according to the current failure detecture implementation, and also (A) has shorter log than them. So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers). Then (C) rejects voting for (A) because it has shorter log. And (C) becomes candidate but even though (A) votes for (C), the previous followers (B) and (D) ignore a vote request while leader (C) is still alive and election timeout has not passed. (A) and (C) alone can't reach quorum 2/4. So elections never succeed. This patch addresses this problem by making followers not ignore vote requests from who they think is the current leader even though election timout was not reached. As @kostja noted, if failure detector would consider a leader alive only as long as it sends heartbeats (append requests) this patch is no longer needed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>	2021-06-14 11:07:38 +02:00
Piotr Jastrzebski	1ed92e37f8	database: Fix warning about deprecated update_shares_for_class usage This patch fixes the following compilation warning: database.cc:430:33: warning: 'update_shares_for_class' is deprecated: Use io_priority_class.update_shares [-Wdeprecated-declarations] _inflight_update = engine().update_shares_for_class(_io_priority, uint32_t(shares)); Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8751	2021-06-14 10:42:22 +03:00
Piotr Sarna	8a049c9116	view: fix use-after-move when handling view update failures The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Refs #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build)	2021-06-14 09:36:10 +02:00
Piotr Sarna	7cdbb7951a	db,view: explicitly move the mutation to its helper function The `apply_to_remote_endpoints` helper function used to take its `mut` parameter by reference, but then moved the value from it, which is confusing and prone to errors. Since the value is moved-from, let's pass it to the helper function as rvalue ref explicitly.	2021-06-14 09:34:40 +02:00
Piotr Sarna	88d4a66e90	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value.	2021-06-14 09:30:38 +02:00
Nadav Har'El	6a8441ef03	Update seastar submodule * seastar 4506b878...813eee3e (12): > reactor: fix race with boost::barrier destructor during smp initialialization > Merge "Merge io-group and io-queue configs" from Pavel E > tests: add test for skipping data from a socket > tests: transform socket_test into a test suite > .gitignore: Add tags > tls: retain handshake error and return original problem on repeated failures > iostream: fix skipping from closed sockets > gitignore .cooking_memory > Merge 'metrics: Fix dtest->ulong conversion error' from Benny Halevy > io_priority_class: Make update_shares const > Remove <seastar/core/apply.hh> > smp: allow having multiple instances of the smp class The fix to make io_priority::update_shares() const will allow getting rid of one of the compilation warnings.	2021-06-14 10:27:14 +03:00
Nadav Har'El	061e43e9d4	Merge 'Fix some compilation warnings' from Piotr Jastrzębski Closes #8850 * github.com:scylladb/scylla: priority_manager: Fix warnings about deprecated register_one_priority_class usage main: Fix warning about deprecated usage of io_queue::capacity	2021-06-14 10:05:27 +03:00
Piotr Jastrzebski	831a60a6cd	priority_manager: Fix warnings about deprecated register_one_priority_class usage This patch fixes following warnings: service/priority_manager.cc:30:36: warning: 'register_one_priority_class' is deprecated: Use io_priority_class::register_one [-Wdeprecated-declarations] : _commitlog_priority(engine().register_one_priority_class("commitlog", 1000)) service/priority_manager.cc:31:35: warning: 'register_one_priority_class' is deprecated: Use io_priority_class::register_one [-Wdeprecated-declarations] , _mt_flush_priority(engine().register_one_priority_class("memtable_flush", 1000)) service/priority_manager.cc:32:36: warning: 'register_one_priority_class' is deprecated: Use io_priority_class::register_one [-Wdeprecated-declarations] , _streaming_priority(engine().register_one_priority_class("streaming", 200)) service/priority_manager.cc:33:36: warning: 'register_one_priority_class' is deprecated: Use io_priority_class::register_one [-Wdeprecated-declarations] , _sstable_query_read(engine().register_one_priority_class("query", 1000)) service/priority_manager.cc:34:37: warning: 'register_one_priority_class' is deprecated: Use io_priority_class::register_one [-Wdeprecated-declarations] , _compaction_priority(engine().register_one_priority_class("compaction", 1000)) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-14 08:49:46 +02:00
Piotr Jastrzebski	3ec04433f7	main: Fix warning about deprecated usage of io_queue::capacity This patch fixes the following warning: main.cc:307:53: warning: 'capacity' is deprecated: modern I/O queues should use a property file [-Wdeprecated-declarations] auto capacity = engine().get_io_queue().capacity(); It's fine to just check --max-io-requests directly because seastar sets io_queue::capacity to the value of this parameter anyway. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-14 08:49:42 +02:00
Raphael S. Carvalho	846f0bd16e	sstables: Fix incremental selection with compound sstable set Incremental selection may not work properly for LCS and ICS due to an use-after-free bug in partitioned set which came into existence after compound set was introduced. The use-after-free happens because partitioned set wasn't taking into account that the next position can become the current position in the next iteration, which will be used by all selectors managed by compound set. So if next position is freed, when it were being used as current position, subsequent selectors would find the current position freed, making them produce incorrect results. Fix this by moving ownership of next pos from incremental_selector_impl to incremental_selector, which makes it more robust as the latter knows better when the selection is done with the next pos. incremental_selector will still return ring_position_view to avoid copies. Fixes #8802. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>	2021-06-13 16:45:07 +03:00
Kamil Braun	9e85921006	storage_proxy: remove a feedback loop from the speculative retry latency metric To handle a read request from a client, the coordinator node must send data and digest requests to replicas, reconcile the obtained results (by merging the obtained mutations and comparing digests), and possibly send more requests to replicas if the digests turned out to be different in order to perform read repair and preserve consistency of observed reads. In contrast to writes, where coordinators send their mutation write requests to all replicas in the replica set, for reads the coordinators send their requests only to as many replicas as is required to achieve the desired CL. For example consider RF=3 and a CL=QUORUM read. Then the coordinator sends its request to a subset of 2 nodes out of the 3 possible replicas. The choice of the 2-node subset is random; the distribution used for the random roll is affected by certain things such as the "cache hitrate" metric. The details are not that relevant for this discussion. If not all of the the initially chosen replicas answer within a certain time period, the coordinator may send an additional request to one more replica, hoping that this replica helps achieving the desired CL so the entire client request succeeds. This mechanism is called "speculative retry" and is enabled by default. This time period - call it `T` - is chosen based on keyspace configuration. The default value is "99.0PERCENTILE", which means that `T` is roughly equal to the 99th percentile of the latency distribution of previous requests (or at least the most recent requests; the algorithm uses an exponential decay strategy to make old request less relevant for the metric). The latencies used are the durations of whole coordinator read requests: each such duration measurement starts before the first replica request is sent and ends after the last replica request is answered, among the replica requests whose results were used for the reconciled result returned to the client (there may be more requests sent later "in the background" - they don't affect the client result and are not taken into account for the latency measurement). This strategy, however, gives an undesired effect which appears when a significant part of all requests require a speculative retry to succeed. To explain this effect it's best to consider a scenario which takes this to the extreme - where all requests require a speculative retry. Consider RF=3 and CL=QUORUM so each read request initially uses 2 replicas. Let {A, B, C} be the set of replicas. We run a uniformly distributed read workload. Initially the cluster operates normally. Roughly 1/3 of all requests go to replicas {A, B}, 1/3 go to {A, C}, and 1/3 go to {B, C}. The 99th percentile of read request latencies is 50ms. Suppose that the average round-trip latency between a coordinator and any replica is 10ms. Suddenly replica C is hard-killed: non-graceful shutdown, e.g. power outage. This means that other nodes are initially not aware that C is down, they must wait for the failure detector to convict C as unavailable which happens after a configurable amount of time. The current default is 20s, meaning that by default coordinators will still attempt to send requests to C for 20s after it is hard-killed. During this period the following happens: - About 2/3 of all requests - the ones which were routed to {A, C} and {B, C} - do not finish within 50ms because C does not answer. For these requests to finish, the coordinator performs a speculative retry to the third replica which finishes after ~10ms (the average round-trip latency). Thus the entire request, from the coordinator's POV, takes ~60ms. - Eventually (very quickly in fact - assuming there are many concurrent requests) the P99 latency rises to 60ms. - Furthermore, the requests which initially use {A, C} and {B, C} start taking more than 2/3 of all requests because they are stuck in the foreground longer than the {A, B} requests (since their latencies are higher). - These requests do not finish within 60ms. Thus coordinators perform speculative retries. Thus they finish after ~70ms. - Eventually the P99 latency rises to 70ms. - These bad requests take an even longer portion of all requests. - These requests do not finish within 70ms. They finish after ~80ms. - Eventually the P99 latency rises to 80ms. - And so on. In metrics, we observe the following: - Latencies rise roughly linearly. They rise until they hit a certain limit; this limit comes from the fact that `T` is upper-bounded by the read request timeout parameter divided by 2. Thus if the read request timeout is `5s` and P99 latencies are `3s`, `T` will be `2.5s`, not `3s`. Thus eventually all requests will take about `2.5s + 10ms` to finish (`2.5s` until speculative retry happens, `10ms` for the last round-trip), unless the node is marked as DOWN before we reach that limit. - Throughput decreases roughly proportionally to the y = 1/x function, as expected from Little's law. Everything goes back to normal when nodes mark C as DOWN, which happens after ~20s by default as explained above. Then coordinators start routing all requests to {A, B} only. This does not happen for graceful shutdowns, where C announces to the cluster that it's shutting down before shutting down, causing other nodes to mark it as DOWN almost immediately. The root cause of the issue is a feedback loop in the metric used to calculate `T`: we perform a speculative retry after `T` -> P99 request latencies rise above `T + 10ms` -> `T` rises above `T + 10ms` -> etc. We fix the problem by changing the measurements used for calculating `T`. Instead of measuring the entire coordinator read latency, we measure each replica request separately and take the maximum over these measurements. We only take into account the measurements for requests that actually contributed to the request's result. The previous statistic would also measure failed requests latencies. Now we measure only latencies of successful replica requests. Indeed this makes sense for the speculative retry use case; the idea behind speculative retry is that we assume that requests usually succeed within a certain time period, and we should perform the retry if they take longer than that. To measure this time period, taking failed requests into account doesn't make much sense. In the scenario above, for a request that initially goes to {A, C}, the following would happen after applying the fix: - We send the requests to A and C. - After ~10ms A responds. We record the ~10ms measurement. - After ~50ms we perform speculative retry, sending a request to B. - After ~10ms B responds. We record the ~10ms measurement. The maximum over recorded measurements is ~10ms, not ~60ms. The feedback loop is removed. Experiments show that the solution is effective: in scenarios like above, after C is killed, latencies only rise slightly by a constant amount and then maintain their level, as expected. Throughput also drops by a constant amount and maintains its level instead of continuously dropping with an asymptote at 0. Fixes #3746. Fixes #7342. Closes #8783	2021-06-13 16:19:11 +03:00
Avi Kivity	d6f3a62c13	Merge 'Add option to forbid SimpleStrategy in CREATE/ALTER KEYSPACE' from Nadav Har'El This series adds a new configuration option - restrict_replication_simplestrategy - which can be used to restrict the ability to use SimpleStrategy in a CREATE KEYSPACE or ALTER KEYSPACE statement. This is part of a new effort (dubbed "safe mode") to allow an installation to restrict operations which are un-recommended or dangerous (see issue #8586 for why SimpleStrategy is bad). The new restrict_replication_simplestrategy option has three values: "true", "false", and "warn": For the time being, the default is still "false", which means SimpleStrategy is not restricted, and can still be used freely. Setting a value of "true" means that SimpleStrategy is restricted - trying to create a a table with it will fail: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; ConfigurationException: SimpleStrategy replication class is not recommended, and forbidden by the current configuration. Please use NetworkToplogyStrategy instead. You may also override this restriction with the restrict_replication_simplestrategy=false configuration option. Trying to ALTER an existing keyspace to use SimpleStrategy will similarly fail. The value "warn" allows - like "false" - SimpleStrategy to be used, but produces a warning when used to create a keyspace. This warning appears in the CREATE/ALTER KEYSPACE statement's response (an interactive cqlsh user will see this warning), and also in Scylla's logs. For example: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; Warnings : SimpleStrategy replication class is not recommended, but was used for keyspace try1. The restrict_replication_simplestrategy configuration option can be changed to silence this warning or make it into an error. Fixes #8586 Closes #8765 * github.com:scylladb/scylla: cql: create_keyspace_statement: move logger out of header file cql: allow restricting SimpleStrategy in ALTER KEYSPACE cql: allow restricting SimpleStrategy in CREATE KEYSPACE config: add configuration option restrict_replication_simplestrategy config: add "tri_mode_restriction" type of configurable value utils/enum_option.hh: add implicit converter to the underlying enum	2021-06-13 15:39:18 +03:00
Nadav Har'El	6f813bd3a1	cql: create_keyspace_statement: move logger out of header file Move the logger declaration from the header file into the only source file that uses it. This is just a small cleanup similar to what the previous patch did in alter_keyspace_statement.{cc,hh}. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:40 +03:00
Nadav Har'El	dea075c038	cql: allow restricting SimpleStrategy in ALTER KEYSPACE In the previous patch we made CREATE KEYSPACE honor the "restrict_replication_simplestrategy" option. In this patch we do the same for ALTER KEYSPACE. We use the same function check_restricted_replication_strategy() used in CREATE KEYSPACE for the logic of what to allow depending on the configuration, and what errors or warnings to generate. One of the non-self-explanatory changes in this patch is to execute(): Previosuly, alter_keyspace_statement inherited its execute() from schema_altering_statement. Now we need to override it to check if the operation is forbidden before running schema_altering_statement's execute() or to warn after it is run. In the previous patch we didn't need to add a new execute() for create_keyspace_statement because we already had one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:40 +03:00
Nadav Har'El	b9539d7135	cql: allow restricting SimpleStrategy in CREATE KEYSPACE This patch uses the configuration option which we added in the previous patch, "restrict_replication_simplestrategy", to control whether a user can use the SimpleStrategy replication strategy in a CREATE KEYSPACE operation. The next patch will do the same for ALTER KEYSPACE. As a tri_mode_restriction, the restrict_replication_simplestrategy option has three values - "true", "false", and "warn": The value "false", which today is still the default, means that SimpleStrategy is not restricted, and can still be used freely. The value "true" means that SimpleStrategy is restricted - trying to create a a table with it will fail: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; ConfigurationException: SimpleStrategy replication class is not recommended, and forbidden by the current configuration. Please use NetworkToplogyStrategy instead. You may also override this restriction with the restrict_replication_simplestrategy=false configuration option. The value "warn" allows - like "false" - SimpleStrategy to be used, but produces a warning when used to create a keyspace. This warning appears in the CREATE KEYSPACE statement's response (an interactive cqlsh user will see this warning), and also in Scylla's logs. For example: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; Warnings : SimpleStrategy replication class is not recommended, but was used for keyspace try1. The restrict_replication_simplestrategy configuration option can be changed to silence this warning or make it into an error. Because we plan to use the same checks and the same error messages also for ALTER TABLE (in the next patch), we encapsulate this logic in a function check_restricted_replication_strategy() which we will use for ALTER TABLE as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:25 +03:00
Nadav Har'El	8a4ac6914a	config: add configuration option restrict_replication_simplestrategy This patch adds a configuration option to choose whether the SimpleStrategy replication strategy is restricted. It is a tri_mode_restriction, allowing to restrict this strategy (true), to allow it (false), or to just warn when it is used (warn). After this patch, the option exists but doesn't yet do anything. It will be used in the following two patches to restrict the CREATE KEYSPACE and ALTER KEYSPACE operations, respectively. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:16 +03:00
Nadav Har'El	a3d6f502ad	config: add "tri_mode_restriction" type of configurable value This patch adds a new type of configurable value for our command-line and YAML parsers - a "tri_mode_restriction" - which can be set to three values: "true", "false", or "warn". We will use this value type for many (but not all) of the restriction options that we plan to start adding in the following patches. Restriction options will allow users to ask Scylla to restrict (true), to not restrict (false) or to warn about (warn) certain dangerous or undesirable operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:44:20 +03:00
Nadav Har'El	afacffc556	utils/enum_option.hh: add implicit converter to the underlying enum Add an implicit converter of the enum_option to the underyling enum it is holding. This is needed for using switch() on an enum_option. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 13:18:49 +03:00
Avi Kivity	ec60f44b64	main: improve process file limit handling We check that the number of open files is sufficent for normal work (with lots of connections and sstables), but we can improve it a little. Systemd sets up a low file soft limit by default (so that select() doesn't break on file descriptors larger than 1023) and recommends[1] raising the soft limit to the more generous hard limit if the application doesn't use select(), as ours does not. Follow the recommendation and bump the limit. Note that this applies only to scylla started from the command line, as systemd integration already raises the soft limit. [1] http://0pointer.net/blog/file-descriptor-limits.html Closes #8756	2021-06-13 09:19:35 +03:00
Tomasz Grabiec	7521301b72	Merge "raft: add tests for non-voters and fix related bugs" from Kostja Add test coverage inspired by etcd for non-voter servers, and fix issues discovered when testing. * scylla-dev/raft-learner-test-v4: raft: (testing) test non-voter can vote raft: (testing) test receiving a confchange in a snapshot raft: (testing) test voter-non-voter config change loop raft: (testing) test non-voter doesn't start election on election timeout raft: (testing) test what happens when a learner gets TimeoutNow raft: (testing) implement a test for a leader becoming non-voter raft: style fix raft: step down as a leader if converted to a non-voter raft: improve configuration consistency checks raft: (testing) test that non-voter stays in PIPELINE mode raft: (testing) always return fsm_debug in create_follower()	2021-06-12 21:36:47 +03:00
Botond Dénes	cb208a56f2	docs/guides/debugging.md: expand section on libthread-db Fix a typo in enabling libthread-db debugging. Add command line snippet which can enable libthread-db debugging on startup. Split the long wall of text about likely problems into separate per-problem subsections. Add sub-section about recently found Fedora bug(?) https://bugzilla.redhat.com/show_bug.cgi?id=1960867. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210603150607.378277-1-bdenes@scylladb.com>	2021-06-12 21:36:47 +03:00
Nadav Har'El	9774c146cc	cql-pytest: add test for connecting with different SSL/TLS versions This is a reproducer for issue #8827, that checks that a client which tries to connect to Scylla with an unsupported version of SSL or TLS gets the expected error alert - not some sort of unexpected EOF. Issue #8827 is still open, so this test is still xfailing. However, I verified that with a fix for this issue, the test passes. The test also prints which protocol versions worked - so it also helps checking issue #8837 (about the ancient SSL protocol being allowed). Refs #8837 Refs #8827 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>	2021-06-12 21:36:47 +03:00
Pavel Emelyanov	7b1f2d91a5	scylla-gdb: Remove maximum-request-size report The recent seastar update moved the variable again, so to have a proper support for it we'd need to have 2 try-catch attempts and a default. Or 1 try-catch, but make sure the maintainer commits this patch AND seastar update in one go, so that the intermediate variable doesn't creep into an intermediate commit. Or bear the scylla-gdb test is not bisect-safe a little bit. Instead of making this complex choise I suggest to just drop the volatile variable from the script at all. This thing is actually a constant derived from the latency goal and io-properties.yaml file, so it can be calculated without gdb help (unlike run-time bits like group rovers or numbers of queued/executing resources). To free developers from doing all this math by hands there's an "ioinfo" tool that (when run with correct options) prints the results of this math on the screen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210610120151.1135-1-xemul@scylladb.com>	2021-06-11 19:06:43 +02:00
Michael Livshin	2bbc293e22	tests: improve error reporting of test_env::reusable_sst() Distinguish the "no such sstable" case from any reading errors. While at it, coroutinize the function. Refs #8785. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210610113304.264922-1-michael.livshin@scylladb.com>	2021-06-11 19:06:43 +02:00
Pavel Emelyanov	fbd98e6292	alternator: Move start-stop code into controller This move is not "just move", but also includes: - putting the whole thing into seastar::async() - switch from locally captured dependencies into controller's class members - making smp_service_groups optional because it doesn't have default contructor and should somehow survive on constructed controller until its start() Also copy few bits from main that can be generalized later: - get_or_default() helper from main - sharded_parameter lambda for cdc - net family and preferred thing from main ( this also fixed the indentation broken by previous patch ) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:17:27 +03:00
Pavel Emelyanov	9e2ad77436	alternator: Move the whole starting code into a sched group The controller won't have the database_config at hands to get the sched group from. All other client services run the whole controller start in the needed sched group, so prepare the alternator controller for that. To make it compile (and while-at-it) also move up the sharded server and executor instances and the smp_service_group. All of these will migrate onto the controller in the next patch. ( the indentation is deliberately left broken ) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:11:02 +03:00
Pavel Emelyanov	f918a75572	alternator: Dont capture db, use cfg When .init()ing the server one needs to provide the max_concurrent_requests_per_shard value from config. Instead of carrying the database around for it -- use the db::config itself which is at hand. All the shards share its instance anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:09:16 +03:00
Pavel Emelyanov	4aad618409	alternator: Controller skeleton Add the controller class with all the needed dependencies. For now completely unused (thus a bunch of (void)-s here and there). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:08:37 +03:00
Pavel Emelyanov	316e9af234	alternator: Controller basement Add header and source file for transport- (and thrift-) like controller that'll do all the bookkeeping needed to start and stop this client service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:06:10 +03:00
Pavel Emelyanov	773d2fe2a4	alternator: Drop storage service from executor It's completely unused in it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-11 18:05:11 +03:00

1 2 3 4 5 ...

26935 Commits