scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Piotr Sarna	8a049c9116	view: fix use-after-move when handling view update failures The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Refs #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build)	2021-06-14 09:36:10 +02:00
Piotr Sarna	7cdbb7951a	db,view: explicitly move the mutation to its helper function The `apply_to_remote_endpoints` helper function used to take its `mut` parameter by reference, but then moved the value from it, which is confusing and prone to errors. Since the value is moved-from, let's pass it to the helper function as rvalue ref explicitly.	2021-06-14 09:34:40 +02:00
Piotr Sarna	88d4a66e90	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value.	2021-06-14 09:30:38 +02:00
Raphael S. Carvalho	846f0bd16e	sstables: Fix incremental selection with compound sstable set Incremental selection may not work properly for LCS and ICS due to an use-after-free bug in partitioned set which came into existence after compound set was introduced. The use-after-free happens because partitioned set wasn't taking into account that the next position can become the current position in the next iteration, which will be used by all selectors managed by compound set. So if next position is freed, when it were being used as current position, subsequent selectors would find the current position freed, making them produce incorrect results. Fix this by moving ownership of next pos from incremental_selector_impl to incremental_selector, which makes it more robust as the latter knows better when the selection is done with the next pos. incremental_selector will still return ring_position_view to avoid copies. Fixes #8802. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>	2021-06-13 16:45:07 +03:00
Kamil Braun	9e85921006	storage_proxy: remove a feedback loop from the speculative retry latency metric To handle a read request from a client, the coordinator node must send data and digest requests to replicas, reconcile the obtained results (by merging the obtained mutations and comparing digests), and possibly send more requests to replicas if the digests turned out to be different in order to perform read repair and preserve consistency of observed reads. In contrast to writes, where coordinators send their mutation write requests to all replicas in the replica set, for reads the coordinators send their requests only to as many replicas as is required to achieve the desired CL. For example consider RF=3 and a CL=QUORUM read. Then the coordinator sends its request to a subset of 2 nodes out of the 3 possible replicas. The choice of the 2-node subset is random; the distribution used for the random roll is affected by certain things such as the "cache hitrate" metric. The details are not that relevant for this discussion. If not all of the the initially chosen replicas answer within a certain time period, the coordinator may send an additional request to one more replica, hoping that this replica helps achieving the desired CL so the entire client request succeeds. This mechanism is called "speculative retry" and is enabled by default. This time period - call it `T` - is chosen based on keyspace configuration. The default value is "99.0PERCENTILE", which means that `T` is roughly equal to the 99th percentile of the latency distribution of previous requests (or at least the most recent requests; the algorithm uses an exponential decay strategy to make old request less relevant for the metric). The latencies used are the durations of whole coordinator read requests: each such duration measurement starts before the first replica request is sent and ends after the last replica request is answered, among the replica requests whose results were used for the reconciled result returned to the client (there may be more requests sent later "in the background" - they don't affect the client result and are not taken into account for the latency measurement). This strategy, however, gives an undesired effect which appears when a significant part of all requests require a speculative retry to succeed. To explain this effect it's best to consider a scenario which takes this to the extreme - where all requests require a speculative retry. Consider RF=3 and CL=QUORUM so each read request initially uses 2 replicas. Let {A, B, C} be the set of replicas. We run a uniformly distributed read workload. Initially the cluster operates normally. Roughly 1/3 of all requests go to replicas {A, B}, 1/3 go to {A, C}, and 1/3 go to {B, C}. The 99th percentile of read request latencies is 50ms. Suppose that the average round-trip latency between a coordinator and any replica is 10ms. Suddenly replica C is hard-killed: non-graceful shutdown, e.g. power outage. This means that other nodes are initially not aware that C is down, they must wait for the failure detector to convict C as unavailable which happens after a configurable amount of time. The current default is 20s, meaning that by default coordinators will still attempt to send requests to C for 20s after it is hard-killed. During this period the following happens: - About 2/3 of all requests - the ones which were routed to {A, C} and {B, C} - do not finish within 50ms because C does not answer. For these requests to finish, the coordinator performs a speculative retry to the third replica which finishes after ~10ms (the average round-trip latency). Thus the entire request, from the coordinator's POV, takes ~60ms. - Eventually (very quickly in fact - assuming there are many concurrent requests) the P99 latency rises to 60ms. - Furthermore, the requests which initially use {A, C} and {B, C} start taking more than 2/3 of all requests because they are stuck in the foreground longer than the {A, B} requests (since their latencies are higher). - These requests do not finish within 60ms. Thus coordinators perform speculative retries. Thus they finish after ~70ms. - Eventually the P99 latency rises to 70ms. - These bad requests take an even longer portion of all requests. - These requests do not finish within 70ms. They finish after ~80ms. - Eventually the P99 latency rises to 80ms. - And so on. In metrics, we observe the following: - Latencies rise roughly linearly. They rise until they hit a certain limit; this limit comes from the fact that `T` is upper-bounded by the read request timeout parameter divided by 2. Thus if the read request timeout is `5s` and P99 latencies are `3s`, `T` will be `2.5s`, not `3s`. Thus eventually all requests will take about `2.5s + 10ms` to finish (`2.5s` until speculative retry happens, `10ms` for the last round-trip), unless the node is marked as DOWN before we reach that limit. - Throughput decreases roughly proportionally to the y = 1/x function, as expected from Little's law. Everything goes back to normal when nodes mark C as DOWN, which happens after ~20s by default as explained above. Then coordinators start routing all requests to {A, B} only. This does not happen for graceful shutdowns, where C announces to the cluster that it's shutting down before shutting down, causing other nodes to mark it as DOWN almost immediately. The root cause of the issue is a feedback loop in the metric used to calculate `T`: we perform a speculative retry after `T` -> P99 request latencies rise above `T + 10ms` -> `T` rises above `T + 10ms` -> etc. We fix the problem by changing the measurements used for calculating `T`. Instead of measuring the entire coordinator read latency, we measure each replica request separately and take the maximum over these measurements. We only take into account the measurements for requests that actually contributed to the request's result. The previous statistic would also measure failed requests latencies. Now we measure only latencies of successful replica requests. Indeed this makes sense for the speculative retry use case; the idea behind speculative retry is that we assume that requests usually succeed within a certain time period, and we should perform the retry if they take longer than that. To measure this time period, taking failed requests into account doesn't make much sense. In the scenario above, for a request that initially goes to {A, C}, the following would happen after applying the fix: - We send the requests to A and C. - After ~10ms A responds. We record the ~10ms measurement. - After ~50ms we perform speculative retry, sending a request to B. - After ~10ms B responds. We record the ~10ms measurement. The maximum over recorded measurements is ~10ms, not ~60ms. The feedback loop is removed. Experiments show that the solution is effective: in scenarios like above, after C is killed, latencies only rise slightly by a constant amount and then maintain their level, as expected. Throughput also drops by a constant amount and maintains its level instead of continuously dropping with an asymptote at 0. Fixes #3746. Fixes #7342. Closes #8783	2021-06-13 16:19:11 +03:00
Avi Kivity	d6f3a62c13	Merge 'Add option to forbid SimpleStrategy in CREATE/ALTER KEYSPACE' from Nadav Har'El This series adds a new configuration option - restrict_replication_simplestrategy - which can be used to restrict the ability to use SimpleStrategy in a CREATE KEYSPACE or ALTER KEYSPACE statement. This is part of a new effort (dubbed "safe mode") to allow an installation to restrict operations which are un-recommended or dangerous (see issue #8586 for why SimpleStrategy is bad). The new restrict_replication_simplestrategy option has three values: "true", "false", and "warn": For the time being, the default is still "false", which means SimpleStrategy is not restricted, and can still be used freely. Setting a value of "true" means that SimpleStrategy is restricted - trying to create a a table with it will fail: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; ConfigurationException: SimpleStrategy replication class is not recommended, and forbidden by the current configuration. Please use NetworkToplogyStrategy instead. You may also override this restriction with the restrict_replication_simplestrategy=false configuration option. Trying to ALTER an existing keyspace to use SimpleStrategy will similarly fail. The value "warn" allows - like "false" - SimpleStrategy to be used, but produces a warning when used to create a keyspace. This warning appears in the CREATE/ALTER KEYSPACE statement's response (an interactive cqlsh user will see this warning), and also in Scylla's logs. For example: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; Warnings : SimpleStrategy replication class is not recommended, but was used for keyspace try1. The restrict_replication_simplestrategy configuration option can be changed to silence this warning or make it into an error. Fixes #8586 Closes #8765 * github.com:scylladb/scylla: cql: create_keyspace_statement: move logger out of header file cql: allow restricting SimpleStrategy in ALTER KEYSPACE cql: allow restricting SimpleStrategy in CREATE KEYSPACE config: add configuration option restrict_replication_simplestrategy config: add "tri_mode_restriction" type of configurable value utils/enum_option.hh: add implicit converter to the underlying enum	2021-06-13 15:39:18 +03:00
Nadav Har'El	6f813bd3a1	cql: create_keyspace_statement: move logger out of header file Move the logger declaration from the header file into the only source file that uses it. This is just a small cleanup similar to what the previous patch did in alter_keyspace_statement.{cc,hh}. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:40 +03:00
Nadav Har'El	dea075c038	cql: allow restricting SimpleStrategy in ALTER KEYSPACE In the previous patch we made CREATE KEYSPACE honor the "restrict_replication_simplestrategy" option. In this patch we do the same for ALTER KEYSPACE. We use the same function check_restricted_replication_strategy() used in CREATE KEYSPACE for the logic of what to allow depending on the configuration, and what errors or warnings to generate. One of the non-self-explanatory changes in this patch is to execute(): Previosuly, alter_keyspace_statement inherited its execute() from schema_altering_statement. Now we need to override it to check if the operation is forbidden before running schema_altering_statement's execute() or to warn after it is run. In the previous patch we didn't need to add a new execute() for create_keyspace_statement because we already had one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:40 +03:00
Nadav Har'El	b9539d7135	cql: allow restricting SimpleStrategy in CREATE KEYSPACE This patch uses the configuration option which we added in the previous patch, "restrict_replication_simplestrategy", to control whether a user can use the SimpleStrategy replication strategy in a CREATE KEYSPACE operation. The next patch will do the same for ALTER KEYSPACE. As a tri_mode_restriction, the restrict_replication_simplestrategy option has three values - "true", "false", and "warn": The value "false", which today is still the default, means that SimpleStrategy is not restricted, and can still be used freely. The value "true" means that SimpleStrategy is restricted - trying to create a a table with it will fail: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; ConfigurationException: SimpleStrategy replication class is not recommended, and forbidden by the current configuration. Please use NetworkToplogyStrategy instead. You may also override this restriction with the restrict_replication_simplestrategy=false configuration option. The value "warn" allows - like "false" - SimpleStrategy to be used, but produces a warning when used to create a keyspace. This warning appears in the CREATE KEYSPACE statement's response (an interactive cqlsh user will see this warning), and also in Scylla's logs. For example: cqlsh> CREATE KEYSPACE try1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 }; Warnings : SimpleStrategy replication class is not recommended, but was used for keyspace try1. The restrict_replication_simplestrategy configuration option can be changed to silence this warning or make it into an error. Because we plan to use the same checks and the same error messages also for ALTER TABLE (in the next patch), we encapsulate this logic in a function check_restricted_replication_strategy() which we will use for ALTER TABLE as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:25 +03:00
Nadav Har'El	8a4ac6914a	config: add configuration option restrict_replication_simplestrategy This patch adds a configuration option to choose whether the SimpleStrategy replication strategy is restricted. It is a tri_mode_restriction, allowing to restrict this strategy (true), to allow it (false), or to just warn when it is used (warn). After this patch, the option exists but doesn't yet do anything. It will be used in the following two patches to restrict the CREATE KEYSPACE and ALTER KEYSPACE operations, respectively. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:45:16 +03:00
Nadav Har'El	a3d6f502ad	config: add "tri_mode_restriction" type of configurable value This patch adds a new type of configurable value for our command-line and YAML parsers - a "tri_mode_restriction" - which can be set to three values: "true", "false", or "warn". We will use this value type for many (but not all) of the restriction options that we plan to start adding in the following patches. Restriction options will allow users to ask Scylla to restrict (true), to not restrict (false) or to warn about (warn) certain dangerous or undesirable operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 14:44:20 +03:00
Nadav Har'El	afacffc556	utils/enum_option.hh: add implicit converter to the underlying enum Add an implicit converter of the enum_option to the underyling enum it is holding. This is needed for using switch() on an enum_option. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-13 13:18:49 +03:00
Avi Kivity	ec60f44b64	main: improve process file limit handling We check that the number of open files is sufficent for normal work (with lots of connections and sstables), but we can improve it a little. Systemd sets up a low file soft limit by default (so that select() doesn't break on file descriptors larger than 1023) and recommends[1] raising the soft limit to the more generous hard limit if the application doesn't use select(), as ours does not. Follow the recommendation and bump the limit. Note that this applies only to scylla started from the command line, as systemd integration already raises the soft limit. [1] http://0pointer.net/blog/file-descriptor-limits.html Closes #8756	2021-06-13 09:19:35 +03:00
Tomasz Grabiec	7521301b72	Merge "raft: add tests for non-voters and fix related bugs" from Kostja Add test coverage inspired by etcd for non-voter servers, and fix issues discovered when testing. * scylla-dev/raft-learner-test-v4: raft: (testing) test non-voter can vote raft: (testing) test receiving a confchange in a snapshot raft: (testing) test voter-non-voter config change loop raft: (testing) test non-voter doesn't start election on election timeout raft: (testing) test what happens when a learner gets TimeoutNow raft: (testing) implement a test for a leader becoming non-voter raft: style fix raft: step down as a leader if converted to a non-voter raft: improve configuration consistency checks raft: (testing) test that non-voter stays in PIPELINE mode raft: (testing) always return fsm_debug in create_follower()	2021-06-12 21:36:47 +03:00
Botond Dénes	cb208a56f2	docs/guides/debugging.md: expand section on libthread-db Fix a typo in enabling libthread-db debugging. Add command line snippet which can enable libthread-db debugging on startup. Split the long wall of text about likely problems into separate per-problem subsections. Add sub-section about recently found Fedora bug(?) https://bugzilla.redhat.com/show_bug.cgi?id=1960867. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210603150607.378277-1-bdenes@scylladb.com>	2021-06-12 21:36:47 +03:00
Nadav Har'El	9774c146cc	cql-pytest: add test for connecting with different SSL/TLS versions This is a reproducer for issue #8827, that checks that a client which tries to connect to Scylla with an unsupported version of SSL or TLS gets the expected error alert - not some sort of unexpected EOF. Issue #8827 is still open, so this test is still xfailing. However, I verified that with a fix for this issue, the test passes. The test also prints which protocol versions worked - so it also helps checking issue #8837 (about the ancient SSL protocol being allowed). Refs #8837 Refs #8827 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>	2021-06-12 21:36:47 +03:00
Pavel Emelyanov	7b1f2d91a5	scylla-gdb: Remove maximum-request-size report The recent seastar update moved the variable again, so to have a proper support for it we'd need to have 2 try-catch attempts and a default. Or 1 try-catch, but make sure the maintainer commits this patch AND seastar update in one go, so that the intermediate variable doesn't creep into an intermediate commit. Or bear the scylla-gdb test is not bisect-safe a little bit. Instead of making this complex choise I suggest to just drop the volatile variable from the script at all. This thing is actually a constant derived from the latency goal and io-properties.yaml file, so it can be calculated without gdb help (unlike run-time bits like group rovers or numbers of queued/executing resources). To free developers from doing all this math by hands there's an "ioinfo" tool that (when run with correct options) prints the results of this math on the screen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210610120151.1135-1-xemul@scylladb.com>	2021-06-11 19:06:43 +02:00
Michael Livshin	2bbc293e22	tests: improve error reporting of test_env::reusable_sst() Distinguish the "no such sstable" case from any reading errors. While at it, coroutinize the function. Refs #8785. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210610113304.264922-1-michael.livshin@scylladb.com>	2021-06-11 19:06:43 +02:00
Konstantin Osipov	2be8a73c34	raft: (testing) test non-voter can vote When a non-voter is requested a vote, it must vote to preserve liveness. In Raft, servers respond to messages without consulting with their current configuration, and the non-voter may not have the latest configuration when it is requested to vote.	2021-06-11 17:16:57 +03:00
Konstantin Osipov	eaf32f2c3c	raft: (testing) test receiving a confchange in a snapshot	2021-06-11 17:16:56 +03:00
Konstantin Osipov	d08ad76c24	raft: (testing) test voter-non-voter config change loop	2021-06-11 17:16:55 +03:00
Konstantin Osipov	6e4619fe87	raft: (testing) test non-voter doesn't start election on election timeout	2021-06-11 17:16:55 +03:00
Konstantin Osipov	c8ae13a392	raft: (testing) test what happens when a learner gets TimeoutNow Once learner receives TimeoutNow it becomes a candidate, discovers it can't vote, doesn't increase its term and converts back to a follower. Once entries arrive from a new leader it updates its term.	2021-06-11 17:16:55 +03:00
Konstantin Osipov	a972269630	raft: (testing) implement a test for a leader becoming non-voter	2021-06-11 17:16:55 +03:00
Konstantin Osipov	ba046ed1ab	raft: style fix	2021-06-11 17:16:54 +03:00
Konstantin Osipov	b0a1ebc635	raft: step down as a leader if converted to a non-voter If the leader becomes a non-voter after a configuration change, step down and become a follower. Non-voting members are an extension to Raft, so the protocol spec does not define whether they can be leaders. I can not think of a reason why they can't, yet I also can not think of a reason why it's useful, so let's forbid this. We already do not allow non-voters to become candidates, and they ignore timeout_now RPC (leadership transfer), so they already can not be elected.	2021-06-11 17:16:50 +03:00
Konstantin Osipov	684e0d2a8c	raft: improve configuration consistency checks Isolate the checks for configuration transitions in a static function, to be able to unit test outside class server. Split the condition of transitioning to an empty configuration from the condition of transitioning into a configuration with no voters, to produce more user-friendly error messages. Allow to transfer leadership in a configuration when the only voter is the leader itself. This would be equivalent to syncing the leader log with the learner and converting the leader to the follower itself. This is safe, since the leader will re-elect itself quickly after an election timeout, and may be used to do a rolling restart of a cluster with only one voter. A test case follows.	2021-06-11 17:16:47 +03:00
Konstantin Osipov	3e6fd5705b	raft: (testing) test that non-voter stays in PIPELINE mode Test that configuration changes preserve PIPELINE mode.	2021-06-11 17:07:39 +03:00
Konstantin Osipov	1dfe946c91	raft: (testing) always return fsm_debug in create_follower() create_follower() is a test helper, so it's OK to return a test-enabled FSM from it. This will be used in a subsequent patch/test case.	2021-06-11 12:24:43 +03:00
Alejo Sanchez	ff34a6515d	raft: replication test: fix elect_new_leader Recently, the logic of elect_new_leader was changed to allow the old leader to vote for the new candidate. But the implementation is wrong as it re-connects the old leader in all cases disregarding if the nodes were already disconnected. Check if both old leader and the requested new leader are connected first and only if it is the case then the old leader can participate in the election. There were occasional hangs in the loop of elect_new_leader because other nodes besides the candidate were ticked. This patch fixes the loop by removing ticks inside of it. The loop is needed to handle prevote corner cases (e.g. 2 nodes). While there, also wait log on all followers to avoid a previously dropped leader to be a dueling candidate. And update _leader only if it was changed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210609193945.910592-3-alejo.sanchez@scylladb.com>	2021-06-10 12:36:25 +02:00
Alejo Sanchez	add12d801d	raft: log ignored prevote Add a log line for ignored prevote. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210609193945.910592-2-alejo.sanchez@scylladb.com>	2021-06-10 12:33:34 +02:00
Benny Halevy	e0622ef461	compaction_manager: stop_ongoing_compactions: print reason for stopping Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210610084704.388215-1-bhalevy@scylladb.com>	2021-06-10 11:52:57 +03:00
Piotr Sarna	7506f44c77	cql3: use existing constant for max result in indexed statements Original code which introduced enforcing page limits for indexed statements created a new constant for max result size in bytes. Botond reported that we already have such a constant, so it's now used instead of reinventing it from scratch. Closes #8839	2021-06-10 11:08:54 +03:00
Nadav Har'El	b26fcf5567	test/alternator: increase timeouts in test_tracing.py The query tracing tests in test/alternator's test_tracing.py had one timeout of 30 seconds to find the trace, and one unclearly-coded timeout for finding the right content for the trace. We recently saw both timeouts exceeded in tests, but only rarely and only in debug mode, in a run 100 times slower than normal. This patch increases both timeouts to 100 seconds. Whatever happens then, we win: If the test stops failing, we know the new timeout was enough. If the test continues to fail, we will be able to conclude that we have a real bug - e.g., perhaps one of the LWT operations has a bug causing it to hang indefinitely. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210608205026.1600037-1-nyh@scylladb.com>	2021-06-10 09:19:01 +03:00
Benny Halevy	8ecc626c15	queue_reader_handle: mark copy constructor noexcept It is trivially so, as std::exception_ptr is nothrow default constructible. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210609135925.270883-2-bhalevy@scylladb.com>	2021-06-09 20:09:01 +03:00
Benny Halevy	3100cdcc65	queue_reader_handle: move-construct also _ex We're only moving the other reader without the other's exception (as it maybe already be abandoned or aborted). While at it, mark the constructor noexcept. Fixes #8833 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210609135925.270883-1-bhalevy@scylladb.com>	2021-06-09 20:09:01 +03:00
Pavel Emelyanov	990db016e9	transport: Untie transport and database Both controller and server only need database to get config from. Since controller creation only happens in main() code which has the config itself, we may remove database mentioning from transport. Previous attempt was not to carry the config down to the server level, but it stepped on an updateable_value landmine -- the u._v. isn't copyable cross-shard (despite the docs) and to properly initialize server's max_concurrent_requests we need the config's named_value member itself. The db::config that flies through the stack is const reference, but its named_values do not get copied along the way -- the updateable value accepts both references and const references to subscribe on. tests: start-stop in debug mode Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210607135656.18522-1-xemul@scylladb.com>	2021-06-09 20:04:12 +03:00
Eliran Sinvani	9bfb2754eb	dist: rpm: Add specific versioning and python3 dependency The Red Hat packages were missing two things, first the metapackage wasn't dependant at all in the python3 package and second, the scylla-server package dependencies didn't contain a version as part of the dependency which can cause to some problems during upgrade. Doing both of the things listed here is a bit of an overkill as either one of them separately would solve the problem described in #XXXX but both should be applied in order to express the correct concept. Fixes #8829 Closes #8832	2021-06-09 20:02:43 +03:00
Asias He	0665d9c346	gossip: Handle nodes removed from live endpoints directly When a node is removed from the _live_endpoints list directly, e.g., a node being decommissioned, it is possible the node might not be marked as down in gossiper::failure_detector_loop_for_node loop before the loop exits. When the gossiper::failure_detector_loop loop starts again, the node will not be considered because it is not present in _live_endpoints list any more. As a result, the node will not be marked as down though gossiper::failure_detector_loop_for_node loop. To fix, we mark the nodes that are removed from _live_endpoints lists as down in the gossiper::failure_detector_loop loop. Fixes #8712 Closes #8770	2021-06-09 15:02:25 +02:00
Tomasz Grabiec	419ee84d86	Merge "sstable: validate first and last keys ordering" from Benny In #8772, an assert validating first token <= last token failed in leveled_manifest::overlapping. It is unclear how we got to that state, so add validation in sstable::set_first_and_last_keys() that the to-be-set first and last keys are well ordered. Otherwise, throw malformed_sstable_exception. set_first_and_last_keys is called both on the write path from the sstable writer before the sstable is sealed, and on the open/load path via update_info_for_opened_data(). This series also fixes issues with unit tests with regards to first/last keys so they won't fail the validation. Refs #8772 Test: unit(dev) DTest: next-gating(dev), materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_half_to_max_test(debug) * tag 'validate-first-and-last-keys-ordering-v1': sstable: validate first and last keys ordering test: lib: reusable_sst: save unexpected errors test: sstable_datafile_test: stcs_reshape_test: use token_generation_for_current_shard test: sstable_test: define primary key in schema for compressed sstable	2021-06-09 14:43:02 +02:00
Avi Kivity	a57d8eef49	Merge 'streaming: make_streaming_consumer: close reader on errors' from Benny Halevy Currently, if e.g. find_column_family throws an error, as seen in #8776 when the table was dropped during repair, the reader is not closed. Use a coroutine to simplify error handling and close the reader if an exception is caught. Also, catch an error inside the lambda passed to make_interposer_consumer when making the shared_sstable for streaming, and close the reader their and return an exceptional future early, since the reader will not be moved to sst->write_components, that assumes ownership over it and closes it in all cases. Fixes #8776 Test: unit(dev) DTest: repair_additional_test.py:RepairAdditionalTest.repair_while_table_is_dropped_test (dev, debug) w/ https://github.com/scylladb/scylla/pull/8635#issuecomment-856661138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8782 * github.com:scylladb/scylla: streaming: make_streaming_consumer: close reader on errors streaming: make_streaming_consumer: coroutinize returned function	2021-06-09 15:02:36 +03:00
Tomasz Grabiec	ce7a404f17	Merge "Cleanups/refactoring for Raft Group 0" from Kostja * scylla-dev/raft-group-0-part-1-rebase: raft: (service) pass Raft service into storage_service raft: (service) add comments for boot steps raft: add ordering for raft::server_address based on id raft: (internal) simplify construction of tagged_id raft: (internal) tagged_id minor improvements	2021-06-09 10:48:05 +02:00
Avi Kivity	d2157dfea7	Merge 'locator: token_metadata: simplify `tokens_iterator`' from Michał Chojnowski `ring_range()`/`tokens_iterator` are more complicated than they need to be. The `include_min` parameter is not used anywhere, and `tokens_iterator` is pimplified without a good reason. Simplify that. Closes #8805 * github.com:scylladb/scylla: locator: token_metadata: depimplify tokens_iterator locator: token_metadata: remove _ring_pos from tokens_iterator_impl locator: token_metadata: remove tokens_end() locator: token_metadata: remove `include_min` from tokens_iterator_impl locator: token_metadata: remove the `include_min` parameter from `ring_range()`	2021-06-08 15:42:41 +03:00
Konstantin Osipov	267a8e99ad	raft: (service) pass Raft service into storage_service Raft group 0 initialization and configuration changes should be integrated with Scylla cluster assembly, happening when starting the storage service and joining the cluster. Prepare for this. Since Raft service depends on query processor, and query processor depends on storage service, to break a dependency loop split Raft initialization into two steps: starting an under-constructed instance of "sharded" Raft service, accepting an under-constructed instance of "sharded" query_processor, and then passed into storage service start function, and then the local state of Raft groups from system tables once query processor starts. Consistently abbreviate raft_services instance raft_svcs, as is the convention at Scylla. Update the tests.	2021-06-08 14:52:32 +03:00
Konstantin Osipov	959bd21cdb	raft: (service) add comments for boot steps	2021-06-08 14:52:32 +03:00
Konstantin Osipov	b81580f3c6	raft: add ordering for raft::server_address based on id	2021-06-08 14:52:32 +03:00
Konstantin Osipov	d42d5aee8c	raft: (internal) simplify construction of tagged_id Make it easy to construct tagged_id from UUID.	2021-06-08 14:52:32 +03:00
Konstantin Osipov	c9a23e9b8a	raft: (internal) tagged_id minor improvements Introduce a syntax helper tagged_id::create_random_id(), used to create a new Raft server or group id. Provide a default ordering for tagged ids, for use in Raft leader discovery, which selects the smallest id for leader.	2021-06-08 14:52:32 +03:00
Benny Halevy	5a8531c4c8	repair: get_sharder_for_tables: throw no_such_column_family Insteadof std::runtime_error with a message that resembles no_such_column_family, throw a no_such_column_family given the keyspace and table uuid. The latter can be explicitly caught and handled if needed. Refs #8612 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210608113605.91292-1-bhalevy@scylladb.com>	2021-06-08 14:45:44 +03:00
Nadav Har'El	355dbf2140	test/cql-pytest: option for running the tests over SSL This patch adds a "--ssl" option to test/cql-pytest's pytest, as well as to the run script test/cql-pytest/run. When "test/cql-pytest/run --ssl" is used, Scylla is started listening for encrypted connections on its standard port (9042) - using a temporary unsigned certificate. Then, the individual tests connect to this encrypted port using TLSv1.2 (Scylla doesn't support earlier version of SSL) instead of TCP. This "--ssl" feature allows writing test which stress various aspects of the connection (e.g., oversized requests - see PR #8800), and then be able to run those tests in both TCP and SSL modes. Fixes #8811 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210607200329.1536234-1-nyh@scylladb.com>	2021-06-08 11:43:20 +02:00

1 2 3 4 5 ...

26903 Commits