scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	5ecbc33be5	database.*: Remove unused headers The database.hh is the central recursive-headers knot -- it has ~50 includes. This patch leaves only 34 (it remains the champion though). Similar thing for database.cc. Both changes help the latter compile ~4% faster :) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210414183107.30374-1-xemul@scylladb.com>	2021-04-18 14:03:17 +03:00
Nadav Har'El	4cf21f3a0f	cql-pytest: update run-cassandra script for Java 11 This patch fixes cql-pytest/run-cassandra to work on systems which default to Java 11, including Fedora 33. Recent versions of Cassandra can run on Java 11 fine, but requires a bunch of weird JVM options to work around its JPMS (Java Platform Module System) feature. Cassandra's start scripts require these options to be listd in conf/jvm11-server.options, which is read by the startup script cassandra.in.sh. Because our "run-cassandra" builds its own "conf" directory, we need to create a jvm11-server.options file in that directory. This is ugly, but unfortunately necessary if cql-pytest/run-cassandra is to run with on systems defaulting to Java 11. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210406220039.195796-1-nyh@scylladb.com>	2021-04-14 13:16:00 +02:00
Asias He	9ea57dff21	gossip: Relax failure detector update We currently only update the failure detector for a node when a higher version of application state is received. Since gossip syn messages do not contain application state, so this means we do not update the failure detector upon receiving gossip syn messages, even if a message from peer node is received which implies the peer node is alive. This patch relaxes the failure detector update rule to update the failure detector for the sender of gossip messages directly. Refs #8296 Closes #8476	2021-04-14 13:16:00 +02:00
Tomasz Grabiec	320f6bf220	Merge 'test: perf: perf_simple_query: collect allocation and task statistics' from Avi Kivity Calculate and display the number of memory allocations and tasks executed per operation. Sample results (--smp 1): 180022.46 tps (90 allocs/op, 20 tasks/op) 178963.44 tps (90 allocs/op, 20 tasks/op) 178702.41 tps (90 allocs/op, 20 tasks/op) 177679.74 tps (90 allocs/op, 20 tasks/op) 179539.36 tps (90 allocs/op, 20 tasks/op) median 178963.44 tps (90 allocs/op, 20 tasks/op) median absolute deviation: 575.92 maximum: 180022.46 minimum: 177679.74 This allows less noisy tracking of how some changes impact performance. Closes #8425 * github.com:scylladb/scylla: test: perf: perf_simple_query: collect allocation and task statistics perf: deinline some functions in perf.hh	2021-04-14 13:16:00 +02:00
Kamil Braun	5c7ed7a83f	time_series_sstable_set: return partition start if some sstables were ck-filtered out When a particular partition exists in at least one sstable, the cache expects any single-partition query to this partition to return a `partition_start` fragment, even if the result is empty. In `time_series_sstable_set::create_single_key_sstable_reader` it could happen that all sstables containing data for the given query get filtered out and only sstables without the relevant partition are left, resulting in a reader which immediately returns end-of-stream (while it should return a `partition_start` and if not in forwarding mode, a `partition_end`). This commit fixes that. We do it by extending the reader queue (used by the clustering reader merger) with a `dummy_reader` which will be returned by the queue as the very first reader. This reader only emits a `partition_start` and, if not in forwarding mode, a `partition_end` fragment. Fixes #8447. Closes #8448	2021-04-14 13:16:00 +02:00
Calle Wilund	03590c8254	commitlog_test: Add test for deadlock in shutdown w. segment wait Refs #8438 Ensures shutting down (well behaved) works even if an allocating path is stuck waiting for a new segment - i.e. other aspect of Closes #8475	2021-04-14 13:16:00 +02:00
Avi Kivity	b756693e64	Merge "mutation_query: move query methods into table" from Botond " These methods are generic ways to query a mutation source. At least they used to be, but nowadays they are pretty specific to how tables are queried -- they use a querier cache to lookup queriers from and save them into. With the coming changes to how permits are obtained, they are about to get even more specific to tables. Instead of forcing the genericity and keep adding new parameters, this patchset bites the bullet and moves them to table. `data_query()` is inlined into `table::query()`, while `mutation_query()` is replaced with `table::mutation_query()`. The only other users besides table are tests and they are adjusted to use similarly named local methods that just combine the right querier with the right result builder. This combination is what the tests really want to test, as this is also what is used by the table methods behind the scenes. Tests: unit(release, debug) " * 'mutation-query-move-query-methods-into-table/v1' of https://github.com/denesb/scylla: mutation_query: remove now unused mutation_query() test: mutation_query_test: use local mutation_query() implementation database: mutation_query(): use table::mutation_query() table: add mutation_query() query: remove the now unused data_query() test: mutation_query_test: use local data_query() implementation table: query(): inline data_query() code into query() table: make query() a coroutine	2021-04-14 13:15:59 +02:00
Avi Kivity	e3db889057	Merge 'Introduce service levels' from Piotr Sarna This series introduces service level syntax borrowed from https://docs.scylladb.com/using-scylla/workload-prioritization/ , but without workload prioritization itself - just for the sake of using identical syntax to provide different parameters later. The new parameters may include: * per-service-level timeouts * oltp/olap declaration, which may change the way Scylla treats long requests - e.g. time them out (the oltp way) or keep them sustained with empty pages (the olap way) Refs #7617 Closes #7867 * github.com:scylladb/scylla: transport: initialize query state with service level controller main: add initializing service level data accessor service: make enable_shared_from_this inheritance public cql3: add SERVICE LEVEL syntax (without an underscore) unit test: Add unit test for per user sla syntax cql: Add support for service level cql queries auth: Add service_level resource for supporting in authorization of cql service_level cql: Support accessing service_level_controller from query state instantiate and initialize the service_level_controller qos: Add a standard implementation for service level data accessor qos: add waiting for the updater future service/qos: adding service level controller service_levels: Add documentation for distributed tables service/qos: adding service level table to the distributed keyspace service/qos: add common definitions auth: add support for role attributes	2021-04-12 17:34:43 +03:00
Eliran Sinvani	144fe02c23	unit test: Add unit test for per user sla syntax This commit adds the infrastructure needed to test per user sla, more specificaly, a service level accessor that triggers the update_service_levels_from_distributed_data function uppon any change to the dystributed sla data. A test was added that indirectly consumes this infrastructure by changing the distributed service level data with cql queries. Message-Id: <23b2211e409446c4f4e3e57b00f78d9ff75fc978.1609249294.git.sarna@scylladb.com>	2021-04-12 16:31:26 +02:00
Eliran Sinvani	a88929da15	auth: Add service_level resource for supporting in authorization of cql service_level queries In order to be able to manage service_level configuration one must be authorized to do so, or to be a superuser. This commit adds the support for service_levels resource. Since service_levels are relative, reconfiguring one service level is not locallized only to that service level and will affect the QOS for all of the service levels, so there is not much sense of granting permissions to manage individual service_levels. This is why only root resource named service_levels that represents all service levels is used. This commit also implements the unit test additions for the newly introduced resource. Message-Id: <81ab16fa813b61be117155feea405da6266921e3.1609237687.git.sarna@scylladb.com>	2021-04-12 16:01:04 +02:00
Ivan Prisyazhnyy	0836efd830	tracing: test/boost/tracing: fix use after free fixes AddressSanitizer: stack-buffer-underflow on address 0x7ffd9a375820 at pc 0x555ac9721b4e bp 0x7ffd9a374e70 sp 0x7ffd9a374620 Backend registry holds a unique pointer to the backend implementation that must outlive the whole tracing lifetime until the shutdown call. So it must be catched/moved before the program exits its scope by passing out the lambda chain. Regarding deletion of the default destructor: moving object requires a move constructor (for do_with) that is not implicitly provided if there is a user-defined object destructor defined even tho its impl is default. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Closes #8461	2021-04-12 16:44:07 +03:00
Kamil Braun	7ffb0d826b	clustering_order_reader_merger: handle empty readers The merger could return end-of-stream if some (but not all) of the underlying readers were empty (i.e. not even returning a `partition_start`). This could happen in places where it was used (`time_series_sstable_set::create_single_key_sstable_reader`) if we opened an sstable which did not have the queried partition but passed all the filters (specifically, the bloom filter returned a false positive for this sstable). The commit also extends the random tests for the merger to include empty readers and adds an explicit test case that catches this bug (in a limited scope: when we merge a single empty reader). It also modifies `test_twcs_single_key_reader_filtering` (regression test for #8432) because the time where the clustering key filter is invoked changes (some invocations move from the constructor of the merger to operator()). I checked manually that it still catches the bug when I reintroduce it. Fixes #8445. Closes #8446	2021-04-12 10:34:52 +03:00
Nadav Har'El	2932f20b40	cql-pytest: translate Cassandra's reproducers for issue #2963 This is a translation of Cassandra's CQL unit test source file validation/entities/SecondaryIndexOnStaticColumnTest.java into our our cql-pytest framework. This test file checks various features of indexing (with secondary index) static rows. All these tests pass on Cassandra, but fail on Scylla because of issue #2963 - we do not yet support indexing of a static row. The failing test currently fail as soon as they try to create the index, with the message: "Indexing static columns is not implemented yet." Refs #2963. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210411153014.311090-1-nyh@scylladb.com>	2021-04-12 08:11:35 +02:00
Nadav Har'El	989589b570	test/cql-pytest,alternator,redis: avoid an annoying warning This patch avoids an annoying warning Warning: Unknown config ini key: flake8-ignore when running one of the pytest-based test projects (cql-pytest, alternator and redis) on recent versions of pytest. In commit `2022da2405`, we added to the toplevel Scylla directory a "tox.ini" file with some intention to configure Python syntax checking. One of the configurations in this tox.ini is: [pytest] flake8-ignore = E501 It turns out that pytest, if a certain test directory does not have its own pytest.ini file, looks up in ancestor directory for various configuration files (the configuration file precedence is described in https://docs.pytest.org/en/stable/customize.html), and this includes this tox.ini configuration section. Recent versions of pytest complain about the "flake8-ignore" configuration parameter, which they don't recognize. This parameter may be ok (?) if you install a flake8 pytest plugin, but we do not require users to do this for running these tests. Moreover, whatever noble intentions this commit and its tox.ini had, nobody ever followed up on it. The three pytest-based test directories never adhered to flake8's recommended syntax, and never intended to do so. None of the developers of these tests use flake8, or seem to wish to do so. If this ever changes, we can change the pytest.ini or undo this commit and go back to a top-level tox.ini, but I don't see this happening anytime soon. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210411085708.300851-1-nyh@scylladb.com>	2021-04-12 08:04:06 +02:00
Tomasz Grabiec	305372820d	Merge "Make position_in_partition::tri_compare use strong_ordering" from Pavel Emelyanov There are some users of that tri_comparator which are also converted to strong_ordering. Most of the code using those is, in turn, already handling return values interchangeably. The bound_view::tri_compare, which's used by the guy, is still returning int. tests: unit(dev) * xemul/br-position-tri-compare: code: Relax position_in_partition::tri_compare users position_in_partition: Convert tri_compare to strong_ordering test: Convert clustering_fragment_summary::tri_cmp to strong_ordering repair: Convert repair_sync_boundary::tri_compare to strong_ordering view: Don't expect int from position_in_partition::tri_compare	2021-04-09 17:54:38 +02:00
Pavel Emelyanov	64074f45ce	code: Relax position_in_partition::tri_compare users There are some pieces left doing res <=> 0 with the res now being a strong_ordering itself. All these can be just dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 18:20:39 +03:00
Pavel Emelyanov	a15f158661	test: Convert clustering_fragment_summary::tri_cmp to strong_ordering Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 18:20:39 +03:00
Botond Dénes	3dbb456fba	test: mutation_query_test: use local mutation_query() implementation Add a local `mutation_query()` variant, which only contains the pieces of logic the test really wants to test: invoking `mutation_querier::consume_page()` with a `reconcilable_result_builder`. This allows us to get rid of the now otherwise unused `mutation_query()`.	2021-04-09 13:40:27 +03:00
Botond Dénes	59ea36731b	test: mutation_query_test: use local data_query() implementation The test only wants to test result size calculation so it doesn't need the whole `data_query()` logic. Replace the call to `data_query()` with one to a local alternative which contains just the necessary bits -- invoking `data_querier::consume_page()` with the right result builder. This allows us get rid of the now otherwise unused `data_query()`.	2021-04-09 13:40:27 +03:00
Pavel Emelyanov	4558eb3afc	partition_snapshot_row_cursor: Move cells hash creation to reader Right now call to .row() method may create hash on row's cells. It's counterintuitive to see a const method that transparently changes something it points to. Since the only caller of a row() who knows whether the hash creation is required is the cache reader, it's better to move the call to prepare_hash() into it. Other than making the .row() less surprising this also helps to get rid of the whole method by the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 12:18:29 +03:00
Pavel Emelyanov	00caf5f219	partition_snapshot_row_cursor: Move read_partition into test The method in question is test-only helper, there's no need in keeping it as a part of the API. Another reason to move is that the method is O(number of rows) and doesn't preempt while looping, but cursor code users try hard not to stall the reactor. So even though this method has a meaningful semantics within the class, it will better be reinvented if needed in core code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 12:16:13 +03:00
Gleb Natapov	b9175edea4	raft: test: check that a server with id zero cannot be neither created nor added to a config Message-Id: <20210407134853.1964226-2-gleb@scylladb.com>	2021-04-08 17:07:18 +02:00
Kamil Braun	3687757115	sstables: fix TWCS single key reader sstable filter The filter passed to `min_position_reader_queue`, which was used by `clustering_order_reader_merger`, would incorrectly include sstables as soon as they passed through the PK (bloom) filter, and would include sstables which didn't pass the PK filter (if they passed the CK filter). Fortunately this wouldn't cause incorrect data to be returned, but it would cause sstables to be opened unnecessarily (these sstables would immediately return eof), resulting in a performance drop. This commit fixes the filter and adds a regression test which uses statistics to check how many times the CK filter was invoked. Fixes #8432. Closes #8433	2021-04-08 18:03:49 +03:00
Tomasz Grabiec	6d6f39a7b3	Merge "fixes for stepdown and quorum check" from Gleb The series contains code cleanups and fixes for stepdown process and quorum check code. Note this is re-send of already posted patches lumped together for convenience. * scylla-dev/raft-fixes-v1: raft: add test for check quorum on a leader raft: fix quorum check code for joint config and non-voting members raft: do not hang on waiting for entries on a leader that was removed from a cluster raft: add more tracing to stepdown code raft: use existing election_elapsed() function instead of redo the calculation raft: test: add test case for stepdown process raft: check that a node is still the leader after initiating stepdown process	2021-04-08 15:18:52 +02:00
Avi Kivity	202c631dee	test: perf: perf_simple_query: collect allocation and task statistics Calculate and display the number of memory allocations and tasks executed per operation. Sample results (--smp 1): 180022.46 tps (90 allocs/op, 20 tasks/op) 178963.44 tps (90 allocs/op, 20 tasks/op) 178702.41 tps (90 allocs/op, 20 tasks/op) 177679.74 tps (90 allocs/op, 20 tasks/op) 179539.36 tps (90 allocs/op, 20 tasks/op) median 178963.44 tps (90 allocs/op, 20 tasks/op) median absolute deviation: 575.92 maximum: 180022.46 minimum: 177679.74 This allows less noisy tracking of how some changes impact performance.	2021-04-07 17:54:48 +03:00
Avi Kivity	3a90df39c5	perf: deinline some functions in perf.hh Those functions were defined in a header, but not marked inline. This made including the header from two source files impossible, as the linker would complain about duplicate symbols. Rather than making them inline, put them in a new source file perf.cc as they don't need to be inline.	2021-04-07 17:51:58 +03:00
Avi Kivity	29a674cd94	test: perf: perf_fast_forward: report allocation rate and tasks These are more stable than cpu consumed across runs, and impact performance directly. Closes #8422	2021-04-07 15:41:43 +02:00
Piotr Sarna	8e808a56d2	Merge 'commitlog: Fix race and edge condition in delete_segments' from Calle Wilund Fixes #8363 Fixes #8376 Delete segements has two issues when running with size-limited commit log and strict adherence to said limit. 1.) It uses parallel processing, with deferral. This means that the disk usage variables it looks at might not be fully valid - i.e. we might have already issued a file delete that will reduce disk footprint such that a segment could instead be recycled, but since vars are (and should) only updated _post_ delete, we don't know. 2.) It does not take into account edge conditions, when we only delete a single segment, and this segment is the border segment - i.e. the one pushing us over the limit, yet allocation is desperately waiting for recycling. In this case we should allow it to live on, and assume that next delete will reduce footprint. Note: to ensure exact size limit, make sure total size is a multiple of segment size. if we had an error in recycling (disk rename?), and no elements are available, we could have waiters hoping they will get segements. abort the queue (not permanent, but wakes up waiters), and let them retry. Since we did deletions instead, disk footprint should allow for new allocs at least. Or more likely, everything is broken, but we will at least make more noise. Closes #8372 * github.com:scylladb/scylla: commitlog: Add signalling to recycle queue iff we fail to recycle commitlog: Fix race and edge condition in delete_segments commitlog: coroutinize delete_segments commitlog_test: Add test for deadlock in recycle waiter	2021-04-07 15:13:25 +02:00
Raphael S. Carvalho	8e0a1ca866	sstable_set: Implement compound_sstable_set's create_single_key_sstable_reader() compound set isn't overriding create_single_key_sstable_reader(), so default implementation is always called. Although default impl will provide correct behavior, specialized ones which provides better perf, which currently is only available for TWCS, were being ignored. compound set impl of single key reader will basically combine single key readers of all sets managed by it. Fixes #8415. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210406205009.75020-1-raphaelsc@scylladb.com>	2021-04-07 12:36:30 +03:00
Nadav Har'El	da11cd99f7	Merge 'Add a (failing) test for picking secondary indexes in order' from Piotr Sarna Currently the heuristics for picking an index for a query are not very well defined. It would be best if we used statistics to pick the index which is likely to perform the fastest, but for starters we should at least let the user decide which index to pick by picking the first one by the order of restrictions passed to the query. The (failing) test case from this patch shows the expected results. Ref: #7969 Closes #8414 * github.com:scylladb/scylla: cql-pytest: add a failing test for index picking order cql3: add tracing used secondary index	2021-04-07 11:40:37 +03:00
Piotr Sarna	1f7b972db7	cql-pytest: add a failing test for index picking order Currently the heuristics for picking an index for a query are not very well defined. It would be best if we used statistics to pick the index which is likely to perform the fastest, but for starters we should at least let the user decide which index to pick by picking the first one by the order of restrictions passed to the query. The (failing) test case from this patch shows the expected results. Ref: #7969	2021-04-07 10:05:00 +02:00
Gleb Natapov	68d73bd4c8	raft: add test for check quorum on a leader	2021-04-07 10:15:33 +03:00
Gleb Natapov	bdb59307d3	raft: test: add test case for stepdown process Add the test for the case where C_new entry is not the last one in a leader that is been removed from a cluster. In this case a leader will continue replication even after committing C_new and will start stepdown process later, when at least one follower is fully synchronized.	2021-04-07 10:15:33 +03:00
Calle Wilund	813694b617	commitlog_test: Add test for deadlock in recycle waiter Not a very good test, mind you. Nothing to verify, just see if the test times out. But try to make it at least complete for failure report.	2021-04-06 16:38:14 +00:00
Tomasz Grabiec	4b10247a4f	Merge "raft: do not assert when receiving unexpected messages in a leader state" from Gleb * scylla-dev/raft-cleanup-v2: raft: test: add test that leader behaves as expected when it gets unexpended messages raft: do not assert when receiving unexpected messages in a leader state raft: use existing function to check if election timeout elapsed	2021-04-06 16:52:23 +02:00
Konstantin Osipov	c83cf1f965	uuid: switch the API to use std::chrono A follow up for the patch for #7611. This change was requested during review and moved out of #7611 to reduce its scope. The patch switches UUID_gen API from using plain integers to hold time units to units from std::chrono. For one, we plan to switch the entire code base to std::chrono units, to ensure type safety. Secondly, using std::chrono units allows to increase code reuse with template metaprogramming and remove a few of UUID_gen functions that beceme redundant as a result. * switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch min_time_UUID(), max_time_UUID(), create_time_safe() to std::chrono * remove unused variant of from_unix_timestamp() * remove unused get_time_UUID_bytes(), create_time_unsafe(), redundant get_adjusted_timestamp() * inline get_raw_UUID_bytes() * collapse to similar implementations of get_time_UUID() * switch internal constants to std::chrono * remove unnecessary unique_ptr from UUID_gen::_instance Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>	2021-04-06 17:12:54 +03:00
Nadav Har'El	0d0db05cf3	test/alternator: speed up two slow xfailing tests By far the two slowest Alternator tests when running a development build on my laptop are test_gsi.py::test_gsi_projection_include and test_gsi.py::test_gsi_projection_keys_only Each of those takes around 3.2, and the sum of just these two tests is as much as 10% (!) of all other 600 tests. The reason why these tests are slow is that they check scanning a GSI with projection. Scylla currently ignores the projection, so the scan returns the wrong value. Because this is a GSI, which supports only eventually- consistent reads, we need to retry the read - and did it for up to 3 seconds! But this retry only makes sense if the GSI read did not yet return the expected data. But in these xfailing test, we read a wrong item (with too many attributes) almost immediately, and this should indicate an immediate failure - no amount of retry would help. So in this patch we detect this case and fail the test immediately instead of wasting 3 seconds in retries. On my laptop with dev build, this patch reduces the time to run the entire Alternator test suite from 70 seconds to 63 seconds. Also, now that we never just waste time until the timeout, we can increase it to any number, and in this patch we increase it from 3 seconds to 5. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317183918.1775383-1-nyh@scylladb.com>	2021-04-06 14:49:15 +02:00
Nadav Har'El	15cab90f7b	test/alternator: switch some fixture scopes from "session" to "module" In conftest.py we have several fixtures creating shared tables which many test files can share, so they are marked with the "session" scope - all the tests in the testing session may share the same instance. This is fine. Some of test files have additional fixtures for creating special tables needed only in those files. Those were also, unnecessarily, marked "session" scope as well. This means that these temporary tables are only deleted at the very end of test suite, event though they can be deleted at the end of the test file which needed them. This is exactly what the "module" fixture scope is, so this patch changes all the fixtures private to one test file to be "module". After this patch, the teardown of the last test in the suite goes down from 4 seconds to just 1.5 seconds (it's still long because there are still plenty of session-scoped fixtures in conftest.py). Another small benefit is that the peak disk usage of the test suite is lower, because some of the temporary tables are deleted sooner. This patch does not change any test functionality, and also does not make any test faster - it just changes the order of the fixture teardowns. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210317175036.1773774-1-nyh@scylladb.com>	2021-04-06 14:43:36 +02:00
Avi Kivity	40b60e8f09	Merge 'repair: Switch to use NODE_OPS_CMD for replace operation' from Asias He In commit `c82250e0cf` (gossip: Allow deferring advertise of local node to be up), the replacing node is changed to postpone the responding of gossip echo message to avoid other nodes sending read requests to the replacing node. It works as following: 1) replacing node does not respond echo message to avoid other nodes to mark replacing node as alive 2) replacing node advertises hibernate state so other nodes knows replacing node is replacing 3) replacing node responds echo message so other nodes can mark replacing node as alive This is problematic because after step 2, the existing nodes in the cluster will start to send writes to the replacing node, but at this time it is possible that existing nodes haven't marked the replacing node as alive, thus failing the write request unnecessarily. For instance, we saw the following errors in issue #8013 (Cassandra stress fails to achieve consistency when only one of the nodes is down) ``` scylla: [shard 1] consistency - Live nodes 2 do not satisfy ConsistencyLevel (2 required, 1 pending, live_endpoints={127.0.0.2, 127.0.0.1}, pending_endpoints={127.0.0.3}) [shard 0] gossip - Fail to send EchoMessage to 127.0.0.3: std::runtime_error (Not ready to respond gossip echo message) c-s: java.io.IOException: Operation x10 on key(s) [4c4f4d37324c35304c30]: Error executing: (UnavailableException): Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive ``` To solve this problem, we can do the replacing operation in multiple stages. One solution is to introduce a new gossip status state as proposed here: gossip: Introduce STATUS_PREPARE_REPLACE #7416 1) replacing node does not respond echo message 2) replacing node advertises prepare_replace state (Remove replacing node from natural endpoint, but do not put in pending list yet) 3) replacing node responds echo message 4) replacing node advertises hibernate state (Put replacing node in pending list) Since we now have the node ops verb introduced in `829b4c1438` (repair: Make removenode safe by default), we can do the multiple stage without introducing a new gossip status state. This patch uses the NODE_OPS_CMD infrastructure to implement replace operation. Improvements: 1) It solves the race between marking replacing node alive and sending writes to replacing node 2) The cluster reverts to a state before the replace operation automatically in case of error. As a result, it solves when the replacing node fails in the middle of the operation, the repacing node will be in HIBERNATE status forever issue. 3) The gossip status of the node to be replaced is not changed until the replace operation is successful. HIBERNATE gossip status is not used anymore. 4) Users can now pass a list of dead nodes to ignore explicitly. Fixes #8013 Closes #8330 * github.com:scylladb/scylla: repair: Switch to use NODE_OPS_CMD for replace operation gossip: Add advertise_to_nodes gossip: Add helper to wait for a node to be up gossip: Add is_normal_ring_member helper	2021-04-04 12:54:09 +03:00
Gleb Natapov	10781037f5	raft: test: add test that leader behaves as expected when it gets unexpended messages	2021-04-04 11:33:35 +03:00
Avi Kivity	4739df2cb1	Merge 'cql3: remove linearizations in the write path' from Michał Chojnowski As a part of the effort of removing big, contiguous buffers from the codebase, cql3::raw_value should be made fragmented. Unfortunately a straightforward rewrite to a fragmented buffer type is not possible, because we want cql3::raw_value to be compatible with cql3::raw_value_view, and we want that view to be based on fragmented_temporary_buffer::view, so that it can be used to view data coming directly from seastar without copying. This patch makes cql3::raw_value fragmented by making cql3::raw_value_view a `variant` of managed_bytes_view and fragmented_temporary_buffer::view. Code users which depended on `cql3::raw_value` being `bytes`, and cql::raw_value_view being `fragmented_temporary_buffer::view` underneath were adjusted to the new, dual representation, mainly through the `cql3::raw_value_view::with_value` visitor and deserialization/validation helpers added to `cql3::raw_value_view`. The second part of this series gets rid of linearizations occuring when processing compound types in the CQL layer. This is achieved by storing their elements in `managed_bytes` instead of `bytes` in the partially deserialized form (`lists::value` `tuples::value`, etc.) outputting `managed_bytes` instead of `bytes` in functions which go from the partially deserialized form to the atomic cell format (for frozen types), and avoiding calling deserialize/serialize on individual elements when it's not necessary. (It's only necessary for CQLv2, because since CQLv3 the format on the wire is the same as our internal one). The above also forces some changes to `expression.cc`, and `restrictions`, mainly because `IN` clauses store their arguments as `lists` and `tuples`, and the code which handled this clause expected `bytes`. After this series, the path from prepared CQL statements to `atomic_cell_or_collection` is almost completely linearization-free. The last remaining place is `collection_mutation_description`, where map keys are linearized to `bytes`. Closes #8160 * github.com:scylladb/scylla: cql3: update_parameters: remove unused version of make_cell for bytes_view types: collection: remove an unused version of pack_fragmented cql3: optimize the deserialization of collections cql3: maps, sets: switch the element type from bytes to managed_bytes cql3: expression: use managed_bytes instead of bytes where possible cql3: expr: expression: make the argument of to_range a forwarding reference cql3: don't linearize elements of lists, tuples, and user types cql3: values: add const managed_bytes& constructor to raw_value_view cql3: output managed_bytes instead of bytes in get_with_protocol_version types: collection: add versions of pack for fragmented buffers types: add write_collection_{value,size} for managed_bytes_mutable_view cql3: tuples, user_types: avoid linearization in from_serialized() and get() types: tuple: add build_value_fragmented cql3: update_parameters: add make_cell version for managed_bytes_view cql3: remove operation::make_*cell cql3: values: make raw_value fragmented cql3: values: remove raw_value_view::operator== cql3: switch users of cql3::raw_value_view to internals-independent API cql3: values: add an internals-independent API to raw_value_view utils: managed_bytes: add a managed_bytes constructor from FragmentedView utils: managed_bytes: add operator<< and to_hex for managed_bytes utils: fragment_range: add to_hex configure: remove unused link dependencies from UUID_test	2021-04-01 15:21:32 +03:00
Pavel Emelyanov	8bbe2eae5e	btree: Convert comparator to <=> It turned out that all the users of btree can already be converted to use safer std::strong_ordering. The only meaningful change here is the btree code itself -- no more ints there. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210330153648.27049-1-xemul@scylladb.com>	2021-04-01 12:56:08 +03:00
Michał Chojnowski	5984d6b2ce	cql3: values: remove raw_value_view::operator== It's only used in a single test, and there is no reason why it should ever be used anywhere else. So let's remove it from the public header and move it to that test.	2021-04-01 10:42:07 +02:00
Michał Chojnowski	b9322a6b71	cql3: switch users of cql3::raw_value_view to internals-independent API We want to change the internals of cql3::raw_value{_view}. However, users of cql3::raw_value and cql3::raw_value_view often use them by extracting the internal representation, which will be different after the planned change. This commit prepares us for the change by making all accesses to the value inside cql3::raw_value(_view) be done through helper methods which don't expose the internal representation publicly. After this commit we are free to change the internal representation of raw_value_{view} without messing up their users.	2021-04-01 10:42:04 +02:00
Michał Chojnowski	4715268e30	utils: managed_bytes: add operator<< and to_hex for managed_bytes We will need them to replace bytes with managed_bytes in some places in an upcoming patch. The change to configure.py is necessary because opearator<< links to to_hex in bytes.cc.	2021-04-01 10:39:42 +02:00
Asias He	bdb95233e8	gossip: Add advertise_to_nodes gossiper::advertise_to_nodes() is added to allow respond to gossip echo message with specified nodes and the current gossip generation number for the nodes. This is helpful to avoid the restarted node to be marked as alive during a pending replace operation. After this patch, when a node sends a echo message, the gossip generation number is sent in the echo message. Since the generation number changes after a restart, the receiver of the echo message can compare the generation number to tell if the node has restarted. Refs #8013	2021-04-01 09:38:54 +08:00
Piotr Jastrzebski	57c7964d6c	config: ignore enable_sstables_mc_format flag Don't allow users to disable MC sstables format any more. We would like to retire some old cluster features that has been around for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this we first have to make sure that all existing clusters have them enabled. It is impossible to know that unless we stop supporting enable_sstables_mc_format flag. Test: unit(dev) Refs #8352 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8360	2021-03-31 12:23:59 +03:00
Avi Kivity	d2921b5112	Merge 'Clean up > 2-year-old features' from Piotr Sarna Following the work started in `253a7640e`, a new batch of old features is assumed to be always available. They are all still announced via gossip, but the code assumes that the feature is always true, because we only support upgrades from a previous release, and the release window is considerably smaller than 2 years. Features picked this time via `git blame`, along with the date of their introduction: * `fe4afb1aa3` (Asias He 2018-09-05 14:52:10 +0800 109) static const sstring ROW_LEVEL_REPAIR = "ROW_LEVEL_REPAIR"; * `ff5e541335` (Calle Wilund 2019-02-05 13:06:07 +0000 110) static const sstring TRUNCATION_TABLE = "TRUNCATION_TABLE"; * `fefef7b9eb` (Tomasz Grabiec 2019-03-05 19:08:07 +0100 111) static const sstring CORRECT_STATIC_COMPACT_IN_MC = "CORRECT_STATIC_COMPACT_IN_MC"; Tests: unit(dev) Closes #8235 * github.com:scylladb/scylla: sstables,test: remove variables depending on old features gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true sstables: stop relying on CORRECT_STATIC_COMPACT_IN_MC feature gms: make TRUNCATION_TABLE feature unconditionally true gms: make ROW_LEVEL_REPAIR feature unconditionally true repair: stop relying on ROW_LEVEL_REPAIR feature	2021-03-30 16:13:35 +03:00
Avi Kivity	8785dd62cb	tests: use kernel page cache Tests are short-lived and use a small amount of data. They are also often run repeatly, and the data is deleted immediately after the test. This is a good scenario for using the kernel page cache, as it can cache read-only data from test to test, and avoid spilling write data to disk if it is deleted quickly. Acknowledge this by using the new --kernel-page-cache option for tests. This is expected to help on large machines, where the disk can be overloaded. Smaller machines with NVMe disks probably will not see a difference. Closes #8347	2021-03-30 12:04:55 +02:00
Piotr Sarna	6de2691bbd	sstables,test: remove variables depending on old features In order to maintain backward compatibility wrt. cluster features, two boolean variables were kept in sstable writers: - correctly_serialize_non_compound_range_tombstones - correctly_serialize_static_compact_in_mc Since these features are assumed to always be present now, the above variables are no longer needed and can be purged.	2021-03-30 09:37:41 +02:00

1 2 3 4 5 ...

1485 Commits