scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	d88ae7edae	Merge 'migration_manager: retire global storage proxy refs' from Avi Kivity Replace get_local_storage_proxy() and get_local_storage_proxy() with constructor-provided references. Some unneeded cases were removed. Test: unit (dev) Closes #9816 * github.com:scylladb/scylla: migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference migration_manager: don't keep storage_proxy alive during schema_check verb mm: don't capture storage proxy shared_ptr during background schema merge mm: remove stats on schema version get	2021-12-17 17:53:08 +03:00
Raphael S. Carvalho	f508f54f3e	table: move min_compaction_threshold() and compaction_enforce_min_threshold() into table_state Compaction specific methods can be implemented in table_state only, as they aren't needed elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211214191822.164223-1-raphaelsc@scylladb.com>	2021-12-17 10:00:31 +02:00
Avi Kivity	a97731a7e5	migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference A static helper also gained a storage_proxy parameter.	2021-12-16 21:05:47 +02:00
Pavel Emelyanov	b2a62d2b59	Merge 'db: range_tombstone_list: Deoverlap empty range tombstones' from Tomasz Grabiec Appending an empty range adjacent to an existing range tombstone would not deoverlap (by dropping the empty range tombstone) resulting in different (non canoncial) result depending on the order of appending. Suppose that range tombstone [a, b] covers range tombstone [x, x), and [a, x) and [x, b) are range tombstones which correspond to [a, b] split around position x. Appending [a, x) then [x, b) then [x, x) would give [a, b) Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b) The fix is to drop empty range tombstones in range_tombstone_list so that the result is canonical. Fixes #9661 Closes #9764 * github.com:scylladb/scylla: range_tombstone_list: Deoverlap adjacent empty ranges range_tombstone_list: Convert to work in terms of position_in_partition	2021-12-16 10:00:40 +03:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Avi Kivity	399e2895f1	test: cql_test_env: provide access to data_dictionary Allow tests to have access to the data_dictionary.	2021-12-15 13:54:18 +02:00
Avi Kivity	3ac622bdd8	Merge "Add v2 versions of make_forwadable() and make_flat_mutation_reader_from_fragments()" from Botond " These two readers are crucial for writing tests for any composable reader so we need v2 versions of them before we can convert and test the combined reader (for example). As these two readers are often used in situations where the payload they deliver is specially crafted for the test at hand, we keep their v1 versions too to avoid conversion meddling with the tests. Tests: unit(dev) " * 'forwarding-and-fragment-reader-v2/v1' of https://github.com/denesb/scylla: flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments() test/lib/mutation_source_test: don't force v1 reader in reverse run mutation_source: add native_version() getter flat_mutation_reader_v2: add make_forwardable() position_in_partition: add after_key(position_in_partition_view) flat_mutation_reader: make_forwardable(): fix indentation flat_mutation_reader: make_forwardable(): coroutinize reader	2021-12-14 20:43:09 +02:00
Benny Halevy	32d61a3d09	test: sstable_directory_test_table_lock_works: verify that truncate is blocked on the the table lock The test in its current form is invalid, as database::remove does removing the table's name from its listing as well as from the keyspace metadata, so it won't be found after that. That said, database::drop_column_family then proceeds to truncate and stop the table, after calling await_pending_ops, and the latter should indeed block on the lock taken by the test. This change modifies the test to create some sstables in the table's directory before starting the sstable_directory. Then, when executing "drop table" in the background, wait until the table is not found by db.find_column_family That would fail the test before this change. See https://jenkins.scylladb.com/job/scylla-enterprise/job/next/1442/artifact/testlog/x86_64_debug/sstable_directory_test.sstable_directory_test_table_lock_works.4720.log ``` INFO 2021-12-13 14:00:17,298 [shard 0] schema_tables - Dropping ks.cf id=00487bc0-5c1d-11ec-9e3b-a44f824027ae version=b10c4994-31c7-3f5a-9591-7fedb0273c82 test/boost/sstable_directory_test.cc(453): fatal error: in "sstable_directory_test_table_lock_works": unexpected exception thrown by table_ok.get() ``` A this point, the test verifies again that the sstables are still on disk (and no truncate happened), and only after drop completed, the table should not exist on disk. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211214104407.2225080-1-bhalevy@scylladb.com>	2021-12-14 14:26:17 +02:00
Nadav Har'El	31eeb44d28	alternator: fix error on UpdateTable for non-existent table When the UpdateTable operation is called for a non-existent table, the appropriate error is ResourceNotFoundException, but before this patch we ran into an exception, which resulted in an ugly "internal server error". In this patch we use the existing get_table() function which most other operations use, and which does all the appropriate verifications and generates the appropriate Alternator api_error instead of letting internal Scylla exceptions escape to the user. This patch also includes a test for UpdateTable on a non-existent table, which used to fail before this patch and pass afterwards. We also add a test for DeleteTable in the same scenario, and see it didn't have this bug. As usual, both tests pass on DynamoDB, which confirms we generate the right error codes. Fixes #9747. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211206181605.1182431-1-nyh@scylladb.com>	2021-12-14 13:09:27 +01:00
Tomasz Grabiec	b228ddabb7	Merge "Move schema altering statement to raft" from Gleb The series is on top of "wire up schema raft state machine". It will apply without, but will not work obviously (when raft is disabled it does nothing anyway). This series does not provide any linearisability just yet though. It only uses raft as a means to distribute schema mutations. To achieve linearisability more work is needed. We need to at lease make sure that schema mutation use monotonically increasing timestamps and, since schema altering statement are RMW, no modification to schema were done between schema mutation creation and application. If there were an operation needs to be restarted. * scylla-dev/gleb/raft-schema-v5: (59 commits) cql3: cleanup mutation creation code in ALTER TYPE cql3: use migration_manager::schema_read_barrier() before accessing a schema in altering statements cql3: bounce schema altering statement to shard 0 migration_manager: add is_raft_enabled() to check if raft is enabled on a cluster migration_manager: add schema_read_barrier() function migration_manager: make announce() raft aware migration_manager: co-routinize announce() function migration_manager: pass raft_gr to the migration manager migration_manager: drop view_ptr array from announce_column_family_update() mm: drop unused announce_ methods cql3: drop schema_altering_statement::announce_migration() cql3: drop has_prepare_schema_mutations() from schema altering statement cql3: drop announce_migration() usage from schema_altering_statement cql3: move DROP AGGREGATE statement to prepare_schema_mutations() api migration_manager: add prepare_aggregate_drop_announcement() function cql3: move DROP FUNCTION statement to prepare_schema_mutations() api migration_manager: add prepare_function_drop_announcement() function cql3: move CREATE AGGREGATE statement to prepare_schema_mutations() api migration_manager: add prepare_new_aggregate_announcement() function cql3: move CREATE FUNCTION statement to prepare_schema_mutations() api ...	2021-12-14 11:05:32 +01:00
Nadav Har'El	815324713e	test/alternator: add more tests for ADD operand mismatch The "ADD" operator in UpdateItem's AttributeUpdates supports a number of types (numbers, sets and strings), should result in a ValidationException if the attribute's existing type is different from the type of the operand - e.g., trying to ADD a number to an attribute which has a set as a value. So far we only had partial testing for this (we tested the case where both operands are sets, but of different types) so this patch adds the missing tests. The new tests pass (on both Alternator and DynamoDB) - we don't have a bug there. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211213195023.1415248-1-nyh@scylladb.com>	2021-12-14 11:15:23 +02:00
Botond Dénes	425c0b0394	test/cql-pytest/nodetool.py: fix take_snapshot() for cassandra take_snapshot() contained copypasta from flush() for the nodetool variant. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211208110129.141592-1-bdenes@scylladb.com>	2021-12-14 11:15:23 +02:00
Tomasz Grabiec	78a6474982	range_tombstone_list: Deoverlap adjacent empty ranges Appending an empty range adjacent to an existing range tombstone would not deoverlap (by dropping the empty range tombstone) resulting in different (non canoncial) result depending on the order of appending. Suppose that [a, b] covers [x, x) Appending [a, x) then [x, b) then [x, x) would give [a, b) Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b) Fix by dropping empty range tombstones.	2021-12-13 21:31:36 +01:00
Nadav Har'El	41c7b2fb4b	test/cql-pytest run: fix inaccurate comment The code in test/cql-pytest/run.py can start Scylla (or Cassandra, or Redis, etc.) in a random IP address in 127.... We explained in a comment that 127.0.0. is used by CCM so we avoid it in case someone runs both dtest and test.py in parallel on the same machine. But this observation was not accurate: Although the original CCM did use only 127.0.0., in Scylla's CCM we added in 2017, in commit 00d3ba5562567ab83190dd4580654232f4590962, the ability to run multiple copies of CCM in parallel; CCM now uses 127.0.., not just 127.0.0.. So we need to correct this in the comment. Luckily, the code doesn't need to change! We already avoided the entire 127.0.. for simplicity, not just 127.0.0.*. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211212151339.1361451-1-nyh@scylladb.com>	2021-12-13 18:12:11 +02:00
Avi Kivity	e44a28dce4	Merge "compaction: Allow data from different buckets (e.g. windows) to be compacted together" from Raphael " Today, data from different buckets (e.g. windows) cannot be compacted together because mutation compactor happens inside each consumer, where each consumer is done on behalf of a particular bucket. To solve this problem, mutation compaction process is being moved from consumer into producer, such that interposer consumer, which is responsible for segregation, will be feeded with compacted data and forward it into the owner bucket. Fixes #9662. tests: unit(debug). " * 'compact_across_buckets_v2' of github.com:raphaelsc/scylla: tests: sstable_compaction_test: add test_twcs_compaction_across_buckets compaction: Move mutation compaction into producer for TWCS compaction: make enable_garbage_collected_sstable_writer() more precise	2021-12-12 15:07:15 +02:00
Gleb Natapov	e9fafea5c1	migration_manager: pass raft_gr to the migration manager Migration manager will be use raft group zero to distribute schema changes.	2021-12-11 12:31:07 +02:00
Gleb Natapov	38e1f85959	migration_manager: drop view_ptr array from announce_column_family_update() No users pass it any longer.	2021-12-11 12:31:07 +02:00
Raphael S. Carvalho	7c90088152	tests: sstable_compaction_test: add test_twcs_compaction_across_buckets Verify that TWCS compaction can now compact data across time windows, like a tombstone which will cause all shadowed data to be purged once they're all compacted together. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 17:14:45 -03:00
Botond Dénes	39426b1aa3	flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments() The main difference compared to v1 (apart from having _v2 suffix at relevant places) is how slicing and reversing works. The v2 variant has native reverse support built-in because the reversing reader is not something we want to convert to v2. A native v2 mutation-source test is also added.	2021-12-10 15:48:49 +02:00
Botond Dénes	20e45987b5	test/lib/mutation_source_test: don't force v1 reader in reverse run Currently in the reverse run we wrap the test-provided mutation-source and create a v1 reader with it, forcing a conversion if the mutation-source has a v2 factory. Worse still, if the test is v2 native, there will be a double conversion. This patch fixes this by creating a wrapper mutation-source appropriate to the version of the underlying factory of the wrapped mutation-source.	2021-12-10 15:48:49 +02:00
Tomasz Grabiec	4d302dfa1a	Merge "Fix exception safety of rows insertion" from Pavel Emelyanov There are several places that (still) use throwing b-tree .insert_before() method and don't manage the inserted object lifetime. Some of those places also leave the leaked rows_entry on the LRU delaying the assertion failure by the time those entries get evicted (#9728) To prevent such surprises in the future, the set removes the non-safe inserters from the B-tree code. Actually most of this set is that removal plus preparations for reviewability. * xemul/br-rows-insertion-exception-safety-2: btree: Earnestly discourage from insertion of plain references row-cache: Handle exception (un)safety of rows_entry insertion partition_snapshot_row_cursor: Shuffle ensure_result creation mutation_partition: Use B-tree insertion sugar tests: Make B-tree tests use unique-ptrs for insertion	2021-12-10 13:55:18 +01:00
Pavel Emelyanov	5a405a4273	tests: Make B-tree tests use unique-ptrs for insertion The non-smart-pointers overloads are going away, prepare tests for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-10 12:35:12 +03:00
Nadav Har'El	03d67440ef	alternator: test additional metrics and fix another broken counter In issue #9406 we noticed that a counter for BatchGetItem operations was missing. When we fixed it, we added a test which checked this counter - but only this counter. It was left as a TODO to test the rest of the Alternator metrics, and this is what this patch does. Here we add a comprehensive test for all of the operations supported by Scylla and how they increase the appropriate operation counter. With this test we discovered a new bug: the DescribeTimeToLive operation incremented the UpdateTimeToLiveCounter :-( So in this patch we also include a fix for that bug, and the new test verifies that it is fixed. In addition to the operation counters, Alternator also has additional metric and we also added tests for some of them - but not all. The remaining untested metrics are listed in a TODO comment. Message-Id: <20211206154727.1170112-1-nyh@scylladb.com>	2021-12-10 08:08:54 +02:00
Benny Halevy	cca956bce2	database_test: snapshot_with_quarantine_works: get the list of sstables from table object Rather than the filesystem, to reduce flakiness. Also, add some test logging. Fixes #9763 Test: database_test(debug, release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211209175144.854896-1-bhalevy@scylladb.com>	2021-12-09 21:01:25 +02:00
Benny Halevy	8728fd480d	database_test: do_with_some_data: get the return func future do_with_some_data runs a function in a seastar thread. It needs to get() the future func returns rather than propagating it. This solves a secondary failure due to abandoned future when the test case fails, as seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4254/artifact/testlog/x86_64_debug/database_test.snapshot_with_quarantine_works.381.log ``` test/boost/database_test.cc(903): fatal error: in "snapshot_with_quarantine_works": critical check expected.empty() has failed WARN 2021-12-08 00:35:16,300 [shard 0] seastar - Exceptional future ignored: boost::execution_aborted, backtrace: 0x10935e50 0x16ff2d8d 0x16ff2a4d 0x16ff5033 0x16ff5ec2 0x162d4ce9 0x10a2bdb5 0x10a2bd24 0x10a54ca4 0x10a27cf3 0x10a22151 0x10a67c9d 0x10a67a78 0x163ac37e 0x163b29e9 0x163b7690 0x163b51c1 0x17c212df 0x17c1f097 0x17bf8b4c 0x17bf83f2 0x17bf82a2 0x17bf7d52 0x10f8bf5a 0x166db84b /lib64/libpthread.so.0+0x9298 /lib64/libc.so.6+0x100352 ... *** 1 abandoned failed future(s) detected Failing the test because fail was requested by --fail-on-abandoned-failed-futures ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211209174512.851945-1-bhalevy@scylladb.com>	2021-12-09 21:11:56 +03:00
Nadav Har'El	c6f2afb93d	Merge 'cql3: Allow to skip EQ restricted columns in ORDER BY' from Jan Ciołek In queries like: ```cql SELECT * FROM t WHERE p = 0 AND c1 = 0 ORDER BY (c1 ASC, c2 ASC) ``` we can skip the requirement to specify ordering for `c1` column. The `c1` column is restricted by an `EQ` restriction, so it can have at most one value anyway, there is no need to sort. This commit makes it possible to write just: ```cql SELECT * FROM t WHERE p = 0 AND c1 = 0 ORDER BY (c2 ASC) ``` I reorganized the ordering code, I feel that it's now clearer and easier to understand. It's possible to only introduce a small change to the existing code, but I feel like it becomes a bit too messy. I tried it out on the [`orderby_disorder_small`](https://github.com/cvybhu/scylla/commits/orderby_disorder_small) branch. The diff is a bit messy because I moved all ordering functions to one place, it's better to read [select_statement.cc](https://github.com/cvybhu/scylla/blob/orderby_disorder/cql3/statements/select_statement.cc#L1495-L1658) lines 1495-1658 directly. In the new code it would also be trivial to allow specifying columns in any order, we would just have to sort them. For now I commented out the code needed to do that, because the point of this PR was to fix #2247. Allowing this would require some more work changing the existing tests. Fixes: #2247 Closes #9518 * github.com:scylladb/scylla: cql-pytest: Enable test for skipping eq restricted columns in order by cql3: Allow to skip EQ restricted columns in ORDER BY cql3: Add has_eq_restriction_on_column function cql3: Reorganize orderings code	2021-12-09 21:11:56 +03:00
Jan Ciolek	13d367dada	cql-pytest: Enable test for skipping eq restricted columns in order by This test was marked as xfail, but now the functionality it tests has been implemented. In my opinion the expected error message makes no sense, the message was: "Order by currently only supports the ordering of columns following their declared order in the PRIMARY KEY" In cases where there was missing restriction on one column. This has been changed to: "Unsupported order by relation - column {} doesn't have an ordering or EQ relation." Because of that I had to modify the test to accept messages from both Scylla and Cassandra. The expected error message pattern is now "rder by", because that's the largest common part. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-12-09 14:59:47 +01:00
Mikołaj Sielużycki	504efe0607	table: Prevent resurrecting data from memtable on compaction Mutations are not guaranteed to come in the order of their timestamps. If there is an expired tombstone in the sstable and a repair inserts old data into memtable, the compaction would not consider memtable data and purge the tombstone leading to data resurrection. The solution is to disallow purging tombstones newer than min memtable timestamp.	2021-12-09 13:22:14 +01:00
Mikołaj Sielużycki	7ce0ca040d	table: Add min_memtable_timestamp function to table	2021-12-09 13:14:38 +01:00
Nadav Har'El	e032f92c5c	Merge 'api/storage service: validate table names' from Benny Halevy This series fixes a couple issues around generating and handling of no_such_keyspace and no_such_column_family exceptions. First, it removes std::throw_with_nested around their throw sites in the respective database::find_* functions. Fixes #9753 And then, it introduces a `validate_tables` helper in api/storage_service.cc that generates a `bad_param_exception` in order to set the correct http response status if a non-existing table name is provided in the `cf` http request parameter. Fixes #9754 The series also adds a test for the REST API under test/rest_api that verifies the storage_service enable/disable auto_compaction api and checks the error codes for non-existing keyspace or table. Test: unit(dev) Closes #9755 * github.com:scylladb/scylla: api: storage_service: add parse_tables database: un-nest no_such_keyspace and no_such_column_family exceptions database: throw internal error when failing uuid returned by find_uuid database: find_uuid: throw no_such_column_family exception if ks/cf were not found test: rest_api: add storage_service test test: add basic rest api test test: cql-pytest: wait for rest api when starting scylla	2021-12-08 16:54:48 +02:00
Benny Halevy	ff63ad9f6e	api: storage_service: add parse_tables Splits and validate the cf parameter, containing an optional comma-separated list of table names. If any table is not found and a no_such_column_family exception is thrown, wrap it in a `bad_param_exception` so it will translate to `reply::status_type::bad_request` rather than `reply::status_type::internal_server_error`. With that, hide the split_cf function from api/api.hh since it was used only from api/storage_service and new use sites should use validate_tables instead. Fixes #9754 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-08 16:42:40 +02:00
Benny Halevy	5eb32aa57c	test: rest_api: add storage_service test FIXME: negative tests for not-found tables should result in a requests.codes.bad_request but currently result in requests.codes.internal_server_error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-08 16:35:36 +02:00
Benny Halevy	26257cfa6d	test: add basic rest api test Test system/uptime_ms to start with. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-08 16:05:33 +02:00
Benny Halevy	01f2e8b391	test: cql-pytest: wait for rest api when starting scylla Some of the tests, like nodetool.py, use the scylla REST API. Add a check_rest_api function that queries http://<node_addr>:10000/ that is served once scylla starts listening on the API port and call it via run.wait_for_services. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-08 16:05:32 +02:00
Piotr Sarna	26288c1a86	test,alternator: make TTL tests less prone to false negatives On my local machine, a 3 second deadline proved to cause flakiness of test_ttl_expiration case, because its execution time is just around 3 seconds. This patch addresse the problem by bumping the local timeout to 10 (and 15 for test_ttl_expiration_long, since it's dangerously near the 10 second deadline on my machine as well). Moreover, some test cases short-circuited once they detected that all needed items expired, but other ones lacked it and always used their full time slot. Since 10 seconds is a little too long for a single test case, even one marked with --veryslow, this patch also adds a couple of other short-circuits. One exception is test_ttl_expiration_hash_wrong_type, which actually depends on the fact that we should wait for the whole loop to finish. Since this case was never flaky for me with the 3 second timeout, it's left as is. Theoretically, test_ttl_expiration also kind of depends on checking the condition more than once (because the TTL of one of the values is bumped on each iteration), but empirical evidence shows that multiple iterations always occur in this test case anyway - for me, it always spinned at least 3 times. Tests: unit(release) Message-Id: <a0a479929dac37daace744e0a970567a8aa3b518.1638431933.git.sarna@scylladb.com>	2021-12-08 16:02:45 +02:00
Nadav Har'El	92e7fbe657	test/alternator: check correct error for unknown operation Add a short test verifying that Alternator responds with the correct error code (UnknownOperationException) when receiving an unknown or unsupported operation. The test passes on both AWS and Alternator, confirming that the behavior is the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211206125710.1153008-1-nyh@scylladb.com>	2021-12-08 13:56:38 +02:00
Botond Dénes	0aa4e5e726	test/cql-pytest: mv virtual_tables.py -> test_virtual_tables.py For consistency with the other tests. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211208102108.126492-1-bdenes@scylladb.com>	2021-12-08 12:23:22 +02:00
Tomasz Grabiec	2a36377bb3	Merge "test: raft: randomized_nemesis_test: introduce server stop/crash nemesis" from Kamil We begin by preparing the `persistence` class so that the storage can be reused across different Raft server instances: the test keeps a shared pointer to the storage so that when a server stops, a new server with the same ID can be reconstructed with this storage. We then modify `environment` so that server instances can be removed and replaced in middle of operations. Finally we prepare a nemesis operation which gracefully stops or immediately crashes a randomly picked server and run this operation periodically in `basic_generator_test`. One important change that changes the API of `raft::server` is included: the metrics are not automatically registered in `start()`. This is because metric registration modifies global data structures, which cannot be done twice with the same set of metrics (and we would do it when we restart a server with the same ID). Instead, `register_metrics()` is exposed in the `raft::server` interface to be called when running servers in production. * kbr/crashes-v3: raft: server: print the ID of aborted server test: raft: randomized_nemesis_test: run stop_crash nemesis in `basic_generator_test` test: raft: randomized_nemesis_test: introduce `stop_crash` operation test: raft: randomized_nemesis_test: environment: implement server `stop` and `crash` raft: server: don't register metrics in `start()` test: raft: randomized_nemesis_test: raft_server: return `stopped_error` when called during abort test: raft: randomized_nemesis_test: handle `raft::stopped_error` test: raft: randomized_nemesis_test: handle missing servers in `environment` call functions test: raft: randomized_nemesis_test: environment: split `new_server` into `new_node` and `start_server` test: raft: randomized_nemesis_test: remove `environment::get_server` test: raft: randomized_nemesis_test: construct `persistence_proxy` outside `raft_server<M>::create` test: raft: randomized_nemesis_test: persistence_proxy: store a shared pointer to `persistence` test: raft: randomized_nemesis_test: persistence: split into two classes test: raft: logical_timer: introduce `sleep_until`	2021-12-07 22:16:23 +01:00
Botond Dénes	2e5440bdf2	Merge 'Convert compaction to flat_mutation_reader_v2' from Raphael Carvalho Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too. There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2. tests: unit(debug). Closes #9725 * github.com:scylladb/scylla: compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 sstable_set: update incremental_reader_selector to flat_mutation_reader_v2	2021-12-07 15:17:38 +02:00
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Kamil Braun	45fe0d015d	test: raft: randomized_nemesis_test: run stop_crash nemesis in `basic_generator_test` There is a separate thread that periodically stops/crashes and restarts a randomly chosen server, so the nemesis runs concurrently with reconfigurations and network partitions.	2021-12-07 11:23:34 +01:00
Kamil Braun	f9073b864f	test: raft: randomized_nemesis_test: introduce `stop_crash` operation An operation which chooses a server randomly, randomly chooses whether to crash or gracefully stop it, performs the chosen operation, and restarts the server after a selected delay.	2021-12-07 11:23:34 +01:00
Kamil Braun	168390d4bb	test: raft: randomized_nemesis_test: environment: implement server `stop` and `crash` `stop` gracefully stops a running server, `crash` immediately "removes" it (from the point of view of the rest of the environment). We cannot simply destroy a running server. Read the comments in `crash` to understand how it's implemented.	2021-12-07 11:23:34 +01:00
Kamil Braun	429f87160b	test: raft: randomized_nemesis_test: raft_server: return `stopped_error` when called during abort Don't return `gate_closed_exception` which is an internal implementation detail and which callers don't expect.	2021-12-07 11:22:52 +01:00
Kamil Braun	c79dacc028	test: raft: randomized_nemesis_test: handle `raft::stopped_error` Include it in possible call result types. It will start appearing when we enable server aborts in the middle of the test.	2021-12-07 11:22:52 +01:00
Kamil Braun	25a8772306	test: raft: randomized_nemesis_test: handle missing servers in `environment` call functions `environment` functions for performing operations on Raft servers: `is_leader`, `call`, `reconfigure`, `get_configuration`, currently assume that a server is running on each node at all times and that it never changes. Prepare these functions for missing/restarting servers.	2021-12-07 11:22:51 +01:00
Kamil Braun	d281b2c0ea	test: raft: randomized_nemesis_test: environment: split `new_server` into `new_node` and `start_server` Soon it will be possible to stop a server and then start a completely new `raft::server` instance but which uses the same ID and persistence, simulating a server restart. For this we introduce the concept of a "node" which keeps the persistence alive (through a shared pointer). To start a server - using `start_server` - we must first create a node on which it will be running through `new_node`. `new_server` is now a short function which does these two things.	2021-12-07 11:22:51 +01:00
Kamil Braun	5c803ae1d0	test: raft: randomized_nemesis_test: remove `environment::get_server` To perform calls to servers in a Raft cluster, the test code would first obtain a reference to a server through `get_server` and then call the server directly. This will not be safe when we implement server crashes and restarts as servers will disappear in middle of operations; we don't want the test code to keep references to no-longer-existing servers. In the new API the test will call the `environment` to perform operations, giving it the server ID. `environment` will handle disappearing servers underneath.	2021-12-07 11:22:51 +01:00
Kamil Braun	0d64fbc39d	test: raft: randomized_nemesis_test: construct `persistence_proxy` outside `raft_server<M>::create`	2021-12-07 11:22:51 +01:00
Kamil Braun	4e8a86c6a1	test: raft: randomized_nemesis_test: persistence_proxy: store a shared pointer to `persistence` We want the test to be able to reuse `persistence` even after `persistence_proxy` is destroyed for simulating server restarts. We'll do it by having the test keep a shared pointer to `persistence`. To do that, instead of storing `persistence` by value and constructing it inside `persistence_proxy`, store it by `lw_shared_ptr` which is taken through the constructor (so `persistence` itself is now constructed outside of `persistence_proxy`).	2021-12-07 11:22:51 +01:00

1 2 3 4 5 ...

2587 Commits