scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Nadav Har'El	d3abff9ea1	test/alternator: validate that TagResource needs a Tags parameter A short new test to verify that in the TagResource operation, the Tags parameter - specifying which tags to set - is required. The test passes on both AWS and Alternator - they both produce a ValidationException in this case (the specific human-readable error message is different, though, so we don't check it). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211206140541.1157574-1-nyh@scylladb.com>	2021-12-06 15:08:16 +01:00
Benny Halevy	9ed72cac95	test: sstable_compaction_test: add sstable_scrub_quarantine_mode_test For each quarantine mode: Validate sstables to quarantine one of them and then scrub with the given quarantine mode and verify the output whwther the quarantined sstable was scrubbed or not. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:58 +02:00
Benny Halevy	60ff28932c	compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub So we can pass additional options on top of the scrub mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:21:37 +02:00
Benny Halevy	bbe275f37d	compaction: scrub_sstables_validate_mode: quarantine invalid sstables When invalid sstables are detected, move them to the quarantine subdirectory so they won't be selected for regular compaction. Refs #7658 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:14:16 +02:00
Benny Halevy	3eabfad9fc	test: database_test: add snapshot_with_quarantine_works Test that snapshot includes quarantined sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Benny Halevy	11b54d44d9	test: database_test: add populate_from_quarantine_works Test that we load quarantined sstables by creating a dataset, moving a sstable to the quarantine dir, and then reload the table and verify the dataset. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Benny Halevy	bdc53880d4	sstables: define symbolic names for table subdirectories Define the "staging", "upload", and "snapshots" subdirectory names as named const expressions in the sstables namespace rather than relying on their string representation, that could lead to typo mistakes. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Piotr Sarna	3867ca2fd6	Merge 'cql3: Don't allow unset values inside UDT' from Jan Ciołek Scylla doesn't support unset values inside UDT. The old code used to convert `unset` to `null`, which seems incorrect. There is an extra space in the error message to retain compatability with Cassandra. Fixes: #9671 Closes #9724 * github.com:scylladb/scylla: cql-pytest: Enable test for UDT with unset values cql3: Don't allow unset values inside UDT	2021-12-03 15:36:55 +01:00
Jan Ciolek	3ae8752812	cql-pytest: Enable test for UDT with unset values The test testUDTWithUnsetValues was marked as xfail, but now the issue has been fixed and we can enable it. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-12-03 14:46:21 +01:00
Avi Kivity	3b82ef854d	Merge "Some compaction manager cleanups" from Raphael " couple of preparatory changes for coroutinization of manager " * 'some_compaction_manager_cleanups_v5' of github.com:raphaelsc/scylla: compaction_manager: move check_for_cleanup into perform_cleanup() compaction_manager: replace get_total_size by one liner compaction_manager: make consistent usage of type and name table compaction_manager: simplify rewrite_sstables() compaction_manager: restore indentation	2021-12-02 19:53:13 +02:00
Botond Dénes	259649c779	sstables/index_reader: improved diagnostics on missing index entry Add the summary index and the bound's address to the error message, so it can be correlated with other trace level logging when investigating a problem. Refs: #9446 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211202124955.542293-2-bdenes@scylladb.com>	2021-12-02 19:43:30 +02:00
Botond Dénes	f0b9519999	test/lib/exception_utils: add message_matches() predicate Which checks the message against the given regex. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211202124955.542293-1-bdenes@scylladb.com>	2021-12-02 19:43:30 +02:00
Raphael S. Carvalho	760cfd93fb	compaction_manager: make consistent usage of type and name table new code in manager adopted name and type table, whereas historical code still uses name and type column family. let's make it consistent for newcomers to not get confused. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:27 -03:00
Avi Kivity	9edd86362a	test: sstable_test: don't read compressed file size from closed file We read the compressed file size from a file that was already closed, resulting in EBADF on my machine. Not sure why it works for everyone else. Fix by reading the size using the path. Closes #9675	2021-12-01 16:28:46 +02:00
Nadav Har'El	94eb5c55c8	Merge 'Loading cache improve eviction use policy' from Vladislav Zolotarov This series introduces a new version of a loading_cache class. The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache. The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. More details may be found in #8674. In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590. There is a patch that fixes it as well. Closes #9708 * github.com:scylladb/scylla: loading_cache: account unprivileged section evictions loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed loading_cache::timestamped::lru_entry: refactoring loading_cache.hh: rearrange the code (no functional change) loading_cache: use std::pmr::polymorphic_allocator	2021-12-01 13:13:53 +02:00
Calle Wilund	3e21fea2b6	test_streamts: test_streams_starting_sequence_number fix 'LastEvaluatedShardId' usage It is not part of raw response, but of the 'StreamDescription' object. Test fails internmittently depending on PK randomization. Closes #9710	2021-12-01 11:05:40 +02:00
Avi Kivity	03755b362a	Merge 'compaction_manager api: stop ongoing compactions' from Benny Halevy This series extends `compaction_manager::stop_ongoing_compaction` so it can be used from the api layer for: - table::disable_auto_compaction - compaction_manager::stop_compaction Fixes #9313 Fixes #9695 Test: unit(dev) Closes #9699 * github.com:scylladb/scylla: compaction_manager: stop_compaction: wait for ongoing compactions to stop compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level compaction_manager: unify stop_ongoing_compactions implementations compaction_manager: stop_ongoing_compactions: add compaction_type option compaction_manager: get_compactions: get a table* parameter table: disable_auto_compaction: stop ongoing compactions compaction_manager: make stop_ongoing_compactions public table: futurize disable_auto_compactions	2021-11-30 19:08:14 +02:00
Avi Kivity	595cc328b1	Merge 'cql3: Remove term, replace with expression' from Jan Ciołek This PR finally removes the `term` class and replaces it with `expression`. * There was some trouble with `lwt_cache_id` in `expr::function_call`. The current code works the following way: * for each `function_call` inside a `term` that describes a pk restriction, `prepare_context::add_pk_function_call` is called. * `add_pk_function_call` takes a `::shared_ptr<cql3::functions::function_call>`, sets its `cache_id` and pushes this shared pointer onto a vector of all collected function calls * Later when some condiition is met we want to clear cache ids of all those collected function calls. To do this we iterate through shared pointers collected in `prepare_context` and clear cache id for each of them. This doesn't work with `expr::function_call` because it isn't kept inside a shared pointer. To solve this I put the `lwt_cache_id` inside a shared pointer and then `prepare_context` collects these shared pointers to cache ids. I also experimented with doing this without any shared pointers, maybe we could just walk through the expression and clear the cache ids ourselves. But the problem is that expressions are copied all the time, we could clear the cache in one place, but forget about a copy. Doing it using shared pointers more closely matches the original behaviour. The experiment is on the [term2-pr3-backup-altcache](https://github.com/cvybhu/scylla/tree/term2-pr3-backup-altcache) branch * `shared_ptr<term>` being `nullptr` could mean: * It represents a cql value `null` * That there is no value, like `std::nullopt` (for example in `attributes.hh`) * That it's a mistake, it shouldn't be possible A good way to distinguish between optional and mistake is to look for `my_term->bind_and_get()`, we then know that it's not an optional value. * On the other hand `raw_value` cased to bool means: * `false` - null or unset * `true` - some value, maybe empty I ran a simple benchmark on my laptop to see how performance is affected: ``` build/release/test/perf/perf_simple_query --smp 1 -m 1G --operations-per-shard 1000000 --task-quota-ms 10 ``` * On master (`a21b1fbb2f`) I get: ``` 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op) median 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op) median absolute deviation: 0.00 maximum: 176506.60 minimum: 176506.60 ``` * On this branch I get: ``` 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op) median 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op) median absolute deviation: 0.00 maximum: 172225.30 minimum: 172225.30 ``` Closes #9481 * github.com:scylladb/scylla: cql3: Remove remaining mentions of term cql3: Remove term cql3: Rename prepare_term to prepare_expression cql3: Make prepare_term return an expression instead of term cql3: expr: Add size check to evaluate_set cql3: expr: Add expr::contains_bind_marker cql3: expr: Rename find_atom to find_binop cql3: expr: Add find_in_expression cql3: Remove term in operations cql3: Remove term in relations cql3: Remove term in multi_column_restrictions cql3: Remove term in term_slice, rename to bounds_slice cql3: expr: Remove term in expression cql3: expr: Add evaluate_IN_list(expression, options) cql3: Remove term in column_condition cql3: Remove term in select_statement cql3: Remove term in update_statement cql3: Use internal cql format in insert_prepared_json_statement cache types: Add map_type_impl::serialize(range of <bytes, bytes>) cql3: Remove term in cql3/attributes cql3: expr: Add constant::view() method cql3: expr: Implement fill_prepare_context(expression) cql3: expr: add expr::visit that takes a mutable expression cql3: expr: Add receiver to expr::bind_variable	2021-11-30 16:39:39 +02:00
Avi Kivity	078f69c133	Merge "raft: (service) implement group 0 as a service" from Kostja " To ensure consistency of schema and topology changes, Scylla needs a linearizable storage for this data available at every member of the database cluster. The series introduces such storage as a service, available to all Scylla subsystems. Using this service, any other internal service such as gossip or migrations (schema) could persist changes to cluster metadata and expect this to be done in a consistent, linearizable way. The series uses the built-in Raft library to implement a dedicated Raft group, running on shard 0, which includes all members of the cluster (group 0), adds hooks to topology change events, such as adding or removing nodes of the cluster, to update group 0 membership, ensures the group is started when the server boots. The state machine for the group, i.e. the actual storage for cluster-wide information still remains a stub. Extending it to actually persist changes of schema or token ring is subject to a subsequent series. Another Raft related service was implemented earlier: Raft Group Registry. The purpose of the registry is to allow Scylla have an arbitrary number of groups, each with its own subset of cluster members and a relevant state machine, sharing a common transport. Group 0 is one (the first) group among many. " * 'raft-group-0-v12' of github.com:scylladb/scylla-dev: raft: (server) improve tracing raft: (metrics) fix spelling of waiters_awaken raft: make forwarding optional raft: (service) manage Raft configuration during topology changes raft: (service) break a dependency loop raft: (discovery) introduce leader discovery state machine system_keyspace: mark scylla_local table as always-sync commitlog system_keyspace: persistence for Raft Group 0 id and Raft Server Id raft: add a test case for adding entries on follower raft: (server) allow adding entries/modify config on a follower raft: (test) replace virtual with override in derived class raft: (server) fix a typo in exception message raft: (server) implement id() helper raft: (server) remove apply_dummy_entry() raft: (test) fix missing initialization in generator.hh	2021-11-30 16:24:51 +02:00
Eliran Sinvani	ddd7248b3b	testlib: close index_reader to avoid racing condition In order to avoid race condition introduced in `9dce1e4` the index_reader should be closed prior to it's destruction. This only exposes 4.4 and earlier releases to this specific race. However, it is always a good idea to first close the index reader and only then destroy it since it is most likely to be assumed by all developers that will change the reader index in the future. Ref #9704 (because on 4.4 and earlier releases are vulnerable). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #9705	2021-11-30 13:05:24 +01:00
Benny Halevy	b60d697084	table: futurize disable_auto_compactions So it can stop ongoing compaction and wait for them to complete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 08:33:04 +02:00
Vlad Zolotarov	1a9c6d9fd3	loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy This patch implements a simple variation of LFRU eviction policy: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. Note: With a 2 sections cache it's not enough for a new entry to have the latest timestamp in order not be evicted right after insertion: e.g. if all all other entries are from the privileged section. And obviously we want to allow new cache entries to be added to a cache. Therefore we can no longer first add a new entry and then shrink the cache. Switching the order of these two operations resolves the culprit. Fixes #8674 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 21:45:21 -05:00
Nadav Har'El	d9c5c4eab6	test/alternator: tests for Select parameter in GSI and LSI We already have tests for the behavior of the "Select" parameter when querying a base table, but this patch adds additional tests for its behavior when querying a GSI or a LSI. There are some differences: Select=ALL_PROJECTED_ATTRIBUTES is not allowed for base tables, but is allowed - and in fact is the default - for GSI and LSI. Also, GSI may not allow ALL_ATTRIBUTES (which is the default for base tables) if only a subset of the attributes were projected. The new tests xfail because the Select and Projection features have not yet been implemented in Alternator. They pass in DynamoDB. After this patch we have (hopefully) complete test coverage of the Select feature, which will be helpful when we start implementing it. Refs #5058 (Select) Refs #5036 (Projection) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211125100443.746917-1-nyh@scylladb.com>	2021-11-29 20:28:43 +01:00
Nadav Har'El	1c279118f4	test/alternator: more test cases for Select parameter Add to the existing tests for the Select parameter of the Query and Scan operations another check: That when Select is ALL_ATTRIBUTES or COUNT, specifying AttributesToGet or ProjectionExpression is forbidden - because the combination doesn't make sense. The expanded test continues to xfail on Alternator (because the Select parameter isn't yet implemented), and passes on DynamoDB. Strengthening the tests for this feature will be helpful when we decide to implement it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211125074128.741677-1-nyh@scylladb.com>	2021-11-29 20:28:25 +01:00
Tomasz Grabiec	3226c5bf9d	Merge 'sstables: mx: enable position fast-forwarding in reverse mode' from Kamil Braun Most of the machinery was already implemented since it was used when jumping between clustering ranges of a query slice. We need only perform one additional thing when performing an index skip during fast-forwarding: reset the stored range tombstone in the consumer (which may only be stored in fast-forwarding mode, so it didn't matter that it wasn't reset earlier). Comments were added to explain the details. As a preparation for the change, we extend the sstable reversing reader random schema test with a fast-forwarding test and include some minor fixes. Fixes #9427. Closes #9484 * github.com:scylladb/scylla: query-request: add comment about clustering ranges with non-full prefix key bounds sstables: mx: enable position fast-forwarding in reverse mode test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding test: sstable_conforms_to_mutation_source_test: fix `vector::erase` call test: mutation_source_test: extract `forwardable_reader_to_mutation` function test: random_schema: fix clustering column printing in `random_schema::cql`	2021-11-29 16:01:53 +01:00
Mikołaj Sielużycki	a88f7df195	memtable-sstable: Add compacting reader when flushing memtable. When memtable contains both mutations and tombstones that delete them, the output flushed to sstables contains both mutations. Inserting a compacting reader results in writing smaller sstables and saves compaction work later. Performance tests of this change have shown a regression in a common case where there are no deletes. A heuristic is employed to skip compaction unless there are tombstones in the memtable to minimise the impact of that issue.	2021-11-29 13:19:42 +01:00
Kamil Braun	ea6310961c	test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding The test would check whether the forward and reverse readers returned consistent results when created in non-forwarding mode with slicing. Do the same but using fast-forwarding instead of slicing. To do this we require a vector of `position_range`s. We also need a vector of `clustering_range`s for the existing test. We modify the existing `random_ranges` function to return `position_range`s instead of `clustering_range`s since `position_range`s are easier to reason about, especially when we consider non-full clustering key prefixes. A function is introduced to convert a `position_range` to a `clustering_range` for the existing test.	2021-11-29 11:10:46 +01:00
Nadav Har'El	1e2ecd282a	Merge 'Harden compaction manager remove' from Benny Halevy This series hardens compaction_manager::remove by: - add debug logging around task execution and stopping. - access compaction_state as lw_shared_ptr rather than via a raw pointer. - with that, detach it from `_compaction_state` in `compaction_manager::remove` right away, to prevent further use of it while compactions are stopped. - added write_lock in `remove` to make sure the lock is not held by any stray task. Test: unit(dev), sstable_compaction_test(debug) Dtest: alternator_tests.py:AlternatorTest.test_slow_query_logging (debug) Closes #9636 * github.com:scylladb/scylla: compaction_manager: add compaction_state when table is constructed compaction_manager: remove: fixup indentation compaction_manager: remove: detach compaction_state before stopping ongoing compactions compaction_manager: remove: serialize stop_ongoing_compactions and gate.close compaction_manager: task: keep a reference on compaction_state test: sstable_compaction_test: incremental_compaction_data_resurrection_test: stop table before it's destroyed. test: sstable_utils: compact_sstables: deregister compaction also on error path test: sstable_compaction_test: partial_sstable_run_filtered_out_test: deregiser_compaction also on error path test: compaction_manager_test: add debug logging to register/deregister compaction test: compaction_manager_test: deregister_compaction: erase by iterator test: compaction_manager_test: move methods out of line compaction_manager: compaction_state: use counter for compaction_disabled compaction_manager: task: delete move and copy constructors compaction_manager: add per-task debug log messages compaction_manager: stop_ongoing_compactions: log number of tasks to stop	2021-11-28 22:12:52 +02:00
Piotr Sarna	ecd122a1b0	Merge 'alternator: rudimentary implementation of TTL expiration service' from Nadav Har'El In this patch series we add an implementation of an expiration service to Alternator, which periodically scans the data in the table, looking for expired items and deleting them. We also continue to improve the TTL test suite to cover additional corner cases discovered during the development of the code. This implementation is good enough to make all existing tests but one, plus a few new ones, pass, but is still a very partial and inefficient implementation littered with FIXMEs throughout the code. Among other things, this initial implementation doesn't do anything reasonable about pacing of the scan or about multiple tables, it scans entire items instead of only the needed parts, and because each shard "owns" a different subset of the token ranges, if a node goes down, partitions which it "owns" will not get expired. The current tests cannot expose these problems, so we will need to develop additional tests for them. Because this implementation is very partial, the Alternator TTL continues to remain "experimental", cannot be used without explicitly enabling this experimental feature, and must not be used for any important deployment. Refs #5060 but doesn't close the issue (let's not close it until we have a reasonably complete implementation - not this partial one). Closes #9624 * github.com:scylladb/scylla: alternator: fix TTL expiration scanner's handling of floating point test/alternator: add TTL test for more data test/alternator: remove "xfail" tag from passing tests in test_ttl.py test/alternator: make test_ttl.py tests fast on Alternator alternator: initial implmentation of TTL expiration service alternator: add another unwrap_number() variant alternator: add find_tag() function test/alternator: test another corner case of TTL setting test/alternator: test TTL expiration for table with sort key test/alternator: improve basic test for TTL expiration test/alternator: extract is_aws() function	2021-11-28 22:12:52 +02:00
Avi Kivity	25bd945a2c	Merge "reverse range scans: use the correct schema for result building" from Botond " Reverse queries has to use the reverse schema (query schema) for the read itself but the table schema for the result building, according to the established interface with the coordinator (half-reverse format). Range scans were using the query schema for both, which produced un-parseable reconcilable results for mutation range scans. This series fixes this and adds unit tests to cover this previously uncovered area. " Fixes #9673. * 'reverse-range-scan-test/v1' of https://github.com/denesb/scylla: test/boost/multishard_mutation_query_test: add reverse read test test/boost/multishard_mutation_query_test: add test for combinations of limits, paging and stateful test/boost/multishard_mutation_query_test: generalize read_partitions_with_paged_scan() test/boost/multishard_mutation_query_test: add read_all_partitions_one_by_one() overload with slice multishard_mutation_query: fix reverse scans partition_slice: init all fields in copy ctor partition_slice: operator<<: print the entire partition row limit partition_slice_builder: add with_partition_row_limit()	2021-11-28 14:18:28 +02:00
Avi Kivity	ec775ba292	Merge "Remove more gms::get(_local)?_gossiper() calls" from Pavel E " This set covers simple but diverse cases: - cache hitrace calculator - repair - system keyspace (virtual table) - dht code - transport event notifier All the places just require straightforward arguments passing. And a reparation in transport -- event notifier needs a backref to the owning server. Remaining after this set is the snitch<->gossiper interaction and the cache hitrate app state update from table code. tests: unit(dev) " * 'br-unglobal-gossiper-cont' of https://github.com/xemul/scylla: transport: Use server gossiper in event notifier transport: Keep backreference from event_notifier transport: Keep gossiper on server dht: Pass gossiper to range_streamer::add_ranges dht: Pass gossiper argument to bootstrap system_keyspace: Keep gossiper on cluster_status_table code: Carry gossiper down to virtual tables creation repair: Use local gossiper reference cache_hitrate_calculator: Keep reference on gossiper	2021-11-28 14:18:28 +02:00
Nadav Har'El	f1997be989	alternator: fix TTL expiration scanner's handling of floating point The expiration-time attribute used by Alternator's TTL feature has a numeric type, meaning that it may be a floating point number - not just an integer, and implemented as big_decimal which has a separate integer mantissa and exponent. Our code which checked expiration incorrectly looked only at the mantissa - resulting in incorrect handling of expiration times which have a fractional part - 123.4 was treated as 1234 instead of 123. This patch fixes the big_decimal handling in the expiration checking, and also adds to the test test_ttl.py::test_ttl_expiration check also for non-integer floating point as well as one with an exponent. The new tests pass on DynamoDB, and failed on Alternator before this patch - and pass with it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Nadav Har'El	84e0004ff6	test/alternator: add TTL test for more data The existing TTL tests use only tiny tables, so don't exercise the expiration-time scanner's use of paging. So in this patch we add another test with a much larger table (with 40,000 items). To verify that this test indeed checks paging, I stopped the scanner's iteration after one page, and saw that this test starts failing (but the smaller tests all pass). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Nadav Har'El	baea76c33b	test/alternator: remove "xfail" tag from passing tests in test_ttl.py Most tests in test_ttl.py now pass, so remove their "xfail" tag. The only remaining failing test is test_ttl_expiration_streams - which cannot yet pass because the expiration event is not yet marked. Note that the fact that almost all tests for Alternator's TTL feature now pass does not mean the feature is complete. The current implementation is very partial and inefficient, and only works reasonably in tests on a single node. The current tests cannot expose these problems, so we will need to develop additional tests for them. The tests will of course remain useful to see that as the implementation continues to improve, none of the tests that already work will break. The Alternator TTL continues to remain "experimental", cannot be used without explicitly enabling this experimental feature, and must not be used for any important deployment. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Nadav Har'El	0b97da5f46	test/alternator: make test_ttl.py tests fast on Alternator The tests for the TTL feature in test/alternator/test_ttl.py takes huge amount of time on DynamoDB - 10 to 30 minutes (!) - because it delays expiration of items a long time after their intended expiration times. We intend Scylla's implementation to have a configurable delay for the expiration scanner, which we will be able to configure to very short delays for tests. So These tests can be made much faster on Scylla. So in this patch we change all of the tests to finish much more quickly on Scylla. Many of the tests still fail, because the TTL feature is not implemented yet. Although after this change all the tests in test_ttl.py complete in a reasonable amount of time (around 3 seconds each), we still mark them as "veryslow" and the "--runveryslow" flag is needed to run them. We should consider changing this in the future, so that these tests will run as part of our default test suite. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Nadav Har'El	88f175d0a8	test/alternator: test another corner case of TTL setting Although it isn't terribly useful, an Alternator user can enable TTL with an expiration-time attribute set to a key attribute. Because expiration times should be numeric - not other types like strings - DynamoDB could warn the user when a chosen key attribute hs a non- numeric type (since key attributes do have fixed types!). But DynamoDB doesn't warn about this - it simply expires nothing. This test verifies this that it indeed does this. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:36 +02:00
Nadav Har'El	a982d161ad	test/alternator: test TTL expiration for table with sort key The basic test for TTL expiration, test_ttl.py::test_ttl_expiration, uses a table with only a partition key. Most of the item expiration logic is exactly the same for tables that also have a sort key, but the step of deleting the item is different, so let's add a test that verifies that also in this case, the expired item is properly deleted. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:36 +02:00
Nadav Har'El	69b4f53aa9	test/alternator: improve basic test for TTL expiration This patch improves test_ttl.py::test_ttl_expiration in two ways: First, it checks yet another case - that items that have the wrong type for the expiration-time column (e.g., a string) never get expired - even if that string happens to contain a number that looks like an expiration time. Second, instead of the huge 15-minute duration for this test, the test now has a configurable duration; We still need to use a very long duration on AWS, but in Scylla we expect to be able to configure the TTL scan frequency, and can finish this test in just a few seconds! We already have experimental code which makes this test pass in just 3 seconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:36 +02:00
Nadav Har'El	fd9a6cf851	test/alternator: extract is_aws() function Extract a boolean function is_aws() out of the "scylla_only" fixture, so it can be used in tests for other purposes. For example, in the next patch the TTL tests will use them to pick different timeouts on AWS (where TTL expiration have huge many-minute delays) and on Scylla (which can be configured to have very short delays). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:36 +02:00
Konstantin Osipov	6d28927550	raft: make forwarding optional In absence of abort_source or timeouts in Raft API, automatic bouncing can create too much noise during testing, especially during network failures. Add an option to disable follower bouncing feature, since randomized_nemesis_test has its own bouncing which handles timeouts correctly. Optionally disable forwarding in basic_generator_test.	2021-11-25 12:35:43 +03:00
Konstantin Osipov	c22f945f11	raft: (service) manage Raft configuration during topology changes Operations of adding or removing a node to Raft configuration are made idempotent: they do nothing if already done, and they are safe to resume after a failure. However, since topology changes are not transactional, if a bootstrap or removal procedure fails midway, Raft group 0 configuration may go out of sync with topology state as seen by gossip. In future we must change gossip to avoid making any persistent changes to the cluster: all changes to persistent topology state will be done exclusively through Raft Group 0. Specifically, instead of persisting the tokens by advertising them through gossip, the bootstrap will commit a change to a system table using Raft group 0. nodetool will switch from looking at gossip-managed tables to consulting with Raft Group 0 configuration or Raft-managed tables. Once this transformation is done, naturally, adding a node to Raft configuration (perhaps as a non-voting member at first) will become the first persistent change to ring state applied when a node joins; removing a node from the Raft Group 0 configuration will become the last action when removing a node. Until this is done, do our best to avoid a cluster state when a removed node or a node which addition failed is stuck in Raft configuration, but the node is no longer present in gossip-managed system tables. In other words, keep the gossip the primary source of truth. For this purpose, carefully chose the timing when we join and leave Raft group 0: Join the Raft group 0 only after we've advertised our tokens, so the cluster is aware of this node, it's visible in nodetool status, but before node state jumps to "normal", i.e. before it accepts queries. Since the operation is idempotent, invoke it on each restart. Remove the node from Group 0 before its tokens are removed from gossip-managed system tables. This guarantees that if removal from Raft group 0 fails for whatever reason, the node stays in the ring, so nodetool removenode and friends are re-tried. Add tracing.	2021-11-25 12:35:42 +03:00
Konstantin Osipov	8ee88a9d8a	raft: (discovery) introduce leader discovery state machine Introduce a special state machine used to to find a leader of an existing Raft cluster or create a new cluster. This state machine should be used when a new Scylla node has no persisted Raft Group 0 configuration. The algorithm is initialized with a list of seed IP addresses, IP address of this server, and, this server's Raft server id. The IP addresses are used to construct an initial list of peers. Then, the algorithm tries to contact each peer (excluding self) from its peer list and share the peer list with this peer, as well as get the peer's peer list. If this peer is already part of some Raft cluster, this information is also shared. On a response from a peer, the current peer's peer list is updated. The algorithm stops when all peers have exchanged peer information or one of the peers responds with id of a Raft group and Raft server address of the group leader. (If any of the peers fails to respond, the algorithm re-tries ad infinitum with a timeout). More formally, the algorithm stops when one of the following is true: - it finds an instance with initialized Raft Group 0, with a leader - all the peers have been contacted, and this server's Raft server id is the smallest among all contacted peers.	2021-11-25 11:50:38 +03:00
Konstantin Osipov	65e549946f	raft: add a test case for adding entries on follower	2021-11-25 11:50:38 +03:00
Konstantin Osipov	e3751068fe	raft: (server) allow adding entries/modify config on a follower Implement an RPC to forward add_entry calls from the follower to leader. Bounce & retry in case of not_a_leader. Do not retry in case of uncertainty - this can lead to adding duplicate entries. The feature is added to core Raft since it's needed by all current clients - both topology and schema changes. When forwarding an entry to a remote leader we may get back a term/index pair that conflicts (has the same index, but is with a higher term) with a local entry we're still waiting on. This can happen, e.g. because there was a leader change and the log was truncated, but we still haven't got the append_entries RPC from the new leader, still haven't truncated the log locally, still haven't aborted all the local waits for truncated entries. Only remove the offending entry from the wait list and abort it. There may be entries labeled with an older term to the right (with higher commit index) of the conflicting entry. However, finding them, would require a linear scan. If we allow it, we may end up doing this linear scan for every conflicting entry during the transition period, which brings us to N^2 complexity of this step. At the same time, as soon as append_entries that commits a higher-term entry with the same index reaches the follower, the waits for the respective truncated entry will be aborted anyway (see notify_waiters() which sets dropped_entry exception), so the scan is unnecessary. Similarly to being able to add entries, allow to modify Raft group configuration on a follower. The implementation works the same way as adding entries - forwards the command to the leader. Now that add_entry() or modify_config never throws not_a_leader, it's more likely to throw timed_out_error, e.g. in case the network is partitioned. Previously it was only possible due to a semaphore wait timeout, and this scenario was not tested. Handle timed_out_error on RPC level to let the existing tests (specifically the randomized nemesis test) pass.	2021-11-25 11:50:38 +03:00
Konstantin Osipov	ae5dc8e980	raft: (test) replace virtual with override in derived class Clang 12 complains if use of override is inconsistent, so stick to it everywhere.	2021-11-25 11:50:38 +03:00
Konstantin Osipov	2763fdd3b7	raft: (test) fix missing initialization in generator.hh A missing initialization in poll_timeout of class interpreter could manifest itself as a sporadically failing randomized_nemesis_test. The test would prematurely run out of allowed limit of virtual clock ticks.	2021-11-25 11:50:38 +03:00
Pavel Emelyanov	ef1960d034	code: Carry gossiper down to virtual tables creation One of the tables needs gossiper and uses global one. This patch prepares the fix by patching the main -> register_virtual_tables stack with the gossiper reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:52:55 +03:00
Nadav Har'El	2bdc31f8a3	test/alternator: two more tests for unimplemented Select=COUNT This patch adds two more tests for the unimplemented Select=COUNT feature (which asks to only count queried items and not return the actual items). Because this feature has not yet been implemented in Alternator (Refs #5058), the new tests xfail. They pass on DynamoDB. The two tests added here are for the interaction of the Select=COUNT feature with filters - in one of the two supported syntaxes (QueryFilter and FilterExpression). We want to verify that even though the user doesn't need the content of the items (since only the counts were requested), they are still retrieved from disk as needed for doing proper filtering - but not returned. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211124225429.739744-1-nyh@scylladb.com>	2021-11-25 08:47:14 +01:00
Mikołaj Sielużycki	44f4ea38c5	test: Future-proof reader conversions tests. Query time must be fetched after populate. If compaction is executed during populate it may be executed with timestamp later than query_time. This would cause the test expected compaction and compaction during populate to be executed at different time points producing different results. The result would be sporadic test failures depending on relative timing of those operations. If no other mutations happen after populate, and query_time is later than the compaction time during population, we're guaranteed to have the same results. Message-Id: <20211123134808.105068-1-mikolaj.sieluzycki@scylladb.com>	2021-11-24 21:01:57 +01:00

1 2 3 4 5 ...

2530 Commits