scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	4fa8159a13	test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable column_family_test::add_sstable will soon be changed to run in a thread, and it's not needed in this procedure, so let's remove its usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Pavel Emelyanov	bbad3eac63	pylib: Cast port number config to int explicitly Otherwise it crashes some python versions. The cast was there before `a2dd64f68f` explicitly dropped one while moving the code between files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11511	2022-09-09 18:08:08 +02:00
Kamil Braun	dba595d347	Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch Broadcast tables are tables for which all statements are strongly consistent (linearizable), replicated to every node in the cluster and available as long as a majority of the cluster is available. If a user wants to store a “small” volume of metadata that is not modified “too often” but provides high resiliency against failures and strong consistency of operations, they can use broadcast tables. The main goal of the broadcast tables project is to solve problems which need to be solved when we eventually implement general-purpose strongly consistent tables: designing the data structure for the Raft command, ensuring that the commands are idempotent, handling snapshots correctly, and so on. In this MVP (Minimum Viable Product), statements are limited to simple SELECT and UPDATE operations on the built-in table. In the future, other statements and data types will be available but with this PR we can already work on features like idempotent commands or snapshotting. Snapshotting is not handled yet which means that restarting a node or performing too many operations (which would cause a snapshot to be created) will give incorrect results. In a follow-up, we plan to add end-to-end Jepsen tests (https://jepsen.io/). With this PR we can already simulate operations on lists and test linearizability in linear complexity. This can also test Scylla's implementation of persistent storage, failure detector, RPC, etc. Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing Closes #11164 * github.com:scylladb/scylladb: raft: broadcast_tables: add broadcast_kv_store test raft: broadcast_tables: add returning query result raft: broadcast_tables: add execution of intermediate language raft: broadcast_tables: add compilation of cql to intermediate language raft: broadcast_tables: add definition of intermediate language db: system_keyspace: add broadcast_kv_store table db: config: add BROADCAST_TABLES feature flag	2022-09-09 18:05:37 +02:00
Kamil Braun	0efdc45d59	Merge 'test.py: remove top level conftest and improve logging' from Alecco - To isolate the different pytest suites, remove the top level conftest and move needed contents to existing `test/pylib/cql_repl/conftest.py` and `test/topology/conftest.py`. - Add logging to CQL and Python suites. - Log driver version for CQL and topology tests. Closes #11482 * github.com:scylladb/scylladb: test.py: enable log capture for Python suite test.py: log driver name/version for cql/topology test.py: remove top level conftest.py	2022-09-08 16:25:24 +02:00
Mikołaj Grzebieluch	eb610c45fe	raft: broadcast_tables: add broadcast_kv_store test Test queries scylla with following statements: * SELECT value FROM system.broadcast_kv_store WHERE key = CONST; * UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST; * UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST IF value = CONST; where CONST is string randomly chosen from small set of random strings and half of conditional updates has condition with comparison to last written value.	2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch	82df8a9905	raft: broadcast_tables: add compilation of cql to intermediate language We decided to extend `cql_statement` hierarchy with `strongly_consistent_modification_statement` and `strongly_consistent_select_statement`. Statements operating on system.broadcast_kv_store will be compiled to these new subclasses if BROADCAST_TABLES flag is enabled. If the query is executed on a shard other than 0 it's bounced to that shard.	2022-09-08 15:25:36 +02:00
Botond Dénes	438aaf0b85	Merge 'Deglobalize repair history maps' from Benny Halevy Change `a8ad385ecd` introduced ``` thread_local std::unordered_map<utils::UUID, seastar::lw_shared_ptr<repair_history_map>> repair_history_maps; ``` We're trying to avoid global scoped variables as much as we can so this should probably be embedded in some sharded service. This series moves the thread-local `repair_history_maps` instances to `compaction_manager` and passes a reference to the shard compaction_manager to functions that need it for compact_for_query and compact_for_compaction. Since some paths don't need it and don't have access to the compactio_manager, the series introduced `utils::optional_reference<T>` that allows to pass nullopt. In this case, `get_gc_before_for_key` behaves in `tombstone_gc_mode::repair` as if the table wasn't repaired and tombstones are not garbage-collected. Fixes #11208 Closes #11366 * github.com:scylladb/scylladb: tombstone_gc: deglobalize repair_history_maps mutation_compactor: pass tombstone_gc_state to compact_mutation_state mutation_partition: compact_for_compaction_v2: get tombstone_gc_state mutation_partition: compact_for_compaction: get tombstone_gc_state mutation_readers: pass tombstone_gc_state to compating_reader sstables: get_gc_before_*: get tombstone_gc_state from caller compaction: table_state: add virtual get_tombstone_gc_state method db: view: get_tombstone_gc_state from compaction_manager db: view: pass base table to view_update_builder repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager replica: database: get_tombstone_gc_state from compaction_manager compaction_manager: add tombstone_gc_state replica: table: add get_compaction_manager function tombstone_gc: introduce tombstone_gc_state repair_service: simplify update_repair_time error handling tombstone_gc: update_repair_time: get table_id rather than schema_ptr tombstone_gc: delete unused forward declaration database: do not drop_repair_history_map_for_table in detach_column_family	2022-09-08 14:08:38 +03:00
Alejo Sanchez	c6a048827a	test.py: log driver name/version for cql/topology Log the python driver name and version to help debugging on third party machines. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-08 11:37:32 +02:00
Alejo Sanchez	a2dd64f68f	test.py: remove top level conftest.py Remove top level conftest so different suites have their own (as it was before). Move minimal functionality into existing test/pylib/cql_repl/conftest.py so cql tests can run on their own. Move param setting into test/topology/conftest.py. Use uuid module for unique keyspace name for cql tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-08 11:37:32 +02:00
Alejo Sanchez	d892d194fb	test.py: remove spurious after test check Before/after test checks are done per test case, there's no longer need to check after pytest finishes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11489	2022-09-08 11:33:37 +02:00
Kamil Braun	ff4430d8ea	test: topology: make imports friendlier for tools (such as `mypy`) When importing from `pylib`, don't modify `sys.path` but use the fact that both `test/` and `test/pylib/` directories contain an `__init__.py` file, so `test.pylib` is a valid module if we start with `test/` as the Python package root. Both `pytest` and `mypy` (and I guess other tools) understand this setup. Also add an `__init__.py` to `test/topology/` so other modules under the `test/` directory will be able to import stuff from `test/topology/` (i.e. from `test.topology.X import Y`). Closes #11467	2022-09-07 23:52:50 +03:00
Nadav Har'El	e5f6adf46c	test/alternator: improve tests for DescribeTable for indexes I created new issues for each missing field in DescribeTable's response for GSIs and LSIs, so in this patch we edit the xfail messages in the test to refer to these issues. Additionally, we only had a test for these fields for GSIs, so this patch also adds a similar test for LSIs. I turns out there is a difference between the two tests - the two fields IndexStatus and ProvisionedThroughput are returned for GSIs, but not for LSIs. Refs #7750 Refs #11466 Refs #11470 Refs #11471 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11473	2022-09-07 09:50:16 +02:00
Benny Halevy	d86810d22c	mutation_partition: compact_for_compaction_v2: get tombstone_gc_state To be passed down to compact_mutation_state in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	0627667a06	mutation_partition: compact_for_compaction: get tombstone_gc_state And pass down to `do_compact`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Benny Halevy	2cd3fc2f36	compaction: table_state: add virtual get_tombstone_gc_state method and override it in table::table_state to get the tombstone_gc_state from the table's compaction_manager. It is going to be used in the next patched to pass the gc state from the compaction_strategy down to sstables and compaction. table_state_for_test was modified to just keep a null tombstone_gc_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Mikołaj Grzebieluch	5b1421cc33	db: config: add BROADCAST_TABLES feature flag Add experimental flag 'broadcast-tables' for enabling BROADCAST_TABLES feature. This feature requires raft group0, thus enabling it without RAFT will cause an error.	2022-09-05 11:11:08 +02:00
Botond Dénes	be9d1c4df4	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422	2022-09-04 20:02:50 +03:00
Kefu Chai	a5e696fab8	storage_service, test: drop unused storage_service_config this setting was removed back in `dcdd207349`, so despite that we are still passing `storage_service_config` to the ctor of `storage_service`, `storage_service::storage_service()` just drops it on the floor. in this change, `storage_service_config` class is removed, and all places referencing it are updated accordingly. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes #11415	2022-08-31 19:49:13 +03:00
Avi Kivity	421557b40a	Merge "Provide DC/RACK when populating topology" from Pavel E " The topology object maintains all sort of node/DC/RACK mappings on board. When new entries are added to it the DC and RACK are taken from the global snitch instance which, in turn, checks gossiper, system keyspace and its local caches. This set make topology population API require DC and RACK via the call argument. In most of the cases the populating code is the storage service that knows exactly where to get those from. After this set it will be possible to remove the dependency knot consiting of snitch, gossiper, system keyspace and messaging. " * 'br-topology-dc-rack-info' of https://github.com/xemul/scylla: toplogy: Use the provided dc/rack info test: Provide testing dc/rack infos storage_service: Provide dc/rack for snitch reconfiguration storage_service: Provide dc/rack from system ks on start storage_service: Provide dc/rack from gossiper for replacement storage_service: Provide dc/rack from gossiper for remotes storage_service,dht,repair: Provide local dc/rack from system ks system_keyspace: Cache local dc-rack on .start() topology: Some renames after previous patch topology: Require entry in the map for update_normal_tokens() topology: Make update_endpoint() accept dc-rack info replication_strategy: Accept dc-rack as get_pending_address_ranges argument dht: Carry dc-rack over boot_strapper and range_streamer storage_service: Make replacement info a real struct	2022-08-31 12:53:06 +03:00
Nadav Har'El	a797512148	Merge 'Raft test topology start stopped servers' from Alecco Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts. Start stopped servers after a test is finished (and passed). Revert previous commit making teardown async again. Closes #11412 * github.com:scylladb/scylladb: test.py: restart stopped servers before teardown... Revert "test.py: random tables make DDL queries async"	2022-08-30 22:48:47 +03:00
Pavel Emelyanov	e5e75ba43c	Merge 'scylla-gdb.py: bring scylla reads-stats up-to-date' from Botond Dénes Said command is broken since 4.6, as the type of `reader_concurrency_semaphore::_permit_list` was changed without an accompanying update to this command. This series updates said command and adds it to the list of tested commands so we notice if it breaks in the future. Closes #11389 * github.com:scylladb/scylladb: test/scylla-gdb: test scylla read-stats scylla-gdb.py: read_stats: update w.r.t. post 4.5 code scylla-gdb.py: improve string_view_printer implementation	2022-08-30 20:24:02 +03:00
Alejo Sanchez	df1ca57fda	test.py: restart stopped servers before teardown... for topology tests Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts. Start stopped servers after a test is finished. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-30 11:40:40 +02:00
Alejo Sanchez	e5eac22a37	Revert "test.py: random tables make DDL queries async" This reverts commit `67c91e8bcd`. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-30 10:54:33 +02:00
Nadav Har'El	eed65dfc2d	Merge 'db: schema_tables: Make table creation shadow earlier concurrent changes' from Tomasz Grabiec Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396 Closes #11398 * github.com:scylladb/scylladb: db: schema_tables: Make table creation shadow earlier concurrent changes db: schema_tables: Fix formatting db: schema_mutations: Make operator<<() print all mutations schema_mutations: Make it a monoid by defining appropriate += operator	2022-08-29 14:21:07 +03:00
Tomasz Grabiec	ae8d2a550d	db: schema_tables: Make table creation shadow earlier concurrent changes Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396	2022-08-29 12:06:02 +02:00
Alejo Sanchez	67c91e8bcd	test.py: random tables make DDL queries async There are async timeouts for ALTER queries. Seems related to othe issues with the driver and async. Make these queries synchronous for now. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11394	2022-08-28 10:38:39 +03:00
Pavel Emelyanov	10e8804417	test: Provide testing dc/rack infos There's a test that's sensitive to correct dc/rack info for testing entries. To populate them it uses global rack-inferring snitch instance or a special "testing" snitch. To make it continue working add a helper that would populate the topology properly (spoiler: next branch will replace it with explicitly populated topology object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 10:00:04 +03:00
Pavel Emelyanov	4cbe6ee9f4	topology: Require entry in the map for update_normal_tokens() The method in question tries to be on the safest side and adds the enpoint for which it updates the tokens into the topology. From now on it's up to the caller to put the endpoint into topology in advance. So most of what this patch does is places topology.update_endpoint() into the relevant places of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:44:08 +03:00
Botond Dénes	4d33812a77	test/scylla-gdb: test scylla read-stats This command was not run before, allowing it to silently break.	2022-08-26 08:08:28 +03:00
Wojciech Mitros	49dba4f0c1	functions: fix dropping of a keyspace with an aggregate in it Currently, if a keyspace has an aggregate and the keyspace is dropped, the keyspace becomes corrupted and another keyspace with the same name cannot be created again This is caused by the fact that when removing an aggregate, we call create_aggregate() to get values for its name and signature. In the create_aggregate(), we check whether the row and final functions for the aggregate exist. Normally, that's not an issue, because when dropping an existing aggregate alone, we know that its UDFs also exist. But when dropping and entire keyspace, we first drop the UDFs, making us unable to drop the aggregate afterwards. This patch fixes this behavior by removing the create_aggregate() from the aggregate dropping implementation and replacing it with specific calls for getting the aggregate name and signature. Additionally, a test that would previously fail is added to cql-pytest/test_uda.py where we drop a keyspace with an aggregate. Fixes #11327 Closes #11375	2022-08-25 16:28:57 +02:00
Tomasz Grabiec	83850e247a	Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun Changing configuration involves two entries in the log: a 'joint configuration entry' and a 'non-joint configuration entry'. We use `wait_for_entry` to wait on the joint one. To wait on the non-joint one, we use a separate promise field in `server`. This promise wasn't connected to the `abort_source` passed into `set_configuration`. The call could get stuck if the server got removed from the configuration and lost leadership after committing the joint entry but before committing the non-joint one, waiting on the promise. Aborting wouldn't help. Fix this by subscribing to the `abort_source` in resolving the promise exceptionally. Furthermore, make sure that two `set_configuration` calls don't step on each other's toes by one setting the other's promise. To do that, reset the promise field at the end of `set_configuration` and check that it's not engaged at the beginning. Fixes #11288. Closes #11325 * github.com:scylladb/scylladb: test: raft: randomized_nemesis_test: additional logging raft: server: handle aborts when waiting for config entry to commit	2022-08-25 12:49:09 +02:00
Avi Kivity	df87949241	Merge "Remove batch tokens update helper" from Pavel E " On token_metadata there are two update_normal_tokens() overloads -- one updates tokens for a single endpoint, another one -- for a set (well -- std::map) of them. Other than updating the tokens both methods also may add an endpoint to the t.m.'s topology object. There's an ongoing effort in moving the dc/rack information from snitch to topology, and one of the changes made in it is -- when adding an entry to topology, the dc/rack info should be provided by the caller (which is in 99% of the cases is the storage service). The batched tokens update is extremely unfriendly to the latter change. Fortunately, this helper is only used by tests, the core code always uses fine-grained tokens updating. " * 'br-tokens-update-relax' of https://github.com/xemul/scylla: token_metadata: Indentation fix after prevuous patch token_metadata: Remove excessive empty tokens check token_metadata: Remove batch tokens updating method tests: Use one-by-one tokens updating method	2022-08-25 12:01:58 +02:00
Wojciech Mitros	9e6e8de38f	tests: prevent test_wasm from occasional failing Some cases in test_wasm.py assumed that all cases are ran in the same order every time and depended on values that should have been added to tables in previous cases. Because of that, they were sometimes failing. This patch removes this assumption by adding the missing inserts to the affected cases. Additionally, an assert that confirms low miss rate of udfs is more precise, a comment is added to explain it clearly. Closes #11367	2022-08-25 11:32:06 +03:00
Kamil Braun	90233551be	test: raft: randomized_nemesis_test: don't access failure detector service after it's stopped It could happen that we accessed failure detector service after it was stopped if a reconfiguration happened in the 'right' moment. This would resolve in an assertion failure. Fix this. Closes #11326	2022-08-25 11:32:06 +03:00
Tomasz Grabiec	1d0264e1a9	Merge 'Implement Raft upgrade procedure' from Kamil Braun Start with a cluster with Raft disabled, end up with a cluster that performs schema operations using group 0. Design doc: https://docs.google.com/document/d/1PvZ4NzK3S0ohMhyVNZZ-kCxjkK5URmz1VP65rrkTOCQ/ (TODO: replace this with .md file - we can do it as a follow-up) The procedure, on a high level, works as follows: - join group 0 - wait until every peer joined group 0 (peers are taken from `system.peers` table) - enter `synchronize` upgrade state, in which group 0 operations are disabled - wait until all members of group 0 entered `synchronize` state or some member entered the final state - synchronize schema by comparing versions and pulling if necessary - enter the final state (`use_new_procedures`), in which group 0 is used for schema operations. With the procedure comes a recovery mode in case the upgrade procedure gets stuck (and it may if we lose a node during recovery - the procedure, to correctly establish a single group 0 cluster, requires contacting every node). This recovery mode can also be used to recover clusters with group 0 already established if they permanently lose a majority of nodes - killing two birds with one stone. Details in the last commit message. Read the design doc, then read the commits in topological order for best reviewing experience. --- I did some manual tests: upgrading a cluster, using the cluster to add nodes, remove nodes (both with `decommission` and `removenode`), replacing nodes. Performing recovery. As a follow-up, we'll need to implement tests using the new framework (after it's ready). It will be easy to test upgrades and recovery even with a single Scylla version - we start with a cluster with the RAFT flag disabled, then rolling-restart while enabling the flag (and recovery is done through simple CQL statements). Closes #10835 * github.com:scylladb/scylladb: service/raft: raft_group0: implement upgrade procedure service/raft: raft_group0: extract `tracker` from `persistent_discovery::run` service/raft: raft_group0: introduce local loggers for group 0 and upgrade service/raft: raft_group0: introduce GET_GROUP0_UPGRADE_STATE verb service/raft: raft_group0_client: prepare for upgrade procedure service/raft: introduce `group0_upgrade_state` db: system_keyspace: introduce `load_peers` idl-compiler: introduce cancellable verbs message: messaging_service: cancellable version of `send_schema_check`	2022-08-25 11:32:06 +03:00
Pavel Emelyanov	1d437302a8	tests: Use one-by-one tokens updating method Tests are the only users of batch tokens updating "sugar" which actually makes things more complicated Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Avi Kivity	6ce5e9079c	Merge 'utils/logalloc: consolidate lsa state in shard tracker' from Botond Dénes Currently the state of LSA is scattered across a handful of global variables. This series consolidates all these into a single one: the shard tracker. Beyond reducing the number of globals (the less globals, the better) this paves the way for a planned de-globalization of the shard tracker itself. There is one separate global left, the static migrators registry. This is left as-is for now. Closes #11284 * github.com:scylladb/scylladb: utils/logalloc: remove reclaim_timer:: globals utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg utils/logalloc: move global stat accessors to tracker utils/logalloc: allocating_section: don't use the global tracker utils/logalloc: pass down tracker::impl reference to segment_pool utils/logalloc: move segment pool into tracker utils/logalloc: add tracker member to basic_region_impl utils/logalloc: make segment independent of segment pool	2022-08-23 18:51:14 +02:00
Tomasz Grabiec	9c4e32d2e2	Merge 'raft: server: drop waiters in `applier_fiber` instead of `io_fiber`' from Kamil Braun When `io_fiber` fetched a batch with a configuration that does not contain this node, it would send the entries committed in this batch to `applier_fiber` and proceed by any remaining entry dropping waiters (if the node was no longer a leader). If there were waiters for entries committed in this batch, it could either happen that `applier_fiber` received and processed those entries first, notifying the waiters that the entries were committed and/or applied, or it could happen that `io_fiber` reaches the dropping waiters code first, causing the waiters to be resolved with `commit_status_unknown`. The second scenario is undesirable. For example, when a follower tries to remove the current leader from the configuration using `modify_config`, if the second scenario happens, the follower will get `commit_status_unknown` - this can happen even though there are no node or network failures. In particular, this caused `randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail from time to time. Fix it by serializing the notifying and dropping of waiters in a single fiber - `applier_fiber`. We decided to move all management of waiters into `applier_fiber`, because most of that management was already there (there was already one `drop_waiters` call, and two `notify_waiters` calls). Now, when `io_fiber` observes that we've been removed from the config and no longer a leader, instead of dropping waiters, it sends a message to `applier_fiber`. `applier_fiber` will drop waiters when receiving that message. Improve an existing test to reproduce this scenario more frequently. Fixes #11235. Closes #11308 * github.com:scylladb/scylladb: test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes` raft: server: drop waiters in `applier_fiber` instead of `io_fiber` raft: server: use `visit` instead of `holds_alternative`+`get`	2022-08-23 17:19:44 +02:00
Kamil Braun	e350e37605	service/raft: raft_group0: implement upgrade procedure A listener is created inside `raft_group0` for acting when the SUPPORTS_RAFT feature is enabled. The listener is established after the node enters NORMAL status (in `raft_group0::finish_setup_after_join()`, called at the end of `storage_service::join_cluster()`). The listener starts the `upgrade_to_group0` procedure. The procedure, on a high level, works as follows: - join group 0 - wait until every peer joined group 0 (peers are taken from `system.peers` table) - enter `synchronize` upgrade state, in which group 0 operations are disabled (see earlier commit which implemented this logic) - wait until all members of group 0 entered `synchronize` state or some member entered the final state - synchronize schema by comparing versions and pulling if necessary - enter the final state (`use_new_procedures`), in which group 0 is used for schema operations (only those for now). The devil lies in the details, and the implementation is ugly compared to this nice description; for example there are many retry loops for handling intermittent network failures. Read the code. `leave_group0` and `remove_group0` were adjusted to handle the upgrade procedure being run correctly; if necessary, they will wait for the procedure to finish. If the upgrade procedure gets stuck (and it may, since it requires all nodes to be available to contact them to correctly establish a single group 0 raft cluster); or if a running cluster permanently loses a majority of nodes, causing group 0 unavailability; the cluster admin is not left without help. We introduce a recovery mode, which allows the admin to completely get rid of traces of existing group 0 and restart the upgrade procedure - which will establish a new group 0. This works even in clusters that never upgraded but were bootstrapped using group 0 from scratch. To do that, the admin does the following on every node: - writes 'recovery' under 'group0_upgrade_state' key in `system.scylla_local` table, - truncates the `system.discovery` table, - truncates the `system.group0_history` table, - deletes group 0 ID and group 0 server ID from `system.scylla_local` (the keys are `raft_group0_id` and `raft_server_id` then the admin performs a rolling restart of their cluster. The nodes restart in a "group 0 recovery mode", which simply means that the nodes won't try to perform any group 0 operations. Then the admin calls `removenode` to remove the nodes that are down. Finally, the admin removes the `group0_upgrade_state` key from `system.scylla_local`, rolling-restarts the cluster, and the cluster should establish group 0 anew. Note that this recovery procedure will have to be extended when new stuff is added to group 0 - like topology change state. Indeed, observe that a minority of nodes aren't able to receive committed entries from a leader, so they may end up in inconsistent group 0 states. It wouldn't be safe to simply create group 0 on those nodes without first ensuring that they have the same state from which group 0 will start. Right now the state only consist of schema tables, and the upgrade procedure ensures to synchronize them, so even if the nodes started in inconsistent schema states, group 0 will correctly be established. (TODO: create a tracking issue? something needs to remind us of this whenever we extend group 0 with new stuff...)	2022-08-23 13:51:01 +02:00
Kamil Braun	b42dfbc0aa	test: raft: randomized_nemesis_test: additional logging Add some more logging to `randomized_nemesis_test` such as logging the start and end of a reconfiguration operation in a way that makes it easy to find one given the other in the logs.	2022-08-23 13:14:30 +02:00
Tomasz Grabiec	0e5b86d3da	Merge 'Optimize mutation consume of range tombstones in reverse' from Benny Halevy Reversing the whole range_tombstone_list into reversed_range_tombstones is inefficient and can lead to reactor stalls with a large number of range tombstones. Instead, iterate over the range_tombsotne_list in reverse direction and reverse each range_tombstone as we go, keeping the result in the optional cookie.reversed_rt member. While at it, this series contains some other cleanups on this path to improve the code readability and maybe make the compiler's life easier as for optimizing the cleaned-up code. Closes #11271 * github.com:scylladb/scylladb: mutation: consume_clustering_fragments: get rid of reversed_range_tombstones; mutation: consume_clustering_fragments: reindent mutation: consume_clustering_fragments: shuffle emit_rt logic around mutation: consume, consume_gently: simplify partition_start logic mutation: consume_clustering_fragments: pass iterators to mutation_consume_cookie ctor mutation: consume_clustering_fragments: keep the reversed schema in cookie mutation: clustering_iterators: get rid of current_rt mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently	2022-08-23 10:05:39 +02:00
Botond Dénes	7d17d675af	utils/logalloc: move global stat accessors to tracker These are pretend free functions, accessing globals in the background, make them a member of the tracker instead, which everything needed locally to compute them. Callers still have to access these stats through the global tracker instance, but this can be changed to happen through a local instance. Soon....	2022-08-23 10:38:58 +03:00
Nadav Har'El	9c15659194	Merge 'test.py: bump timeout of async requests for topology' from Alecco Topology tests do async requests using the Python driver. The driver's API for async doesn't use the session timeout. Pass 60 seconds timeout (default is 10) to match the session's. Fixes https://github.com/scylladb/scylladb/issues/11289 Closes #11348 * github.com:scylladb/scylladb: test.py: bump schema agreement timeout for topology tests test.py: bump timeout of async requests for topology test.py: fix bad indent	2022-08-23 10:30:59 +03:00
Botond Dénes	331033adae	Merge 'Fix frozen mutation consume ordering' from Benny Halevy Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Closes #11269 * github.com:scylladb/scylladb: mutation_partition_view: make mutation_partition_view_virtual_visitor stoppable frozen_mutation: consume and consume_gently in-order frozen_mutation: frozen_mutation_consumer_adaptor: rename rt to rtc frozen_mutation: frozen_mutation_consumer_adaptor: return early when flush returns stop_iteration::yes frozen_mutation: frozen_mutation_consumer_adaptor: consume static row unconditionally frozen_mutation: frozen_mutation_consumer_adaptor: flush current_row before rt_gen	2022-08-23 06:37:04 +03:00
Alejo Sanchez	01cac33472	test.py: bump schema agreement timeout for topology tests Increase the schema agreement timeout to match other timeouts. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 21:07:55 +02:00
Alejo Sanchez	f9d31112cf	test.py: bump timeout of async requests for topology Topology tests do async requests using the Python driver. The driver's API for async doesn't use the session timeout. Pass 60 seconds timeout (default is 10) to match the session's. This will hopefully will fix timeout failures on debug mode. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 21:07:03 +02:00
Mikołaj Sielużycki	b5380baf8a	frozen_mutation: consume and consume_gently in-order Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 20:12:20 +03:00
Kamil Braun	e0c6153adf	test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes` Improve the randomness of this test, making it a bit easier to reproduce the scenarios that the test aims to catch. Increase timeouts a bit to account for this additional randomness.	2022-08-22 18:53:48 +02:00
Alejo Sanchez	87c233b36b	test.py: fix bad indent Fix leftover bad indent Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 14:29:54 +02:00

1 2 3 4 5 ...

3577 Commits