scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Avi Kivity	d3e5b37059	Revert "Merge 'Commitlog: Handle disk usage and disk footprint discrepancies, ensuring we flush when needed' from Calle Wilund" This reverts commit `e9c940dbbc`, reversing changes made to `6144656b25`. Since it was merged commitlog_test consistently times out in debug mode.	2021-05-27 21:16:26 +03:00
Wojciech Mitros	725c6aac81	test/perf: close test_env to pass an assert in sstables_manager destructor When destroying an perf_sstable_test_env, an assert in sstables_manager destructor fails, because it hasn't been closed. Fix by removing all references to sstables from perf_sstable_test_env, and then closing the test_env(as well as the sstables_manager) Fixes #8736 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #8737	2021-05-27 17:41:17 +03:00
Michał Chojnowski	5e9f741bb4	repair: remove range_split.hh Dead code since `80ebedd242`. Closes #8698	2021-05-27 17:21:37 +03:00
Avi Kivity	5f8484897b	Merge 'cdc: use a new internal table for exchanging generations' from Kamil Braun Reopening #8286 since the token metadata fix that allows `Everywhere` strategy tables to work with RBO (#8536) has been merged. --- Currently when a node wants to create and broadcast a new CDC generation it performs the following steps: 1. choose the generation's stream IDs and mapping (how this is done is irrelevant for the current discussion) 2. choose the generation's timestamp by taking the current time (according to its local clock) and adding 2 * ring_delay 3. insert the generation's data (mapping and stream IDs) into system_distributed.cdc_generation_descriptions, using the generation's timestamp as the partition key (we call this table the "old internal table" below) 4. insert the generation's timestamp into the "CDC_STREAMS_TIMESTAMP" application state. The timestamp spreads epidemically through the gossip protocol. When nodes see the timestamp, they retrieve the generation data from the old internal table. Unfortunately, due to the schema of the old internal table, where the entire generation data is stored in a single cell, step 3 may fail for sufficiently large generations (there is a size threshold for which step 3 will always fail - retrying the operation won't help). Also the old internal table lies in the system_distributed keyspace that uses SimpleStrategy with replication factor 3, which is also problematic; for example, when nodes restart, they must reach at least 2 out of these 3 specific replicas in order to retrieve the current generation (we write and read the generation data with QUORUM, unless we're a single-node cluster, where we use ONE). Until this happens, a restarting node can't coordinate writes to CDC-enabled tables. It would be better if the node could access the last known generation locally. The commit introduces a new table for broadcasting generation data with the following properties: - it uses a better schema that stores the data in multiple rows, each of manageable size - it resides in a new keyspace that uses EverywhereStrategy so the data will be written to every node in the cluster that has a token in the token ring - the data will be written using CL=ALL and read using CL=ONE; thanks to this, restarting node won't have to communicate with other nodes to retrieve the data of the last known generation. Note that writing with CL=ALL does not reduce availability: creating a new generation requires all nodes to be available anyway, because they must learn about the generation before their clocks go past the generation's timestamp; if they don't, partitions won't be mapped to stream IDs consistently across the cluster - the partition key is no longer the generation's timestamp. Because it was that way in the old internal table, it forced the algorithm to choose the timestamp before the generation data was inserted into the table. What if the inserting took a long time? It increased the chance that nodes would learn about the generation too late (after their clocks moved past its timestamp). With the new schema we will first insert the generation data using a randomly generated UUID as the partition key, then choose the timestamp, then gossip both the timestamp and the UUID. Observe that after a node learns about a generation broadcasted using this new method through gossip it will retrieve its data very quickly since it's one of the replicas and it can use CL=ONE as it was written using CL=ALL. The generation's timestamp and the UUID mentioned in the last point form a "generation identifier" for this new generation. For passing these new identifiers around, we introduce the cdc::generation_id_v2 type. Fixes #7961. --- For optimal review experience it is best to first read the updated design notes (you can read them rendered here: https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md), specifically the ["Generation switching"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#generation-switching) section followed by the ["Internal generation descriptions table V1 and upgrade procedure"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#internal-generation-descriptions-table-v1-and-upgrade-procedure) section, then read the commits in topological order. dtest gating run (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/1160/ unit tests (dev) passed locally Closes #8643 * github.com:scylladb/scylla: docs: update cdc.md with info about the new internal table sys_dist_ks: don't create old CDC generations table on service initialization sys_dist_ks: rename all_tables() to ensured_tables() cdc: when creating new generations, use format v2 if possible main: pass feature_service to cdc::generation_service gms: introduce CDC_GENERATIONS_V2 feature cdc: introduce retrieve_generation_data test: cdc: include new generations table in permissions test sys_dist_ks: increase timeout for create_cdc_desc sys_dist_ks: new table for exchanging CDC generations tree-wide: introduce cdc::generation_id_v2	2021-05-27 17:13:44 +03:00
Avi Kivity	e8e4456ec7	Merge 'Introduce per-service-level workload types and their first use-case - shedding in interactive workloads' from Piotr Sarna This draft extends and obsoletes #8123 by introducing a way of determining the workload type from service level parameters, and then using this context to qualify requests for shedding. The rough idea is that when the admission queue in the CQL server is hit, it might make more sense to start shedding surplus requests instead of accumulating them on the semaphore. The assumption that interactive workloads are more interested in the success rate of as many requests as possible, and hanging on a semaphore reduces the chances for a request to succeed. Thus, it may make sense to shed some requests to reduce the load on this coordinator and let the existing requests to finish. It's a draft, because I only performed local guided tests. #8123 was followed by some experiments on a multinode cluster which I want to rerun first. Closes #8680 * github.com:scylladb/scylla: test: add a case for conflicting workload types cql-pytest: add basic tests for service level workload types docs: describe workload types for service levels sys_dist_ks: fix redundant parsing in get_service_level sys_dist_ks: make get_service_level exception-safe transport: start shedding requests during potential overload client_state: hook workload type from service levels cql3: add listing service level workload type cql3: add persisting service level workload type qos: add workload_type service level parameter	2021-05-27 17:01:56 +03:00
Konstantin Osipov	52f7ff4ee4	raft: (testing) update copyright An incorrect copyright information was copy-pasted from another test file. Message-Id: <20210525183919.1395607-1-kostja@scylladb.com>	2021-05-27 15:47:49 +03:00
Piotr Sarna	99f356d764	test: add a case for conflicting workload types The test case verifies that if several workload types are effective for a single role, the conflict resolution is well defined.	2021-05-27 14:31:36 +02:00
Piotr Sarna	01b7e445f9	cql-pytest: add basic tests for service level workload types The test cases check whether it's possible to declare workload type for a service level and if its input is validated.	2021-05-27 14:31:36 +02:00
Pavel Emelyanov	d2442a1bb3	tests: Ditch storage_service_for_tests The purpose of the class in question is to start sharded storage service to make its global instance alive. I don't know when exactly it happened but no code that instantiates this wrapper really needs the global storage service. Ref: #2795 tests: unit(dev), perf_sstable(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210526170454.15795-1-xemul@scylladb.com>	2021-05-27 14:39:13 +03:00
Piotr Sarna	762e2f48f2	cql3: add listing service level workload type The workload type information is now presented in the output of LIST SERVICE LEVEL and LIST ALL SERVICE LEVELS statements.	2021-05-27 13:02:22 +02:00
Nadav Har'El	97e827e3e1	secondary index: fix regression in CREATE INDEX IF NOT EXISTS The recent commit `0ef0a4c78d` added helpful error messages in case an index cannot be created because the intended name of its materialized view is already taken - but accidentally broke the "CREATE INDEX IF NOT EXISTS" feature. The checking code was correct, but in the wrong place: we need to first check maybe the index already exists and "IF NOT EXISTS" was chosen - and only do this new error checking if this is not the case. This patch also includes a cql-pytest test for reproducing this bug. The bug is also reproduced by the translated Cassandra unit tests cassandra_tests/validation/entities/secondary_index_test.py:: testCreateAndDropIndex and this is how I found this bug. After these patch, all these tests pass. Fixes #8717. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210526143635.624398-1-nyh@scylladb.com>	2021-05-27 09:10:41 +02:00
Avi Kivity	e2e723cc4c	build: enable -Wrange-loop-construct warning This warning triggers when a range for ("for (auto x : range)") causes non-trivial copies, prompting the developer to replace with a capture by reference. A few minor violations in the test suite are corrected. Closes #8699	2021-05-26 10:32:56 +03:00
Avi Kivity	e9c940dbbc	Merge 'Commitlog: Handle disk usage and disk footprint discrepancies, ensuring we flush when needed' from Calle Wilund Fixes #8270 If we have an allocation pattern where we leave large parts of segments "wasted" (typically because the segment has empty space, but cannot hold the mutation being added), we can have a disk usage that is below threshold, yet still get a disk _footprint_ that is over limit causing new segment allocation to stall. We need to take a few things into account: 1.) Need to include wasted space in the threshold check. Whether or not disk is actually used does not matter here. 2.) If we stall a segment alloc, we should just flush immediately. No point in waiting for the timer task. 3.) Need to adjust the thresholds a bit. Depending on sizes, we should probably consider start flushing once we've used up space enough to be in the last available segment, so a new one is hopefully available by the time we hit the limit. Also fix edge case (for tests), when we have too few segment to have an active one (i.e. need flush everything). Closes #8695 * github.com:scylladb/scylla: commitlog_test: Add test case for usage/disk size threshold mismatch commitlog: Flush all segments if we only have one. commitlog: Always force flush if segment allocation is waiting commitlog: Include segment wasted (slack) size in footprint check commitlog: Adjust (lower) usage threshold	2021-05-25 18:34:29 +03:00
Kamil Braun	c948573398	sys_dist_ks: don't create old CDC generations table on service initialization The old table won't be created in clusters that are bootstrapped after this commit. It will stay in clusters that were upgraded from a version before this commit. Note that a fully upgraded cluster doesn't automatically create a new generation in the new format. Even if the last generation was created before the upgrade, the cluster will keep using it. A new generation will be created in the new format when either: 1. a new node bootstraps (in the new version), 2. or the user runs checkAndRepairCdcStreams, which has a new check: if the current generation uses the old format, the command will decide that repair is needed, even if the generation is completely fine otherwise (also in the new version). During upgrade, while the CDC_GENERATIONS_V2 feature is still not enabled, the user may still bootstrap a node in the old version of Scylla or run checkAndRepairCdcStreams on a not-yet-upgraded node. In that case a new generation will be created in the old format, using the old table definitions.	2021-05-25 16:07:23 +02:00
Kamil Braun	4d3870b24b	main: pass feature_service to cdc::generation_service	2021-05-25 16:07:23 +02:00
Kamil Braun	f25e77c202	test: cdc: include new generations table in permissions test	2021-05-25 16:07:23 +02:00
Calle Wilund	a96433c684	commitlog_test: Add test case for usage/disk size threshold mismatch Refs #8270 Tries to simulate case where we mismatch segments usage with actual disk footprint and fail to flush enough to allow segment recycling	2021-05-25 12:43:12 +00:00
Avi Kivity	e391e4a398	test: serialized_action_test: prevent false-positive timeout in test_phased_barrier_reassignment test_phased_barrier_reassignment has a timeout to prevent the test from hanging on failure, but it occastionally triggers in debug mode since the timeout is quite low (1ms). Increase the timeout to prevent false positives. Since the timeout only expires if the test fails, it will have no impact on execution time. Ref #8613 Closes #8692	2021-05-25 11:20:18 +02:00
Raphael S. Carvalho	ee39eb9042	sstables: Fix slow off-strategy compaction on STCS tables Off-strategy compaction on a table using STCS is slow because of the needless write amplification of 2. That's because STCS reshape isn't taking advantage of the fact that sstables produced by a repair-based operation are disjoint. So the ~256 input sstables were compacted (in batches of 32) into larger sstables, which in turn were compacted into even larger ones. That write amp is very significant on large data sets, making the whole operation 2x slower. Fixes #8449. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210524213426.196407-1-raphaelsc@scylladb.com>	2021-05-25 11:24:42 +03:00
Piotr Sarna	95c6ec1528	Merge 'test/cql-pytest: clean up tests to run on Cassandra' from Nadav Har'El To keep our cql-pytest tests "correct", we should strive for them to pass on Cassandra - unless they are testing a Scylla-only feature or a deliberate difference between Scylla and Cassandra - in which case they should be marked "scylla-only" and cause such tests to be skipped when running on Cassandra. The following few small patches fix a few cases where our tests we failing on Cassandra. In one case this even found a bug in the test (a trivial Python mistake, but still). Closes #8694 * github.com:scylladb/scylla: test/cql-pytest: fix python mistake in an xfailing test test/cql-pytest: mark some tests with scylla-only test/cql-pytest: clean up test_create_large_static_cells_and_rows	2021-05-24 16:42:01 +02:00
Nadav Har'El	edc2c65552	Merge 'Fix service level negative timeouts' from Piotr Sarna This series fixes a minor validation issue with service level timeouts - negative values were not checked. This bug is benign because negative timeouts act just like a 0s timeout, but the original series claimed to validate against negative values, so it's hereby fixed. More importantly however, this series follows by enabling cql-pytest to run service level tests and provides a first batch of them, including a missing test case for negative timeouts. The idea is similar to what we already have in alternator test suite - authentication is unconditionally enabled, which doesn't affect any existing tests, but at the same time allows writing test cases which rely on authentication - e.g. service levels. Closes #8645 * github.com:scylladb/scylla: cql-pytest: introduce service level test suite cql-pytest: add enabling authentication by default qos: fix validating service level timeouts for negative values	2021-05-24 16:30:13 +03:00
Tomasz Grabiec	b1821c773f	Merge "raft: basic RPC module testing" from Pavel Solodovnikov Now RPC module has some basic testing coverage to make sure RPC configuration is updated appropriately on configuration changes (i.e. `add_server` and `remove_server` are called when appropriate). The test suite currenty consists of the following test-cases: * Loading server instance with configuration from a snapshot. * Loading server instance with configuration from a log. * Configuration changes (remove + add node). * Leader elections don't lead to RPC configuration changes. * Voter <-> learner node transitions also don't change RPC configuration. * Reverting uncommitted configuration changes updates RPC configuration accordingly (two cases: revert to snapshot config or committed state from the log). A few more refactorings are made along the way to be able to reuse some existing functions from `replication_test` in `rpc_test` implementation. Please note, though, that there are still some functions that are borrowed from `replication_test` but not yet extracted to common helpers. This is mostly because RPC tests doesn't need all the complexity that `replication_test` has, thus, some helpers are copied in a reduced form. It would take some effort to refactor these bits to fit both `replication_test` and `rpc_test` without sacrificing convenience. This will probably be addressed in another series later. * manmanson/raft-rpc-tests-v9-alt3: raft: add tests for RPC module test: add CHECK_EVENTUALLY_EQUAL utility macro raft: replication_test: reset test rpc network between test runs raft: replication_test: extract tickers initialization into a separate func raft: replication_test: support passing custom `apply_fn` to `change_configuration()` raft: replication_test: introduce `test_server` aggregate struct raft: replication_test: support voter<->learner configuration changes raft: remove duplicate `create_command` function from `replication_test` raft: avoid 'using' statements in raft testing helpers header	2021-05-24 14:44:37 +02:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Nadav Har'El	5206665b15	test/cql-pytest: fix python mistake in an xfailing test The xfailing test cassandra_tests/validation/entities/collections_test.py:: testSelectionOfEmptyCollections had a Python mistake (using {} instead of set() for an empty set), which resulted in its failure when run against Cassandra. After this patch it passes on Cassandra and fails on Scylla - as expected (this is why it is marked xfail). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-24 13:14:54 +03:00
Nadav Har'El	f26b31e950	test/cql-pytest: mark some tests with scylla-only Tests which are known to test a Scylla-only feature (such as CDC) or to rely on a known and difference between Scylla and Cassandra should be marked "scylla-only", so they are skipped when running the tests against Cassandra (test/cql-pytest/run-cassandra) instead of reporting errors. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-24 13:03:48 +03:00
Nadav Har'El	c8117584e3	test/cql-pytest: clean up test_create_large_static_cells_and_rows The test test_create_large_static_cells_and_rows had its own implementation of "nodetool flush" using Scylla's REST API. Now that we have a nodetool.flush() function for general use in cql-pytest, let's use it and save a bit of duplication. Another benefit is that now this test can be run (and pass) against Cassandra. To allow this test to run on Cassandra, I had to remove a "USING TIMEOUT" which wasn't necessary for this test, and is not a feature supported by Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-24 12:31:51 +03:00
Asias He	425e3b1182	gossip: Introduce direct failure detector Currently, gossip uses the updates of the gossip heartbeat from gossip messages to decide if a node is up or down. This means if a node is actually down but the gossip messages are delayed in the network, the marking of node down can be delayed. For example, a node sends 20 gossip messages in 20 seconds before it is dead. Each message is delayed 15 seconds by the network for some reason. A node receives those delayed messages one after another. Those delayed messages will prevent this node from being marked as down. Because heartbeat update is received just before the threshold to mark a node down is triggered which is around 20 seconds by default. As a result, this node will not be marked as down in 20 * 15 seconds = 300 seconds, much longer than the ~20 seconds node down detection time in normal cases. In this patch, a new failure detector is implemented. - Direct detection The existing failure detector can get gossip heartbeat updates indirectly. For example: Node A can talk to Node B Node B can talk to Node C Node A can not talk to Node C, due to network issues Node A will not mark Node B to be down because Node A can get heart beat of Node C from node B indirectly. This indirect detection is not very useful because when Node A decides if it should send requests to Node C, the requests from Node A to C will fail while Node A thinks it can communicate with Node C. This patch changes the failure detection to be direct. It uses the existing gossip echo message to detect directly. Gossip echo messages will be sent to peer nodes periodically. A peer node will be marked as down if a timeout threshold has been meet. Since the failure detection is peer to peer, it avoids the delayed message issue mentioned above. - Parallel detection The old failure detector uses shard zero only. This new failure detector utilizes all the shards to perform the failure detection, each shard handling a subset of live nodes. For example, if the cluster has 32 nodes and each node has 16 shards, each shard will handle only 2 nodes. With a 16 nodes cluster, each node has 16 shards, each shard will handle only one peer node. A gossip message will be sent to peer nodes every 2 seconds. The extra echo messages traffic produced compared to the old failure detector is negligible. - Deterministic detection Users can configure the failure_detector_timeout_in_ms to set the threshold to mark a node down. It is the maximum time between two successful echo message before gossip marks a node down. It is easier to understand than the old phi_convict_threshold. - Compatible This patch only uses the existing gossip echo message. Nodes with or without this patch can work together. Fixes #8488 Closes #8036	2021-05-24 10:47:06 +03:00
Piotr Sarna	890ed201fd	Merge 'Enable -Wunused-private-field warning' from Avi Kivity The -Wunused-private-field was squelched when we switched to clang to make the change easier. But it is a useful warning, so re-enable it. It found a serious bug (#8682) and a few minor instances of waste. Closes #8683 * github.com:scylladb/scylla: build: enable -Wunused-private-field warning test: drop unused fields table: drop unused field database_sstable_write_monitor::_compaction_manager streaming: drop unused fields sstables: mx reader: drop unused _column_value_length field sstables: index_consumer: drop unused max_quantity field compaction: resharding_compaction: drop unused _shard field compaction: compaction_read_monitor: drop unused _compaction_manager field raft: raft_services: drop unused _gossiper field repair: drop unused _nr_peer_nodes field redis: drop unused fields _storage_proxy and _requests_blocked_memory mutation_rebuilder: drop unused field _remaining_limit db: data_listeners: remove unused field _db cql3: insert_json_statement: note bug with unused _if_not_exists cql3: authorized_prepared_statement_cache: drop unused field _logger auth: service_level_resource_view: drop unused field _resource	2021-05-24 09:21:10 +02:00
Gleb Natapov	b4d6bdb16e	raft: test: check that a leader does not send probes to a follower in the snapshot mode Message-Id: <YKTNN7vNGkQwTDX7@scylladb.com>	2021-05-23 01:06:12 +02:00
Avi Kivity	7e5a0b6fd0	test: drop unused fields Drop unused fields in various tests and test libraries.	2021-05-21 21:04:49 +03:00
Nadav Har'El	a2379b96b1	alternator test: test for large BatchGetItem This patch adds an Alternator test, test_batch_get_item_large, which checks a BatchGetItem with a moderately large (1.5 MB) response. The test passes - we do not have a bug in BatchGetItem - but it does reproduce issue #8522 - the long response is stored in memory as one long contiguous string and causes a warning about an over-sized allocation: WARN ... seastar_memory - oversized allocation: 2281472 bytes. Incidentally, this test also reproduces a second contiguous allocation problem - issue #8183 (in BatchWriteItem which we use in this test to set up the item to read). Refs #8522 Refs #8183 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210520161619.110941-1-nyh@scylladb.com>	2021-05-21 08:38:53 +02:00
Avi Kivity	eac6fb8d79	gdb: bypass unit test on non-x86 The gdb self-tests fail on aarch64 due to a failure to use thread-local variables. I filed [1] so it can get fixed. Meanwhile, disable the test so the build passes. It is sad, but the aarch64 build is not impacted by these failures. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=27886 Closes #8672	2021-05-20 20:14:15 +03:00
Avi Kivity	30034371e7	Merge "Remove most of global pointers from repair" from Pavel " There are many global stuff in repair -- a bunch of pointers to sharded services, tracker, map of metas (maybe more). This set removes the first group, all those services had become main-local recently. Along the way a call to global storage proxy is dropped. To get there the repair_service is turned into a "classical" sharded<> service, gets all the needed dependencies by references from main and spreads them internally where needed. Tracker and other stuff is left global, but tracker is now the candidate for merging with the now sharded repair_service, since it emulates the sharded concept internally. Overall the change is - make repair_service sharded and put all dependencies on it at start - have sharded<repair_service> in API and storage service - carry the service reference down to repair_info and repair_meta constructions to give them the depedencies - use needed services in _info and _meta methods tests: unit(dev), dtest.repair(dev) " * 'br-repair-service' of https://github.com/xemul/scylla: (29 commits) repair: Drop most of globals from repair repair: Use local references in messaging handler checks repair: Use local references in create_writer() repair: Construct repair_meta with local references repair: Keep more stuff on repair_info repair: Kill bunch of global usages from insert_repair_meta repair: Pass repair service down to meta insertion repair: Keep local migration manager on repair_info repair: Move unused db captures repair: Remove unused ms captures repair: Construct repair_info with service repair: Loop over repair sharded container repair: Make sync_data_using_repair a method repair: Use repair from storage service repair: Keep repair on storage service repair: Make do_repair_start a method repair: Pass repair_service through the API until do_repair_start repair: Fix indentation after previous patch repair: Split sync_data_using_repair repair: Turn repair_range a repair_info method ...	2021-05-20 10:57:48 +03:00
Pavel Solodovnikov	238273d237	treewide: remove evidently unneded storage_proxy includes from some places Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 02:19:32 +03:00
Pavel Solodovnikov	0663aa6ca1	service_level_controller: remove extraneous `service/storage_service.hh` include Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 02:18:41 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Piotr Sarna	223a59c09c	test: make rjson allocator test working in sanitize mode Following Nadav's advice, instead of ignoring the test in sanitize/debug modes, the allocator simply has a special path of failing sufficiently large allocation requests. With that, a problem with the address sanitizer is bypassed and other debug mode sanitizers can inspect and check if there are no more problems related to wrapping the original rapidjson allocator. Closes #8539	2021-05-20 00:42:47 +03:00
Pavel Solodovnikov	a66de8658b	raft: add tests for RPC module Now RPC module has some basic testing coverage to make sure RPC configuration is updated appropriately on configuration changes (i.e. `add_server` and `remove_server` are called when appropriate). The test suite currenty consists of the following test-cases: * Loading server instance with configuration from a snapshot. * Loading server instance with configuration from a log. * Configuration changes (remove + add node). * Leader elections don't lead to RPC configuration changes. * Voter <-> learner node transitions also don't change RPC configuration. * Reverting uncommitted configuration changes updates RPC configuration accordingly (two cases: revert to snapshot config or committed state from the log). Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-19 23:14:04 +03:00
Pavel Solodovnikov	e030e291a8	test: add CHECK_EVENTUALLY_EQUAL utility macro It would be good to have a `CHECK` variant in addition to an existing `REQUIRE_EVENTUALLY_EQUAL` macro. Will be used in raft RPC tests. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-19 23:12:55 +03:00
Pavel Solodovnikov	2067cc75c6	raft: replication_test: reset test rpc network between test runs Currently, emulated rpc network is shared between all test cases in `replication_test.cc` (see static `rpc::net` map). Though, its value is not reset when executing a subsequent test case, which opens a possibility for heap-use-after-free bugs. Also, make all `send_*` functions in test rpc class to throw an error if a node being contacted is not in the network instead of past-the-end access. This allows to safely contact a non-existent node, which will be used in RPC tests later. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-19 23:06:29 +03:00
Avi Kivity	d8121961fa	Merge 'cql-pytest: add nodetool flush feature and use it in a test' from Nadav Har'El The first patch adds a nodetool-like capability to the cql-pytest framework. It is not meant to be used to test nodetool itself, but rather to give CQL tests the ability to use nodetool operations - currently only one operation - "nodetool flush". We try to use Scylla's REST API, if possible, and only fall back to using an external "nodetool" command when the REST API is not available - i.e., when testing Cassandra. The benefit of using the REST API is that we don't need to run the jmx server to test Scylla. The second patch is an example of using the new nodetool flush feature in a test that needs to flush data to reproduce a bug (which has already been fixed). Closes #8622 * github.com:scylladb/scylla: cql-pytest: reproducer for issue #8138 cql-pytest: add nodetool flush feature	2021-05-19 14:40:18 +03:00
Nadav Har'El	fd8d15a1a6	cql-pytest: reproducer for issue #8138 We add a reproducing test for issue #8138, were if we write to an TWCS table, scanning it would yield no rows - and worse - crash the debug build. This test requires "nodetool flush" to force the read to happen from sstables, hence the nodetool feature was implemented in the previous patch (on Scylla, it uses the REST API - not actually running nodetool or requiring JMX). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-19 13:58:14 +03:00
Nadav Har'El	49580a4701	cql-pytest: add nodetool flush feature This patch adds a nodetool-compatible capability to the cql-pytest framework. It is not meant to be used to test nodetool itself, but rather to give CQL tests the ability to use nodetool operations - currently one operation - "nodetool flush". Use it in a test as: import nodetool nodetool.flush(cql, table) I chose a functional API with parameters ("cql") instead of a fixture with an implied connection so that in the future we may allow multiple multiple nodes and this API will allow sending nodetool requests to different nodes. However, multi-node support is not implemented yet, nor used in any of the existing tests. The implementation uses Scylla's REST API if available, or if not, falls back to using an external "nodetool" command (which can be overridden using the NODETOOL environment variable). This way, both cql-pytest/run (Scylla) and cql-pytest/run-cassandra (Cassandra) now correctly support these nodetool operations, and we still don't need to run JMX to test Scylla. The reason We want to support nodetool.flush() is to reproduce bugs that depend on data reaching disk. We already had such a reproducer in test_large_cells_rows.py - it too did something similar - but it was Scylla-only (using only the REST API). Instead of copying such code to multiple places, we better have a common nodetool.flush() function, as done in this patch. The test in test_large_cells_rows.py can later be changed to use the new function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-19 13:55:25 +03:00
Pavel Emelyanov	28f01aadc9	allocation_strategy, code: Simplify alloc() Todays alloc() accepts migrate-fn, size and alignment. All the callers don't really need to provide anything special for the migrate-fn and are just happy with default alignof() for alignment. The simplification is in providing alloc() that only accepts size arg and does the rest itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-19 09:23:49 +03:00
Botond Dénes	dbb6851d4d	test/manual/sstable_scan_footprint: don't double close the semaphore The semaphore `stats_collector` references is the one obtained from the database object, which is already stopped by `database::stop()`, making the stop in `~stats_collector()` redundant, and even worse, as it triggers an assert failure. Remove it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210518140913.276368-1-bdenes@scylladb.com>	2021-05-18 17:55:52 +03:00
Avi Kivity	16ff92745f	Merge 'perf: add alternator frontend to perf_simple_query' from Piotr Sarna The perf_simple_query tool is extended with another protocol aside from CQL - alternator. The alternative (pun intended) benchmark can be executed by using the `--alternator X` parameter, where X specifies one of the alternator's mandatory write isolation options: - "forbid_rmw" - forbids RMW (read-modify-write) requests - "unsafe" - never uses LWT (lightweight transactions), even for RMW - "always_use_lwt" - uses LWT even for non-RMW requests - "only_rmw_uses_lwt" - that one's rather self-explanatory Alternator cooperates with existing `--write` and `--delete` parameters. Aside from being able to check for improvements/regressions in the alternator module, it's also possible to check how different isolation levels influence the number of allocations and overall performance, or to compare alternator against CQL. Example output showing the difference in isolation levels: ```bash $ ./build/release/test/perf/perf_simple_query_g --smp 1 \ --write --alternator only_rmw_uses_lwt --default-log-level error random-seed=1235000092 Started alternator executor 10873.76 tps (202.9 allocs/op, 12.4 tasks/op, 369921 insns/op) 11096.09 tps (202.7 allocs/op, 12.1 tasks/op, 374792 insns/op) 11100.09 tps (203.0 allocs/op, 12.1 tasks/op, 376469 insns/op) 11068.98 tps (203.1 allocs/op, 12.1 tasks/op, 377132 insns/op) 11081.24 tps (203.2 allocs/op, 12.1 tasks/op, 377290 insns/op) median 11081.24 tps (203.2 allocs/op, 12.1 tasks/op, 377290 insns/op) median absolute deviation: 14.85 maximum: 11100.09 minimum: 10873.76 $ ./build/release/test/perf/perf_simple_query_g --smp 1 \ --random-seed 1235000092 --write --alternator always_use_lwt \ --default-log-level error random-seed=1235000092 Started alternator executor 3605.35 tps (877.4 allocs/op, 174.6 tasks/op, 986666 insns/op) 3555.71 tps (890.0 allocs/op, 174.4 tasks/op, 1006945 insns/op) 3530.20 tps (899.7 allocs/op, 174.1 tasks/op, 1021908 insns/op) 3437.65 tps (908.2 allocs/op, 174.6 tasks/op, 1033992 insns/op) 3409.88 tps (913.2 allocs/op, 174.4 tasks/op, 1041240 insns/op) median 3530.20 tps (899.7 allocs/op, 174.1 tasks/op, 1021908 insns/op) median absolute deviation: 75.15 maximum: 3605.35 minimum: 3409.88 ``` Closes #8656 * github.com:scylladb/scylla: perf: add alternator frontend to perf_simple_query cdc: make metadata.hh self-sufficient test: add minimal alternator_test_env	2021-05-18 16:17:54 +03:00
Piotr Sarna	6c6ccda8a0	perf: add alternator frontend to perf_simple_query The perf_simple_query tool is extended with another protocol aside from CQL - alternator. The alternative (pun intended) benchmark can be executed by using the `--alternator X` parameter, where X specifies one of the alternator's mandatory write isolation options: - "forbid_rmw" - forbids RMW (read-modify-write) requests - "unsafe" - never uses LWT (lightweight transactions), even for RMW - "always_use_lwt" - uses LWT even for non-RMW requests - "only_rmw_uses_lwt" - that one's rather self-explanatory Alternator cooperates with existing --write and --delete parameters. Aside from being able to check for improvements/regressions in the alternator module, it's also possible to check how different isolation levels influence the number of allocations and overall performance, or to compare alternator against CQL. $ ./build/release/test/perf/perf_simple_query_g --smp 1 \ --write --alternator only_rmw_uses_lwt --default-log-level error random-seed=1235000092 Started alternator executor 10873.76 tps (202.9 allocs/op, 12.4 tasks/op, 369921 insns/op) 11096.09 tps (202.7 allocs/op, 12.1 tasks/op, 374792 insns/op) 11100.09 tps (203.0 allocs/op, 12.1 tasks/op, 376469 insns/op) 11068.98 tps (203.1 allocs/op, 12.1 tasks/op, 377132 insns/op) 11081.24 tps (203.2 allocs/op, 12.1 tasks/op, 377290 insns/op) median 11081.24 tps (203.2 allocs/op, 12.1 tasks/op, 377290 insns/op) median absolute deviation: 14.85 maximum: 11100.09 minimum: 10873.76 $ ./build/release/test/perf/perf_simple_query_g --smp 1 \ --random-seed 1235000092 --write --alternator always_use_lwt \ --default-log-level error random-seed=1235000092 Started alternator executor 3605.35 tps (877.4 allocs/op, 174.6 tasks/op, 986666 insns/op) 3555.71 tps (890.0 allocs/op, 174.4 tasks/op, 1006945 insns/op) 3530.20 tps (899.7 allocs/op, 174.1 tasks/op, 1021908 insns/op) 3437.65 tps (908.2 allocs/op, 174.6 tasks/op, 1033992 insns/op) 3409.88 tps (913.2 allocs/op, 174.4 tasks/op, 1041240 insns/op) median 3530.20 tps (899.7 allocs/op, 174.1 tasks/op, 1021908 insns/op) median absolute deviation: 75.15 maximum: 3605.35 minimum: 3409.88	2021-05-18 15:10:31 +02:00
Piotr Sarna	b6d6247a74	test: add minimal alternator_test_env A minimal implementation of alternator test env, a younger cousin of cql_test_env, is implemented. Note that using this environment for unit tests is strongly discouraged in favor of the official test/alternator pytest suite. Still, alternator_test_env has its uses for microbenchmarks.	2021-05-18 15:10:31 +02:00
Botond Dénes	82bff1bcc6	test: cql_test_env: use proper scheduling groups Currently `cql_test_env` runs its `func` in the default (main) group and also leaves all scheduling groups in `dbcfg` default initialized to the same scheduling group. This results in every part of the system, normally isolated from each other, running in the same (default) scheduling group. Not a big problem on its own, as we are talking about tests, but this creates an artificial difference between the test and the real environment, which is ever more pronounced since certain query parameters are selected based on the current scheduling group. To bring cql test env just that little bit closer to the real thing, this patch creates all the scheduling groups main does (well almost) and configures `dbcfg` with them. Creating and destroying the scheduling group on each setup-teardown of cql test env breaks some internal seastar components which don't like seeing the same scheduling group with the same name but different id. So create the scheduling groups once on first access and keep them around until the test executable is running. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210514141614.128213-2-bdenes@scylladb.com>	2021-05-18 13:44:54 +03:00
Botond Dénes	300ee974f7	test: use with_cql_test_env_thread where needed Currently `with_cql_test_env()` is equivalent to `with_cql_test_env_thread()`, which resulted in many tests using the former while really needing the latter and getting away with it. This equivalence is incidental and will go away soon, so make sure all cql test env using tests that expect to be run in a thread use the appropriate variant. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210514141614.128213-1-bdenes@scylladb.com>	2021-05-18 13:44:52 +03:00

1 2 3 4 5 ...

1697 Commits