scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Alejo Sanchez	52188016af	raft: replication test: create_server in raft_cluster Remove the global create_raft_server() and replace with a create_server() helper in replication_test(). This will allow not requiring the user of raft_cluster to create special objects. Note this does not move(apply) anymore as it's kept in raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:02 -04:00
Alejo Sanchez	1edcb6e647	raft: replication test: reset snapshots When stopping a server also delete snapshots and persisted snapshots. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:46:11 -04:00
Alejo Sanchez	453f19cf0e	raft: replication test: reset server helper Add a helper to reset a server in raft_cluster. Besides simplifying code and preventing errors, this will help move create_raft_server logic to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	d3b7f21b88	raft: replication test: pause tickers before stopping Pause tickers before stopping servers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	30c9daafd2	raft: replication test: tick helper Move test tick handling to raft_cluster as helper method. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	2e61c507d2	raft: replication test: tickers on raft_cluster Move tickers to raft_cluster helper class. Ticker initialization and pause is done automatically at start_all() and stop_all(). Add temporary helpers to manage specific tickers. These might be removed later once proper node abort and reset are implemented. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	aea77871c4	raft: replication test: cluster tracking leader Track current leader inside helper class. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	ca8e55613e	raft: replication test: elect first leader in raft_cluster Run first leader election inside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	322802308c	raft: replication test: use id 0 for rpc tests raft_cluster at the moment only allows sequential 0 based ids. The code was generating ids over this and causing problems for code changes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	c1a6e81002	raft: replication test: fix partition wait log When partitioning, don't wait_log on servers outside configuration. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	6db730c500	raft: replication test: partition helper Add a partition handling helper to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	848c244932	raft: replication test: track in_configuration in raft_cluster Keep track of servers in configuration inside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	16728b8966	raft: replication test: use cluster saved apply function Use apply function saved in cluster at creation time. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	3daed889b8	raft: replication test: change_configuration in raft_cluster Move change_configuration to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	102b8e71bb	raft: replication test: free_election in raft_cluster Move free_election to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	60d4d06861	raft: replication test: wait_log_all in raft_cluster Move wait_log_all to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	d1ba0fe719	raft: replication test: wait_log in raft_cluster Move wait_log to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	3e4871b884	raft: replication test: elect_new_leader in raft_cluster Move elect_new_leader to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	59b9642be5	raft: replication test: elapse_election in raft_cluster Move elapse_election to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	b3e2b54913	raft: replication test: move add_entry up Style. Move definition of add_entry and add_remaining_entries with the rest of raft_cluster definitions. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	8cd2abe72b	raft: replication test: remove spurious check Going forward the leader is always in configuration and up to date. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2d51d1bbc5	raft: replication test: raft_cluster add_entries Move add_entries() to raft_cluster and provide a helper to add remaining entries. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2a1e7a15a6	raft: replication test: calculate first value helper Helper to calculate what's the value number to be added after snapshot and leader initial log. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	e2f425e210	raft: replication test: initial state helper Move initial_state preparation to its own helper function. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	d2c0308a85	raft: replication test: move declarations up Move declarations near the top of the file for following refactors to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	a3700a6d0a	raft: replication test: move up set_config Move set_config above raft_cluster for a subsequent commit. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	57da05c986	raft: replication test: use disconnect() helper For rpc tests, use raft_cluster::disconnect() instead of the local connected reference. This removes connected object use outside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	54c919b726	raft: replication test: add connectivity helpers Add connectivity helpers disconnect(server, except) and connect_all() to so users of raft_cluster don't need to keep the a connectivity object pointer. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	5e324f3438	raft: replication test: rpc with raft_cluster Use raft_cluster for rpc tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	752d53a909	raft: replication test: use parallel start/stop Start and stop servers in parallel. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	bcf5181697	raft: replication test: cluster class Use raft_cluster class to handle servers. First part of this change. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	5fc0a1251d	raft: replication test: helper uuid to local id Add a helper to convert from UUID to size_t id. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	7e93501d4c	raft: replication test: use optional Instead of tracking with a boolean use an optional for partition leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	ccb85bce02	raft: replication test: wait log on next leader only When there's a defined next leader, only wait for log propagation for this follower. Splits wait_log() to waiting for one follower with wait_log() and waiting for all followers with wait_log(). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2aa1646e35	raft: replication test: remove wait after adding entries Remove log wait after adding entries. It was added to handle some debug hangs but it is not good for testing. There are already wait logs at proper code locations. (e.g. elect_new_leader, partition) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	0216d0a7b0	raft: replication test: remove unused param elect_new_leader doesn't need to know configuration anymore. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	effcb7c5f6	raft: tests: move conversion helpers to header Move replication test helpers to header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	7327cbd871	raft: replication test: use structs to avoid alias Use structs for test commands. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Raphael S. Carvalho	a7cdd846da	compaction: Prevent tons of compaction of fully expired sstable from happening in parallel Compaction manager can start tons of compaction of fully expired sstable in parallel, which may consume a significant amount of resources. This problem is caused by weight being released too early in compaction, after data is all compacted but before table is called to update its state, like replacing sstables and so on. Fully expired sstables aren't actually compacted, so the following can happen: - compaction 1 starts for expired sst A with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 2 starts for expired sst B with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 3 starts for expired sst C with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 1 is done updating table state, so it finally completes and releases all the resources. - compaction 2 is done updating table state, so it finally completes and releases all the resources. - compaction 3 is done updating table state, so it finally completes and releases all the resources. This happens because, with expired sstable, compaction will release weight faster than it will update table state, as there's nothing to be compacted. With my reproducer, it's very easy to reach 50 parallel compactions on a single shard, but that number can be easily worse depending on the amount of sstables with fully expired data, across all tables. This high parallelism can happen only with a couple of tables, if there are many time windows with expired data, as they can be compacted in parallel. Prior to `55a8b6e3c9`, weight was released earlier in compaction, before last sstable was sealed, but right now, there's no need to release weight earlier. Weight can be released in a much simpler way, after the compaction is actually done. So such compactions will be serialized from now on. Fixes #8710. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com> [avi: drop now unneeded storage_service_for_tests]	2021-05-30 23:22:51 +03:00
Avi Kivity	791412b046	test: user_defined_function_test: raise Lua timeout user_defined_function_test fails sporadically in debug mode due to lua timeout. Raise the timeout to avoid the failure, but not so much that the test that expects timout becomes too slow. Fixes #8746. Closes #8747	2021-05-30 13:10:57 +03:00
Piotr Jastrzebski	76d7c761d1	schema: Stop using deprecated constructor This is another boring patch. One of schema constructors has been deprecated for many years now but was used in several places anyway. Usage of this constructor could lead to data corruption when using MX sstables because this constructor does not set schema version. MX reading/writing code depends on schema version. This patch replaces all the places the deprecated constructor is used with schema_builder equivalent. The schema_builder sets the schema version correctly. Fixes #8507 Test: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>	2021-05-30 11:58:27 +03:00
Nadav Har'El	1507bbb35a	cql-pytest: increase default server-side timeouts Sometimes the cql-pytest tests run extremely slowly. This can be a combination of running the debug build (which is naturally slow) and a test machine which is overcommitted, or experiencing some transient swap storm or some similar event. We don't want tests, which we run on a 100% reliable setups, to fail just because they run into timeouts in Scylla when they run very slowly. We already noticed this problem in the past, and increased the CQL client timeout in conftest.py from the default of 10 seconds to 120 seconds - the old default of 10 seconds was not enough for some long operations (such as creating a table with multiple views) when the test ran very slowly. However, this only fixed the client-side timeout. We also have a bunch of server-side timeouts, configured to all sorts of arbitrary (and fairly small) numbers. For example, the server has a "write request timeout" option, which defaults to just 2 seconds. We recently saw this timeout exceeded in a slow run which tried to do a very large write. So this patch configures all the configurable server-side timeouts we have to default to 300 seconds. This should be more than enough for even the slowest runs (famous last words...). This default is not a good idea on real multi-node clusters which are expected to deal with node loss, but this is not the case in cql-pytest. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210529213648.856503-1-nyh@scylladb.com>	2021-05-30 01:20:14 +03:00
Avi Kivity	d3e5b37059	Revert "Merge 'Commitlog: Handle disk usage and disk footprint discrepancies, ensuring we flush when needed' from Calle Wilund" This reverts commit `e9c940dbbc`, reversing changes made to `6144656b25`. Since it was merged commitlog_test consistently times out in debug mode.	2021-05-27 21:16:26 +03:00
Wojciech Mitros	725c6aac81	test/perf: close test_env to pass an assert in sstables_manager destructor When destroying an perf_sstable_test_env, an assert in sstables_manager destructor fails, because it hasn't been closed. Fix by removing all references to sstables from perf_sstable_test_env, and then closing the test_env(as well as the sstables_manager) Fixes #8736 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #8737	2021-05-27 17:41:17 +03:00
Michał Chojnowski	5e9f741bb4	repair: remove range_split.hh Dead code since `80ebedd242`. Closes #8698	2021-05-27 17:21:37 +03:00
Avi Kivity	5f8484897b	Merge 'cdc: use a new internal table for exchanging generations' from Kamil Braun Reopening #8286 since the token metadata fix that allows `Everywhere` strategy tables to work with RBO (#8536) has been merged. --- Currently when a node wants to create and broadcast a new CDC generation it performs the following steps: 1. choose the generation's stream IDs and mapping (how this is done is irrelevant for the current discussion) 2. choose the generation's timestamp by taking the current time (according to its local clock) and adding 2 * ring_delay 3. insert the generation's data (mapping and stream IDs) into system_distributed.cdc_generation_descriptions, using the generation's timestamp as the partition key (we call this table the "old internal table" below) 4. insert the generation's timestamp into the "CDC_STREAMS_TIMESTAMP" application state. The timestamp spreads epidemically through the gossip protocol. When nodes see the timestamp, they retrieve the generation data from the old internal table. Unfortunately, due to the schema of the old internal table, where the entire generation data is stored in a single cell, step 3 may fail for sufficiently large generations (there is a size threshold for which step 3 will always fail - retrying the operation won't help). Also the old internal table lies in the system_distributed keyspace that uses SimpleStrategy with replication factor 3, which is also problematic; for example, when nodes restart, they must reach at least 2 out of these 3 specific replicas in order to retrieve the current generation (we write and read the generation data with QUORUM, unless we're a single-node cluster, where we use ONE). Until this happens, a restarting node can't coordinate writes to CDC-enabled tables. It would be better if the node could access the last known generation locally. The commit introduces a new table for broadcasting generation data with the following properties: - it uses a better schema that stores the data in multiple rows, each of manageable size - it resides in a new keyspace that uses EverywhereStrategy so the data will be written to every node in the cluster that has a token in the token ring - the data will be written using CL=ALL and read using CL=ONE; thanks to this, restarting node won't have to communicate with other nodes to retrieve the data of the last known generation. Note that writing with CL=ALL does not reduce availability: creating a new generation requires all nodes to be available anyway, because they must learn about the generation before their clocks go past the generation's timestamp; if they don't, partitions won't be mapped to stream IDs consistently across the cluster - the partition key is no longer the generation's timestamp. Because it was that way in the old internal table, it forced the algorithm to choose the timestamp before the generation data was inserted into the table. What if the inserting took a long time? It increased the chance that nodes would learn about the generation too late (after their clocks moved past its timestamp). With the new schema we will first insert the generation data using a randomly generated UUID as the partition key, then choose the timestamp, then gossip both the timestamp and the UUID. Observe that after a node learns about a generation broadcasted using this new method through gossip it will retrieve its data very quickly since it's one of the replicas and it can use CL=ONE as it was written using CL=ALL. The generation's timestamp and the UUID mentioned in the last point form a "generation identifier" for this new generation. For passing these new identifiers around, we introduce the cdc::generation_id_v2 type. Fixes #7961. --- For optimal review experience it is best to first read the updated design notes (you can read them rendered here: https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md), specifically the ["Generation switching"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#generation-switching) section followed by the ["Internal generation descriptions table V1 and upgrade procedure"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#internal-generation-descriptions-table-v1-and-upgrade-procedure) section, then read the commits in topological order. dtest gating run (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/1160/ unit tests (dev) passed locally Closes #8643 * github.com:scylladb/scylla: docs: update cdc.md with info about the new internal table sys_dist_ks: don't create old CDC generations table on service initialization sys_dist_ks: rename all_tables() to ensured_tables() cdc: when creating new generations, use format v2 if possible main: pass feature_service to cdc::generation_service gms: introduce CDC_GENERATIONS_V2 feature cdc: introduce retrieve_generation_data test: cdc: include new generations table in permissions test sys_dist_ks: increase timeout for create_cdc_desc sys_dist_ks: new table for exchanging CDC generations tree-wide: introduce cdc::generation_id_v2	2021-05-27 17:13:44 +03:00
Avi Kivity	e8e4456ec7	Merge 'Introduce per-service-level workload types and their first use-case - shedding in interactive workloads' from Piotr Sarna This draft extends and obsoletes #8123 by introducing a way of determining the workload type from service level parameters, and then using this context to qualify requests for shedding. The rough idea is that when the admission queue in the CQL server is hit, it might make more sense to start shedding surplus requests instead of accumulating them on the semaphore. The assumption that interactive workloads are more interested in the success rate of as many requests as possible, and hanging on a semaphore reduces the chances for a request to succeed. Thus, it may make sense to shed some requests to reduce the load on this coordinator and let the existing requests to finish. It's a draft, because I only performed local guided tests. #8123 was followed by some experiments on a multinode cluster which I want to rerun first. Closes #8680 * github.com:scylladb/scylla: test: add a case for conflicting workload types cql-pytest: add basic tests for service level workload types docs: describe workload types for service levels sys_dist_ks: fix redundant parsing in get_service_level sys_dist_ks: make get_service_level exception-safe transport: start shedding requests during potential overload client_state: hook workload type from service levels cql3: add listing service level workload type cql3: add persisting service level workload type qos: add workload_type service level parameter	2021-05-27 17:01:56 +03:00
Konstantin Osipov	52f7ff4ee4	raft: (testing) update copyright An incorrect copyright information was copy-pasted from another test file. Message-Id: <20210525183919.1395607-1-kostja@scylladb.com>	2021-05-27 15:47:49 +03:00
Piotr Sarna	99f356d764	test: add a case for conflicting workload types The test case verifies that if several workload types are effective for a single role, the conflict resolution is well defined.	2021-05-27 14:31:36 +02:00
Piotr Sarna	01b7e445f9	cql-pytest: add basic tests for service level workload types The test cases check whether it's possible to declare workload type for a service level and if its input is validated.	2021-05-27 14:31:36 +02:00

1 2 3 4 5 ...

1739 Commits