scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Alejo Sanchez	52188016af	raft: replication test: create_server in raft_cluster Remove the global create_raft_server() and replace with a create_server() helper in replication_test(). This will allow not requiring the user of raft_cluster to create special objects. Note this does not move(apply) anymore as it's kept in raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:47:02 -04:00
Alejo Sanchez	1edcb6e647	raft: replication test: reset snapshots When stopping a server also delete snapshots and persisted snapshots. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 23:46:11 -04:00
Alejo Sanchez	453f19cf0e	raft: replication test: reset server helper Add a helper to reset a server in raft_cluster. Besides simplifying code and preventing errors, this will help move create_raft_server logic to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	d3b7f21b88	raft: replication test: pause tickers before stopping Pause tickers before stopping servers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	30c9daafd2	raft: replication test: tick helper Move test tick handling to raft_cluster as helper method. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	2e61c507d2	raft: replication test: tickers on raft_cluster Move tickers to raft_cluster helper class. Ticker initialization and pause is done automatically at start_all() and stop_all(). Add temporary helpers to manage specific tickers. These might be removed later once proper node abort and reset are implemented. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	aea77871c4	raft: replication test: cluster tracking leader Track current leader inside helper class. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	ca8e55613e	raft: replication test: elect first leader in raft_cluster Run first leader election inside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	322802308c	raft: replication test: use id 0 for rpc tests raft_cluster at the moment only allows sequential 0 based ids. The code was generating ids over this and causing problems for code changes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	c1a6e81002	raft: replication test: fix partition wait log When partitioning, don't wait_log on servers outside configuration. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:20 -04:00
Alejo Sanchez	6db730c500	raft: replication test: partition helper Add a partition handling helper to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	848c244932	raft: replication test: track in_configuration in raft_cluster Keep track of servers in configuration inside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	16728b8966	raft: replication test: use cluster saved apply function Use apply function saved in cluster at creation time. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	3daed889b8	raft: replication test: change_configuration in raft_cluster Move change_configuration to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	102b8e71bb	raft: replication test: free_election in raft_cluster Move free_election to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	60d4d06861	raft: replication test: wait_log_all in raft_cluster Move wait_log_all to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	d1ba0fe719	raft: replication test: wait_log in raft_cluster Move wait_log to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	3e4871b884	raft: replication test: elect_new_leader in raft_cluster Move elect_new_leader to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	59b9642be5	raft: replication test: elapse_election in raft_cluster Move elapse_election to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	b3e2b54913	raft: replication test: move add_entry up Style. Move definition of add_entry and add_remaining_entries with the rest of raft_cluster definitions. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	8cd2abe72b	raft: replication test: remove spurious check Going forward the leader is always in configuration and up to date. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2d51d1bbc5	raft: replication test: raft_cluster add_entries Move add_entries() to raft_cluster and provide a helper to add remaining entries. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2a1e7a15a6	raft: replication test: calculate first value helper Helper to calculate what's the value number to be added after snapshot and leader initial log. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	e2f425e210	raft: replication test: initial state helper Move initial_state preparation to its own helper function. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	d2c0308a85	raft: replication test: move declarations up Move declarations near the top of the file for following refactors to raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	a3700a6d0a	raft: replication test: move up set_config Move set_config above raft_cluster for a subsequent commit. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	57da05c986	raft: replication test: use disconnect() helper For rpc tests, use raft_cluster::disconnect() instead of the local connected reference. This removes connected object use outside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	54c919b726	raft: replication test: add connectivity helpers Add connectivity helpers disconnect(server, except) and connect_all() to so users of raft_cluster don't need to keep the a connectivity object pointer. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	5e324f3438	raft: replication test: rpc with raft_cluster Use raft_cluster for rpc tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	752d53a909	raft: replication test: use parallel start/stop Start and stop servers in parallel. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	bcf5181697	raft: replication test: cluster class Use raft_cluster class to handle servers. First part of this change. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	5fc0a1251d	raft: replication test: helper uuid to local id Add a helper to convert from UUID to size_t id. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	7e93501d4c	raft: replication test: use optional Instead of tracking with a boolean use an optional for partition leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	ccb85bce02	raft: replication test: wait log on next leader only When there's a defined next leader, only wait for log propagation for this follower. Splits wait_log() to waiting for one follower with wait_log() and waiting for all followers with wait_log(). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	2aa1646e35	raft: replication test: remove wait after adding entries Remove log wait after adding entries. It was added to handle some debug hangs but it is not good for testing. There are already wait logs at proper code locations. (e.g. elect_new_leader, partition) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	0216d0a7b0	raft: replication test: remove unused param elect_new_leader doesn't need to know configuration anymore. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	effcb7c5f6	raft: tests: move conversion helpers to header Move replication test helpers to header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Alejo Sanchez	7327cbd871	raft: replication test: use structs to avoid alias Use structs for test commands. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-01 21:50:19 -04:00
Piotr Dulikowski	b0c22f2e39	repair: trigger repair abort_source only from shard 0 When user requests repair to be forcefully aborted, the `_abort_all_as` abort source could be modified from multiple shards in parallel by the `tracker::abort_all_repairs()` function, which can lead to undefined behavior and to a crash. This commit makes sure that `_abort_all_as` is used only from shard 0 when repair is aborted. Fixes #8693 Closes #8734	2021-05-31 15:57:31 +03:00
Avi Kivity	e96ff3d82d	dist: add new docker building process The new process has the following differences from the Dockerfile based image: - Using buildah commands instead of a Dockerfile. This is more flexible since we don't need to pack everything into a "build context" and transfer it to the container; instead we interact with the container as we build it. - Using packages instead of a remote yum repository. This makes it easy to create an image in one step (no need to create a repository, promote, then download the packages back via yum. It means that the image cannot be upgraded via yum, but container images are usually just replaced with a new version. - Build output is an OCI archive (e.g. a tarball), not a docker image in a local repoistory. This means the build process can later be integrated into ninja, since the artifact is just a file. The file can be uploaded into a repository or made available locally with skopeo. - any build mode is supported, not just release. This can be used for quick(er) testing with dev mode. I plan to integrate it further into the build system, but currently this is blocked on a buildah bug [1]. [1] https://github.com/containers/buildah/issues/3262 Closes #8730	2021-05-31 10:05:22 +03:00
Nadav Har'El	2440569984	secondary index: fix error message which erroneously refered to "map" The value of a frozen collection may only be indexed (using a secondary index) in full - it is not allowed to index only the keys for example - "CREATE INDEX idx ON table (keys(v))" is not allowed. The error message referred to a frozen<map>, but the problem can happen on any frozen collection (e.g., a frozen set), not just a frozen map, so can be confusing to a user who used a frozen set, and getting an error about a frozen map. So this patch fixes the error message to refer to a "frozen collection". Note that the Cassandra error message in this case is different - it reads: "Frozen collections are immutable and must be fully indexed". Fixes #8744. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210529094056.825117-1-nyh@scylladb.com>	2021-05-30 23:23:20 +03:00
Botond Dénes	cd6bbd37a4	utils/utf8.c: move includes outside of namespaces Including in the middle of a namespace is not a good practice. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210528142502.962947-1-bdenes@scylladb.com>	2021-05-30 23:23:20 +03:00
Raphael S. Carvalho	a7cdd846da	compaction: Prevent tons of compaction of fully expired sstable from happening in parallel Compaction manager can start tons of compaction of fully expired sstable in parallel, which may consume a significant amount of resources. This problem is caused by weight being released too early in compaction, after data is all compacted but before table is called to update its state, like replacing sstables and so on. Fully expired sstables aren't actually compacted, so the following can happen: - compaction 1 starts for expired sst A with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 2 starts for expired sst B with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 3 starts for expired sst C with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 1 is done updating table state, so it finally completes and releases all the resources. - compaction 2 is done updating table state, so it finally completes and releases all the resources. - compaction 3 is done updating table state, so it finally completes and releases all the resources. This happens because, with expired sstable, compaction will release weight faster than it will update table state, as there's nothing to be compacted. With my reproducer, it's very easy to reach 50 parallel compactions on a single shard, but that number can be easily worse depending on the amount of sstables with fully expired data, across all tables. This high parallelism can happen only with a couple of tables, if there are many time windows with expired data, as they can be compacted in parallel. Prior to `55a8b6e3c9`, weight was released earlier in compaction, before last sstable was sealed, but right now, there's no need to release weight earlier. Weight can be released in a much simpler way, after the compaction is actually done. So such compactions will be serialized from now on. Fixes #8710. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com> [avi: drop now unneeded storage_service_for_tests]	2021-05-30 23:22:51 +03:00
Benny Halevy	1c0769d789	table: clear: make exception safe It is currently possible that _memtables->add_memtable() will throw after _memtables->clear(), leaving the memtables list completely empty. However, we do rely on always having at least one allocated in the memtables list as active_memtable() references a lw_shared_ptr<memtable> at the back of the memtables vector, and it expected to always be allocated via add_memtable() upon construction and after clear(). This change moves the implementation of this convention to memtable_list::clear() and makes the latter exception safe by first allocating the to-be-added empty memtable and only then clearing the vector. Refs #8749 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>	2021-05-30 13:22:52 +03:00
Avi Kivity	791412b046	test: user_defined_function_test: raise Lua timeout user_defined_function_test fails sporadically in debug mode due to lua timeout. Raise the timeout to avoid the failure, but not so much that the test that expects timout becomes too slow. Fixes #8746. Closes #8747	2021-05-30 13:10:57 +03:00
Piotr Jastrzebski	76d7c761d1	schema: Stop using deprecated constructor This is another boring patch. One of schema constructors has been deprecated for many years now but was used in several places anyway. Usage of this constructor could lead to data corruption when using MX sstables because this constructor does not set schema version. MX reading/writing code depends on schema version. This patch replaces all the places the deprecated constructor is used with schema_builder equivalent. The schema_builder sets the schema version correctly. Fixes #8507 Test: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>	2021-05-30 11:58:27 +03:00
Nadav Har'El	1507bbb35a	cql-pytest: increase default server-side timeouts Sometimes the cql-pytest tests run extremely slowly. This can be a combination of running the debug build (which is naturally slow) and a test machine which is overcommitted, or experiencing some transient swap storm or some similar event. We don't want tests, which we run on a 100% reliable setups, to fail just because they run into timeouts in Scylla when they run very slowly. We already noticed this problem in the past, and increased the CQL client timeout in conftest.py from the default of 10 seconds to 120 seconds - the old default of 10 seconds was not enough for some long operations (such as creating a table with multiple views) when the test ran very slowly. However, this only fixed the client-side timeout. We also have a bunch of server-side timeouts, configured to all sorts of arbitrary (and fairly small) numbers. For example, the server has a "write request timeout" option, which defaults to just 2 seconds. We recently saw this timeout exceeded in a slow run which tried to do a very large write. So this patch configures all the configurable server-side timeouts we have to default to 300 seconds. This should be more than enough for even the slowest runs (famous last words...). This default is not a good idea on real multi-node clusters which are expected to deal with node loss, but this is not the case in cql-pytest. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210529213648.856503-1-nyh@scylladb.com>	2021-05-30 01:20:14 +03:00
Avi Kivity	d23bebf5c2	Merge "Unexport storage service dependencies" from Pavel E " Right now storage service is used as "provider" of another services -- database, feature service and tokens. This set unexports the first pair. This dropps a bunch of calls for global storage service instances from the places that don't really need it. tests: unit(dev), start-stop " * 'br-pupate-storage-service' of https://github.com/xemul/scylla: storage-service: Don't export features api: Get features from proxy storage-service: Don't export database storage-service: Turn some global helpers into methods storage-service: Open-code simple config getters view: Get database from stprage_proxy main: Use local database instance api: Use database from http_ctx	2021-05-29 20:52:47 +03:00
Pavel Emelyanov	598bbfab15	storage-service: Don't export features Now storage service uses the feature service instance internally and doesn't need to provide getter for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-28 18:16:12 +03:00
Pavel Emelyanov	651568318d	api: Get features from proxy The reset_local_schema call needs proxy and feature service to do its job. Right now the features are retrived from global storage service, but they are present on the proxy as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-28 18:15:15 +03:00

1 2 3 4 5 ...

26744 Commits