scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Avi Kivity	d2b1d2f695	compaction_manager: reindent postponed_compactions_reevaluation()	2022-12-05 22:02:27 +02:00
Avi Kivity	1669025736	compaction_manager: coroutinize postponed_compactions_reevaluation() So much nicer.	2022-12-05 22:01:41 +02:00
Avi Kivity	d2c44cba77	compaction_manager: make postponed_compactions_reevaluation() return a future postponed_compactions_reevaluation() runs until compaction_manager is stopped, checking if it needs to launch new compactions. Make it return a future instead of stashing its completion somewhere. This makes is easier to convert it to a coroutine.	2022-12-05 21:58:48 +02:00
Botond Dénes	3d620378d4	Merge 'view: coroutinize maybe_mark_view_as_built' from Avi Kivity Simplifying it a little. Closes #12171 * github.com:scylladb/scylladb: view: reindent maybe_mark_view_as_built view: coroutinize maybe_mark_view_as_built	2022-12-05 13:43:34 +02:00
Asias He	c6087cf3a0	repair: Reduce repair reader eviction with diff shard count When repair master and followers have different shard count, the repair followers need to create multi-shard readers. Each multi-shard reader will create one local reader on each shard, N (smp::count) local readers in total. There is a hard limit on the number of readers who can work in parallel. When there are more readers than this limit. The readers will start to evict each other, causing buffers already read from disk to be dropped and recreating of readers, which is not very efficient. To optimize and reduce reader eviction overhead, a global reader permit is introduced which considers the multi-shard reader bloats. With this patch, at any point in time, the number of readers created by repair will not exceed the reader limit. Test Results: 1) with stream sem 10, repair global sem 10, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 1.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:45:24,770] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:45:53,869] Repair session 1 [2022-11-23 17:45:53,869] Repair session 1 finished real 0m30.212s user 0m1.680s sys 0m0.222s 1.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:46:07,507] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:46:30,608] Repair session 1 [2022-11-23 17:46:30,608] Repair session 1 finished real 0m24.241s user 0m1.731s sys 0m0.213s 2) with stream sem 10, repair global sem no_limit, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 2.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:49:49,301] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:01,414] Repair session 1 [2022-11-23 17:52:01,415] Repair session 1 finished real 2m13.227s user 0m1.752s sys 0m0.218s 2.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:52:19,280] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:42,387] Repair session 1 [2022-11-23 17:52:42,387] Repair session 1 finished real 0m24.196s user 0m1.689s sys 0m0.184s Comparing 1.1) and 2.1), it shows the eviction played a major role here. The patch gives 73s / 30s = 2.5X speed up in this setup. Comparing 1.1 and 1.2, it shows even if we limit the readers, starting on the lower shard is faster 30s / 24s = 1.25X (the total number of multishard readers is lower) Fixes #12157 Closes #12158	2022-12-05 10:47:36 +02:00
Botond Dénes	1e20095547	Update tools/java submodule * tools/java 1c06006447...ecab7cf7d6 (1): > Add VSCode files to gitignore	2022-12-05 09:54:51 +02:00
Botond Dénes	c4d72c8dd0	Merge 'cql3: select_statement: split and coroutinize process_results()' from Avi Kivity Split the simple (and common) case from the complex case, and coroutinize the latter. Hopefully this generates better code for the simple case, and it makes the complex case a little nicer. Closes #12194 * github.com:scylladb/scylladb: cql3: select_statement: reindent process_results_complex() cql3: select_statement: coroutinize process_results_complex() cql3: select_statement: split process_results() into fast path and complex path	2022-12-05 08:16:22 +02:00
Avi Kivity	a0a4711b74	snapshot: protect list operations against the lambda coroutine fiasco run_snapshot_list_operation() takes a continuation, so passing it a lambda coroutine without protection is dangerous. Protect the coroutine with coroutine::lambda so it doesn't lost its contents. Fixes #12192. Closes #12193	2022-12-05 08:14:39 +02:00
guy9	cb842b2729	Replacing the Docs top bar message from the LIVE event to the community forum announcement Closes #12189	2022-12-05 08:05:04 +02:00
Avi Kivity	0834bb0365	cql3: select_statement: reindent process_results_complex()	2022-12-04 21:36:17 +02:00
Avi Kivity	a63f98e3fc	cql3: select_statement: coroutinize process_results_complex() Not a huge gain, since it's just a do_with, but still a little better. Note the inner lambda is not a coroutine, so isn't susceptibe to the lambda coroutine fiasco.	2022-12-04 21:34:51 +02:00
Avi Kivity	7f29efa0ad	cql3: select_statement: split process_results() into fast path and complex path This will allow us to coroutinize the complex path without adding an allocation to the fast path.	2022-12-04 21:30:45 +02:00
Avi Kivity	02b66bb31a	Merge 'Mark sstable::<directory accessing methods> private' from Pavel Emelyanov One of the prerequisites to make sstables reside on object-storage is not to let the rest of the code "know" the filesystem path they are located on (because sometimes they will not be on any filesystem path). This patch makes the methods that can reveal this path back private so that later they can be abstracted out. Closes #12182 * github.com:scylladb/scylladb: sstable: Mark some methods private test: Don't get sstable dir when known test: Use move_to_quarantine() helper test: Use sstable::filename() overload without dir name sstables: Reimplement batch directory sync after move table, tests: Make use of move_to_new_dir() default arg sstables: Remove fsync_directory() helper table: Simplify take_snapshot()'s collecting sstables names	2022-12-04 17:45:37 +02:00
Kamil Braun	b551cd254c	test: test_raft_upgrade: fix test_recover_stuck_raft_upgrade flakiness The test enables an error injection inside the Raft upgrade procedure on one of the nodes which will cause the node to throw an exception before entering `synchronize` state. Then it restarts other nodes with Raft enabled, waits until they enter `synchronize` state, puts them in RECOVERY mode, removes the error-injected node and creates a new Raft group 0. As soon as the other nodes enter `synchronize`, the test disabled the error injection (the rest of the test was outside the `async with inject_error(...)` block). There was a small chance that we disabled the error injection before the node reached it. In that case the node also entered `synchronize` and the cluster managed to finish the upgrade procedure. We encountered this during next promotion. Eliminate this possibility by extending the scope of the `async with inject_error(...)` block, so that the RECOVERY mode steps on the other nodes are performed within that block. Closes #12162	2022-12-02 21:26:44 +01:00
Avi Kivity	94f18b5580	test: sstable_conforms_to_mutation_source: use do_with_async() where needed The test clearly needs a thread (it converts a reader to a mutation without waiting), so give it one. Closes #12178	2022-12-02 20:48:37 +01:00
Pavel Emelyanov	084522d9eb	sstable: Mark some methods private There are several class sstable methods that reveal internal directory path to caller. It's not object-storage-friendly. Fortunately, all the callers of those methods had been patched not to work with full paths, so these can be marked private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:15:02 +03:00
Pavel Emelyanov	fb63850f2c	test: Don't get sstable dir when known The sstable_move_test creates sstables in its own temp directories and the requests these dirs' paths back from sstables. Test can come with the paths it has at hand, no need to call sstables for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:13:58 +03:00
Pavel Emelyanov	4c742a658d	test: Use move_to_quarantine() helper Two places in tests move sstable to quarantine subdir by hand. There's the class sstable method that does the same, so use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:13:19 +03:00
Pavel Emelyanov	d6244b7408	test: Use sstable::filename() overload without dir name The dir this place currently uses is the directory where the sstable was created, so dropping this argument would just render the same path. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:12:21 +03:00
Pavel Emelyanov	a702affd4d	sstables: Reimplement batch directory sync after move There's a table::move_sstables_from_staging() method that gets a bunch of sstables and moves them from staging subdit into table's root datadir. Not to flush the root dir for every sstable move, it asks the sstable::move_to_new_dir() not to flush, but collects staging dir names and flushes them and the root dir at the end altothether. In order to make it more friendly to object-storage and to remove one more caller of sstable::get_dir() the delayed_commit_changes struct is introduced. It collects _all_ the affected dir names in unordered_set, then allows flushing them. By default the move_to_new_dir() doesn't receive this object and flushes the directories instantly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:08:47 +03:00
Pavel Emelyanov	1b42d5fce3	table, tests: Make use of move_to_new_dir() default arg The method in question accepts boolean bit whether or not it should sync directories at the end. It's always true but in one case, so there's the default value for it. Make use of it. Anticipating the suggestion to replace bool with bool_class -- next patch will replace it with something else. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:07:16 +03:00
Pavel Emelyanov	339feb4205	sstables: Remove fsync_directory() helper The one effectively wraps existing seastar sync_directory() helper into two io_check-s. It's simpler just to call the latter directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:05:43 +03:00
Pavel Emelyanov	80f5d7393f	table: Simplify take_snapshot()'s collecting sstables names The method in question "snapshots" all sstables it can find, then writes their Datafile names into the manifest file. To get the list of file names it iterates over sstables list again and does silly conversion of full file path to file name with the help of the directory path length. This all can be made much simpler if just collecting component names directly at the time sstable is hardlinked. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:02:37 +03:00
Raphael S. Carvalho	d61b4f9dfb	compaction_manager: Delete compaction_state's move constructor compaction_state shouldn't be moved once emplaced. moving it could theoretically cause task's gate holder to have a dangling pointer to compaction_state's gate, but turns out gate's move ctor will actually fail under this assertion: assert(!_count && "gate reassigned with outstanding requests"); Cannot happen today, but let's make it more future proof. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12167	2022-12-02 20:56:57 +03:00
Tomasz Grabiec	1a6bf2e9ca	Merge 'service/raft: specialized verb for failure detector pinger' from Kamil Braun We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace. Closes #12107 * github.com:scylladb/scylladb: service/raft: specialized verb for failure detector pinger db: system_keyspace: de-staticize `{get,set}_raft_server_id` service/raft: make this node's Raft ID available early in group registry	2022-12-02 13:54:02 +01:00
Pavel Emelyanov	71179ff5ab	distributed_loader: Use coroutine::lambda in sleeping coroutine According to seastar/doc/lambda-coroutine-fiasco.md lambda that co_awaits once loses its capture frame. In distrobuted_loader code there's at least one of that kind. fixes: #12175 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12170	2022-12-02 13:06:33 +02:00
Pavel Emelyanov	1d91914166	sstables: Drop set_generation() method The method became unused since `70e5252a` (table: no longer accept online loading of SSTable files in the main directory) and the whole concept of reshuffling sstables was dropped later by `7351db7c` (Reshape upload files and reshard+reshape at boot). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12165	2022-12-01 22:17:10 +02:00
Avi Kivity	2978052113	view: reindent maybe_mark_view_as_built Several identation levels were harmed during the preparation of this patch.	2022-12-01 22:09:21 +02:00
Avi Kivity	ac2e2f8883	view: coroutinize maybe_mark_view_as_built Somewhat simplifies complicated logic.	2022-12-01 22:04:51 +02:00
Kamil Braun	cbdcc944b5	service/raft: specialized verb for failure detector pinger We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace.	2022-12-01 20:54:18 +01:00
Kamil Braun	02c64becdc	db: system_keyspace: de-staticize `{get,set}_raft_server_id` Part of the anti-globals war.	2022-12-01 20:54:18 +01:00
Kamil Braun	99fe580068	service/raft: make this node's Raft ID available early in group registry Raft ID was loaded or created late in the boot procedure, in `storage_service::join_token_ring`. Create it earlier, as soon as it's possible (when `system_keyspace` is started), pass it to `raft_group_registry::start` and store it inside `raft_group_registry`. We will use this Raft ID stored in group registry in following patches. Also this reduces the number of disk accesses for this node's Raft ID. It's now loaded from disk once, stored in `raft_group_registry`, then obtained from there when needed. This moves `raft_group_registry::start` a bit later in the startup procedure - after `system_keyspace` is started - but it doesn't make a difference.	2022-12-01 20:54:18 +01:00
Nadav Har'El	6fcb5302a6	alternator-test: xfail a flaky test exposing a known bug In a recent commit `757d2a4`, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now usually passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened #12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes #12143 Refs #12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12169	2022-12-01 20:00:46 +02:00
Kamil Braun	3cd035d1b9	test/pylib: scylla_cluster: remove `ScyllaCluster.decommissioned` field The field was not used for anything. We can keep decommissioned server in `stopped` field. In fact it caused us a problem: since recently, we're using `ScyllaCluster.uninstall` to clean-up servers after test suite finishes (previously we were using `ScyllaServer.uninstall` directly). But `ScyllaCluster.uninstall` didn't look into the `decommissioned` field, so if a server got decommissioned, we wouldn't uninstall it, and it left us some unnecessary artifacts even for successful tests. This is now fixed. Closes #12163	2022-12-01 19:07:26 +02:00
Avi Kivity	a4b77a5691	Merge 'Cleanup sstables::test_env's manager usage' from Pavel Emelyanov Mainly this PR removes global db::config and feature service that are used by sstables::test_env as dependencies for embedded sstables_manager. Other than that -- drop unused methods, remove nested test_env-s and relax few cases that use two temp dirs at a time for no gain. Closes #12155 * github.com:scylladb/scylladb: test, utils: Use only one tempdir sstable_compaction_test: Dont create nested envs mutation_reader_test: Remove unused create_sstable() helper tests, lib: Move globals onto sstables::test_env tests: Use sstables::test_env.db_config() to access config features: Mark feature_config_from_db_config const sstable_3_x_test: Use env method to create sst sstable_3_x_test: Indentation fix after previous patch sstable_3_x_test: Use sstable::test_env test: Add config to sstable::test_env creation config: Add constexpr value for default murmur ignore bits	2022-12-01 17:47:25 +02:00
Pavel Emelyanov	4c6bfc078d	code: Use http::re(quest\|ply) instead of httpd:: ones Recent seastar update deprecated those from httpd namespace. fixes: #12142 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12161	2022-12-01 17:33:35 +02:00
Pavel Emelyanov	adc6ee7ea8	test, utils: Use only one tempdir There's a do_with_cloned_tmp_directory that makes two temp dirs to toss sstables between them. Make it go with just one, all the more so it would resemble existing manipulations aroung staging/ subdir Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:57 +03:00
Pavel Emelyanov	15a7b9cafa	sstable_compaction_test: Dont create nested envs The "compact" test case runs in sstables::test_env and additionally wraps it with another instance provided by do_with_tmp_directory helper. It's simpler to create the temp dir by hand and use outter env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:56 +03:00
Pavel Emelyanov	69fe5fd054	mutation_reader_test: Remove unused create_sstable() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:54 +03:00
Pavel Emelyanov	400bc2c11d	tests, lib: Move globals onto sstables::test_env There's a bunch of objects that are used by test_env as sstables_manager dependencies. Now when no other code needs those globals they better sit on the test_env next to the manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:36 +03:00
Pavel Emelyanov	6a294b9ad6	tests: Use sstables::test_env.db_config() to access config Currently some places use global test config, but it's going to be removed soon, so switch to using config from environment Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:30 +03:00
Pavel Emelyanov	b4e31ad359	features: Mark feature_config_from_db_config const It's in fact such. Other than that, next patch will call it with const config at hand and fail to compile without this fix Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:27 +03:00
Pavel Emelyanov	8178845ef3	sstable_3_x_test: Use env method to create sst Just to make it shorter and conform to other sst env tests Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:19 +03:00
Pavel Emelyanov	8d5d05012e	sstable_3_x_test: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:09 +03:00
Pavel Emelyanov	6628d801f2	sstable_3_x_test: Use sstable::test_env There are several cases there that construct sstables_manager by hand with the help of a bunch of global dependencies. It's nicer to use existing wrapper. (indentation left broken until next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:46 +03:00
Pavel Emelyanov	1d8c76164f	test: Add config to sstable::test_env creation To make callers (tests) construct it with different options. In particular, one test will soon want to construct it with custom large data handler of its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:18 +03:00
Pavel Emelyanov	6d0c8fb6e2	config: Add constexpr value for default murmur ignore bits ... and use in some places of sstable_compaction_test. This will allow getting rid of global test_db_config thing later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:15 +03:00
Botond Dénes	dbd00fd3e9	Merge 'Task manager shard repair tasks' from Aleksandra Martyniuk The PR introduces shard_repair_task_impl which represents a repair task that spans over a single shard repair. repair_info is replaced with shard_repair_task_impl, since both serve similar purpose. Closes #12066 * github.com:scylladb/scylladb: repair: reindent repair: replace repair_info with shard_repair_task_impl repair: move repair_info methods to shard_repair_task_impl repair: rename methods of repair_module repair: change type of repair_module::_repairs repair: keep a reference to shard_repair_task_impl in row_level_repair repair: move repair_range method to shard_repair_task_impl repair: make do_repair_ranges a method of shard_repair_task_impl repair: copy repair_info methods to shard_repair_task_impl repair: corutinize shard task creation repair: define run for shard_repair_task_impl repair: add shard_repair_task_impl	2022-12-01 10:04:31 +02:00
Kamil Braun	0f9d0dd86e	Merge 'raft: support IP address change' from Konstantin Osipov This is the core of dynamic IP address support in Raft, moving out the IP address sourcing from Raft Group 0 configuration to gossip. At start of Raft, the raft id <> IP address translation map is tuned into the gossiper notifications and learns IP addresses of Raft hosts from them. The series intentionally doesn't contain the part which speeds up the initial cluster assembly by persisting the translation cache and using more sources besides gossip (discovery, RPC) to show correctness of the approach. Closes #12035 * github.com:scylladb/scylladb: raft: (rpc) do not throw in case of a missing IP address in RPC raft: (address map) actively maintain ip <-> raft server id map	2022-11-30 15:40:18 +01:00
Aleksandra Martyniuk	78a6193c01	repair: reindent	2022-11-30 13:53:52 +01:00

1 2 3 4 5 ...

34101 Commits