scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 00:02:37 +00:00

Author	SHA1	Message	Date
Avi Kivity	acf8da2bce	Merge "flat_mutation_reader: keep timeout in permit" from Benny " This series moves the timeout parameter, that is passed to most f_m_r methods, into the reader_permit. This eliminates the need to pass the timeout around, as it's taken from the permit when needed. The permit timeout is updated in certain cases when the permit/reader is paused and retrieved later on for reuse. Following are perf_simple_query results showing ~1% reduction in insns/op and corresponding increase in tps. $ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10 Before: 102500.38 tps ( 75.1 allocs/op, 12.1 tasks/op, 45620 insns/op) After: 103957.53 tps ( 75.1 allocs/op, 12.1 tasks/op, 45372 insns/op) Test: unit(dev) DTest: repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release) materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release) materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release) migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release) " * tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla: flat_mutation_reader: get rid of timeout parameter reader_concurrency_semaphore: use permit timeout for admission reader_concurrency_semaphore: adjust reactivated reader timeout multishard_mutation_query: create_reader: validate saved reader permit repair: row_level: read_mutation_fragment: set reader timeout flat_mutation_reader: maybe_timed_out: use permit timeout test: sstable_datafile_test: add sstable_reader_with_timeout reader_permit: add timeout member	2021-08-25 17:51:10 +03:00
Gleb Natapov	03a266d73b	raft: make read_barrier work on a follower as well as on a leader This patch implements RAFT extension that allows to perform linearisable reads by accessing local state machine. The extension is described in section 6.4 of the PhD. To sum it up to perform a read barrier on a follower it needs to asks a leader the last committed index that it knows about. The leader must make sure that it is still a leader before answering by communicating with a quorum. When follower gets the index back it waits for it to be applied and by that completes read_barrier invocation. The patch adds three new RPC: read_barrier, read_barrier_reply and execute_read_barrier_on_leader. The last one is the one a follower uses to ask a leader about safe index it can read. First two are used by a leader to communicate with a quorum.	2021-08-25 08:57:13 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Nadav Har'El	d598a94b43	Merge: everywhere: mark deferred actions noexcept Merged patch series by By Benny Halevy: Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) * tag 'deferred_action-noexcept-v1' of github.com:bhalevy/scylla: everywhere: make deferred actions noexcept cql3: prepare_context: mark methods noexcept commitlog: segment, segment_manager: mark methods noexcept everywhere: cleanup defer.hh includes	2021-08-23 11:16:17 +03:00
Avi Kivity	6221b90b89	secondary_index_manager: stop including expression.hh Use a forward declaration of cql3::expr::oper_t to reduce the number of translation units depending on expression.hh. Before: $ find build/dev -name '.d' \| xargs cat \| grep -c expression.hh 272 After: $ find build/dev -name '.d' \| xargs cat \| grep -c expression.hh 154 Some translation units adjust their includes to restore access to required headers. Closes #9229	2021-08-22 21:21:46 +03:00
Benny Halevy	4439e5c132	everywhere: cleanup defer.hh includes Get rid of unused includes of seastar/util/{defer,closeable}.hh and add a few that are missing from source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:39 +03:00
Pavel Solodovnikov	f98cb96506	raft: raft_sys_table_storage_test: don't use initializer lists inside loops and coroutines Workaround for Clang bug: https://bugs.llvm.org/show_bug.cgi?id=51515 When compiled on aarch64 with ASAN support and -Og/-Oz/-Os optimization level, `raft_sys_table_storage::do_store_log_entries` crashes during the tests. ASAN incorrectly reports `stack-use-after-return` on `std::vector` list initialization after initial coroutine suspension (initializer list's data pointer starts to point to garbage). The workaround is simple: don't use initializer lists in such case and replace with a series of `emplace_back` calls. Tests: unit(debug, aarch64) Fixes #9178 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210818102038.92509-1-pa.solodovnikov@scylladb.com>	2021-08-18 13:32:55 +03:00
Asias He	eaf4d2afb4	storage_service: Generate view update for load and stream Currently, view will be not updated because the streaming reason is set to streaming::stream_reason::rebuild. On the receiver side, only streaming with the reason streaming::stream_reason::repair will trigger view update. Change the stream reason to repair to trigger view update for load and stream. This makes load_and_stream behaves the same as nodetool refresh. Note: However, this is not very efficient though. Consider RF = 3, sst1, sst2, sst3 from the older cluster. When sst1 is loaded, it streams to 3 replica nodes, if we generate view updates, we will have 3 view updates for this replica (each of the peer nodes finds its peer and writes the view update to peer). After loading sst2 and sst3, we will have 9 view updates in total for a single partition. If we create the view after the load and stream process, we will only have 3 view updates for a single partition. If we create the view after the load and stream process, we will only have 3 view updates for a single partition. Fixes #9205 Closes #9213	2021-08-17 21:44:24 +03:00
Eliran Sinvani	47d3862b63	Service Level Controller: Add a listener API for service level config changes This change adds an api for registering a listener for service_level configuration chanhes. It notifies about removal addition and change of service level. The hidden assumption is that some listeners are going to create and/or manage service level specific resources and this it what guided the time of the call to the subscriber. Addition and change of a service level are called before the actual change takes place, this guaranties that resource creation can take place before the service level or new config starts to be used. The deletion notification is called only after the deletion took place and this guranties that the service level can't be active and the resources created can be safely destroyed.	2021-08-16 11:38:59 +03:00
Asias He	97bb2e47ff	storage_service: Enable Repair Based Note Operations (RBNO) by default for replace We decided to enable repair based node operations by default for replace node operations. To do that, a new option --allowed-repair-based-node-ops is added. It lists the node operations that are allowed to enable repair based node operations. The operations can be bootstrap, replace, removenode, decommission and rebuild. By default, --allowed-repair-based-node-ops is set to contain "replace". Note, the existing option --enable-repair-based-node-ops is still in play. It is the global switch to enable or disable the feature. Examples: - To enable bootstrap and replace node ops: ``` scylla --enable-repair-based-node-ops true --allowed-repair-based-node-ops replace,bootstrap ``` - To disable any repair based node ops: ``` scylla --enable-repair-based-node-ops false ``` Closes #9197	2021-08-15 13:30:46 +03:00
Piotr Sarna	e1be04852b	migration_manager: add migrating user-defined aggregates User-defined aggregate creation and deletion can now be announced.	2021-08-13 11:14:12 +02:00
Piotr Sarna	ad2093539b	pagers: make a lambda mutable in fetch_page The lambda passed to with_thread_if_needed helper function relies on moving its captured parameters, so it's made mutable in order to avoid copying.	2021-08-13 11:13:43 +02:00
Piotr Sarna	260604d053	cql3: wrap handling paging result with with_thread_if_needed One of the pagers did not spawn a Seastar thread even if it was required by its underlying selectors - the behavior is now fixed.	2021-08-13 11:13:43 +02:00
Asias He	ce8fd051c9	storage_service: Fix argument in send_meta_data::do_receive The extra status print is not needed in the log. Fixes the following error: ERROR 2021-08-10 10:54:21,088 [shard 0] storage_service - service/storage_service.cc:3150 @do_receive: failed to log message: fmt='send_meta_data: got error code={}, from node={}, status={}': fmt::v7::format_error (argument not found) Fixes #9183 Closes #9189	2021-08-11 11:35:30 +02:00
Piotr Dulikowski	14b00610b2	storage_proxy: add functions for creating and waiting for hint sync pts Adds functions in storage_proxy which allow to create sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4a35d138f6	Revert "storage_proxy: add functions for syncing with hints queue" This reverts commit `244738b0d5`. This commit removes create_hint_queue_sync_point and check_hint_queue_sync_point functions from storage_proxy, which were used to wait until local hints are sent out to particular nodes. Similar methods will be reintroduced later in this PR, with a completely different implementation.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4604bb21c3	Revert "storage_proxy: implement verbs for hint sync points" This reverts commit `485036ac33`. This commit removes the handlers for HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK verbs. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for new verbs.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	6c5d2fe0bf	Revert "storage_proxy: coordinate waiting for hints to be sent" This reverts commit `46075af7c4`. This commit removes the logic responsible for waiting for other nodes to replay their hints. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for coordinating multiple nodes.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	afb4c85662	Revert "storage_proxy: stop waiting for hints replay when node goes down" This reverts commit `22e06ace2c`. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so we are removing all infrastructure related to coordinating hint waiting - therefore this commit needs to be reverted.	2021-08-09 09:06:23 +02:00
Piotr Dulikowski	035da96161	Revert "storage_proxy: add abort_source to wait_for_hints_to_be_replayed" This reverts commit `958a13577c`. The `wait_for_hints_to_be_replayed` function is going to be completely removed in this PR, so this commit needs to be reverted, too.	2021-08-09 09:06:23 +02:00
Tomasz Grabiec	8fe06ad681	storage_proxy: Fix result reconciliation for memory-limitter induced short reads This applies to the case when pages are broken by replicas based on memory limits (not row or partition limits). If replicas stop pages in the following places: replica1 = { row 1, <end-of-page> row 2 } replica2 = { row 3 } The coordinator will reconcile the first page as: { row 1, row 3 } and row 2 will not be emitted at all in the following pages. The coordinator should notice that replica1 returned a short read and ignore everything past row 1 from other replicas, but it doesn't. There is a logic to do this trimming, but it is done in got_incomplete_information_across_partitions() which is executed only for the partition for which row limits were exhausted. Fix by running the logic unconditionally. Fixes #9119 Tests: - unit (dev) - manual (2 node cluster, manual reproducer) Message-Id: <20210802231539.156350-1-tgrabiec@scylladb.com>	2021-08-05 11:28:52 +03:00
Asias He	9903eecc0f	storage_service: Close reader in load_and_stream We forgot to call the reader.close() for the reader when the close api is introduced. Fixes #9146 Closes #9148	2021-08-05 09:27:19 +03:00
Tomasz Grabiec	cd56a4ec09	service: query_pagers: Reuse query_uuid across pages when paging locally Query pager was reusing query_uuid only when it had no local state (no _last_pkey), so querier cache was not used when paging locally. This bug affects performance of aggregate queries like count(*). Fixes #9127 Message-Id: <20210803003941.175099-1-tgrabiec@scylladb.com>	2021-08-03 22:52:05 +03:00
Nadav Har'El	6c27000b98	Merge 'Propagate exceptions without throwing' from Piotr Sarna NOTE: this series depends on a Seastar submodule update, currently queued in next: 0ed35c6af052ab291a69af98b5c13e023470cba3 In order to avoid needless throwing, exceptions are passed directly wherever possible. Two mechanisms which help with that are: 1. `make_exception_future<>` for futures 2. `co_return coroutine::exception(...)` for coroutines which return `future<T>` (the mechanism does not work for `future<>` without parameters, unfortunately) Tests: unit(release) Closes #9079 * github.com:scylladb/scylla: system_keyspace: pass exceptions without throwing sstables: pass exceptions without throwing storage_proxy: pass exceptions without throwing multishard_mutation_query: pass exceptions without throwing client_state: pass exceptions without throwing flat_mutation_reader: pass exceptions without throwing table: pass exceptions without throwing commitlog: pass exceptions without throwing compaction: pass exceptions without throwing database: pass exceptions without throwing	2021-08-01 16:47:47 +03:00
Pavel Emelyanov	f9132b582b	storage_service: Make it local There are 3 places that can now declare local instance: - main - cql_test_env - boost gossiper test The global pointer is saved in debug namespace for debugging. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Pavel Emelyanov	055025eaa9	storage_service: Remove (de)?init_storage_service() One of them just re-wraps arguments in std::ref and calls for global storage service. The other one is dead code which also calls the global s._s. Remove both and fix the only caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Pavel Emelyanov	2ffbe894b9	storage_service: Use container() in run_with(out)_api_lock Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Pavel Emelyanov	cd44a808be	storage_service: Unmark update_topology static And use container() to reshard to shard 0. This removes one more call for global storage service instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Pavel Emelyanov	39db19191f	storage_service: Capture this when appropriate Some storage_service methods call for global storage service instance while they can enjoy "this" pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Piotr Sarna	4de751c8c8	storage_proxy: pass exceptions without throwing In order to avoid needless throwing, exceptions are passed directly wherever possible. Two mechanisms which help with that are: 1. make_exception_future<> for futures 2. co_return coroutine::exception(...) for coroutines which return future<T> (the mechanism does not work for future<> without parameters, unfortunately)	2021-07-26 17:05:15 +02:00
Piotr Sarna	101eb26171	client_state: pass exceptions without throwing In order to avoid needless throwing, exceptions are passed directly wherever possible. Two mechanisms which help with that are: 1. make_exception_future<> for futures 2. co_return coroutine::exception(...) for coroutines which return future<T> (the mechanism does not work for future<> without parameters, unfortunately)	2021-07-26 17:04:28 +02:00
Pavel Emelyanov	11a2709f10	storage_service: Replace globals with locals The node-ops verb handler is the lambda of storage-service and it can stop using global storage service instance for no extra charge. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-26 14:21:30 +03:00
Pavel Emelyanov	6e56671d9e	storage_service: Remove one extra hop of node-ops handler It's now clear that the verb handler goes to some "random" shard, then immediatelly switches to shard-0 and then does the handling. Avoid the extra hop and go to shard-0 right at once. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-26 14:21:30 +03:00
Pavel Emelyanov	b6315d3af7	storage_service: Fix indentation after previous patch And, while at it, s/ss/this/g and drop the ss variable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-26 14:21:30 +03:00
Pavel Emelyanov	f5fad311cf	storage_service: Move cross-shard hop up the stack The storage_service::node_ops_cmd_handler runs inside a huge invoke_on(0, ...) lambda. Make it be called on shard-0. This is the preparation for next two patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-26 14:21:30 +03:00
Pavel Emelyanov	a09586a237	repair, storage_service: Move nodeops reg/unreg to storage service The storage service is the verb sender, so it must be the verb registrator. Another goal of this patch is to allow removal of repair -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-26 14:21:21 +03:00
Pavel Solodovnikov	bcbcc18aa1	raft: raft_sys_table_storage: fix broken `load_snapshot` and `load_term_and_vote` Loading snapshot id and term + vote involve selecting static fields from the "system.raft" table, constrained by a given group id. The code incorrectly assumes that, for example, `SELECT snapshot_id FROM raft WHERE group_id=?` in `load_snapshot` always returns only one row. This is not true, since this will return a row for each (pk, ck) combination, which is (group_id, index) for "system.raft" table. The same applies for the `load_term_and_vote`, which selects static `vote_term` and `vote` from "system.raft". This results in a crash at node startup when there is a non-empty raft log containing more than one entry for a given `group_id`. Restrict the selection to always return one row by applying `LIMIT 1` clause. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210723183232.742083-1-pa.solodovnikov@scylladb.com>	2021-07-25 02:01:34 +02:00
Tomasz Grabiec	b044db863f	Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data. This PR was created by @jul-stas and @StarostaGit Closes #8961 * github.com:scylladb/scylla: test/boost: run_mutation_source_tests on streaming virtual table system_keyspace: Introduce describe_ring table as virtual_table storage_service: Pass the reference down to system_keyspace endpoint_details: store `_host` as `gms::inet_address` queue_reader: implement next_partition() virtual_tables: Introduce streaming_virtual_table flat_mutation_reader: Add a new filtering reader factory method	2021-07-23 18:05:51 +02:00
Avi Kivity	aaf35b5ac2	Merge "Remove storage-service from transport (and a bit more)" from Pavel E " The cql-server -> storage-service dependency comes from the server's event_notifier which (un)subscribes on the lifecycle events that come from the storage service. To break this link the same trick as with migration manager notifications is used -- the notification engine is split out of the storage service and then is pushed directly into both -- the listeners (to (un)subscribe) and the storage service (to notify). tests: unit(dev), dtest(simple_boot_shutdown, dev) manual({ start/stop, with/without started transport, nodetool enable-/disablebinary } in various combinations, dev) " * 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla: transport.controller: Brushup cql_server declarations code: Remove storage-service header from irrelevant places storage_service: Remove (unlifecycle) subscribe methods transport: Use local notifier to (un)subscribe server transport: Keep lifecycle notifier sharded reference main: Use local lifecycle notifier to (un)subscribe listeners main, tests: Push notifier through storage service storage_service: Move notification core into dedicated class storage_service: Split lifecycle notification code transport, generic_server: Remove no longer used functionality transport: (Un)Subscribe cql_server::event_notifier from controller tests: Remove storage service from manual gossiper test	2021-07-22 19:27:45 +03:00
Pavel Emelyanov	c39f04fa6f	code: Remove storage-service header from irrelevant places Some .cc files over the code include the storage service for no real need. Drop the header and include (in some) what's really needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:19 +03:00
Pavel Emelyanov	e711bfbb7e	storage_service: Remove (unlifecycle) subscribe methods All the listeners now use main-local notifier instance directly and these methods become unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:49:35 +03:00
Pavel Emelyanov	8248bc9e33	main, tests: Push notifier through storage service Now it's time to move the lifecycle notifier from storage service to the main's scope. Next patches will remove the $lifecycle-subscriber -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:45:51 +03:00
Pavel Emelyanov	6b3b01d9a6	storage_service: Move notification core into dedicated class Introduce the endpoint_lifecycle_notifier class that's in charge of keeping track of subscribers and notifying them. The subscribers will thus be able to set and unset their subscription without the need to mess with storage service at all. The storage_service for now keeps the notifier on board, but this is going to change in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:44:02 +03:00
Pavel Emelyanov	7e8a032013	storage_service: Split lifecycle notification code This prepares the ground for moving the notification engine into own class like it was done for migration_notifier some time ago. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:43:14 +03:00
Benny Halevy	c5e08eb6e7	main: add deferred stop of batchlog_manager Stop the batchlog manager using a deferred action in main to make sure it is stopped after its start() method has been called, also if we bail out of main early due to exception. Change the bm.stop() calls in storage_service to just stop the replay loop using drain(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 20:24:11 +03:00
Juliusz Stasiewicz	a8b741efe2	endpoint_details: store `_host` as `gms::inet_address` In an upcoming commit I will add "system.describe_ring" table which uses endpoint's inet address as a part of CK and, therefore, needs to keep them sorted with `inet_addr_type::less`.	2021-07-20 14:00:54 +02:00
Piotr Sarna	38afef71b9	Merge 'Service Level Controller: Stop polling distributed data.. ... when decommissioned (reworked)' from Eliran Sinvani This is a rework of #8916 The polling loop of the service level controller queries a distributed table in order to detect configuration changes. If a node gets decommissioned, this loop continues to run until shutdown, if a node stays in the decommissioned mode without being shut down, the loop will fail to query the table and this will result in warnings and eventually errors in the log. This is not really harmful but it adds unnecessary noise to the log. The series below lays the infrastructure for observing storage service state changes, which eventually being used to break the loop upon preparation for decommissioning. Tests: Unit test (dev) Failing tests in jenkins. Fixes #8836 The previous merge (possibly due to conflict resolution) contained a misplaced get that caused an abort on shutdown. Closes #9035 * github.com:scylladb/scylla: Service Level Controller: Stop configuration polling loop upon leaving the cluster main: Stop using get_local_storage_service in main	2021-07-19 10:52:42 +02:00
Benny Halevy	a44c06d776	storage_proxy: query: log also errors If log trace level is enabled, log also error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210712070509.24102-1-bhalevy@scylladb.com>	2021-07-16 16:12:05 +02:00
Botond Dénes	999169e535	database: make_streaming_reader(): require permit As a preparation for up-front admission, add a permit parameter to `make_streaming_reader()`, which will be the admitted permit once we switch to up-front admission. For now it has to be a non-admitted permit. A nice side-effect of this patch is that now permits will have a use-case specific description, instead of the generic "streaming" one.	2021-07-14 16:48:43 +03:00
Eliran Sinvani	ccdef39d21	Service Level Controller: Stop configuration polling loop upon leaving the cluster This change subscribes service_level_controller for nodes life cycle notifications and uses the notification of leaving the cluster for the current node to stop the configuration polling loop. If the loop continues to run it's queries will fail consistently since the nodes will not answers to queries. It is worth mentioning that the queries failing in the current state of code is harmles but noisy since after 90 seconsd, if the scylla process is not shut down the failures will start to generate failure logs every 90 seconds which is confusing for users. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2021-07-14 09:31:40 +03:00

1 2 3 4 5 ...

2305 Commits