scylladb

Author	SHA1	Message	Date
Wojciech Mitros	813fef44d3	exceptions: make view update timeouts inherit from timed_out_error Currently, when generating and propagating view updates, if we notice that we've already exceeded the time limit, we throw an exception inheriting from `request_timeout_exception`, to later catch and log it when finishing request handling. However, when catching, we only check timeouts by matching the `timed_out_error` exception, so the exception thrown in the view update code is not registered as a timeout exception, but an unknown one. This can cause tests which were based on the log output to start failing, as in the past we were noticing the timeout at the end of the request handling and using the `timed_out_error` to keep processing it and now, even though we do notice the timeout even earlier, due to it's type we log an error to the log, instead of treating it as a regular timeout. In this patch we make the error thrown on timeout during view updates inherit from `timed_out_error` instead of the `request_timeout_exception` (it is also moved from the "exceptions" directory, where we define exceptions returned to the user). Aside from helping with the issue described above, we also improve our metrics, as the `request_timeout_exception` is also not checked for in the `is_timeout_exception` method, and because we're using it to check whether we should update write timeout metrics, they will only start getting updated after this patch. Fixes #19261 (cherry picked from commit `4aa7ada771`) Closes scylladb/scylladb#19262	2024-06-13 12:01:12 +03:00
Piotr Dulikowski	68eca3778c	Merge 'mv: throttle view update generation for large queries' from Wojciech Mitros This series is a reupload of #13792 with a few modifications, namely a test is added and the conflicts with recent tablet related changes are fixed. See https://github.com/scylladb/scylladb/issues/12379 and https://github.com/scylladb/scylladb/pull/13583 for a detailed description of the problem and discussions. This PR aims to extend the existing throttling mechanism to work with requests that internally generate a large amount of view updates, as suggested by @nyh. The existing mechanism works in the following way: * Client sends a request, we generate the view updates corresponding to the request and spawn background tasks which will send these updates to remote nodes * Each background task consumes some units from the `view_update_concurrency_semaphore`, but doesn't wait for these units, it's just for tracking * We keep track of the percent of consumed units on each node, this is called `view update backlog`. * Before sending a response to the client we sleep for a short amount of time. The amount of time to sleep for is based on the fullness of this `view update backlog`. For a well behaved client with limited concurrency this will limit the amount of incoming requests to a manageable level. This mechanism doesn't handle large DELETE queries. Deleting a partition is fast for the base table, but it requires us to generate a view update for every single deleted row. The number of deleted rows per single client request can be in the millions. Delaying response to the request doesn't help when a single request can generate millions of updates. To deal with this we could treat the view update generator just like any other client and force it to wait a bit of time before sending the next batch of updates. The amount of time to wait for is calculated just like in the existing throttling code, it's based on the fullness of `view update backlogs`. The new algorithm of view update generation looks something like this: ```c++ for(;;) { auto updates = generate_updates_batch_with_max_100_rows(); co_await seastar::sleep(calculate_sleep_time_from_backlogs()); spawn_background_tasks_for_updates(updates); } ``` Fixes: https://github.com/scylladb/scylladb/issues/12379 Closes scylladb/scylladb#16819 * github.com:scylladb/scylladb: test: add test for bad_allocs during large mv queries mv: throttle view update generation for large queries exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception db/view: extract view throttling delay calculation to a global function view_update_generator: add get_storage_proxy() storage_proxy: make view backlog getters public	2024-05-16 08:22:54 +02:00
Wojciech Mitros	485eb7a64c	test: add test for bad_allocs during large mv queries This patch adds a test for reproducing issue #12379, which is being fixed in #16819. The test case works by creating a table with a materialized view, and then performing a partition delete query on it. At the same time, it uses injections to limit the memory to a level lower than usual, in order to increase the consistency of the test, and to limit its runtime. Before #16819, the test would exceed the limit and fail, and now the next allocation is throttled using a sleep.	2024-05-13 18:16:39 +02:00
Jan Ciolek	e0442d7bfa	mv: throttle view update generation for large queries For every mutation applied to the base table we have to generate the corresponding materialized view table updates. In case of simple requests, like INSERT or UPDATE, the number of view updates generated per base table mutation is limited to at most a few view table updates per base table update. The situation is different for DELETE queries, which can delete the whole partitions or clustering ranges. Range deletions are fast on the base table, but for the view table the situation is different. Deleting a single partition in the base table will generate as many singular view updates as there are rows in the deleted partition, which could potentially be in the millions. To prevent OOM view updates are generated in batches of at most 100 rows. There is a loop which generates the next batch of updates, spawns tasks to send them to remote nodes, generates another batch and so on. The problem is that there is no concurrency control - each batch is scheduled to be sent in the background, but the following batch is generated without waiting for the previously generated updates to be sent. This can lead to unbounded concurrency and OOM. To protect against this view update generation should be limited somehow. There is an existing mechanism for limiting view updates - throttling. We keep track of how many pending view updates there are, in the view backlog, and delay responses to the client based on this backlog's fullness. For a well behaved client with limited concurrency this will slow down the amount of incoming requests until it reaches an optimal point. This works for simple queries (INSERT, UPDATE, ...), but it doesn't do anything for range DELETEs. A DELETE is a single request that generates millions of view updates, delaying client response doesn't help. The throttling mechanism could be extend to cover this case - we could treat the DELETE request like any other client and force it to wait before sending more updates. This commit implements this approach - before sending the next batch of updates the generator is forced to sleep for a bit of time, calculated using the exisiting throttling equation. The more full the backlog gets the more the generator will have to sleep for, and hopefully this will prevent overloading the system with view updates. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:16:23 +02:00
Pavel Emelyanov	d47053266b	view: Abort pending view updates when draining When view builder is drained (it now happens very early, but next patch moves this into regular drain) it waits for all on-going view build steps to complete. This includes waiting for any outstanding proxy view writes to complete as well. View writes in proxy have very high timeout of 5 minutes but they are cancellable. However, canecelling of such writes happens in proxy's drain_on_shutdown() call which, in turn, happens pretty late on shutdown. Effectively, by the time it happens all view writes mush have completed already, so stop-time cancelling doesn't really work nowadays. Next patch makes view builder drain happen a bit later during shutdown, namely -- _after_ shutting down messaging service. When it happen that late, non-working view writes cancellation becomes critical, as view builder drain hangs for aforementioned 5 minutes. This patch explicitly cancels all view writes when view builder stops. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 08:16:12 +03:00
Pavel Emelyanov	2ee7c41139	view: Get concurrency semaphore via database, not table The _view_update_concurrency_sem field on database propagates itself via keyspace config down to table config and view_update_generator then grabs one via table:: helper. That's an overkil, view_update_generator has direct reference on the database and can get this semaphore from there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:25:57 +03:00
Pavel Emelyanov	bc4552740f	view: Move view_update_generator methods' code Now when the two methods belong to another class, move the code itself to db/view , where the class itself resides. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:24:20 +03:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Nadav Har'El	1bcaeb89c7	view: revert cleanup filter that doesn't work with tablets This patch reverts commit `10f8f13b90` from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes #16598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:24:44 +02:00
Nadav Har'El	0fe40f729e	mv: sleep a bit before view-update-generator restart The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue #16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs #16598. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:13:52 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Pavel Emelyanov	3471f30b58	view_update_generator: Unplug from database later Patch `967ebacaa4` (view_update_generator: Move abort kicking to do_abort()) moved unplugging v.u.g from database from .stop() to .do_abort(). The latter call happens very early on stop -- once scylla receives SIGINT. However, database may still need v.u.g. plugged to flush views. This patch moves unplug to later, namely to .stop() method of v.u.g. which happens after database is drained and should no longer continue view updates. fixes: #16001 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16091	2023-11-20 11:47:55 +02:00
Pavel Emelyanov	becd960ae8	view_update_generator: Add logging to do_abort() Just tell the logs that the guy is aborting refs: #10941 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:34:21 +03:00
Pavel Emelyanov	967ebacaa4	view_update_generator: Move abort kicking to do_abort() When v.u.g. stops is first aborts the generation background fiber by requesting abort on the internal abort source and signalling the fiber in case it's waiting. Right now v.u.g.::stop() is defer-scheduled last in main(), so this move doesn't change much -- when stop_signal fires, it will kick the v.u.g.::do_abort() just a bit earlier, there's nothing that would happen after it before real ::stop() is called that depends on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Pavel Emelyanov	e34220ebb7	view_update_generator: Add early abort subscription Subscribe v.u.g. to the main's stop_signal. For now a no-op callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Aleksandra Martyniuk	cdbfa0b2f5	replica: iterate safely over tables related maps Loops over _column_families and _ks_cf_to_uuid which may preempt are protected by reader mode of rwlock so that iterators won't get invalid.	2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk	52afd9d42d	replica: wrap column families related maps into tables_metadata As a preparation for ensuring access safety for column families related maps, add tables_metadata, access to members of which would be protected by rwlock.	2023-07-25 16:13:00 +02:00
Raphael S. Carvalho	1ff8645eaa	view_update_generator: Dump throughput and duration for view update from staging Very helpful for user to understand how fast view update generation is processing the staging sstables. Today, logs are completely silent on that. It's not uncommon for operators to peek into staging dir and deduce the throughput based on removal of files, which is terrible. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:23 -03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Marcin Maliszkiewicz	99f8d7dcbe	db: view: use deferred_close for closing staging_sstable_reader When consume_in_thread throws the reader should still be closed. Related https://github.com/scylladb/scylla-enterprise/issues/2661 Closes #13398 Refs: scylladb/scylla-enterprise#2661 Fixes: #13413	2023-04-03 09:02:55 +03:00
Pavel Emelyanov	cc262d814b	view: Drop global storage_proxy usage from mutate_MV() Now the mutate_MV is the method of v.u.generator which has reference to the sharded<storage_proxy>. Few helper static wrappers are patched to get the needed proxy or database reference from the mutate_MV call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 18:48:14 +03:00
Pavel Emelyanov	2652dffd89	view: Capture v.u.generator on view_updating_consumer lambda The consumer is in fact pushing the updates and _that_'s the component that would really need the view_update_generator at hand. The consumer is created from the generator itself so no troubles getting the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:10:55 +03:00
Pavel Emelyanov	d5557ef0e2	view: Plug view update generator to database The database is low-level service and currently view update generator implicitly depend on it via storage proxy. However, database does need to push view updates with the help of mutate_MV helper, thus adding the dependency loop. This patch exploits the fact that view updates start being pushed late enough, by that time all other service, including proxy and view update generator, seem to be up and running. This allows a "weak dependency" from database to view update generator, like there's one from database to system keyspace already. So in this patch the v.u.g. puts the shared-from-this pointer onto the database at the time it starts. On stop it removes this pointer after database is drained and (hopefully) all view updates are pushed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:09:49 +03:00
Pavel Emelyanov	3fd12d6a0e	view: Add view_update_generator -> sharded<storage_proxy> dependency The generator will be responsible for spreading view updates with the help of mutate_MV helper. The latter needs storage proxy to operate, so the generator gets this dependency in advance. There's no need to change start/stop order at the moment, generator already starts after and stops before proxy. Also, services that have generator as dependency are not required by proxy (even indirectly) so no circular dependency is produced at this point. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:08:47 +03:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Pavel Emelyanov	f51762c72a	headers: Refine view_update_generator.hh and around The initial intent was to reduce the fanout of shared_sstable.hh through v.u.g.hh -> cql_test_env.hh chain, but it also resulted in some shots around v.u.g.hh -> database.hh inclusion. By and large: - v.u.g.hh doesn't need database.hh - cql_test_env.hh doesn't need v.u.g.hh (and thus -- the shared_sstable.hh) but needs database.hh instead - few other .cc files need v.u.g.hh directly as they pulled it via cql_test_env.hh before - add forward declarations in few other places Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12952	2023-02-22 09:32:30 +02:00
Benny Halevy	10f8f13b90	db: view_update_generator: always clean up staging sstables Since they are currently not cleaned up by cleanup compaction filter their tokens, processing only tokens owned by the current node (based on the keyspace replication strategy). Refs #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:38:22 +02:00
Raphael S. Carvalho	ec79ac46c9	db/view: Add visibility to view updating of Staging SSTables Today, we're completely blind about the progress of view updating on Staging files. We don't know how long it will take, nor how much progress we've made. This patch adds visibility with a new metric that will inform the number of bytes to be processed from Staging files. Before any work is done, the metric tell us the total size to be processed. As view updating progresses, the metric value is expected to decrease, unless work is being produced faster than we can consume them. We're piggybacking on sstables::read_monitor, which allows the progress metric to be updated whenever the SSTable reader makes progress. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11751	2022-10-12 16:57:37 +03:00
Benny Halevy	81fa1ce9a1	Revert 'Compact staging sstables' This patch reverts the following patches merged in `78750c2e1a` "Merge 'Compact staging sstables' from Benny Halevy" > `597e415c38` "table: clone staging sstables into table dir" > `ce5bd505dc` "view_update_generator: discover_staging_sstables: reindent" > `59874b2837` "table: add get_staging_sstables" > `7536dd7f00` "distributed_loader: populate table directory first" The feature causes regressions seen with e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/41/testReport/materialized_views_test/TestMaterializedViews/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split011___test_base_replica_repair/ ``` AssertionError: Expected [[0, 0, 'a', 3.0]] from SELECT * FROM t_by_v WHERE v = 0, but got [] ``` Where views aren't updated properly. Apparently since `table::stream_view_replica_updates` doesn't exclude the staging sstables anymore and since they are cloned to the base table as new sstables it seems to the view builder that no view updates are required since there's no changes comparing to the base table. Reopens #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10890	2022-06-27 12:18:48 +03:00
Benny Halevy	597e415c38	table: clone staging sstables into table dir clone staging sstables so their content may be compacted while views are built. When done, the hard-linked copy in the staging subdirectory will be simply unlinked. Fixes #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-23 16:55:27 +03:00
Benny Halevy	ce5bd505dc	view_update_generator: discover_staging_sstables: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-23 16:55:27 +03:00
Benny Halevy	59874b2837	table: add get_staging_sstables We don't have to go over all sstables in the table to select the staging sstables out of them, we can get it directly from the _sstables_staging map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-23 16:55:27 +03:00
Benny Halevy	b8b14d76b3	view_update_generator: discover_staging_sstables: get shared table ptr earlier It's potentially a bit more efficient since t.get_sstables is called only once, while t.shared_from_this() is called per staging sstable. Also, prepare for the following patches that modify this function further. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-23 16:55:27 +03:00
Raphael S. Carvalho	aa667e590e	sstable_set: Fix partitioned_sstable_set constructor The sstable set param isn't being used anywhere, and it's also buggy as sstable run list isn't being updated accordingly. so it could happen that set contains sstables but run list is empty, introducing inconsistency. we're fortunate that the bug wasn't activated as it would've been a hard one to catch. found this while auditting the code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220617203438.74336-1-raphaelsc@scylladb.com>	2022-06-21 11:58:13 +03:00
Botond Dénes	d0ea895671	readers: move multishard reader & friends to reader/multishard.cc Since the multishard reader family weighs more than 1K SLOC, it gets its own .cc file.	2022-03-30 15:42:51 +03:00
Botond Dénes	05c48ee0cc	db/view/view_updating_consumer: migrate to v2 Not a completely mechanical transition. The consumer has to generate its mutation via a mutation_rebuilder_v2 as mutation fragment v2 cannot be applied to mutations directly yet.	2022-02-21 12:29:24 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Michael Livshin	91d38ef2a9	view_update_generator: remove unneeded call to downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-01-11 10:49:26 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Pavel Emelyanov	0de69136d4	view_update_generator: Register staging sstables in constructor First, it's to fix the discarded future during the register. The future is not actually such, as it's always the no-op ready one as at that stage the view_update_generator is neither aborted nor is in throttling state. Second, this change is to keep database start-up code in main shorter and cleaner. Registering staging sstables belongs to the view_update_generator start code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-15 17:49:06 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Benny Halevy	4439e5c132	everywhere: cleanup defer.hh includes Get rid of unused includes of seastar/util/{defer,closeable}.hh and add a few that are missing from source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:39 +03:00
Botond Dénes	1b7eea0f52	reader_concurrency_semaphore: admission: flip the switch This patch flips two "switches": 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) by now all permits are obtained up-front, so this patch just yanks out the restricted reader from all reader stacks and simultaneously switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By doing this admission is now waited on when creating the permit. (2) we switch to an admission algorithm that adds a new aspect to the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). We flip these two switches at the same time because up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. Another change made to accommodate this switch is the replacement of the replica side read execution stages which the reader concurrency semaphore as an execution stage. This replacement is needed because with the introduction of up-front admission, reads are not independent of each other any-more. One read executed can influence whether later reads executed will be admitted or not, and execution stages require independent operations to work well. By moving the execution stage into the semaphore, we have an execution stage which is in control of both admission and running the operations in batches, avoiding the bad interaction between the two.	2021-07-14 17:19:02 +03:00
Botond Dénes	f28b5018f2	view/view_update_generator: use obtain_reader_permit()	2021-07-14 16:48:43 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Benny Halevy	02d74e1530	view_update_generator: start: close staging_sstable_reader when done The staging_sstable_reader has to be closed before it's destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Avi Kivity	5f4bf18387	Revert "Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros" This reverts commit `31909515b3`, reversing changes made to `ef97adc72a`. It shows many serious regressions in dtest. Fixes #8197.	2021-03-02 13:21:22 +02:00

1 2

78 Commits