scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Botond Dénes	b4c3aab4a7	multishard_mutation_query: reset failed readers to inexistent state When attempting to dismantling readers, some of the to-be-dismantled readers might be in a failed state. The code waiting on the reader to stop is expecting failures, however it didn't do anything besides logging the failure and bumping a counter. Code in the lower layers did not know how to deal with a failed reader and would trigger `std::bad_variant_access` when trying to process (save or cleanup) it. To prevent this, reset the state of failed readers to `inexistent_state` so code in the lower layers doesn't attempt to further process them.	2018-12-17 13:18:08 +02:00
Botond Dénes	9cef043841	multishard_mutation_query: handle missing readers when dismantling When dismantling the combined buffer and the compaction state we are no longer guaranteed to have the reader each partition originated from. The reader might have been evicted and not resumed, or resuming it might have failed. In any case we can no longer assume the originating reader of each partition will be present. If a reader no longer exists, discard the partitions that it emitted.	2018-12-17 13:18:08 +02:00
Botond Dénes	438bef333b	multishard_mutation_query: add support for keeping stats for discarded partitions In the next patches we will add code that will have to discard some of the dismantled partitions/fragments/bytes. Prepare the `dismantle_buffer_stats` struct for being able to track the discarded partitions/fragments/bytes in addition to those that were successfully dismantled.	2018-12-17 13:18:08 +02:00
Botond Dénes	ce52436af4	multishard_mutation_query: expect evicted reader state when creating reader Previously readers were created once, so `make_remote_reader()` had a validation to ensure readers were not attempted at being created more than once. This validation was done by checking that the reader-state is either `inexistent` or `successful_lookup`. However with the introduction of pausing shard readers, it is now possible that a reader will have to be created and then re-created several times, however this validation was not updated to expect this. Update the validation so it also expects the reader-state to be `evicted`, the state the reader will be if it was evicted while paused.	2018-12-17 13:18:08 +02:00
Botond Dénes	1effb1995b	multishard_mutation_query: pretty-print the reader state in log messages	2018-12-17 13:18:08 +02:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Botond Dénes	f334d3717f	query_mutations_on_all_shards(): implement pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	6f0e0c4ed7	query_mutations_on_all_shards(): restore indentation The previous patch added half-aligned lines to improve readability of that patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	aa6083a75b	query_mutations_on_all_shards(): simplify the state-machine The `read_context` which handles creating, saving and looking-up the shard readers has to deal with its `destroy_reader()` method called any time, even before some other method finished its work. For example it is valid for a reader to be requested to be destroyed, even before the contexts finishes creating it. This means that state transitions that take time can be interleaved with another state transition request. To deal with this the read context uses `future_` states, states that mark an ongoing state transitions. This allows for state transition request that arrive in the middle of another state transition to be attached as a continuation to the ongoing transition, and to be executed after that finishes. This however resulted in complex code, that has to handle readers being in all sorts of different states, when the `save_readers()` method is called. To avoid all this complexity, exploit the fact that `destroy_reader()` receives a future<> as its argument, which resolves when all previous state transitions have finished. Use a gate to wait on all these futures to resolve. This way we don't need all those transitional states, instead in `save_readers()` we only need to wait on the gate to close. Thus the number of states `save_readers()` has to consider drops drastically. This has the theoretical drawback of the process of saving the readers having to wait on each of the readers to stop, but in practice the process finishes when the last reader is saved anyway, so I don't expect this to result in any slowdown.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Avi Kivity	bceff1550c	tests: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Botond Dénes	d467b518bc	multishard_mutation_query(): don't attempt to stop broken readers Currently, when stopping a reader fails, it simply won't be attempted to be saved, and it will be left in the `_readers` array as-is. This can lead to an assertion failure as the reader state will contain futures that were already waited upon, and that the cleanup code will attempt to wait on again. To prevent this, when stopping a reader fails, reset it to nonexistent state, so that the cleanup code doesn't attempt to do anything with it. Refs: #3830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>	2018-10-09 15:59:50 +03:00
Botond Dénes	7d71b42651	multishard_mutation_query: add some tracing Add tracing for the following events: 1) Dismantling of the combined buffer. 2) Dismantling of the compaction state. 3) Cleaning up the readers. (1) and (2) can possibly have adverse effects on the performance of the query and hence it is important that details about the dismantled fragments is exposed in the tracing data. (3) is less critical but still good to know how much readers were created by the read (in case they aren't saved). Since normally (in strateful queries) this will always be 0 only trace this when it is non-zero (and is interesting).	2018-09-11 08:18:16 +03:00
Botond Dénes	b41be7c8e5	multishard_mutation_query: add comment to `read_context` Explain the purpose of the class and its intended usage and any gotchas the reader/modifier of the code has to keep in mind.	2018-09-11 08:18:16 +03:00
Botond Dénes	b6e1a8f32d	multishard_mutation_query: always cleanup readers properly Currently the reader cleanup code, which ensures the readers and their dependent objects are destroyed in the corect order and a single smp::submit_to() message, are only run when the readers are attempted to be saved. However proper cleanup is needed not only then, but also when the query is not stateful. Rename the current `cleanup()` method to `stop()`, make it public and call it from a `finally()` block after the page is finalized to ensure readers are properly cleaned up at all times. Also make sure that failures in `stop()` are never propagated so that a failure in the cleanup doesn't fail the read itself.	2018-09-11 08:18:16 +03:00
Botond Dénes	b3f1fe14e8	multishard_mutation_query: fix possible deadlock when creating a reader fails Failing to create a reader (`do_make_remote_reader()`) can lead to a deadlock if the reader is in any of the future_*_state states, as the `then()` block is not executed and hence the promise of the first future in the chain is not set. Avoid this by changing the `then()` to a `then_wrapped()` and using `set_exception()` and `set_value()` accordingly, such that the future is resolved on both the happy and error path.	2018-09-10 16:41:13 +03:00
Botond Dénes	253407bdc8	multishard_mutation_query: add badness counters Add badness counters that allow tracking problems. The following counters are added: 1) multishard_query_unpopped_fragments 2) multishard_query_unpopped_bytes 3) multishard_query_failed_reader_stops 4) multishard_query_failed_reader_saves The first pair of counters observe the amount of work range scan queries have to undo on each page. It is normal for these counters to be non-zero, however sudden spikes in their values can indicate problems. This undoing of work is needed for stateful range-scans to work. When stateful queries are enabled the `multishard_combining_reader` is dismantled and all unconsumed fragments in its and any of its intermediate reader's buffers are pushed back into the originating shard reader's buffer (via `unpop_mutation_fragment()`). This also includes the `partition_start`, the `static_row` (if there is one) and all extracted and active `range_tombstone` fragments. This together can amount to a substantial amount of fragments. (1) counts the amount of fragments moved back, while (2) counts the number of bytes. Monitoring size and quantity separately allows for detecting edge cases like moving many small fragments or just a few huge ones. The counters count the fragments/bytes moved back to readers located on the shard they belong to. The second pair of counters are added to detect any problems around saving readers. Since the failure to save a reader will not fail the read itself, it is necessary to add visibility to these failures by other means. (3) counts the number of times stopping a shard reader (waiting on pending read-aheads and next-partitions) failed while (4) counts the number of times inserting the reader into the `querier_cache` failed. Contrary to the first two counters, which will almost certainly never be zero, these latter two counters should always be zero. Any other value indicates problems in the respective shards/nodes.	2018-09-03 10:31:44 +03:00
Botond Dénes	97364c7ad9	database: add query_mutations_on_all_shards() This method allows for querying a range or ranges on all shards of the node. Under the hood it uses the multishard_combining_reader for executing the query. It supports paging and stateful queries (saving and reusing the readers between pages). All this is transparent to the client, who only needs to supply the same query::read_command::query_uuid through the pages of the query (and supply correct start positions on each page, that match the stop position of the last page).	2018-09-03 10:31:44 +03:00

20 Commits