scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Amnon Heiman	ea8d52b11c	row_locking: change estimated histogram with time_estimated_histogram This patch changes the row locking latencies to use time_estimated_histogram. The change consist of changing the histogram definition and changing how values are inserted to the histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Benny Halevy	d4615f4293	sstables: sstable_version_types: implement operator<=> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200707061715.578604-1-bhalevy@scylladb.com>	2020-07-08 14:23:11 +03:00
Avi Kivity	b0698dfb38	Merge 'Rewrite CQL3 restriction representation' from dekimir " This is the first stage of replacing the existing restrictions code with a new representation. It adds a new class `expression` to replace the existing class `restriction`. Lots of the old code is deleted, though not all -- that will come in subsequent stages. Tests: unit (dev, debug restrictions_test), dtest (next-gating) " * dekimir-restrictions-rewrite: cql3/restrictions: Drop dead code cql3/restrictions: Use free functions instead of methods cql3/restrictions: Create expression objects cql3/restrictions: Add free functions over new classes cql3/restrictions: Add new representation	2020-07-08 10:22:17 +03:00
Dejan Mircevski	37ebe521e3	cql3/restrictions: Use free functions instead of methods Instead of `restriction` class methods, use the new free functions. Specific replacement actions are listed below. Note that class `restrictions` (plural) remains intact -- both its methods and its type hierarchy remain intact for now. Ensure full test coverage of the replacement code with new file test/boost/restrictions_test.cc and some extra testcases in test/cql/*. Drop some existing tests because they codify buggy behaviour (reference #6369, #6382). Drop others because they forbid relation combinations that are now allowed (eg, mixing equality and inequality, comparing to NULL, etc.). Here are some specific categories of what was replaced: - restriction::is_foo predicates are replaced by using the free function find_if; sometimes it is used transitively (see, eg, has_slice) - restriction::is_multi_column is replaced by dynamic casts (recall that the `restrictions` class hierarchy still exists) - utility methods is_satisfied_by, is_supported_by, to_string, and uses_function are replaced by eponymous free functions; note that restrictions::uses_function still exists - restriction::apply_to is replaced by free function replace_column_def - when checking infinite_bound_range_deletions, the has_bound is replaced by local free function bounded_ck - restriction::bounds and restriction::value are replaced by the more general free function possible_lhs_values - using free functions allows us to simplify the multi_column_restriction and token_restriction hierarchies; their methods merge_with and uses_function became identical in all subclasses, so they were moved to the base class - single_column_primary_key_restrictions<clustering_key>::needs_filtering was changed to reuse num_prefix_columns_that_need_not_be_filtered, which uses free functions Fixes #5799. Fixes #6369. Fixes #6371. Fixes #6372. Fixes #6382. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-07 23:08:09 +02:00
Avi Kivity	4c221855a1	Merge 'hinted handoff: fix commitlog memory leak' from Piotr D " When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This PR prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, Fixes #6776 " * piodul-fix-commitlog-memory-leak-in-hinted-handoff: hinted handoff: disable warnings about segments left on disk hinted handoff: release memory on commitlog termination	2020-07-07 21:36:14 +03:00
Piotr Dulikowski	b955793088	hinted handoff: disable warnings about segments left on disk When a mutation is written to the commitlog, a rp_handle object is returned which keeps a reference to commitlog segment. A segment is "dirty" when its reference count is not zero, otherwise it is "clean". When commitlog object is being destroyed, a warning is being printed for every dirty segment. On the other hand, clean segments are deleted. In case of standard mutation writing path, the rp_handle moves responsibility for releasing the reference to the memtable to which the mutation is written. When the memtable is flushed to disk, all references accumulated in the memtable are released. In this context, it makes sense to warn about dirty segments, because such segments contain mutations that are not written to sstables, and need to be replayed. However, hinted handoff uses a different workflow - it recreates a commitlog object periodically. When a hint is written to commitlog, the rp_handle reference is not released, so that segments with hints are not deleted when destroying the commitlog. When commitlog is created again, we get a list of saved segments with hints that we can try to send at a later time. Although this is intended behavior, now that releasing the hints commitlog is done properly, it causes the mentioned warning to periodically appear in the logs. This patch adds a parameter for the commitlog that allows to disable this warning. It is only used when creating hinted handoff commitlogs.	2020-07-07 19:40:42 +02:00
Piotr Dulikowski	002e6c4056	hinted handoff: release memory on commitlog termination When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This commit prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, #6776	2020-07-07 19:40:32 +02:00
Botond Dénes	5ebe2c28d1	db/view: view_update_generator: re-balance wait/signal on the register semaphore The view update generator has a semaphore to limit concurrency. This semaphore is waited on in `register_staging_sstable()` and later the unit is returned after the sstable is processed in the loop inside `start()`. This was broken by `4e64002`, which changed the loop inside `start()` to process sstables in per table batches, however didn't change the `signal()` call to return the amount of units according to the number of sstables processed. This can cause the semaphore units to dry up, as the loop can process multiple sstables per table but return just a single unit. This can also block callers of `register_staging_sstable()` indefinitely as some waiters will never be released as under the right circumstances the units on the semaphore can permanently go below 0. In addition to this, `4e64002` introduced another bug: table entries from the `_sstables_with_tables` are never removed, so they are processed every turn. If the sstable list is empty, there won't be any update generated but due to the unconditional `signal()` described above, this can cause the units on the semaphore to grow to infinity, allowing future staging sstables producers to register a huge amount of sstables, causing memory problems due to the amount of sstable readers that have to be opened (#6603, #6707). Both outcomes are equally bad. This patch fixes both issues and modifies the `test_view_update_generator` unit test to reproduce them and hence to verify that this doesn't happen in the future. Fixes: #6774 Refs: #6707 Refs: #6603 Tests: unit(dev) Signed-off-by: Botond DÃ©nes <bdenes@scylladb.com> Message-Id: <20200706135108.116134-1-bdenes@scylladb.com>	2020-07-07 08:53:00 +02:00
Wojciech Mitros	76038b8d8e	view: differentiate identical error messages and change them to warnings Modified log message in view_builder::calculate_shard_build_step to make it distinct from the one in view_builder::execute, changed their logging level to warning, since we're continuing even if we handle an exception. Fixes #4600	2020-07-06 20:50:34 +03:00
Botond Dénes	62c6859b69	db/view: view_update_generator: use partitioned sstable set And pass it to `make_range_sstable_reader()` when creating the reader, thus allowing the incremental selector created therein to exploit the fact that staging sstables are disjoint (in the case of repair and streaming at least). This should reduce the memory consumption of the staging reader considerably when reading from a lot of sstables.	2020-07-06 13:38:23 +03:00
Piotr Sarna	4cb79f04b0	treewide: replace libjsoncpp usage with rjson In order to eventually switch to a single JSON library, most of the libjsoncpp usage is dropped in favor of rjson. Unfortunately, one usage still remains: test/utils/test_repl utility heavily depends on the exact textual format of its output JSON files, so replacing a library results in all tests failing because of differences in formatting. It is possible to force rjson to print its documents in the exact matching format, but that's left for later, since the issue is not critical. It would be nice though if our test suite compared JSON documents with a real JSON parser, since there are more differences - e.g. libjsoncpp keeps children of the object sorted, while rapidjson uses an unordered data structure. This change should cause no change in semantics, it strives just to replace all usage of libjsoncpp with rjson.	2020-07-03 10:27:23 +02:00
Pavel Emelyanov	f045cec586	snap: Get rid of storage_service reference in schema.cc Now when the snapshot stopping is correctly handled, we may pull the database reference all the way down to the schema::describe(). One tricky place is in table::napshot() -- the local db reference is pulled through an smp::submit_to call, but thanks to the shard checks in the place where it is needed the db is still "local" Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:28:25 +03:00
Pavel Emelyanov	9211df2cdf	snapshot: Make check_snapshot_not_exist a method Sanitation. It now can access the this->_db pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:26:15 +03:00
Pavel Emelyanov	ba47ef0397	snapshots: Move ops gate from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:17:21 +03:00
Pavel Emelyanov	e439873319	snapshot: Move lock from storage_service For this de-static run_snapshot_*_operation (because we no longer have the static global to get the lock from) and make the snapshot_ctl be peering_sharded_service to call invoke_on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:17:19 +03:00
Pavel Emelyanov	d674baacef	snapshot: Move all code into db::snapshot_ctl class This includes - rename namespace in snapshot-ctl.[cc\|hh] - move methods from storage_service to snapshot_ctl - move snapshot_details struct - temporarily make storage_service._snapshot_lock and ._snapshot_ops public - replace two get_local_storage_service() occurrences with this._db The latter is not 100% clear as the code that does this references "this" from another shard, but the _db in question is the distributed object, so they are all the same on all instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:59:53 +03:00
Pavel Emelyanov	8d36607044	storage_service: Move all snapshot code into snapshot-ctl.cc This is plain move, no other modifications are made, even the "service" namespace is kept, only few broken indentation fixes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:54:15 +03:00
Pavel Emelyanov	d989d9c1c7	snapshots: Initial skeleton A placeholder for snapshotting code that will be moved into it from the storage_service. Also -- pass it through the API for future use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:54:14 +03:00
Rafael Ávila de Espíndola	67c22c8697	commitlog::read_log_file: Don't discard a future This makes the code a bit easier to read as there are no discarded futures and no references to having to keep a subscription alive, which we don't with current seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200527013120.179763-1-espindola@scylladb.com>	2020-06-24 17:22:29 +03:00
Rafael Ávila de Espíndola	64c8164e6c	everywhere: Update to seastar api v4 (when_all_succeed returning a tuple) We now just need to replace a few calls to then with then_unpack. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200618172100.111147-1-espindola@scylladb.com>	2020-06-23 19:40:18 +03:00
Avi Kivity	de38091827	priority_manager: merge streaming_read and streaming_write classes into one class Streaming is handled by just once group for CPU scheduling, so separating it into read and write classes for I/O is artificial, and inflates the resources we allow for streaming if both reads and writes happen at the same time. Merge both classes into one class ("streaming") and adjust callers. The merged class has 200 shares, so it reduces streaming bandwidth if both directions are active at the same time (which is rare; I think it only happens in view building).	2020-06-22 15:09:04 +03:00
Avi Kivity	7351db7cab	Merge "Reshape upload files and reshard+reshape at boot" from Glauber " This patchset adds a reshape operation to each compaction strategy; that is a strategy-specific way of detecting if SSTables are in-strategy or off-strategy, and in case they are offstrategy moving them to in-strategy. Often times the number of SSTables in a particular slice of the sstable set matters for that decision (number of SSTables in the same time window for TWCS, number of SSTables per tier for STCS, number of L0 SSTables for LCS). We want to be more lenient for operations that keep the node offline, like reshape at boot, but more forgiving for operations like upload, which run in maintenance mode. To accomodate for that the threshold for considering a slice of the SSTable set offstrategy is passed as a parameter Once this patchset is applied, the upload directory will reshape the SSTables before moving them to the main directory (if needed). One side effect of it is that it is no longer necessary to take locks for the refresh operation nor disable writes in the table. With the infrastructure that we have built in the upload directory, we can apply the same set of steps to populate_column_family. Using the sstable_directory to scan the files we can reshard and reshape (usually if we resharded a reshape will be necessary) with the node still offline. This has the benefit of never adding shared SSTables to the table. Applying this patchset will unlock a host of cleanups: - we can get rid of all testing for shared sstables, sstable_need_rewrite, etc. - we can remove the resharding backlog tracker. and many others. Most cleanups are deferred for a later patchset, though. " * 'reshard-reshape-v4' of github.com:glommer/scylla: distributed_loader: reshard before the node is made online distributed_loader: rework uploading of SSTables sstable_directory: add helper to reshape existing unshared sstables compaction_strategy: add method to reshape SSTables compaction: add a new compaction type, Reshape compaction: add a size and throught pretty printer. compaction: add default implementation for some pure functions tests: fix fragile database tests distributed_loader.cc: add a helper function to extract the highest SSTable version found distributed_loader.cc : extract highest_generation_seen code compaction_manager: rename run_resharding_job distributed_loader: assume populate_column_families is run in shard 0 api: do not allow user to meddle with auto compaction too early upload: use custom error handler for upload directory sstable_directory: fix debug message	2020-06-18 17:04:53 +03:00
Glauber Costa	e40aa042a7	distributed_loader: reshard before the node is made online This patch moves the resharding process to use the new directory_with_sstables_handler infrastructure. There is no longer a clear reshard step, and that just becomes a natural part of populate_column_family. In main.cc, a couple of changes are necessary to make that happen. The first one obviously is to stop calling reshard. We also need to make sure that: - The compaction manager is started much earlier, so we can register resharding jobs with it. - auto compactions are disabled in the populate method, so resharding doesn't have to fight for bandwidth with auto compactions. Now that we are resharding through the sstable_directory, the old resharding code can be deleted. There is also no need to deal with the resharding backlog either, because the SSTables are not yet added to the sstable set at this point. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Rafael Ávila de Espíndola	f6e407ecd2	everywhere: Prepare for seastar api v4 (when_all_succeed return value) The seastar api v4 changes the return type of when_all_succeed. This patch adds discard_result when that is best solution to handle the change. This doesn't do the actual update to v4 since there are still a few issues left to fix in seastar. A patch doing just the update will follow. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200617233150.918110-1-espindola@scylladb.com>	2020-06-18 15:13:56 +03:00
Piotr Dulikowski	e5b2218ad4	hinted handoff: use bool instead of send_state_set After restart_segment was removed from send_state enum, send_state_set now has only one possible element: segment_replay_failed. This patch removes send_state_set and uses bool in its place instead.	2020-06-12 16:10:20 +02:00
Piotr Dulikowski	6b34bb1a43	hinted handoff: update replay position on commitlog failure Hints manager uses commitlog framework to store and replay hints. The commitlog::read_log_file function is used for replaying hints. It reads commitlog entries and passes them to a callback. In case of hints manager, the callback calls manager::send_one_hint function. In case something goes wrong during this process, sending of that file is attempted again later. If the error was caused by hints that failed to be sent (e.g. due to network error), then we also advance _last_not_complete_rp field to the position of the first hint that failed. In the next retry, we will start reading from the commitlog from that position. However, current logic does not account for the case when an error occurs in the commitlog::read_log_file function itself. If, coincidentally, all hints sent by send_one_hint succeed, then we won't advance the _last_not_complete_rp field and we may unnecessarily repeat sending some of the hints that succeeded. This patch adds the send_one_file_ctx::last_sent_rp field, which keeps track of the last commitlog position for which a hint was attempted to be sent. In case read_log_file throws an error but all send_one_hint calls succeed, then it will be used to update _last_not_complete_rp. This will reduce the amount of hints that are resent in this case to only one. Tests: - unit(dev) - dtest(hintedhandoff_additional_test, dev)	2020-06-12 16:10:20 +02:00
Piotr Dulikowski	d369b538f0	hinted handoff: remove rps_set, use first_failed_rp instead When sending hints from one file, rps_set is used to keep track of positions of hints that are currently sent. If sending of a hint fails, its position is not removed from rps_set. If some hints fail to be sent while handling a hints file, the lowest position from rps_set is used to calculate the position from where to start when sending of the file is retried. Keeping track of commitlog positions this way isn't necessary to calculate this position. This patch removes rps_set and replaces it with first_failed_rp - which is just a single std::optional<db::replay_position>. This value is updated when a hint send failure is detected. This simplifies calculation of starting position for the next retry, and allowed to remove some error handling logic related to an edge case when inserting to rps_set fails. - unit(dev) - dtest(hintedhandoff_additional_test, dev)	2020-06-12 16:10:19 +02:00
Rafael Ávila de Espíndola	555d8fe520	build: Be consistent about system versus regular headers We were not consistent about using '#include "foo.hh"' instead of '#include <foo.hh>' for scylla's own headers. This patch fixes that inconsistency and, to enforce it, changes the build to use -iquote instead of -I to find those headers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200608214208.110216-1-espindola@scylladb.com>	2020-06-10 15:49:51 +03:00
Piotr Sarna	3458bd2e32	db,view: fix outdated comments Some comments still referred to variable names which are no longer up-to-date. Follow-up for #6560. Message-Id: <2b857ccc900dd64f0d9379f5d6c87fd3aaa5d902.1591594042.git.sarna@scylladb.com>	2020-06-08 09:02:10 +03:00
Nadav Har'El	d6626c217a	merge: add error injection to mv Merged pull request https://github.com/scylladb/scylla/pull/6516 from Piotr Sarna: This series adds error injection points to materialized view paths: view update generation from staging sstables; view building; generating view updates from user writes. This series comes with a corresponding dtest pull request which adds some test cases based on error injection. Fixes #6488	2020-06-07 19:23:23 +03:00
Piotr Sarna	b3a6a33487	db,view: ensure that local updates are applied locally In current mutate_MV() code it's possible for a local endpoint to become a target for a network operation. That's the source of occasional `broken promise` benign error messages appearing, since the mutation is actually applied locally, so there's no point in creating a write response handler - the node will not send a response to itself via network. While at it, the code is deduplicated a little bit - with the paths simplified, it's easier to ensure that a local endpoint is never listed as a target for remote network operations. Fixes #5459 Tests: unit(dev), dtest(materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test)	2020-06-07 19:10:03 +03:00
Kamil Braun	d89b7a0548	cdc: rename CDC description tables Commit `968177da04` has changed the schema of cdc_topology_description and cdc_description tables in the system_distributed keyspace. Unfortunately this was a backwards-incompatible change: these tables would always be created, irrespective of whether or not "experimental" was enabled. They just wouldn't be populated with experimental=off. If the user now tries to upgrade Scylla from a version before this change to a version after this change, it will work as long as CDC is protected b the experimental flag and the flag is off. However, if we drop the flag, or if the user turns experimental on, weird things will happen, such as nodes refusing to start because they try to populate cdc_topology_description while assuming a different schema for this table. The simplest fix for this problem is to rename the tables. This fix must get merged in before CDC goes out of experimental. If the user upgrades his cluster from a pre-rename version, he will simply have two garbage tables that he is free to delete after upgrading. sstables and digests need to be regenerated for schema_digest_test since this commit effectively adds new tables to the system_distributed keyspace. This doesn't result in schema disagreement because the table is announced to all nodes through the migration manager.	2020-06-05 09:59:16 +02:00
Piotr Sarna	76e89efc1a	db,view: add error injection points to view building ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Piotr Sarna	9d524a7a7e	db,view: add error injection points to view update generator ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Avi Kivity	a4c44cab88	treewide: update concepts language from the Concepts TS to C++20 Seastar recently lost support for the experimental Concepts Technical Specification (TS) and gained support for C++20 concepts. Re-enable concepts in Scylla by updating our use of concepts to the C++20 standard. This change: - peels off uses of the GCC6_CONCEPT macro - removes inclusions of <seastar/gcc6-concepts.hh> - replaces function-style concepts (no longer supported) with equation-style concepts - semicolons added and removed as needed - deprecated std::is_pod replaced by recommended replacement - updates return type constraints to use concepts instead of type names (either std::same_as or std::convertible_to, with std::same_as chosen when possible) No attempt is made to improve the concepts; this is a specification update only. Message-Id: <20200531110254.2555854-1-avi@scylladb.com>	2020-06-02 09:12:21 +03:00
Pavel Emelyanov	67d5fad65f	storage_service: Remove some inclusions of its header GC pass over .cc files. Some really do not need it, some need for features/gossiper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Emelyanov	ee31191e21	storage_service: Move get_generation_number to util/ This is purely utility helper routine. As a nice side effect the inclusion of storage_service.hh is removed from several unrelated places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Avi Kivity	0c6bbc84cd	Merge "Classify queries based on their initiator, rather than their target" from Botond " Currently we classify queries as "system" or "user" based on the table they target. The class of a query determines how the query is treated, currently: timeout, limits for reverse queries and the concurrency semaphore. The catch is that users are also allowed to query system tables and when doing so they will bypass the limits intended for user queries. This has caused performance problems in the past, yet the reason we decided to finally address this is that we want to introduce a memory limit for unpaged queries. Internal (system) queries are all unpaged and we don't want to impose the same limit on them. This series uses scheduling groups to distinguish user and system workloads, based on the assumption that user workloads will run in the statement scheduling group, while system workloads will run in the main (or default) scheduling group, or perhaps something else, but in any case not in the statement one. Currently the scheduling group of reads and writes is lost when going through the messaging service, so to be able to use scheduling groups to distinguish user and system reads this series refactors the messaging service to retain this distinction across verb calls. Furthermore, we execute some system reads/writes as part of user reads/writes, such as auth and schema sync. These processes are tagged to run in the main group. This series also centralises query classification on the replica and moves it to a higher level. More specifically, queries are now classified -- the scheduling group they run in is translated to the appropriate query class specific configuration -- on the database level and the configuration is propagated down to the lower layers. Currently this query class specific configuration consists of the reader concurrency semaphore and the max memory limit for otherwise unlimited queries. A corollary of the semaphore begin selected on the database level is that the read permit is now created before the read starts. A valid permit is now available during all stages of the read, enabling tracking the memory consumption of e.g. the memtable and cache readers. This change aligns nicely with the needs of more accurate reader memory tracking, which also wants a valid permit that is available in every layer. The series can be divided roughly into the following distinct patch groups: * 01-02: Give system read concurrency a boost during startup. * 03-06: Introduce user/system statement isolation to messaging service. * 07-13: Various infrastructure changes to prepare for using read permits in all stages of reads. * 14-19: Propagate the semaphore and the permit from database to the various table methods that currently create the permit. * 20-23: Migrate away from using the reader concurrency semaphore for waiting for admission, use the permit instead. * 24: Introduce `database::make_query_config()` and switch the database methods needing such a config to use it. * 25-31: Get rid of all uses of `no_reader_permit()`. * 32-33: Ban empty permits for good. * 34: querier_cache: use the queriers' permits to obtain the semaphore. Fixes: #5919 Tests: unit(dev, release, debug), dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual testing with a 2 node mixed cluster with extra logging. " * 'query-class/v6' of https://github.com/denesb/scylla: (34 commits) querier_cache: get semaphore from querier reader_permit: forbid empty permits reader_permit: fix reader_resources::operator bool treewide: remove all uses of no_reader_permit() database: make_multishard_streaming_reader: pass valid permit to multi range reader sstables: pass valid permits to all internal reads compaction: pass a valid permit to sstable reads database: add compaction read concurrency semaphore view: use valid permits for reads from the base table database: use valid permit for counter read-before-write database: introduce make_query_class_config() reader_concurrency_semaphore: remove wait_admission and consume_resources() test: move away from reader_concurrency_semaphore::wait_admission() reader_permit: resource_units: introduce add() mutation_reader: restricted_reader: work in terms of reader_permit row_cache: pass a valid permit to underlying read memtable: pass a valid permit to the delegate reader table: require a valid permit to be passed to most read methods multishard_mutation_query: pass a valid permit to shard mutation sources querier: add reader_permit parameter and forward it to the mutation_source ...	2020-05-29 10:11:44 +03:00
Piotr Sarna	77e943e9a3	db,views: unify time points used for update generation Until now, view updates were generated with a bunch of random time points, because the interface was not adjusted for passing a single time point. The time points were used to determine whether cells were alive (e.g. because of TTL), so it's better to unify the process: 1. when generating view updates from user writes, a single time point is used for the whole operation 2. when generating view updates via the view building process, a single time point is used for each build step NOTE: I don't see any reliable and deterministic way of writing test scenarios which trigger problems with the old code. After #6488 is resolved and error injection is integrated into view.cc, tests can be added. Fixes #6429 Tests: unit(dev) Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>	2020-05-28 12:56:09 +03:00
Botond Dénes	734e995639	database: add compaction read concurrency semaphore All reads will soon require a valid permit, including those done during compaction. To allow creating valid permits for these reads create a compaction specific semaphore. This semaphore is unlimited as compaction concurrency is managed by higher level layer, we use just for resource usage accounting.	2020-05-28 11:34:35 +03:00
Botond Dénes	992e697dd5	view: use valid permits for reads from the base table View update generation involves reading existing values from the base table, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to these reads. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the base table write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	e4c591aa67	database: introduce make_query_class_config() And use it to obtain any query-class specific configuration that was obtained from `table::config` before, such as the read concurrency semaphore and the max memory limit for unlimited queries. As all users of these items get these from the query class config now, we can remove them from `table::config`.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Nadav Har'El	c3da9f2bd4	alternator: add mandatory configurable write isolation mode Alternator supports four ways in which write operations can use quorum writes or LWT or both, which we called "write isolation policies". Until this patch, Alternator defaulted to the most generally safe policy, "always_use_lwt". This default could have been overriden for each table separately, but there was no way to change this default for all tables. This patch adds a "--alternator-write-isolation" configuration option which allows changing the default. Moreover, @dorlaor asked that users must explicitly choose this default mode, and not get "always_use_lwt" without noticing. The previous default, "always_use_lwt" supports any workload correctly but because it uses LWT for all writes it may be disappointingly slow for users who run write-only workloads (including most benchmarks) - such users might find the slow writes so disappointing that they will drop Scylla. Conversely, a default of "forbid_rmw" will be faster and still correct, but will fail on workloads which need read-modify-write operations - and suprise users that need these operations. So Dor asked that that none of the write modes be made the default, and users must make an informed choice between the different write modes, rather than being disappointed by a default choice they weren't aware of. So after this patch, Scylla refuses to boot if Alternator is enabled but a "--alternator-write-isolation" option is missing. The patch also modifies the relevant documentation, adds the same option to our docker image, and the modifies the test-running script test/alternator/run to run Scylla with the old default mode (always_use_lwt), which we need because we want to test RMW operations as well. Fixes #6452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524160338.108417-1-nyh@scylladb.com>	2020-05-27 08:40:05 +03:00
Tomasz Grabiec	1424543e11	Merge "Move sstables_format on sstable_manager" from Pavel Emelyanov The format is currently sitting in storage_service, but the previous set patched all the users not to call it, instead they use sstables_manager to get the highest supported format. So this set finalizes this effort and places the format on sstables_manager(s). The set introduces the db::sstables_format_selector, that - starts with the lowest format (ka) - reads one on start from system tables - subscribes on sstables-related features and bumps up the selection if the respective feature is enabled During its lifetime the selector holds a reference to the sharded<database> and updates the format on it, the database, in turn, propagates it further to sstables_managers. The managers start with the highest known format (mc) which is done for tests. * https://github.com/xemul/scylla br-move-sstables-format-4: storage_service: Get rid of one-line helpers system_keyspace: Cleanup setup() from storage_service format_selector: Log which format is being selected sstables_manager: Keep format on format_selector: Make it standalone format_selector: Move the code into db/ format_selector: Select format locally storage_service: Introduce format_selector storage_service: Split feature_enabled_listener::on_enabled storage_service: Tossing bits around features: Introduce and use masked features features: Get rid of per-features booleans	2020-05-27 08:40:05 +03:00
Avi Kivity	8d27e1b4a9	Merge 'Propagate tracing to materialized view update path' from Piotr S In order to improve materialized views' debuggability, tracing points are added to view update generation path. Example trace: ``` ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-27 13:13:46.834000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-04-27 13:13:46.834346 \| 127.0.0.1 \| 1 \| 127.0.0.1 Processing a statement [shard 0] \| 2020-04-27 13:13:46.834426 \| 127.0.0.1 \| 80 \| 127.0.0.1 Creating write handler for token: -3248873570005575792 natural: {127.0.0.1, 127.0.0.3} pending: {} [shard 0] \| 2020-04-27 13:13:46.834494 \| 127.0.0.1 \| 148 \| 127.0.0.1 Creating write handler with live: {127.0.0.3, 127.0.0.1} dead: {} [shard 0] \| 2020-04-27 13:13:46.834507 \| 127.0.0.1 \| 161 \| 127.0.0.1 Sending a mutation to /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.834519 \| 127.0.0.1 \| 173 \| 127.0.0.1 Executing a mutation locally [shard 0] \| 2020-04-27 13:13:46.834532 \| 127.0.0.1 \| 186 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 0] \| 2020-04-27 13:13:46.834570 \| 127.0.0.1 \| 224 \| 127.0.0.1 Reading key {{-3248873570005575792, pk{000400000002}}} from sstable /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db [shard 0] \| 2020-04-27 13:13:46.834608 \| 127.0.0.1 \| 262 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Index.db: scheduling bulk DMA read of size 8 at offset 0 [shard 0] \| 2020-04-27 13:13:46.834635 \| 127.0.0.1 \| 289 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Index.db: finished bulk DMA read of size 8 at offset 0, successfully read 8 bytes [shard 0] \| 2020-04-27 13:13:46.834975 \| 127.0.0.1 \| 629 \| 127.0.0.1 Message received from /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.834988 \| 127.0.0.3 \| 11 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db: scheduling bulk DMA read of size 41 at offset 0 [shard 0] \| 2020-04-27 13:13:46.835015 \| 127.0.0.1 \| 669 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 0] \| 2020-04-27 13:13:46.835020 \| 127.0.0.3 \| 44 \| 127.0.0.1 Generated 1 view update mutations [shard 0] \| 2020-04-27 13:13:46.835080 \| 127.0.0.3 \| 104 \| 127.0.0.1 Sending view update for ks.t_v2_idx_index to 127.0.0.2, with pending endpoints = {}; base token = -3248873570005575792; view token = 3728482343045213994 [shard 0] \| 2020-04-27 13:13:46.835095 \| 127.0.0.3 \| 119 \| 127.0.0.1 Sending a mutation to /127.0.0.2 [shard 0] \| 2020-04-27 13:13:46.835105 \| 127.0.0.3 \| 129 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 0] \| 2020-04-27 13:13:46.835117 \| 127.0.0.3 \| 141 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db: finished bulk DMA read of size 41 at offset 0, successfully read 41 bytes [shard 0] \| 2020-04-27 13:13:46.835160 \| 127.0.0.1 \| 813 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.835164 \| 127.0.0.3 \| 188 \| 127.0.0.1 Mutation handling is done [shard 0] \| 2020-04-27 13:13:46.835177 \| 127.0.0.3 \| 201 \| 127.0.0.1 Generated 1 view update mutations [shard 0] \| 2020-04-27 13:13:46.835215 \| 127.0.0.1 \| 869 \| 127.0.0.1 Locally applying view update for ks.t_v2_idx_index; base token = -3248873570005575792; view token = 3728482343045213994 [shard 0] \| 2020-04-27 13:13:46.835226 \| 127.0.0.1 \| 880 \| 127.0.0.1 Successfully applied local view update for 127.0.0.1 and 0 remote endpoints [shard 0] \| 2020-04-27 13:13:46.835253 \| 127.0.0.1 \| 907 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 0] \| 2020-04-27 13:13:46.835256 \| 127.0.0.1 \| 910 \| 127.0.0.1 Got a response from /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.835274 \| 127.0.0.1 \| 928 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-27 13:13:46.835276 \| 127.0.0.1 \| 930 \| 127.0.0.1 Mutation successfully completed [shard 0] \| 2020-04-27 13:13:46.835279 \| 127.0.0.1 \| 933 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-04-27 13:13:46.835286 \| 127.0.0.1 \| 941 \| 127.0.0.1 Message received from /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835331 \| 127.0.0.2 \| 14 \| 127.0.0.1 Sending mutation_done to /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835399 \| 127.0.0.2 \| 82 \| 127.0.0.1 Mutation handling is done [shard 0] \| 2020-04-27 13:13:46.835413 \| 127.0.0.2 \| 96 \| 127.0.0.1 Got a response from /127.0.0.2 [shard 0] \| 2020-04-27 13:13:46.835639 \| 127.0.0.3 \| 662 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-27 13:13:46.835640 \| 127.0.0.3 \| 664 \| 127.0.0.1 Successfully applied view update for 127.0.0.2 and 1 remote endpoints [shard 0] \| 2020-04-27 13:13:46.835649 \| 127.0.0.3 \| 673 \| 127.0.0.1 Got a response from /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835841 \| 127.0.0.1 \| 1495 \| 127.0.0.1 Request complete \| 2020-04-27 13:13:46.834944 \| 127.0.0.1 \| 944 \| 127.0.0.1 ``` Fixes #6175 Tests: unit(dev), manual * psarna-propagate_tracing_to_more_write_paths: db,view: add tracing to view update generation path treewide: propagate trace state to write path	2020-05-27 08:40:05 +03:00
Pavel Emelyanov	ccdee822e1	storage_service: Get rid of one-line helpers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	3c2066bd78	system_keyspace: Cleanup setup() from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	0598b3a858	format_selector: Log which format is being selected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	89a1b09214	sstables_manager: Keep format on Make the database be the format_selector target, so when the format is selected its set on database which in turn just forwards the selection into sstables managers. All users of the format are already patched to read it from those managers. The initial value for the format is the highest, which is needed by tests. When scylla starts the format is updated by format_selector, first after reading from system tables, then by selectiing it from features. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:28 +03:00

1 2 3 4 5 ...

1736 Commits