scylladb

Author	SHA1	Message	Date
Botond Dénes	9eab5bca27	query_*(): use the coordinator specified memory limit for unlimited queries It is important that all replicas participating in a read use the same memory limits to avoid artificial differences due to different amount of results. The coordinator now passes down its own memory limit for reads, in the form of max_result_size (or max_size). For unpaged or reverse queries this has to be used now instead of the locally set max_memory_unlimited_query configuration item. To avoid the replicas accidentally using the local limit contained in the `query_class_config` returned from `database::make_query_class_config()`, we refactor the latter into `database::get_reader_concurrency_semaphore()`. Most of its callers were only interested in the semaphore only anyway and those that were interested in the limit as well should get it from the coordinator instead, so this refactoring is a win-win.	2020-07-28 18:00:29 +03:00
Botond Dénes	159d37053d	storage_proxy: use read_command::max_result_size to pass max result size around Use the recently added `max_result_size` field of `query::read_command` to pass the max result size around, including passing it to remote nodes. This means that the max result size will be sent along each read, instead of once per connection. As we want to select the appropriate `max_result_size` based on the type of the query as well as based on the query class (user or internal) the previous method won't do anymore. If the remote doesn't fill this field, the old per-connection value is used.	2020-07-28 18:00:29 +03:00
Botond Dénes	a64d9b8883	database: add get_statement_scheduling_group()	2020-07-28 18:00:29 +03:00
Botond Dénes	d5cc932a0b	database: query_mutations(): obtain the memory accounter inside Instead of requesting callers to do it and pass it as a parameter. This is in line with data_query().	2020-07-28 18:00:29 +03:00
Botond Dénes	517a941feb	query_class_config: move into the query namespace It belongs there, its name even starts with "query".	2020-07-28 18:00:29 +03:00
Botond Dénes	cd849ed40d	database: add make_restricted_range_sstable_reader() A variant of `make_range_sstable_reader()` that wraps the reader in a restricting reader, hence making it wait for admission on the read concurrency semaphore, before starting to actually read.	2020-07-20 11:23:39 +03:00
Amnon Heiman	186301aff8	per table metrics: change estimated_histogram to time_estimated_histogram This patch changes the per table latencies histograms: read, write, cas_prepare, cas_accept, and cas_learn. Beside changing the definition type and the insertion method, the API was changed to support the new metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Raphael S. Carvalho	ce210a4420	table: simplify add_sstable() get_shards_for_this_sstable() can be called inside table::add_sstable() because the shards for a sstable is precomputed and so completely exception safe. We want a central point for checking that table will no longer added shared SSTables to its sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:32 -03:00
Raphael S. Carvalho	68b527f100	table: simplify update_stats_for_new_sstable() no longer need to conditionally track the SSTable metadata, as table will no longer accept shared SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:04 -03:00
Raphael S. Carvalho	607c74dc95	table: remove unused open_sstable function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:00 -03:00
Raphael S. Carvalho	60467a7e36	table: no longer keep track of sstables that need resharding Now that table will no longer accept shared SSTables, it no longer needs to keep track of them. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:38 -03:00
Raphael S. Carvalho	cd548c6304	table: Remove unused functions no longer used by resharding Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:36 -03:00
Pavel Emelyanov	f045cec586	snap: Get rid of storage_service reference in schema.cc Now when the snapshot stopping is correctly handled, we may pull the database reference all the way down to the schema::describe(). One tricky place is in table::napshot() -- the local db reference is pulled through an smp::submit_to call, but thanks to the shard checks in the place where it is needed the db is still "local" Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:28:25 +03:00
Avi Kivity	e5be3352cf	database, streaming, messaging: drop streaming memtables Before Scylla 3.0, we used to send streaming mutations using individual RPC requests and flush them together using dedicated streaming memtables. This mechanism is no longer in use and all versions that use it have long reached end-of-life. Remove this code.	2020-06-25 15:25:54 +02:00
Glauber Costa	b34c0c2ff6	distributed_loader: rework uploading of SSTables Uploading of SSTables is problematic: for historical reasons it takes a lock that may have to wait for ongoing compactions to finish, then it disables writes in the table, and then it goes loading SSTables as if it knew nothing about them. With the sstable_directory infrastructure we can do much better: * we can reshard and reshape the SSTables in place, keeping the number of SSTables in check. Because this is an background process we can be fairly aggressive and set the reshape mode to strict. * we can then move the SSTables directly into the main directory. Because we know they are few in number we can call the more elegant add_sstable_and_invalidate_cache instead of the open coding currently done by load_new_sstables * we know they are not shared (if they were, we resharded them), simplifying the load process even further. The major changes after this patch is applied is that all compactions (resharding and reshape) needed to make the SSTables in-strategy are done in the streaming class, which reduces the impact of this operation on the node. When the SSTables are loaded, subsequent reads will not suffer as we will not be adding shared SSTables in potential high numbers, nor will we reshard in the compaction class. There is also no more need for a lock in the upload process so in the fast path where users are uploading a set of SSTables from a backup this should essentially be instantaneous. The lock, as well as the code to disable and enable table writes is removed. A future improvement is to bypass the staging directory too, in which case the reshaping compaction would already generate the view updates. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Rafael Ávila de Espíndola	336d541f58	database: Use a flat_hash_map for _ks_cf_to_uuid Given that the key is a std::pair, we have to explicitly mark the hash and eq types as transparent for heterogeneous lookup to work. With that, pass std::string_view to a few functions that just check if a value is in the map. This increases the .text section by 11 KiB (0.03%). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	6da9eef25f	database: Use flat_hash_map for _keyspaces This changes the hash map used for _keyspaces. Using a flat_hash_map allows using std::string_view in has_keyspace thanks to the heterogeneous lookup support. This add 200 KiB to .text, since this is the first use of absl and brings in files from the .a. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Glauber Costa	e48ad3dc23	remove manifest_file filter from table. When we are scanning an sstable directory, we want to filter out the manifest file in most situations. The table class has a filter for that, but it is a static filter that doesn't depend on table for anything. We are better off removing it and putting in another independent location. While it seems wasteful to use a new header just for that, this header will soon be populated with the sstable_directory class. Tests: unit (dev) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Raphael S. Carvalho	077b4ee97d	table: Don't remove a SSTable from the backlog tracker if not previously added After `7f1a215`, a sstable is only added to backlog tracker if sstable::shared() returns true. sstable::shared() can return true for a sstable that is actually owned by more than one shard, but it can also incorrectly return true for a sstable which wasn't made explicitly unshared through set_unshared(). A recent work of mine is getting rid of set_unshared() because a sstable has the knowledge to determine whether or not it's shared. The problem starts with streaming sstable which hasn't set_unshared() called for it, so it won't be added to backlog tracker, but it can be eventually removed from the tracker when that sstable is compacted. Also, it could happen that a shared sstable, which was resharded, will be removed from the tracker even though it wasn't previously added. When those problems happen, backlog tracker will have an incorrect account of total bytes, which leads it to producing incorrect backlogs that can potentially go negative. These problems are fixed by making every add / removal go through functions which take into account sstable::shared(). Fixes #6227. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-2-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Avi Kivity	0c6bbc84cd	Merge "Classify queries based on their initiator, rather than their target" from Botond " Currently we classify queries as "system" or "user" based on the table they target. The class of a query determines how the query is treated, currently: timeout, limits for reverse queries and the concurrency semaphore. The catch is that users are also allowed to query system tables and when doing so they will bypass the limits intended for user queries. This has caused performance problems in the past, yet the reason we decided to finally address this is that we want to introduce a memory limit for unpaged queries. Internal (system) queries are all unpaged and we don't want to impose the same limit on them. This series uses scheduling groups to distinguish user and system workloads, based on the assumption that user workloads will run in the statement scheduling group, while system workloads will run in the main (or default) scheduling group, or perhaps something else, but in any case not in the statement one. Currently the scheduling group of reads and writes is lost when going through the messaging service, so to be able to use scheduling groups to distinguish user and system reads this series refactors the messaging service to retain this distinction across verb calls. Furthermore, we execute some system reads/writes as part of user reads/writes, such as auth and schema sync. These processes are tagged to run in the main group. This series also centralises query classification on the replica and moves it to a higher level. More specifically, queries are now classified -- the scheduling group they run in is translated to the appropriate query class specific configuration -- on the database level and the configuration is propagated down to the lower layers. Currently this query class specific configuration consists of the reader concurrency semaphore and the max memory limit for otherwise unlimited queries. A corollary of the semaphore begin selected on the database level is that the read permit is now created before the read starts. A valid permit is now available during all stages of the read, enabling tracking the memory consumption of e.g. the memtable and cache readers. This change aligns nicely with the needs of more accurate reader memory tracking, which also wants a valid permit that is available in every layer. The series can be divided roughly into the following distinct patch groups: * 01-02: Give system read concurrency a boost during startup. * 03-06: Introduce user/system statement isolation to messaging service. * 07-13: Various infrastructure changes to prepare for using read permits in all stages of reads. * 14-19: Propagate the semaphore and the permit from database to the various table methods that currently create the permit. * 20-23: Migrate away from using the reader concurrency semaphore for waiting for admission, use the permit instead. * 24: Introduce `database::make_query_config()` and switch the database methods needing such a config to use it. * 25-31: Get rid of all uses of `no_reader_permit()`. * 32-33: Ban empty permits for good. * 34: querier_cache: use the queriers' permits to obtain the semaphore. Fixes: #5919 Tests: unit(dev, release, debug), dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual testing with a 2 node mixed cluster with extra logging. " * 'query-class/v6' of https://github.com/denesb/scylla: (34 commits) querier_cache: get semaphore from querier reader_permit: forbid empty permits reader_permit: fix reader_resources::operator bool treewide: remove all uses of no_reader_permit() database: make_multishard_streaming_reader: pass valid permit to multi range reader sstables: pass valid permits to all internal reads compaction: pass a valid permit to sstable reads database: add compaction read concurrency semaphore view: use valid permits for reads from the base table database: use valid permit for counter read-before-write database: introduce make_query_class_config() reader_concurrency_semaphore: remove wait_admission and consume_resources() test: move away from reader_concurrency_semaphore::wait_admission() reader_permit: resource_units: introduce add() mutation_reader: restricted_reader: work in terms of reader_permit row_cache: pass a valid permit to underlying read memtable: pass a valid permit to the delegate reader table: require a valid permit to be passed to most read methods multishard_mutation_query: pass a valid permit to shard mutation sources querier: add reader_permit parameter and forward it to the mutation_source ...	2020-05-29 10:11:44 +03:00
Piotr Sarna	77e943e9a3	db,views: unify time points used for update generation Until now, view updates were generated with a bunch of random time points, because the interface was not adjusted for passing a single time point. The time points were used to determine whether cells were alive (e.g. because of TTL), so it's better to unify the process: 1. when generating view updates from user writes, a single time point is used for the whole operation 2. when generating view updates via the view building process, a single time point is used for each build step NOTE: I don't see any reliable and deterministic way of writing test scenarios which trigger problems with the old code. After #6488 is resolved and error injection is integrated into view.cc, tests can be added. Fixes #6429 Tests: unit(dev) Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>	2020-05-28 12:56:09 +03:00
Botond Dénes	734e995639	database: add compaction read concurrency semaphore All reads will soon require a valid permit, including those done during compaction. To allow creating valid permits for these reads create a compaction specific semaphore. This semaphore is unlimited as compaction concurrency is managed by higher level layer, we use just for resource usage accounting.	2020-05-28 11:34:35 +03:00
Botond Dénes	992e697dd5	view: use valid permits for reads from the base table View update generation involves reading existing values from the base table, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to these reads. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the base table write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	e4c591aa67	database: introduce make_query_class_config() And use it to obtain any query-class specific configuration that was obtained from `table::config` before, such as the read concurrency semaphore and the max memory limit for unlimited queries. As all users of these items get these from the query class config now, we can remove them from `table::config`.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Botond Dénes	e0b98ba921	database: give system reads a concurrency boost during startup In the next patches we will match reads to the appropriate reader concurrency semaphore based on the scheduling group they run in. This will result in a lot of system reads that are executed during startup and that were up to now (incorrectly) using the user read semaphore to switch to the system read semaphore. This latter has a much more constrained concurrency, which was observed to cause system reads to saturate and block on the semaphore, slowing down startup. To solve this, boost the concurrency of the system read semaphore during startup to match that of the user semaphore. This is ok, as during startup there are no user reads to compete with. After startup, before we start serving user reads the concurrency is reverted back to the normal value.	2020-05-28 10:40:08 +03:00
Tomasz Grabiec	1424543e11	Merge "Move sstables_format on sstable_manager" from Pavel Emelyanov The format is currently sitting in storage_service, but the previous set patched all the users not to call it, instead they use sstables_manager to get the highest supported format. So this set finalizes this effort and places the format on sstables_manager(s). The set introduces the db::sstables_format_selector, that - starts with the lowest format (ka) - reads one on start from system tables - subscribes on sstables-related features and bumps up the selection if the respective feature is enabled During its lifetime the selector holds a reference to the sharded<database> and updates the format on it, the database, in turn, propagates it further to sstables_managers. The managers start with the highest known format (mc) which is done for tests. * https://github.com/xemul/scylla br-move-sstables-format-4: storage_service: Get rid of one-line helpers system_keyspace: Cleanup setup() from storage_service format_selector: Log which format is being selected sstables_manager: Keep format on format_selector: Make it standalone format_selector: Move the code into db/ format_selector: Select format locally storage_service: Introduce format_selector storage_service: Split feature_enabled_listener::on_enabled storage_service: Tossing bits around features: Introduce and use masked features features: Get rid of per-features booleans	2020-05-27 08:40:05 +03:00
Pavel Emelyanov	89a1b09214	sstables_manager: Keep format on Make the database be the format_selector target, so when the format is selected its set on database which in turn just forwards the selection into sstables managers. All users of the format are already patched to read it from those managers. The initial value for the format is the highest, which is needed by tests. When scylla starts the format is updated by format_selector, first after reading from system tables, then by selectiing it from features. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:28 +03:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Avi Kivity	beaeda5234	database: remove variadic future from query() and query_mutations() Variadic futures are deprecated; replace with future<std::tuple<...>>. Tests: unit (dev)	2020-05-17 18:45:38 +02:00
Glauber Costa	7423ccc318	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Avi Kivity	76d21a0c22	Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J. " We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR. This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry. Fixes #6098 Fixes #6146 Tests: unit(dev, release), manual " * haaawk-dont_cache_cdc: cdc: Don't cache CDC Log table table: invalidate disabled cache on memtable flush table: Add cache_enabled member function cf_prop_defs: persist caching_options in schema property_definitions: add get that returns variant feature: add PER_TABLE_CACHING feature caching_options: add enabled parameter	2020-05-10 15:39:42 +03:00
Avi Kivity	5b971397aa	Revert "compaction_manager: allow early aborts through abort sources." This reverts commit `e8213fb5c3`. It results in an assertion failure in remove_index_file_test. Fixes #6413.	2020-05-10 12:32:18 +03:00
Raphael S. Carvalho	88d2486fca	sstables: Synchronize deletion of SSTables in resharding with other operations Input SSTables of resharding is deleted at the coordinator shard, not at the shards they belong to. We're not acquiring deletion semaphore before removing those input SSTables from the SSTable set, so it could happen that resharding deletes those SSTables while another operation like snapshot, which acquires the semaphore, find them deleted. Let's acquire the deletion semaphore so that the input SSTables will only be removed from the set, when we're certain that nobody is relying on their existence anymore. Now resharding will only delete input SStables after they're safely removed from the SSTable set of all shards they belong to. unit: test(dev). Fixes #6328. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200507233636.92104-1-raphaelsc@scylladb.com>	2020-05-10 10:50:32 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Glauber Costa	e8213fb5c3	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Tests: unit (dev), manual start+stop, manual drain + stop Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200506184749.98288-1-glauber@scylladb.com>	2020-05-07 13:24:47 +03:00
Piotr Jastrzebski	1a43849cd2	table: Add cache_enabled member function This function determines cache usage based both on table _config and dynamic schema information. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Sarna	c66661c582	table: bypass cache when generating view updates from streaming There's no indication that data needed for generating view updates from staging sstables is going to be immediately useful for the user, and a large amount of it can push hot rows out of the cache, thus deteriorating performance. Fixes #6233 Tests: unit(dev)	2020-04-26 15:43:02 +03:00
Piotr Sarna	71ac6ebcc5	Merge 'prepare the view building generator to work through a compaction' from Glauber There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately * glommer-view_build: staging: potentially read many SSTables at the same time view_build_test: make sure it works with smp > 1	2020-04-15 18:07:09 +02:00
Glauber Costa	4e6400293e	staging: potentially read many SSTables at the same time There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-15 11:26:44 -04:00
Konstantin Osipov	18b9bb57ac	lwt: rename metrics to match accepted terminology Rename inherited metrics cas_propose and cas_commit to cas_accept and cas_learn respectively. A while ago we made a decision to stick to widely accepted terms for Paxos rounds: prepare, accept, learn. The rest of the code is using these terms, so rename the metrics to avoid confusion/technical debt. While at it, rename a few internal methods and functions. Fixes #6169 Message-Id: <20200414213537.129547-1-kostja@scylladb.com>	2020-04-15 12:20:30 +02:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Glauber Costa	463d0ab37c	compaction: move rewrite_sstables to the compaction_manager There is no reason why the table code has to be aware of the efforts of rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it. Rewrite is special, because we need to do it one SSTable at a time, without lumping it together. However, the compaction manager is totally capable of doing that itself. If we do that, the special "table::rewrite_sstables" can be killed. This code would maybe be better off as a thread, where we wouldn't need to keep state. However there are some methods like maybe_stop_on_error() that expect a future so I am leaving this be for now. This is a cleanup that can be done later. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200401162722.28780-2-glauber@scylladb.com>	2020-04-06 16:02:30 +03:00
Glauber Costa	87dd23db03	compaction: use a larger min_threshold during bootstrap, replace During bootstrap and replace operations the node can't take reads and we'd like to see the process ending ASAP. This is because until the process ends, we keep having to duplicate writes to an extended set. Not to mention, in the case of a cluster expansion users want to use the added capacity sooner rather than later. Streaming generates a lot of compaction activity, that competes with the bootstrap itself, slowing it down. Long term, we are moving to treat those compactions differently and maybe postpone them altogether. However for now we can reduce the amount of compactions by increasing the minimum threshold of SSTables that have to accumulate before they are selected for compactions. The default is 2, meaning we will trigger a compaction every time 2 SSTables of about the same size are found (for STCS, others follow a similar pattern). Until we have offstrategy infrastructure we don't want the compactions to stop happening altogether so the reads, when they start, don't suffer. This patch sets the minimum threshold to 16 (for the default max_threshold of 32), meaning we will generate a lot less compaction activity during streaming. Once streaming is done we revert it to its original. Unfortunately there isn't much we can do at the moment about decommission. During decommission the nodes receiving data are also taking reads and we don't want SSTables to accumulate. Fixes #5109 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-01 10:06:27 +03:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Pavel Emelyanov	96e3d0fa36	mutation_partition: Debloat header form others Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200317191051.12623-1-xemul@scylladb.com>	2020-03-18 11:53:36 +02:00
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00

1 2 3 4 5 ...

821 Commits