scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Benny Halevy	e39fbe1849	compaction: move compaction uuid generation to compaction_info We'd like to use the same uuid both for printing compaction log messages and to update compaction_history. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 13:55:23 +03:00
Amnon Heiman	186301aff8	per table metrics: change estimated_histogram to time_estimated_histogram This patch changes the per table latencies histograms: read, write, cas_prepare, cas_accept, and cas_learn. Beside changing the definition type and the insertion method, the API was changed to support the new metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Amnon Heiman	ea8d52b11c	row_locking: change estimated histogram with time_estimated_histogram This patch changes the row locking latencies to use time_estimated_histogram. The change consist of changing the histogram definition and changing how values are inserted to the histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Raphael S. Carvalho	1e9c5b5295	table: simplify table::discard_sstables() no longer need to have any special code for shared SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:40 -03:00
Raphael S. Carvalho	ce210a4420	table: simplify add_sstable() get_shards_for_this_sstable() can be called inside table::add_sstable() because the shards for a sstable is precomputed and so completely exception safe. We want a central point for checking that table will no longer added shared SSTables to its sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:32 -03:00
Raphael S. Carvalho	68b527f100	table: simplify update_stats_for_new_sstable() no longer need to conditionally track the SSTable metadata, as table will no longer accept shared SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:04 -03:00
Raphael S. Carvalho	607c74dc95	table: remove unused open_sstable function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:00 -03:00
Raphael S. Carvalho	60467a7e36	table: no longer keep track of sstables that need resharding Now that table will no longer accept shared SSTables, it no longer needs to keep track of them. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:38 -03:00
Raphael S. Carvalho	cd548c6304	table: Remove unused functions no longer used by resharding Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:36 -03:00
Raphael S. Carvalho	68a4739a42	table: remove sstable::shared() condition from backlog tracker add/remove functions Now that table no longer accept shared SSTables, those two functions can be simplified by removing the shared condition. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:34 -03:00
Raphael S. Carvalho	343efe797d	table: No longer accept a shared SSTable With off-strategy work on reshard on boot and refresh, table no longer needs to work with Shared SSTables. That will unlock a host of cleanups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:04 -03:00
Pavel Emelyanov	f045cec586	snap: Get rid of storage_service reference in schema.cc Now when the snapshot stopping is correctly handled, we may pull the database reference all the way down to the schema::describe(). One tricky place is in table::napshot() -- the local db reference is pulled through an smp::submit_to call, but thanks to the shard checks in the place where it is needed the db is still "local" Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:28:25 +03:00
Avi Kivity	e5be3352cf	database, streaming, messaging: drop streaming memtables Before Scylla 3.0, we used to send streaming mutations using individual RPC requests and flush them together using dedicated streaming memtables. This mechanism is no longer in use and all versions that use it have long reached end-of-life. Remove this code.	2020-06-25 15:25:54 +02:00
Avi Kivity	de38091827	priority_manager: merge streaming_read and streaming_write classes into one class Streaming is handled by just once group for CPU scheduling, so separating it into read and write classes for I/O is artificial, and inflates the resources we allow for streaming if both reads and writes happen at the same time. Merge both classes into one class ("streaming") and adjust callers. The merged class has 200 shares, so it reduces streaming bandwidth if both directions are active at the same time (which is rare; I think it only happens in view building).	2020-06-22 15:09:04 +03:00
Glauber Costa	b34c0c2ff6	distributed_loader: rework uploading of SSTables Uploading of SSTables is problematic: for historical reasons it takes a lock that may have to wait for ongoing compactions to finish, then it disables writes in the table, and then it goes loading SSTables as if it knew nothing about them. With the sstable_directory infrastructure we can do much better: * we can reshard and reshape the SSTables in place, keeping the number of SSTables in check. Because this is an background process we can be fairly aggressive and set the reshape mode to strict. * we can then move the SSTables directly into the main directory. Because we know they are few in number we can call the more elegant add_sstable_and_invalidate_cache instead of the open coding currently done by load_new_sstables * we know they are not shared (if they were, we resharded them), simplifying the load process even further. The major changes after this patch is applied is that all compactions (resharding and reshape) needed to make the SSTables in-strategy are done in the streaming class, which reduces the impact of this operation on the node. When the SSTables are loaded, subsequent reads will not suffer as we will not be adding shared SSTables in potential high numbers, nor will we reshard in the compaction class. There is also no more need for a lock in the upload process so in the fast path where users are uploading a set of SSTables from a backup this should essentially be instantaneous. The lock, as well as the code to disable and enable table writes is removed. A future improvement is to bypass the staging directory too, in which case the reshaping compaction would already generate the view updates. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	e48ad3dc23	remove manifest_file filter from table. When we are scanning an sstable directory, we want to filter out the manifest file in most situations. The table class has a filter for that, but it is a static filter that doesn't depend on table for anything. We are better off removing it and putting in another independent location. While it seems wasteful to use a new header just for that, this header will soon be populated with the sstable_directory class. Tests: unit (dev) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Piotr Sarna	64b8b77ac2	table: add error injection points to the materialized view path ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Avi Kivity	0c34e114e2	Merge "Upgrade to seastar api version 3" (make_file_output_stream returns future) from Rafael " The new seastar api changes make_file_output_stream and make_file_data_sink to return futures. This series includes a few refactoring patches and the actual transition. " * 'espindola/api-v3-v3' of https://github.com/espindola/scylla: table: Fix indentation everywhere: Move to seastar api level 3 sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream sstables: Pass a data_sink to checksummed_file_writer's constructor sstables: Convert a file_writer constructor to a static make sstables: Move file_writer constructor out of line	2020-06-03 23:09:49 +03:00
Rafael Ávila de Espíndola	686f9220c1	table: Fix indentation It was broken by the previous commit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola	e5876f6696	everywhere: Move to seastar api level 3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Raphael S. Carvalho	077b4ee97d	table: Don't remove a SSTable from the backlog tracker if not previously added After `7f1a215`, a sstable is only added to backlog tracker if sstable::shared() returns true. sstable::shared() can return true for a sstable that is actually owned by more than one shard, but it can also incorrectly return true for a sstable which wasn't made explicitly unshared through set_unshared(). A recent work of mine is getting rid of set_unshared() because a sstable has the knowledge to determine whether or not it's shared. The problem starts with streaming sstable which hasn't set_unshared() called for it, so it won't be added to backlog tracker, but it can be eventually removed from the tracker when that sstable is compacted. Also, it could happen that a shared sstable, which was resharded, will be removed from the tracker even though it wasn't previously added. When those problems happen, backlog tracker will have an incorrect account of total bytes, which leads it to producing incorrect backlogs that can potentially go negative. These problems are fixed by making every add / removal go through functions which take into account sstable::shared(). Fixes #6227. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-2-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Raphael S. Carvalho	fb6976f1b9	Make sure SSTables created by streaming are added to backlog tracker New SStables are only added to backlog tracker if set_unshared() was called on their behalf. SStables created for streaming are not being added to the tracker because make_streaming_sstable_for_write() doesn't call set_unshared() nor does it caller. Which results in backlog not accounting for their existence, which means backlog will be much lower than expected. This problem could be fixed by adding a set_unshared() call but it turns out we don't even need set_unshared() anymore. It was introduced when Scylla metadata didn't exist, now a SSTable has built-in knowledge of whether or not it's shared. Relying on every SSTable creator calling set_unshared() is bug prone. Let's get rid of it and let the SStable itself say whether or not it's shared. If an imported SSTable has not Scylla metadata, Scylla will still be able to compute shards using token range metadata. Refs #6021. Refs #6227. Fixes #6441. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-1-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Pavel Emelyanov	67d5fad65f	storage_service: Remove some inclusions of its header GC pass over .cc files. Some really do not need it, some need for features/gossiper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Avi Kivity	0c6bbc84cd	Merge "Classify queries based on their initiator, rather than their target" from Botond " Currently we classify queries as "system" or "user" based on the table they target. The class of a query determines how the query is treated, currently: timeout, limits for reverse queries and the concurrency semaphore. The catch is that users are also allowed to query system tables and when doing so they will bypass the limits intended for user queries. This has caused performance problems in the past, yet the reason we decided to finally address this is that we want to introduce a memory limit for unpaged queries. Internal (system) queries are all unpaged and we don't want to impose the same limit on them. This series uses scheduling groups to distinguish user and system workloads, based on the assumption that user workloads will run in the statement scheduling group, while system workloads will run in the main (or default) scheduling group, or perhaps something else, but in any case not in the statement one. Currently the scheduling group of reads and writes is lost when going through the messaging service, so to be able to use scheduling groups to distinguish user and system reads this series refactors the messaging service to retain this distinction across verb calls. Furthermore, we execute some system reads/writes as part of user reads/writes, such as auth and schema sync. These processes are tagged to run in the main group. This series also centralises query classification on the replica and moves it to a higher level. More specifically, queries are now classified -- the scheduling group they run in is translated to the appropriate query class specific configuration -- on the database level and the configuration is propagated down to the lower layers. Currently this query class specific configuration consists of the reader concurrency semaphore and the max memory limit for otherwise unlimited queries. A corollary of the semaphore begin selected on the database level is that the read permit is now created before the read starts. A valid permit is now available during all stages of the read, enabling tracking the memory consumption of e.g. the memtable and cache readers. This change aligns nicely with the needs of more accurate reader memory tracking, which also wants a valid permit that is available in every layer. The series can be divided roughly into the following distinct patch groups: * 01-02: Give system read concurrency a boost during startup. * 03-06: Introduce user/system statement isolation to messaging service. * 07-13: Various infrastructure changes to prepare for using read permits in all stages of reads. * 14-19: Propagate the semaphore and the permit from database to the various table methods that currently create the permit. * 20-23: Migrate away from using the reader concurrency semaphore for waiting for admission, use the permit instead. * 24: Introduce `database::make_query_config()` and switch the database methods needing such a config to use it. * 25-31: Get rid of all uses of `no_reader_permit()`. * 32-33: Ban empty permits for good. * 34: querier_cache: use the queriers' permits to obtain the semaphore. Fixes: #5919 Tests: unit(dev, release, debug), dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual testing with a 2 node mixed cluster with extra logging. " * 'query-class/v6' of https://github.com/denesb/scylla: (34 commits) querier_cache: get semaphore from querier reader_permit: forbid empty permits reader_permit: fix reader_resources::operator bool treewide: remove all uses of no_reader_permit() database: make_multishard_streaming_reader: pass valid permit to multi range reader sstables: pass valid permits to all internal reads compaction: pass a valid permit to sstable reads database: add compaction read concurrency semaphore view: use valid permits for reads from the base table database: use valid permit for counter read-before-write database: introduce make_query_class_config() reader_concurrency_semaphore: remove wait_admission and consume_resources() test: move away from reader_concurrency_semaphore::wait_admission() reader_permit: resource_units: introduce add() mutation_reader: restricted_reader: work in terms of reader_permit row_cache: pass a valid permit to underlying read memtable: pass a valid permit to the delegate reader table: require a valid permit to be passed to most read methods multishard_mutation_query: pass a valid permit to shard mutation sources querier: add reader_permit parameter and forward it to the mutation_source ...	2020-05-29 10:11:44 +03:00
Piotr Sarna	77e943e9a3	db,views: unify time points used for update generation Until now, view updates were generated with a bunch of random time points, because the interface was not adjusted for passing a single time point. The time points were used to determine whether cells were alive (e.g. because of TTL), so it's better to unify the process: 1. when generating view updates from user writes, a single time point is used for the whole operation 2. when generating view updates via the view building process, a single time point is used for each build step NOTE: I don't see any reliable and deterministic way of writing test scenarios which trigger problems with the old code. After #6488 is resolved and error injection is integrated into view.cc, tests can be added. Fixes #6429 Tests: unit(dev) Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>	2020-05-28 12:56:09 +03:00
Botond Dénes	3cd2598ab3	reader_permit: forbid empty permits Remove `no_reader_permit()` and all ways to create empty (invalid) permits. All permits are guaranteed to be valid now and are only obtainable from a semaphore. `reader_permit::semaphore()` now returns a reference, as it is guaranteed to always have a valid semaphore reference.	2020-05-28 11:34:35 +03:00
Botond Dénes	992e697dd5	view: use valid permits for reads from the base table View update generation involves reading existing values from the base table, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to these reads. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the base table write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	4409579352	mutation_reader: restricted_reader: work in terms of reader_permit We want to refactor all read resource tracking code to work through the read_permit, so refactor the restricted reader to also do so.	2020-05-28 11:34:35 +03:00
Botond Dénes	fe024cecdc	row_cache: pass a valid permit to underlying read All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the underlying reader when creating it. This means `row_cache::make_reader()` now also requires a permit to be passed to it.	2020-05-28 11:34:35 +03:00
Botond Dénes	9ede82ebf8	memtable: pass a valid permit to the delegate reader All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the delegate reader when creating it. This means `memtable::make_flat_reader()` now also requires a permit to be passed to it. Internally the permit is stored in `scanning_reader`, which is used both for flushes and normal reads. In the former case a permit is not required.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Botond Dénes	0b4ec62332	flat_mutation_reader: flat_multi_range_reader: add reader_permit parameter Mutation sources will soon require a valid permit so make sure we have one and pass it to the mutation sources when creating the underlying readers. For now, pass no_reader_permit() on call sites, deferring the obtaining of a valid permit to later patches.	2020-05-28 11:34:35 +03:00
Piotr Sarna	18a37d0cb1	db,view: add tracing to view update generation path In order to improve materialized views' debuggability, tracing points are added to view update generation path. Sample info of an insert statement which resulted in producing local view updates which require read-before-write: activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-19 12:02:48.420000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-04-19 12:02:48.420674 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-04-19 12:02:48.420753 \| 127.0.0.1 \| 79 \| 127.0.0.1 Creating write handler for token: -6715243485458697746 natural: {127.0.0.1} pending: {} [shard 0] \| 2020-04-19 12:02:48.420815 \| 127.0.0.1 \| 141 \| 127.0.0.1 Creating write handler with live: {127.0.0.1} dead: {} [shard 0] \| 2020-04-19 12:02:48.420824 \| 127.0.0.1 \| 149 \| 127.0.0.1 Executing a mutation locally [shard 0] \| 2020-04-19 12:02:48.420830 \| 127.0.0.1 \| 155 \| 127.0.0.1 View updates for ks.t1 require read-before-write - base table reader is created [shard 0] \| 2020-04-19 12:02:48.420862 \| 127.0.0.1 \| 188 \| 127.0.0.1 Generated 2 view update mutations [shard 0] \| 2020-04-19 12:02:48.420910 \| 127.0.0.1 \| 235 \| 127.0.0.1 Locally applying view update for ks.t1_v_idx_index; base token = -6715243485458697746; view token = -4156302194539278891 [shard 0] \| 2020-04-19 12:02:48.420918 \| 127.0.0.1 \| 243 \| 127.0.0.1 Successfully applied local view update for 127.0.0.1 and 0 remote endpoints [shard 0] \| 2020-04-19 12:02:48.420971 \| 127.0.0.1 \| 297 \| 127.0.0.1 View updates for ks.t1 were generated and propagated [shard 0] \| 2020-04-19 12:02:48.420973 \| 127.0.0.1 \| 299 \| 127.0.0.1 Got a response from /127.0.0.1 [shard 0] \| 2020-04-19 12:02:48.420988 \| 127.0.0.1 \| 314 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-19 12:02:48.420990 \| 127.0.0.1 \| 315 \| 127.0.0.1 Mutation successfully completed [shard 0] \| 2020-04-19 12:02:48.420994 \| 127.0.0.1 \| 320 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-04-19 12:02:48.421000 \| 127.0.0.1 \| 326 \| 127.0.0.1 Request complete \| 2020-04-19 12:02:48.420330 \| 127.0.0.1 \| 330 \| 127.0.0.1 Sample info for remote updates: activity \| timestamp \| source \| source_elapsed \| client --------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-26 16:19:47.691000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1] \| 2020-04-26 16:19:47.691590 \| 127.0.0.1 \| 6 \| 127.0.0.1 Processing a statement [shard 1] \| 2020-04-26 16:19:47.692368 \| 127.0.0.1 \| 783 \| 127.0.0.1 Creating write handler for token: -3248873570005575792 natural: {127.0.0.3, 127.0.0.2} pending: {} [shard 1] \| 2020-04-26 16:19:47.694186 \| 127.0.0.1 \| 2598 \| 127.0.0.1 Creating write handler with live: {127.0.0.2, 127.0.0.3} dead: {} [shard 1] \| 2020-04-26 16:19:47.694283 \| 127.0.0.1 \| 2699 \| 127.0.0.1 Sending a mutation to /127.0.0.2 [shard 1] \| 2020-04-26 16:19:47.694591 \| 127.0.0.1 \| 3006 \| 127.0.0.1 Sending a mutation to /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.694862 \| 127.0.0.1 \| 3277 \| 127.0.0.1 Message received from /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.696358 \| 127.0.0.3 \| 40 \| 127.0.0.1 Message received from /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.696442 \| 127.0.0.2 \| 32 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 1] \| 2020-04-26 16:19:47.697762 \| 127.0.0.3 \| 1444 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 1] \| 2020-04-26 16:19:47.698120 \| 127.0.0.2 \| 1710 \| 127.0.0.1 Generated 1 view update mutations [shard 1] \| 2020-04-26 16:19:47.699107 \| 127.0.0.3 \| 2789 \| 127.0.0.1 Sending view update for ks.t_v2_idx_index to 127.0.0.4, with pending endpoints = {}; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] \| 2020-04-26 16:19:47.699345 \| 127.0.0.3 \| 3027 \| 127.0.0.1 Sending a mutation to /127.0.0.4 [shard 1] \| 2020-04-26 16:19:47.699614 \| 127.0.0.3 \| 3296 \| 127.0.0.1 Generated 1 view update mutations [shard 1] \| 2020-04-26 16:19:47.699824 \| 127.0.0.2 \| 3414 \| 127.0.0.1 Locally applying view update for ks.t_v2_idx_index; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] \| 2020-04-26 16:19:47.700012 \| 127.0.0.2 \| 3603 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 1] \| 2020-04-26 16:19:47.700059 \| 127.0.0.3 \| 3741 \| 127.0.0.1 Message received from /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.700958 \| 127.0.0.4 \| 37 \| 127.0.0.1 Successfully applied local view update for 127.0.0.2 and 0 remote endpoints [shard 1] \| 2020-04-26 16:19:47.701522 \| 127.0.0.2 \| 5112 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 1] \| 2020-04-26 16:19:47.701615 \| 127.0.0.2 \| 5206 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.701913 \| 127.0.0.3 \| 5595 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.702489 \| 127.0.0.3 \| 6171 \| 127.0.0.1 Got a response from /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.702667 \| 127.0.0.1 \| 11082 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 1] \| 2020-04-26 16:19:47.702689 \| 127.0.0.1 \| 11105 \| 127.0.0.1 Mutation successfully completed [shard 1] \| 2020-04-26 16:19:47.702784 \| 127.0.0.1 \| 11200 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.703016 \| 127.0.0.2 \| 6606 \| 127.0.0.1 Done processing - preparing a result [shard 1] \| 2020-04-26 16:19:47.703054 \| 127.0.0.1 \| 11470 \| 127.0.0.1 Sending mutation_done to /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.703720 \| 127.0.0.4 \| 2800 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.704527 \| 127.0.0.4 \| 3607 \| 127.0.0.1 Got a response from /127.0.0.4 [shard 1] \| 2020-04-26 16:19:47.704580 \| 127.0.0.3 \| 8262 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 1] \| 2020-04-26 16:19:47.704606 \| 127.0.0.3 \| 8288 \| 127.0.0.1 Successfully applied view update for 127.0.0.4 and 1 remote endpoints [shard 1] \| 2020-04-26 16:19:47.704853 \| 127.0.0.3 \| 8535 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.706092 \| 127.0.0.2 \| 9682 \| 127.0.0.1 Got a response from /127.0.0.2 [shard 1] \| 2020-04-26 16:19:47.709933 \| 127.0.0.1 \| 18348 \| 127.0.0.1 Request complete \| 2020-04-26 16:19:47.702582 \| 127.0.0.1 \| 11582 \| 127.0.0.1 Tests: unit(dev, debug)	2020-05-18 16:05:23 +02:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Raphael S. Carvalho	c06cdcdb3c	table: Don't allow a shared SSTable to be selected for regular compaction After commit `88d2486fca`, removal of shared SSTables is not atomic anymore. They can be first removed from the list of shared SSTables and only later be removed from the SSTable set. That list is used to filter out shared SSTables from regular compaction candidates. So it can happen that regular compaction pick up a shared SSTable as candidate after it was removed from that list but before it was removed from the set. To fix this, let's only remove a shared SSTable from that aforementioned list after it was successfully removed from the SSTable set, so that a shared SSTable cannot be selected for regular compaction anymore. Fixes #6439. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512175224.114487-1-raphaelsc@scylladb.com>	2020-05-13 10:43:48 +03:00
Avi Kivity	76d21a0c22	Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J. " We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR. This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry. Fixes #6098 Fixes #6146 Tests: unit(dev, release), manual " * haaawk-dont_cache_cdc: cdc: Don't cache CDC Log table table: invalidate disabled cache on memtable flush table: Add cache_enabled member function cf_prop_defs: persist caching_options in schema property_definitions: add get that returns variant feature: add PER_TABLE_CACHING feature caching_options: add enabled parameter	2020-05-10 15:39:42 +03:00
Raphael S. Carvalho	88d2486fca	sstables: Synchronize deletion of SSTables in resharding with other operations Input SSTables of resharding is deleted at the coordinator shard, not at the shards they belong to. We're not acquiring deletion semaphore before removing those input SSTables from the SSTable set, so it could happen that resharding deletes those SSTables while another operation like snapshot, which acquires the semaphore, find them deleted. Let's acquire the deletion semaphore so that the input SSTables will only be removed from the set, when we're certain that nobody is relying on their existence anymore. Now resharding will only delete input SStables after they're safely removed from the SSTable set of all shards they belong to. unit: test(dev). Fixes #6328. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200507233636.92104-1-raphaelsc@scylladb.com>	2020-05-10 10:50:32 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Avi Kivity	bef8e5e930	Merge "Don't invalidate row cache when adding GC SStable to SSTable Set" from Raphael " Garbage collected SSTables, created by incremental compaction process, are being added to the SSTable set using a function that invalidates row cache using the range of the SSTable itself. That's incorrect because data in GC SSTables come from preexisting SSTables in set, meaning the state of data isn't changed and so no need for invalidation at all. Incorrect invalidation like this is a source of read performance issues. This problem is fixed by including GC SSTables to the descriptor which is used to specify changes to the SSTable set, which is the correct thing to do given that a midway failure could leave the set in an incorrect state. Fixes #5956. Fixes #6275. tests: unit(dev) " * 'fix_issue_5956_v4' of github.com:raphaelsc/scylla: sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set sstables/compaction: Change meaning of compaction_completion_desc input and output fields sstables/compaction: Clean up code around garbage_collected_sstable_writer	2020-05-07 14:10:49 +03:00
Benny Halevy	b2f50224d9	table: database_sstable_write_monitor: revert charges in destructor We must unregister the monitor upon destruction to prevent use-after-free from `compaction_backlog_tracker::backlog` path. This is similar to ~compaction_read_monitor as implemented in commit `ca284174d0` Fixes #6385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506214419.569655-1-bhalevy@scylladb.com>	2020-05-07 10:39:39 +02:00
Piotr Jastrzebski	38ede62a02	table: invalidate disabled cache on memtable flush table::update_cache has two branches of its logic. One when caching is enabled and the other when it's disabled. This patch adds unconditional cache invalidation to the second (disabled caching) branch. This is done for two purposes. First and foremost, it gives the guarantee that when we enable the cache later it will be in the right state and will be ready for usage. This is because any memtable flush that would logically invalidate the cache, actually physically does that too now. An additional benefit of this change is that disabled cache will be cleared during the next memtable flush that will happen after turning the switch off. Previously, the cache would also be emptied but it would take more time before all its elements are removed by eviction. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Jastrzebski	1a43849cd2	table: Add cache_enabled member function This function determines cache usage based both on table _config and dynamic schema information. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Raphael S. Carvalho	8f4458f1d5	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:08 -03:00
Glauber Costa	70e5252a5d	table: no longer accept online loading of SSTable files in the main directory Loading SSTables from the main directory is possible, to be compatible with Cassandra, but extremely dangerous and not recommended. From the beginning, we recommend using an separate, upload/ directory. In all this time, perhaps due to how the feature's usefulness is reduced in Cassandra due to the possible races, I have never seen anyone coming from Cassandra doing procedures involving refresh at all. Loading SSTables from the main directory forces us to disable writes to the table temporarily until the SSTables are sorted out. If we get rid of this, we can get rid of the disabling of the writes as well. We can't do it now because if we want to be nice to the odd user that may be using refresh through the main directory without our knowledge we should at least error out. This patch, then, does that: it errors out if SSTables are found in the main directory. It will not proceed with the refresh, and direct the user to the upload directory. The main loop in reshuffle_sstables is left in place structurally for now, but most of it is gone. The test for is is deleted. After a period of deprecation we can start ignoring these SSTables and get rid of the lock. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200429144511.13681-1-glauber@scylladb.com>	2020-05-03 08:40:38 +03:00
Rafael Ávila de Espíndola	95ee54f3cc	sstables: Call monitor->write_failed earlier. A writer is destroyed just before consume_in_thread returns, since the adapter takes ownership of it. The problem is that a monitor can keep a reference to the a writer_offset_tracker that is owned by that writer. The monitor is accessed periodically via backlog_controller::_update_timer. This means we have to deregister from the list of ongoing writes before the writer is destroyed. If the write fails, the deregistration happens in write_failed, but it is currently called after the writer is destroyed. This patch moves the call to write_failed to the writer destructor as I could not find a convenient location to put it. Since the writer is destroyed in consume_in_thread, we could call it there, but then we also have to update consume. The is a similar problem with the case where the sstable is written correctly. That will be fixed in the next patch. Fixes #6221. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-04-27 08:58:31 -07:00
Rafael Ávila de Espíndola	95acfd1d58	sstables: Add write_failed to the write_monitor interface Only database_sstable_write_monitor needs it so far, but the call needs to be moved earlier, which requires calling it in code paths that don't know about database_sstable_write_monitor. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-04-27 08:58:31 -07:00
Piotr Sarna	c66661c582	table: bypass cache when generating view updates from streaming There's no indication that data needed for generating view updates from staging sstables is going to be immediately useful for the user, and a large amount of it can push hot rows out of the cache, thus deteriorating performance. Fixes #6233 Tests: unit(dev)	2020-04-26 15:43:02 +03:00
Glauber Costa	1f9c37fb5e	view_updating_consumer: move reference to a pointer It is currently not possible to wrap the view_updating_consumer in an std::optional. I intend to do it to allow for compactions to optionally generate view updates. The reason for that is that view_updating_consumer has a reference as a member, which makes the move assignment constructor not be implicitly generated. This patch fixes it by keeping a pointer instead of a reference. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200421123648.8328-1-glauber@scylladb.com>	2020-04-22 10:05:35 +03:00
Piotr Sarna	a6cf0bfa7d	table: switch to correct io_priority for streaming view updates The io_priority parameter used when generating view updates from streaming is used by the sstable reader, so it should use the I/O priority for streaming read operations, not streaming write operations. Fixes #6231 Tests: unit(dev)	2020-04-19 09:56:43 +03:00

1 2 3 4

160 Commits