scylladb

Author	SHA1	Message	Date
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Botond Dénes	e0b98ba921	database: give system reads a concurrency boost during startup In the next patches we will match reads to the appropriate reader concurrency semaphore based on the scheduling group they run in. This will result in a lot of system reads that are executed during startup and that were up to now (incorrectly) using the user read semaphore to switch to the system read semaphore. This latter has a much more constrained concurrency, which was observed to cause system reads to saturate and block on the semaphore, slowing down startup. To solve this, boost the concurrency of the system read semaphore during startup to match that of the user semaphore. This is ok, as during startup there are no user reads to compete with. After startup, before we start serving user reads the concurrency is reverted back to the normal value.	2020-05-28 10:40:08 +03:00
Tomasz Grabiec	1424543e11	Merge "Move sstables_format on sstable_manager" from Pavel Emelyanov The format is currently sitting in storage_service, but the previous set patched all the users not to call it, instead they use sstables_manager to get the highest supported format. So this set finalizes this effort and places the format on sstables_manager(s). The set introduces the db::sstables_format_selector, that - starts with the lowest format (ka) - reads one on start from system tables - subscribes on sstables-related features and bumps up the selection if the respective feature is enabled During its lifetime the selector holds a reference to the sharded<database> and updates the format on it, the database, in turn, propagates it further to sstables_managers. The managers start with the highest known format (mc) which is done for tests. * https://github.com/xemul/scylla br-move-sstables-format-4: storage_service: Get rid of one-line helpers system_keyspace: Cleanup setup() from storage_service format_selector: Log which format is being selected sstables_manager: Keep format on format_selector: Make it standalone format_selector: Move the code into db/ format_selector: Select format locally storage_service: Introduce format_selector storage_service: Split feature_enabled_listener::on_enabled storage_service: Tossing bits around features: Introduce and use masked features features: Get rid of per-features booleans	2020-05-27 08:40:05 +03:00
Pavel Emelyanov	89a1b09214	sstables_manager: Keep format on Make the database be the format_selector target, so when the format is selected its set on database which in turn just forwards the selection into sstables managers. All users of the format are already patched to read it from those managers. The initial value for the format is the highest, which is needed by tests. When scylla starts the format is updated by format_selector, first after reading from system tables, then by selectiing it from features. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:28 +03:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Avi Kivity	beaeda5234	database: remove variadic future from query() and query_mutations() Variadic futures are deprecated; replace with future<std::tuple<...>>. Tests: unit (dev)	2020-05-17 18:45:38 +02:00
Glauber Costa	7423ccc318	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Avi Kivity	76d21a0c22	Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J. " We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR. This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry. Fixes #6098 Fixes #6146 Tests: unit(dev, release), manual " * haaawk-dont_cache_cdc: cdc: Don't cache CDC Log table table: invalidate disabled cache on memtable flush table: Add cache_enabled member function cf_prop_defs: persist caching_options in schema property_definitions: add get that returns variant feature: add PER_TABLE_CACHING feature caching_options: add enabled parameter	2020-05-10 15:39:42 +03:00
Avi Kivity	5b971397aa	Revert "compaction_manager: allow early aborts through abort sources." This reverts commit `e8213fb5c3`. It results in an assertion failure in remove_index_file_test. Fixes #6413.	2020-05-10 12:32:18 +03:00
Raphael S. Carvalho	88d2486fca	sstables: Synchronize deletion of SSTables in resharding with other operations Input SSTables of resharding is deleted at the coordinator shard, not at the shards they belong to. We're not acquiring deletion semaphore before removing those input SSTables from the SSTable set, so it could happen that resharding deletes those SSTables while another operation like snapshot, which acquires the semaphore, find them deleted. Let's acquire the deletion semaphore so that the input SSTables will only be removed from the set, when we're certain that nobody is relying on their existence anymore. Now resharding will only delete input SStables after they're safely removed from the SSTable set of all shards they belong to. unit: test(dev). Fixes #6328. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200507233636.92104-1-raphaelsc@scylladb.com>	2020-05-10 10:50:32 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Glauber Costa	e8213fb5c3	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Tests: unit (dev), manual start+stop, manual drain + stop Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200506184749.98288-1-glauber@scylladb.com>	2020-05-07 13:24:47 +03:00
Piotr Jastrzebski	1a43849cd2	table: Add cache_enabled member function This function determines cache usage based both on table _config and dynamic schema information. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Sarna	c66661c582	table: bypass cache when generating view updates from streaming There's no indication that data needed for generating view updates from staging sstables is going to be immediately useful for the user, and a large amount of it can push hot rows out of the cache, thus deteriorating performance. Fixes #6233 Tests: unit(dev)	2020-04-26 15:43:02 +03:00
Piotr Sarna	71ac6ebcc5	Merge 'prepare the view building generator to work through a compaction' from Glauber There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately * glommer-view_build: staging: potentially read many SSTables at the same time view_build_test: make sure it works with smp > 1	2020-04-15 18:07:09 +02:00
Glauber Costa	4e6400293e	staging: potentially read many SSTables at the same time There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-15 11:26:44 -04:00
Konstantin Osipov	18b9bb57ac	lwt: rename metrics to match accepted terminology Rename inherited metrics cas_propose and cas_commit to cas_accept and cas_learn respectively. A while ago we made a decision to stick to widely accepted terms for Paxos rounds: prepare, accept, learn. The rest of the code is using these terms, so rename the metrics to avoid confusion/technical debt. While at it, rename a few internal methods and functions. Fixes #6169 Message-Id: <20200414213537.129547-1-kostja@scylladb.com>	2020-04-15 12:20:30 +02:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Glauber Costa	463d0ab37c	compaction: move rewrite_sstables to the compaction_manager There is no reason why the table code has to be aware of the efforts of rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it. Rewrite is special, because we need to do it one SSTable at a time, without lumping it together. However, the compaction manager is totally capable of doing that itself. If we do that, the special "table::rewrite_sstables" can be killed. This code would maybe be better off as a thread, where we wouldn't need to keep state. However there are some methods like maybe_stop_on_error() that expect a future so I am leaving this be for now. This is a cleanup that can be done later. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200401162722.28780-2-glauber@scylladb.com>	2020-04-06 16:02:30 +03:00
Glauber Costa	87dd23db03	compaction: use a larger min_threshold during bootstrap, replace During bootstrap and replace operations the node can't take reads and we'd like to see the process ending ASAP. This is because until the process ends, we keep having to duplicate writes to an extended set. Not to mention, in the case of a cluster expansion users want to use the added capacity sooner rather than later. Streaming generates a lot of compaction activity, that competes with the bootstrap itself, slowing it down. Long term, we are moving to treat those compactions differently and maybe postpone them altogether. However for now we can reduce the amount of compactions by increasing the minimum threshold of SSTables that have to accumulate before they are selected for compactions. The default is 2, meaning we will trigger a compaction every time 2 SSTables of about the same size are found (for STCS, others follow a similar pattern). Until we have offstrategy infrastructure we don't want the compactions to stop happening altogether so the reads, when they start, don't suffer. This patch sets the minimum threshold to 16 (for the default max_threshold of 32), meaning we will generate a lot less compaction activity during streaming. Once streaming is done we revert it to its original. Unfortunately there isn't much we can do at the moment about decommission. During decommission the nodes receiving data are also taking reads and we don't want SSTables to accumulate. Fixes #5109 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-01 10:06:27 +03:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Pavel Emelyanov	96e3d0fa36	mutation_partition: Debloat header form others Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200317191051.12623-1-xemul@scylladb.com>	2020-03-18 11:53:36 +02:00
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Rafael Ávila de Espíndola	ba453d832b	Pass string_view to keyspace_metadata::new_keyspace This avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	94d07fba07	Pass string_view to the keyspace_metadata constructor This avoids a few sstring copies when constructing keyspace_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	2b96abcece	Pass string_view to no_such_column_family's constructor With this we don't have to construct a sstring to construct a no_such_column_family. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Botond Dénes	1073094f04	database: database::query(), database::apply(): remove default timeouts	2020-02-27 19:14:12 +02:00
Botond Dénes	2c1ee7b9cd	database: table::query(): remove default timeout	2020-02-27 19:14:09 +02:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Piotr Sarna	5e07c00eeb	Merge 'Delete table snapshot' from Amnon This series adds an option to the API that supports deleting a specific table from a snapshot. The implementation works in a similar way to the option to specify specific keyspaces when deleting a snapshot. The motivation is to allow reducing disk-space when using the snapshot for backup. A dtest PR is sent to the dtest repository. Fixes #5658 Original PR #5805 Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf) * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot	2020-02-24 09:38:57 +01:00
Tomasz Grabiec	d0b6be0820	Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache." Fixes #4446. tests: - unit tests (dev mode) - dtests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test cleanup_test.py	2020-02-20 18:20:56 +01:00
Raphael S. Carvalho	fa16845353	database: Fix on_compaction_completion doc Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:34 -03:00
Raphael S. Carvalho	65b4fc8bcd	sstables/compaction: Introduce compaction_completion_desc This descriptor contain all information needed for table to be properly updated on compaction completion. A new member will be added to it soon, which will store ranges to be invalidated in row cache on behalf of cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:29:32 -03:00
Pavel Emelyanov	8435e93549	db: Move unbounded_range_tombstones listening from storage_service Now the database keeps reference on feature service, so we can listen on the feature in it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Amnon Heiman	c3260bad25	storage_service: Add optional table name to clear snapshot There are cases when it is useful to delete specific table from a snapshot. An example is when a snapshot is used for backup. Backup can take a long period of time, during that time, each of the tables can be deleted once it was backup without waiting for the entire backup process to completed. This patch adds such an option to the database and to the storage_service wrapping method that calls it. If a table is specified a filter function is created that filter only the column family with that given name. This is similar to the filtering at the keyspace level. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Botond Dénes	8014c7124d	compaction_manager: collect all cleanup related logic in perform_cleanup() Currently the call chain for a cleanup collection looks like this: compaction_manager::perform_cleanup() compaction_manager::rewrite_sstables() table::cleanup_sstables() ... `perform_cleanup()` is essentially empty, immediately deferring to `rewrite_sstables()`. Cleanup related logic is scattered between the latter two methods on the call chain. These methods however recently started serving as generic methods for compactions that want to rewrite each sstable one-by-one, collecting cleanup related ifs in various places. The reason is historic, we first had cleanup, then bolted others on top, trying to share the underlying code as much as possible. It is time this is cleaned up (pun intended). Make `perform_cleanup()` the place where all cleanup related logic is, with the rest of the stack made truly generic.	2020-02-11 17:47:44 +02:00
Botond Dénes	b2dc5d4895	compaction: compaction_descriptor: use compaction options instead of cleanup flag Instead of the restrictive `cleanup` boolean flag, which allows for choosing between only two compaction types, use `compaction_options`, which in addition to allowing any number of compaction types to be selected, also allows seamlessly passing specific options to them.	2020-02-11 17:47:44 +02:00
Pavel Emelyanov	1a3f78a57d	database: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	de1dc59548	migration_manager: Refactor validation of new/updating ksm The goal is to have token_metadata reference intide the keyspace_metadata.validate method. This can be acheived by doing the validation through the database reference which is "at hands" in migration_manager. While at it, merge the validation with exists/not-exists checks done in the same places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00

1 2 3 4 5 ...

797 Commits