scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 18:10:39 +00:00

Author	SHA1	Message	Date
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Avi Kivity	3976066156	test: sstable_datafile_test: prepare for asynchronously closed sstables_manager sstables_manager will soon be closed asynhronously, with a future-returning close() function. To prepare for that, make the following changes - replace on-stack test_env with test_env::do_with() - use the variant of column_family_for_tests that accepts an sstables_manager - replace test_sstables_manager with an sstables_manager obtained from test_env These changes allow lifetime management of the sstables_manager used in the tests to be centralized in test_env. Since test_env now calls await_background_jobs on termination, those calls are dropped.	2020-09-23 20:55:12 +03:00
Avi Kivity	1c1a737eda	test: sstable_datafile_test: drop bad 'return' The pattern return function_returning_a_future().get(); is legal, but confusing. It returns an unexpected std::tuple<>. Here, it doesn't do any harm, but if we try to coerce the surrounding code into a signature (void ()), then that will fail. Remove the unneeded and unexpected return.	2020-09-23 20:55:06 +03:00
Avi Kivity	c27c2a06bb	test: sstable_datafile_test: reorder table stop in compaction_manager_test Stopping a table will soon close its sstables; so the next check will fail as the number of sstables for the table will be zero. Reorder the stop() call to make it safe. We don't need the stop() for the check, since the previous loop made sure compactions completed.	2020-09-23 20:55:03 +03:00
Pavel Emelyanov	a6e6856e1f	compaction: Keep database reference on cleanup options The database is available at both places that create the options -- tests and API perform_cleanup call. Options object doesn't over-survive the returned future, so it's safe to keep the reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Rafael Ávila de Espíndola	6363716799	schema: Pass an rvalue to set_compaction_strategy_options This produces less code and makes sure every caller moves the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-19 14:02:35 -07:00
Raphael S. Carvalho	3be1420083	test: Check that TWCS properly performs size-tiered compaction on past windows Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	f2b588cfc4	compaction/twcs: Make newest_bucket() non-static To fix #6928, newest_bucket() will have to access the class fields. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-17 12:29:34 -03:00
Raphael S. Carvalho	11df96718a	compaction: Prevent non-regular compaction from picking compacting SSTables After `8014c7124`, cleanup can potentially pick a compacting SSTable. Upgrade and scrub can also pick a compacting SSTable. The problem is that table::candidates_for_compaction() was badly named. It misleads the user into thinking that the SSTables returned are perfect candidates for compaction, but manager still need to filter out the compacting SSTables from the returned set. So it's being renamed. When the same SSTable is compacted in parallel, the strategy invariant can be broken like overlapping being introduced in LCS, and also some deletion failures as more than one compaction process would try to delete the same files. Let's fix scrub, cleanup and ugprade by calling the manager function which gets the correct candidates for compaction. Fixes #6938. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>	2020-08-16 17:31:03 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Rafael Ávila de Espíndola	bd2f9fc685	test: Move sstable_run_based_compaction_strategy_for_tests.hh to test/lib This is in preparation to moving the code to a .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-11 11:48:41 -07:00
Avi Kivity	3530e80ce1	Merge "Support md format" from Benny " This series adds support for the "md" sstable format. Support is based on the following: * do not use clustering based filtering in the presence of static row, tombstones. * Disabling min/max column names in the metadata for formats older than "md". * When updating the metadata, reset and disable min/max in the presence of range tombstones (like Cassandra does and until we process them accurately). * Fix the way we maintain min/max column names by: keeping whole clustering key prefixes as min/max rather than calculating min/max independently for each component, like Cassandra does in the "md" format. Fixes #4442 Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug) md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1 " * tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits) config: enable_sstables_md_format by default test: cql_query_test: add test_clustering_filtering unit tests table: filter_sstable_for_reader: allow clustering filtering md-format sstables table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results table: filter_sstable_for_reader: adjust to md-format table: filter_sstable_for_reader: include non-scylla sstables with tombstones table: filter_sstable_for_reader: do not filter if static column is requested table: filter_sstable_for_reader: refactor clustering filtering conditional expression features: add MD_SSTABLE_FORMAT cluster feature config: add enable_sstables_md_format database: add set_format_by_config test: sstable_3_x_test: test both mc and md versions test: Add support for the "md" format sstables: mx/writer: use version from sstable for write calls sstables: mx/writer: update_min_max_components for partition tombstone sstables: metadata_collector: support min_max_components for range tombstones sstable: validate_min_max_metadata: drop outdated logic sstables: rename mc folder to mx sstables: may_contain_rows: always true for old formats sstables: add may_contain_rows ...	2020-08-11 13:29:11 +03:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Benny Halevy	bd4383a842	sstables: mx/writer: update_min_max_components for partition tombstone Partition tombstones represent an implicit clustering range that is unbound on both sides, so reflect than in min/max column names metadata using empty clustering key prefixes. If we don't do that, when using the sstable for filtering, we have no other way of distinguishing range tombstones from partition tombstones given the sstable metadata and we would need to include any sstable with tombstones, even if those are range tombstone, for which we can do a better filtering job, using the sstable min/max column names metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	68acae5873	sstables: metadata_collector: support min_max_components for range tombstones We essentially treat min/max column names as range bounds with min as incl_start and max as incl_end. By generating a bound_view for min/max column names on the fly, we can correctly track and compare also short clustering key prefixes that may be used as bounds for range tombstones. Extend the sstable_tombstone_metadata_check unit test to cover these cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Pekka Enberg	a37eaaa022	sstables: Add support for the "md" format enum value Add the sstable_version_types::md enum value and logically extend sstable_version_types comparisons to cover also the > sstable_version_types::mc cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	9f114d821a	sstables: keep whole clustering_key_prefix as min/max_column_names Currently we compare each min/max component independently. This may lead to suboptimal, inclusive clustering ranges that do not indicate any actual key we encountered. For example: ['a', 2], ['b', 1] will lead to min=['a', 1], max=['b', 2] instead of the keys themselves. This change keeps the min or max keys as a whole. It considers shorter clustering prefixes (that are possible with compact storage) as range tombstone bounds, so that a shorter key is considered less than the minimum if the latter has a common prefix, and greater than the maximum if the latter has a common prefix. Extend the min_max_clustering_key_test to test for this case. Previously {"a", "2"}, {"b", "1"} clustering keys would erronuously end up with min={"a", "1"} max={"b", "2"} while we want them to be min={"a", "2"} max={"b", "1"}. Adjust sstable_3_x_test to ignore original mc sstables that were previously computed with different min/max column names. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:03 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Botond Dénes	fe127a2155	sstables: clamp estimated_partitions to [1, +inf) in writers In some cases estimated number of partitions can be 0, which is albeit a legit estimation result, breaks many low-level sstable writer code, so some of these have assertions to ensure estimated partitions is > 0. To avoid hitting this assert all users of the sstable writers do the clamping, to ensure estimated partitions is at least 1. However leaving this to the callers is error prone as #6913 has shown it. As this clamping is standard practice, it is better to do it in the writers themselves, avoiding this problem altogether. This is exactly what this patch does. It also adds two unit tests, one that reproduces the crash in #6913, and another one that ensures all sstable writers are fine with estimated partitions being 0 now. Call sites previously doing the clamping are changed to not do it, it is unnecessary now as the writer does it itself. Fixes #6913 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>	2020-07-27 09:19:37 +02:00
Rafael Ávila de Espíndola	e15c8ee667	Everywhere: Explicitly instantiate make_lw_shared seastar::make_lw_shared has a constructor taking a T&&. There is no such constructor in std::make_shared: https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared This means that we have to move from make_lw_shared(T(...) to make_lw_shared<T>(...) If we don't want to depend on the idiosyncrasies of seastar::make_lw_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	efeaded427	Everywhere: Add a make_shared_schema helper This replaces a lot of make_lw_shared(schema(...)) with make_shared_schema(...). This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	66d866427d	sstable_datafile_test: Use BOOST_REQUIRE_EQUAL This only works for types that can be printed, but produces a better error message if the check fails. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200716232700.521414-1-espindola@scylladb.com>	2020-07-17 11:58:58 +03:00
Raphael S. Carvalho	cf352e7c14	sstables: optimize procedure that checks if a sstable needs cleanup needs_cleanup() returns true if a sstable needs cleanup. Turns out it's very slow because it iterates through all the local ranges for all sstables in the set, making its complexity: O(num_sstables * local_ranges) We can optimize it by taking into account that abstract_replication_strategy documents that get_ranges() will return a list of ranges that is sorted and non-overlapping. Compaction for cleanup already takes advantage of that when checking if a given partition can be actually purged. So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)). With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means the max # of checks performed will go from 768000 to ~9584. Fixes #6730. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com>	2020-06-30 12:58:43 +03:00
Raphael S. Carvalho	8e47f61df7	compaction: Enable tombstone expiration based on the presence of the sstable set For tombstone expiration to proceed correctly without the risk of resurrecting data, the sstable set must be present. Regular compaction and derivatives provide the sstable set, so they're able to expire tombstones with no resurrection risk. Resharding, on the other hand, can run on any shard, not necessarily on the same shard that one of the input sstables belongs to, so it currently cannot provide a sstable set for tombstone expiration to proceed safely. That being said, let's only do expiration based on the presence of the set. This makes room for the sstable set to be feeded to compaction via descriptor, allowing even resharding to do expiration. Currently, compaction thinks that sstable set can only come from the table, and that also needs to be changed for further flexibility. It's theoretically possible that a given resharding job will resurrect data if a fully expired SSTable is resharded at a shard which it doesn't belong to. Resharding will have no way to tell that expiring all that data will lead to resurrection because the relevant SSTables are at different shards. This is fixed by checking for fully expired sstables only on presence of the sstable set. Fixes #6600. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200605200954.24696-1-raphaelsc@scylladb.com>	2020-06-07 11:46:48 +03:00
Raphael S. Carvalho	fb6976f1b9	Make sure SSTables created by streaming are added to backlog tracker New SStables are only added to backlog tracker if set_unshared() was called on their behalf. SStables created for streaming are not being added to the tracker because make_streaming_sstable_for_write() doesn't call set_unshared() nor does it caller. Which results in backlog not accounting for their existence, which means backlog will be much lower than expected. This problem could be fixed by adding a set_unshared() call but it turns out we don't even need set_unshared() anymore. It was introduced when Scylla metadata didn't exist, now a SSTable has built-in knowledge of whether or not it's shared. Relying on every SSTable creator calling set_unshared() is bug prone. Let's get rid of it and let the SStable itself say whether or not it's shared. If an imported SSTable has not Scylla metadata, Scylla will still be able to compute shards using token range metadata. Refs #6021. Refs #6227. Fixes #6441. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-1-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Avi Kivity	0c6bbc84cd	Merge "Classify queries based on their initiator, rather than their target" from Botond " Currently we classify queries as "system" or "user" based on the table they target. The class of a query determines how the query is treated, currently: timeout, limits for reverse queries and the concurrency semaphore. The catch is that users are also allowed to query system tables and when doing so they will bypass the limits intended for user queries. This has caused performance problems in the past, yet the reason we decided to finally address this is that we want to introduce a memory limit for unpaged queries. Internal (system) queries are all unpaged and we don't want to impose the same limit on them. This series uses scheduling groups to distinguish user and system workloads, based on the assumption that user workloads will run in the statement scheduling group, while system workloads will run in the main (or default) scheduling group, or perhaps something else, but in any case not in the statement one. Currently the scheduling group of reads and writes is lost when going through the messaging service, so to be able to use scheduling groups to distinguish user and system reads this series refactors the messaging service to retain this distinction across verb calls. Furthermore, we execute some system reads/writes as part of user reads/writes, such as auth and schema sync. These processes are tagged to run in the main group. This series also centralises query classification on the replica and moves it to a higher level. More specifically, queries are now classified -- the scheduling group they run in is translated to the appropriate query class specific configuration -- on the database level and the configuration is propagated down to the lower layers. Currently this query class specific configuration consists of the reader concurrency semaphore and the max memory limit for otherwise unlimited queries. A corollary of the semaphore begin selected on the database level is that the read permit is now created before the read starts. A valid permit is now available during all stages of the read, enabling tracking the memory consumption of e.g. the memtable and cache readers. This change aligns nicely with the needs of more accurate reader memory tracking, which also wants a valid permit that is available in every layer. The series can be divided roughly into the following distinct patch groups: * 01-02: Give system read concurrency a boost during startup. * 03-06: Introduce user/system statement isolation to messaging service. * 07-13: Various infrastructure changes to prepare for using read permits in all stages of reads. * 14-19: Propagate the semaphore and the permit from database to the various table methods that currently create the permit. * 20-23: Migrate away from using the reader concurrency semaphore for waiting for admission, use the permit instead. * 24: Introduce `database::make_query_config()` and switch the database methods needing such a config to use it. * 25-31: Get rid of all uses of `no_reader_permit()`. * 32-33: Ban empty permits for good. * 34: querier_cache: use the queriers' permits to obtain the semaphore. Fixes: #5919 Tests: unit(dev, release, debug), dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual testing with a 2 node mixed cluster with extra logging. " * 'query-class/v6' of https://github.com/denesb/scylla: (34 commits) querier_cache: get semaphore from querier reader_permit: forbid empty permits reader_permit: fix reader_resources::operator bool treewide: remove all uses of no_reader_permit() database: make_multishard_streaming_reader: pass valid permit to multi range reader sstables: pass valid permits to all internal reads compaction: pass a valid permit to sstable reads database: add compaction read concurrency semaphore view: use valid permits for reads from the base table database: use valid permit for counter read-before-write database: introduce make_query_class_config() reader_concurrency_semaphore: remove wait_admission and consume_resources() test: move away from reader_concurrency_semaphore::wait_admission() reader_permit: resource_units: introduce add() mutation_reader: restricted_reader: work in terms of reader_permit row_cache: pass a valid permit to underlying read memtable: pass a valid permit to the delegate reader table: require a valid permit to be passed to most read methods multishard_mutation_query: pass a valid permit to shard mutation sources querier: add reader_permit parameter and forward it to the mutation_source ...	2020-05-29 10:11:44 +03:00
Raphael S. Carvalho	097a5e9e07	compaction: Disable garbage collected writer if interposer consumer is used GC writer, used for incremental compaction, cannot be currently used if interposer consumer is used. That's because compaction assumes that GC writer will be operated only by a single compaction writer at a given point in time. With interposer consumer, multiple writers will concurrently operate on the same GC writer, leading to race condition which potentially result in use-after-free. Let's disable GC writer if interposer consumer is enabled. We're not losing anything because GC writer is currently only needed on strategies which don't implement an interposer consumer. Resharding will always disable GC writer, which is the expected behavior because it doesn't support incremental compaction yet. The proper fix, which allows GC writer and interposer consumer to work together, will require more time to implement and test, and for that reason, I am postponing it as #6472 is a showstopper for the current release. Fixes #6472. tests: mode(dev). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200526195428.230472-1-raphaelsc@scylladb.com>	2020-05-29 08:26:43 +02:00
Botond Dénes	d68ac8bf18	treewide: remove all uses of no_reader_permit()	2020-05-28 11:34:35 +03:00
Botond Dénes	e4c591aa67	database: introduce make_query_class_config() And use it to obtain any query-class specific configuration that was obtained from `table::config` before, such as the read concurrency semaphore and the max memory limit for unlimited queries. As all users of these items get these from the query class config now, we can remove them from `table::config`.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Glauber Costa	e29701ca1c	compaction_manager: expand state to be able to differentiate between enabled and stopped We are having many issues with the stop code in the compaction_manager. Part of the reason is that the "stopped" state has its meaning overloaded to indicate both "compaction manager is not accepting compactions" and "compaction manager is not ready or destructed". In a later step we could default to enabled-at-start, but right now we maintain current behavior to minimize noise. It is only possible to stop the compaction manager once. It is possible to enable / disable the compaction manager many times. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Glauber Costa	70a89ab4ab	compaction: do not assume I/O priority class We shouldn't assume the I/O priority class for compactions. For instance, if we are dealing with offstrategy compactions we may want to use the maintenance group priority for them. For now, all compactions are put in the compaction class. rewrite compactions (scrub, cleanup) could be maintenance, but we don't have clear access to the database object at this time to derive the equivalent CPU priority. This is planned to be changed in the future, and when we do change it, we'll adjust. Same goes for resharding: while we could at this point change it we'd risking memory pressure since resharding is run online and sstables are shared until resharding is done. When we move it to offline execution we'll do it with maintenance priority. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200512002233.306538-3-glauber@scylladb.com>	2020-05-12 08:23:19 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Raphael S. Carvalho	a214ccdf89	sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set Garbage collected SSTable is incorrectly added to SSTable set with a function that invalidates row cache. This problem is fixed by adding GC SStable to set using mechanism which replaces old sstables with new sstables. Also, adding GC SSTable to set in a separate call is not correct. We should make sure that GC SSTable reaches the SSTable set at the same time its respective old (input) SSTable is removed from the set, and that's done using a single request call to table. Fixes #5956. Fixes #6275. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:19 -03:00
Raphael S. Carvalho	8f4458f1d5	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:08 -03:00
Glauber Costa	55f5ca39a9	sstable_test: rework test to use a thread The compaction_manager test lives inside a thread and it is not taking advantage of it, with continuations all over. One of the side effects of it is that the test is calling stop() twice on the compaction_manager. While this works today, it is not good practice. A change I am making is just about to break it. This patch converts the test to fully use .get() instead of chained continuations and in doing so also guarantees that the compaction manager will be RAII-stopped just one, from a defer object. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200503161420.8346-2-glauber@scylladb.com>	2020-05-03 19:54:04 +03:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Avi Kivity	e9e2b75a76	Merge "Allow Major compactions for TWCS" from Glauber " This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects In order to achieve that, we start by changing the way resharding works: it will now work with a read interposer, similar to the one TWCS uses for streaming data. Once we do that, a lot of assumptions that exist in the compaction code can be simplified and supporting TWCS major compactions become a matter of simply enabling its interposer in the compaction code as well. There are many further simplifications that this work exposes: The compaction method create_new_sstable seems out of place. It is not used by resharding, and it seems duplicated for normal compactions. We could clean it up with more refactoring in a later patch. The whole logic of the feed_writer could be part of the consumer code. Testing details: scylla unit tests (dev, release) sstable_datafile_test (debug) dtests (resharding_test.py) manual scylla resharding Fixes #1431 " Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> * 'twcs-major-v3' of github.com:glommer/scylla: compaction: make major compaction time-aware with TWCS compaction: do resharding through an interposer mutation_writer: introduce shard_based splitting writer mutation_writer: factor out part of the code for the timestamp splitter compaction: abort if create_new_sstable is called from resharding	2020-04-06 12:54:08 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Glauber Costa	098b215b0d	compaction: make major compaction time-aware with TWCS This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: 1. Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. 2. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects With the motivation out of the way, let's talk about the implementation: The implementation is quite simple and builds upon the previous patches. It simply specializes the interposer implementation for regular compaction with a table-specific interposer. Fixes #1431 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-03 10:10:10 -04:00
Pekka Enberg	75b55cea88	Merge "Resharding through compact sstables" from Glauber " This patchseries is part of my effort to make resharding less special - and hopefully less problematic. The next steps are a bit heavy, so I'd like to, if possible, get this out of the way. After these two patches, there is no more need to ever call reshard_sstables: compact_sstables will do, and it will be able to recognize resharding compactions. To do that we need to unify the creator function, which is trivially done by adding a shard parameter to regular compactions as well: they can just ignore it. I have considered just making the compaction_descriptor have a virtual create() function and specializing it, but because we have to store the creator in the compaction object I decided to keep the virtual function for now. In a later cleanup step, if we can for instance store the entire compaction_descriptor object in the compaction object we could do that. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Tests: unit tests (dev), dtest (resharding.py) " * 'resharding-through-compact-sstables' of github.com:glommer/scylla: resharding: get rid of special reshard_sstables compaction: enhance compaction_descriptor with creator and replace function	2020-04-02 14:43:35 +02:00
Glauber Costa	e8801cd77b	compaction: enhance compaction_descriptor with creator and replace function There are many differences between resharding and compaction that are artificial, arising more from the way we ended up implementing it than necessity. This patch attempts to pass the creator and replacer functions through the compaction_descriptor. There is a difference between the creator function for resharding and regular compaction: resharding has to pass the shard number on behalf of which the SSTable is created. However regular compactions can just ignore this. No need to have a special path just for this. After this is done, the constructor for the compaction object can be greatly simplified. In further patches I intend to simplify it a bit further, but some more cleanup has to happen first. To make that happen we have to construct a compaction_descriptor object inside the resharding function. This is temporary: resharding currently works with a descriptor, but at some point that descriptor is lost and broken into pieces to be passed to this function. The overarching goal of this work is exactly to be able to keep that descriptor for as long as possible, which should simplify things a lot. Callers are patched, but there are plenty for sstable_datafile_test.cc. For their benefit, a helper function is provided to keep the previous signature (test only). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-31 19:41:25 -04:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	7bd2b8d73f	schema: make it possible to set sharding_info per schema Previously schema::get_sharding_info was obtaining sharding_info from the partitioner but we want to remove sharding_info from the partitioner so we need a place in schema to store it there instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	dc2e060313	create_token_range_from_keys: use sharding info for shard_of Replace i_partitioner::shard_of with sharding_info::shard_of Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Glauber Costa	dd65f7dcbb	tests: move token_generation_for_shard to common code We now have a utils file for SSTables. This is potentially useful for other tests. As a matter of fact, this function is repeated right now for the resharding test. And to add insult to injury, the version in the resharding test has the parameters shard and number of tokens flipped, which although extremely confusing is the predictable outcome of such repetition Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-22 19:00:26 +02:00
Piotr Jastrzebski	7064f6b831	partitioner: hide dht::default_partitioner Remove last usage of this global outside i_partitioner.cc and hide it inside the compilation unit. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00

1 2

69 Commits