scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Glauber Costa	e29701ca1c	compaction_manager: expand state to be able to differentiate between enabled and stopped We are having many issues with the stop code in the compaction_manager. Part of the reason is that the "stopped" state has its meaning overloaded to indicate both "compaction manager is not accepting compactions" and "compaction manager is not ready or destructed". In a later step we could default to enabled-at-start, but right now we maintain current behavior to minimize noise. It is only possible to stop the compaction manager once. It is possible to enable / disable the compaction manager many times. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Glauber Costa	70a89ab4ab	compaction: do not assume I/O priority class We shouldn't assume the I/O priority class for compactions. For instance, if we are dealing with offstrategy compactions we may want to use the maintenance group priority for them. For now, all compactions are put in the compaction class. rewrite compactions (scrub, cleanup) could be maintenance, but we don't have clear access to the database object at this time to derive the equivalent CPU priority. This is planned to be changed in the future, and when we do change it, we'll adjust. Same goes for resharding: while we could at this point change it we'd risking memory pressure since resharding is run online and sstables are shared until resharding is done. When we move it to offline execution we'll do it with maintenance priority. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200512002233.306538-3-glauber@scylladb.com>	2020-05-12 08:23:19 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Raphael S. Carvalho	a214ccdf89	sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set Garbage collected SSTable is incorrectly added to SSTable set with a function that invalidates row cache. This problem is fixed by adding GC SStable to set using mechanism which replaces old sstables with new sstables. Also, adding GC SSTable to set in a separate call is not correct. We should make sure that GC SSTable reaches the SSTable set at the same time its respective old (input) SSTable is removed from the set, and that's done using a single request call to table. Fixes #5956. Fixes #6275. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:19 -03:00
Raphael S. Carvalho	8f4458f1d5	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:08 -03:00
Glauber Costa	55f5ca39a9	sstable_test: rework test to use a thread The compaction_manager test lives inside a thread and it is not taking advantage of it, with continuations all over. One of the side effects of it is that the test is calling stop() twice on the compaction_manager. While this works today, it is not good practice. A change I am making is just about to break it. This patch converts the test to fully use .get() instead of chained continuations and in doing so also guarantees that the compaction manager will be RAII-stopped just one, from a defer object. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200503161420.8346-2-glauber@scylladb.com>	2020-05-03 19:54:04 +03:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Avi Kivity	e9e2b75a76	Merge "Allow Major compactions for TWCS" from Glauber " This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects In order to achieve that, we start by changing the way resharding works: it will now work with a read interposer, similar to the one TWCS uses for streaming data. Once we do that, a lot of assumptions that exist in the compaction code can be simplified and supporting TWCS major compactions become a matter of simply enabling its interposer in the compaction code as well. There are many further simplifications that this work exposes: The compaction method create_new_sstable seems out of place. It is not used by resharding, and it seems duplicated for normal compactions. We could clean it up with more refactoring in a later patch. The whole logic of the feed_writer could be part of the consumer code. Testing details: scylla unit tests (dev, release) sstable_datafile_test (debug) dtests (resharding_test.py) manual scylla resharding Fixes #1431 " Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> * 'twcs-major-v3' of github.com:glommer/scylla: compaction: make major compaction time-aware with TWCS compaction: do resharding through an interposer mutation_writer: introduce shard_based splitting writer mutation_writer: factor out part of the code for the timestamp splitter compaction: abort if create_new_sstable is called from resharding	2020-04-06 12:54:08 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Glauber Costa	098b215b0d	compaction: make major compaction time-aware with TWCS This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: 1. Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. 2. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects With the motivation out of the way, let's talk about the implementation: The implementation is quite simple and builds upon the previous patches. It simply specializes the interposer implementation for regular compaction with a table-specific interposer. Fixes #1431 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-03 10:10:10 -04:00
Pekka Enberg	75b55cea88	Merge "Resharding through compact sstables" from Glauber " This patchseries is part of my effort to make resharding less special - and hopefully less problematic. The next steps are a bit heavy, so I'd like to, if possible, get this out of the way. After these two patches, there is no more need to ever call reshard_sstables: compact_sstables will do, and it will be able to recognize resharding compactions. To do that we need to unify the creator function, which is trivially done by adding a shard parameter to regular compactions as well: they can just ignore it. I have considered just making the compaction_descriptor have a virtual create() function and specializing it, but because we have to store the creator in the compaction object I decided to keep the virtual function for now. In a later cleanup step, if we can for instance store the entire compaction_descriptor object in the compaction object we could do that. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Tests: unit tests (dev), dtest (resharding.py) " * 'resharding-through-compact-sstables' of github.com:glommer/scylla: resharding: get rid of special reshard_sstables compaction: enhance compaction_descriptor with creator and replace function	2020-04-02 14:43:35 +02:00
Glauber Costa	e8801cd77b	compaction: enhance compaction_descriptor with creator and replace function There are many differences between resharding and compaction that are artificial, arising more from the way we ended up implementing it than necessity. This patch attempts to pass the creator and replacer functions through the compaction_descriptor. There is a difference between the creator function for resharding and regular compaction: resharding has to pass the shard number on behalf of which the SSTable is created. However regular compactions can just ignore this. No need to have a special path just for this. After this is done, the constructor for the compaction object can be greatly simplified. In further patches I intend to simplify it a bit further, but some more cleanup has to happen first. To make that happen we have to construct a compaction_descriptor object inside the resharding function. This is temporary: resharding currently works with a descriptor, but at some point that descriptor is lost and broken into pieces to be passed to this function. The overarching goal of this work is exactly to be able to keep that descriptor for as long as possible, which should simplify things a lot. Callers are patched, but there are plenty for sstable_datafile_test.cc. For their benefit, a helper function is provided to keep the previous signature (test only). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-31 19:41:25 -04:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	7bd2b8d73f	schema: make it possible to set sharding_info per schema Previously schema::get_sharding_info was obtaining sharding_info from the partitioner but we want to remove sharding_info from the partitioner so we need a place in schema to store it there instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	dc2e060313	create_token_range_from_keys: use sharding info for shard_of Replace i_partitioner::shard_of with sharding_info::shard_of Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Glauber Costa	dd65f7dcbb	tests: move token_generation_for_shard to common code We now have a utils file for SSTables. This is potentially useful for other tests. As a matter of fact, this function is repeated right now for the resharding test. And to add insult to injury, the version in the resharding test has the parameters shard and number of tokens flipped, which although extremely confusing is the predictable outcome of such repetition Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-22 19:00:26 +02:00
Piotr Jastrzebski	7064f6b831	partitioner: hide dht::default_partitioner Remove last usage of this global outside i_partitioner.cc and hide it inside the compilation unit. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	08ebf1f69d	sstable_datafile_test: stop calling dht::global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Konstantin Osipov	ff3f9cb7cf	test: stop using BOOST_TEST_MESSAGE() for logging We use boost test logging primarily to generate nice XML xunit files used in Jenkins. These XML files can be bloated with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes of build archives, on every build. Let's use seastar logger for test logging instead, reserving the use of boost log facilities for boost test markup information.	2020-03-05 11:38:11 +03:00
Avi Kivity	906784639d	Merge "Clean sstables from using global objects" from Pavel E " This set cleans sstable_writer_config and surrounding sstables code from using global storage_ and feature_ service-s and database by moving the configuration logic onto sstables_manager (that was supposed to do it since `eebc3701a5`). Most of the complexity is hidden around sstable_writer_config creation, this set makes the sstables_manager create this object with an explicit call. All the rest are consequences of this change. Tests: unit(debug), manual start-stop " * 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla: sstables: Move get_highest_supported_format sstables: Remove global get_config() helper sstables: Use manager's config() in .new_sstable_component_file() sstable_writer_config: Extend with more db::config stuff sstables_manager: Don't use global helper to generate writer config sstable_writer_config: Sanitize out some features fields initialization sstable_writer_config: Factor out some field initialization sstables: Generate writer config via manager only sstables: Keep reference on manager test: Re-use existing global sstables_manager table: Pass sstable_writer_config into write_memtable_to_sstable	2020-03-03 18:33:01 +02:00
Botond Dénes	b6f8a6fbd3	test/boost: sstable_datafile_test: sstable_scrub_test: stop table `table` is not registered with the database, and hence will not be waited on during shutdown. Stop it explicitly to prevent any asynchronous operation on it racing with shutdown. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200302142845.569638-1-bdenes@scylladb.com>	2020-03-02 16:20:00 +01:00
Pavel Emelyanov	5adce3390c	sstables: Generate writer config via manager only The sstable_writer_config creation looks simple (just declare the struct instance) but behind the scenes references storage and feature services, messes with database config, etc. This patch teaches the sstables_manager generate the writer config and makes the rest of the code use it. For future safety by-hands creation of the sstable_writer_config is prohibited. The manager is referenced through table-s and sstable-s, but two existing sstables_managers live on database object, and table-s and sstable-s both live shorter than the database, this reference is save. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Raphael S. Carvalho	65b4fc8bcd	sstables/compaction: Introduce compaction_completion_desc This descriptor contain all information needed for table to be properly updated on compaction completion. A new member will be added to it soon, which will store ranges to be invalidated in row cache on behalf of cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:29:32 -03:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Piotr Jastrzebski	8a9dc8b394	test_services: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	aae6240273	sstable_data_file_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Avi Kivity	91c4409376	locator: token_metadata: remove unused include "query-request.hh" sstable_datafile_test.cc lost access to interval_map (via position_in_partition.hh), so it now includes that directly.	2020-02-14 20:46:25 +02:00
Botond Dénes	78624b5069	test: sstable_datafile_test: add scrub unit test	2020-02-13 15:02:37 +02:00
Botond Dénes	b2dc5d4895	compaction: compaction_descriptor: use compaction options instead of cleanup flag Instead of the restrictive `cleanup` boolean flag, which allows for choosing between only two compaction types, use `compaction_options`, which in addition to allowing any number of compaction types to be selected, also allows seamlessly passing specific options to them.	2020-02-11 17:47:44 +02:00
Piotr Jastrzebski	05e0451b27	token: change _data to int64_t Previously _data was stored as array of 8 bytes in network byte order. After this change it stores the same value in int64_t in host byte order. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	b569d127a0	token: change data to array<uint8_t, 8> It is save to do such change because we support only Murmur3Partitioner which uses only tokens that are 8 bytes long. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:30:46 +01:00
Botond Dénes	936619a8d3	sstables/continuous_data_consumer: track buffers used for parsing Based on heap profiling, buffers used for storing half-parsed fields are a major contributor to the overall memory consumption of reads. This memory was completely "under the radar" before. Track it by using tracked `temporary_buffer` instances everywhere in `continuous_data_consumer`. As `continuous_data_consumer` is the basis for parsing all index and data files, adding the tracing here automatically covers all data, index and promoted index parsing. I'm almost convinced that there is a better place to store the `permit` then the three places now, but so far I was unable to completely decipher the our data/index file parsing class hierarchy.	2020-01-28 08:13:16 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Konstantin Osipov	1c8736f998	tests: move all test source files to their new locations 1. Move tests to test (using singular seems to be a convention in the rest of the code base) 2. Move boost tests to test/boost, other (non-boost) unit tests to test/unit, tests which are expected to be run manually to test/manual. Update configure.py and test.py with new paths to tests.	2019-12-16 17:47:42 +03:00

39 Commits