scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	f93912f344	Revert "Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"" With #4446 fixed, this commit can be reverted. This reverts commit `454e7e0109`.	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	fb81f2aa7c	table: Fix stale data being returned due to lack of cache invalidation Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache. To prevent data that belongs to the node from being purged from the row cache, cleanup will only invalidate the cache with a set of token ranges that will not overlap with any of ranges owned by the node. update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test now passes. Fixes #4446. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	65b4fc8bcd	sstables/compaction: Introduce compaction_completion_desc This descriptor contain all information needed for table to be properly updated on compaction completion. A new member will be added to it soon, which will store ranges to be invalidated in row cache on behalf of cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:29:32 -03:00
Avi Kivity	454e7e0109	Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations" This reverts commit `5e9925b9f0`. It causes data resurrection in simple_decommission_node_2_test. Fixes #5838.	2020-02-18 20:13:10 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Asias He	5e9925b9f0	streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations The table::flush_streaming_mutations is used in the days when streaming data goes to memtable. After switching to the new streaming, data goes to sstables directly in streaming, so the sstables generated in table::flush_streaming_mutations will be empty. It is unnecessary to invalidate the cache if no sstables are added. To avoid unnecessary cache invalidating which pokes hole in the cache, skip calling _cache.invalidate() if the sstables is empty. The steps are: - STREAM_MUTATION_DONE verb is sent when streaming is done with old or new streaming - table::flush_streaming_mutations is called in the verb handler - cache is invalidated for the streaming ranges In summary, this patch will avoid a lot of cache invalidation for streaming. Backports: 3.0 3.1 3.2 Fixes: #5769	2020-02-16 11:22:30 +02:00
Pavel Emelyanov	b11cf6e950	cql3/query_processor.hh: Debloat from other headers This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200212225828.3374-1-xemul@scylladb.com>	2020-02-16 11:22:30 +02:00
Botond Dénes	8014c7124d	compaction_manager: collect all cleanup related logic in perform_cleanup() Currently the call chain for a cleanup collection looks like this: compaction_manager::perform_cleanup() compaction_manager::rewrite_sstables() table::cleanup_sstables() ... `perform_cleanup()` is essentially empty, immediately deferring to `rewrite_sstables()`. Cleanup related logic is scattered between the latter two methods on the call chain. These methods however recently started serving as generic methods for compactions that want to rewrite each sstable one-by-one, collecting cleanup related ifs in various places. The reason is historic, we first had cleanup, then bolted others on top, trying to share the underlying code as much as possible. It is time this is cleaned up (pun intended). Make `perform_cleanup()` the place where all cleanup related logic is, with the rest of the stack made truly generic.	2020-02-11 17:47:44 +02:00
Botond Dénes	b2dc5d4895	compaction: compaction_descriptor: use compaction options instead of cleanup flag Instead of the restrictive `cleanup` boolean flag, which allows for choosing between only two compaction types, use `compaction_options`, which in addition to allowing any number of compaction types to be selected, also allows seamlessly passing specific options to them.	2020-02-11 17:47:44 +02:00
Botond Dénes	0b53ccaecd	table: cleanup_sstables(): only short-circuit on actual cleanup Currently the cleanup call is short circuited if it is determined that cleanup is not needed for the sstable to-be-cleaned-up. This is undesired because actually not just cleanup uses this routine to rewrite sstables, sstable-upgrade and sstable-scrub also uses it, and they want to go on with the cleanup compaction sstables even if all data in it belongs to the current node. Fix: #5699	2020-02-11 17:47:44 +02:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Amnon Heiman	028525daeb	database: add schema.cql file when creating a snapshot When creating a snapshot we need to add a schema.cql file in the snapshot directory that describes the table in that snapshot. This patch adds the file using the schema describe method. get_snapshot_details and manifest_json_filter were modified to ignore the schema.cql file. Fixes #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Benny Halevy	718e9eb341	table: move_sstables_from_staging: fix use after free of shared_sstable Introduced in `4b3243f5b9` Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test ==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0 READ of size 8 at 0x60200023de18 thread T1 (reactor-1) #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289 #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530 #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556 #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda( std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618 #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626 #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... 0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20) freed by thread T1 (reactor-1) here: #0 0x7f8a153b796f in operator delete(void) (/lib64/libasan.so.5+0x11096f) #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128 #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470 #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351 #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332 #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680 #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477 #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496 #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573 #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645 #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488 #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... Fixes #5511 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>	2019-12-23 15:20:41 +02:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	6efef84185	sstable: return future from move_to_new_dir distributed_loader::probe_file needlessly creates a seastar thread for it and the next patch will use it as part of a parallel_for_each loop to move a list of sstables (and sync the directories once at the end). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Piotr Sarna	79c3a508f4	table: Reduce read amplification in view update generation This commit makes sure that single-partition readers for read-before-write do not have fast-forwarding enabled, as it may lead to huge read amplification. The observed case was: 1. Creating an index. CREATE INDEX index1 ON myks2.standard1 ("C1"); 2. Running cassandra-stress in order to generate view updates. cassandra-stress write no-warmup n=1000000 cl=ONE -schema \ 'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \ keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors skip-read-validation -node 127.0.0.1; Without disabling fast-forwarding, single-partition readers were turned into scanning readers in cache, which resulted in reading 36GB (sic!) on a workload which generates less than 1GB of view updates. After applying the fix, the number dropped down to less than 1GB, as expected. Refs #5409 Fixes #4615 Fixes #5418	2019-12-05 11:58:34 +02:00
Avi Kivity	fd951a36e3	Merge "Let compaction wait on background deletions" from Benny " In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done. However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted. This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish. Fixes #4909 Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction "	2019-12-04 11:18:41 +02:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Benny Halevy	f9e93bba38	sstables: compaction: move cleanup parameter to compaction_descriptor Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>	2019-11-18 10:52:20 +01:00
Kamil Braun	a67e887dea	sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285 ) CQL tracing would only report file I/O involving one sstable, even if multiple sstables were read from during the query. Steps to reproduce: create a table with NullCompactionStrategy insert row, flush memtables insert row, flush memtables restart Scylla tracing on select * from table The trace would only report DMA reads from one of the two sstables. Kudos to @denesb for catching this. Related issue: #4908	2019-11-17 00:38:37 -08:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Vladimir Davydov	b75862610e	paxos_state: account paxos round latency This patch adds the following per table stats: cas_prepare_latency cas_propose_latency cas_commit_latency They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed by Cassandra.	2019-10-29 19:26:18 +03:00
Kamil Braun	394c36835a	sstables: report sstable data file I/O in CQL tracing Use tracing::make_traced_file when creating an sstable input_stream. To achieve that, trace_state needs to be plumbed down through some functions.	2019-10-25 14:10:28 +02:00
Raphael S. Carvalho	7f1a2156c7	table: Don't account for shared SSTables in compaction backlog tracker We don't want to add shared sstables to table's backlog tracker because: 1) table's backlog tracker has only an influence on regular compaction 2) shared sstables are never regular compacted, they're worked by resharding which has its own backlog tracker. Such sstables belong to more than one shard, meaning that currently they're added to backlog tracker of all shards that own them. But the thing is that such sstables ends up being resharded in shard that may be completely random. So increasing backlog of all shards such sstables belong to, won't lead to faster resharding. Also, table's backlog tracker is supposed to deal only with regular compaction. Accounting for shared sstables in table's tracker may lead to incorrect speed up of regular compactions because the controller is not aware that some relevant part of the backlog is due to pending resharding. The fix is about ignoring sstables that will be resharded and let table's backlog tracker account only for sstables that can be worked on by regular compaction, and rely on resharding controlling itself with its own tracker. NOTE: this doesn't fix the resharding controlling issue completely, as described in #4952. We'll still need to throttle regular compaction on behalf of resharding. So subsequent work may be about: - move resharding to its own priority class, perhaps streaming. - make a resharding's backlog tracker accounts for sstables in all of its pending jobs, not only the ongoing ones (currently limited to 1 by shard). - limit compaction shares when resharding is in progress. THIS only fixes the issue in which controller for regular compaction shouldn't account sstables completely exclusive to resharding. Fixes #5077. Refs #4952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>	2019-10-13 10:14:13 +03:00
Tomasz Grabiec	b0e0f29b06	db: read: Filter-out sstables using its first and last keys Affects single-partition reads only. Refs #5113 When executing a query on the replica we do several things in order to narrow down the sstable set we read from. For tables which use LeveledCompactionStrategy, we store sstables in an interval set and we select only sstables whose partition ranges overlap with the queried range. Other compaction strategies don't organize the sstables and will select all sstables at this stage. The reasoning behind this is that for non-LCS compaction strategies the sstables' ranges will typically overlap and using interval sets in this case would not be effective and would result in quadratic (in sstable count) memory consumption. The assumption for overlap does not hold if the sstables come from repair or streaming, which generates non-overlapping sstables. At a later stage, for single-partition queries, we use the sstables' bloom filter (kept in memory) to drop sstables which surely don't contain given partition. Then we proceed to sstable indexes to narrow down the data file range. Tables which don't use LCS will do unnecessary I/O to read index pages for single-partition reads if the partition is outside of the sstable's range and the bloom filter is ineffective (Refs #5112). This patch fixes the problem by consulting sstable's partition range in addition to the bloom filter, so that the non-overlapping sstables will be filtered out with certainty and not depend on bloom filter's efficiency. It's also faster to drop sstables based on the keys than the bloom filter. Tests: - unit (dev) - manual using cqlsh Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>	2019-09-28 19:42:57 +03:00
Benny Halevy	19b67d82c9	table::on_compaction_completion: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	8dd6e13468	table::on_compaction_completion: wait for background deletes Don't let background deletes accumulate uncontrollably. Fixes #4909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	da6645dc2c	table: refresh_snapshot before deleting any sstables The row cache must not hold refrences to any sstable we're about to delete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:29 +03:00
Botond Dénes	136fc856c5	treewide: silence discarded future warnings for questionable discards This patches silences the remaining discarded future warnings, those where it cannot be determined with reasonable confidence that this was indeed the actual intent of the author, or that the discarding of the future could lead to problems. For all those places a FIXME is added, with the intent that these will be soon followed-up with an actual fix. I deliberately haven't fixed any of these, even if the fix seems trivial. It is too easy to overlook a bad fix mixed in with so many mechanical changes.	2019-08-26 19:28:43 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Piotr Sarna	1ab07b80b4	database: assign proper io priority for streaming view updates Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority.	2019-08-20 00:24:50 +02:00
Avi Kivity	77686ab889	Merge "Make SSTable cleanup run aware" from Raphael " Fixes #4663. Fixes #4718. " * 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla: tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id table: Make SSTable cleanup run aware compaction: introduce constants for compaction descriptor compaction: Make it possible to config the identifier of the output sstable run table: do not rely on undefined behavior in cleanup_sstables	2019-07-31 19:10:22 +03:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Raphael S. Carvalho	332c2ff710	table: Make SSTable cleanup run aware The cleanup procedure will move any sstable out of its sstable run because sstables are cleaned up individually and they end up receiving a new run identifier, meaning a table may potentially end up with a new sstable run for each of the sstables cleaned. SStable cleanup needs to be run aware, so that the run structure is not messed up after the operation is done. Given that only one fragment or other, composing a sstable run, may need cleanup, it's better to keep them in their original sstable run. Fixes #4663. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:47 -03:00
Raphael S. Carvalho	0e732ed1cf	table: do not rely on undefined behavior in cleanup_sstables It shouldn't rely on argument evaluation order, which is ub. Fixes #4718. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-07-15 23:39:22 -03:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Benny Halevy	6dad9baa1c	table: disable_sstable_write: acquire _sstable_deletion_sem `disable_sstable_write` needs to acquire `_sstable_deletion_sem` to properly synchronize with background deletions done by `on_compaction_completion` to ensure no sstables will be created or deleted during `reshuffle_sstables` after `storage_service::load_new_sstables` disables sstable writes. Fixes #4622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	c6bad3f3c2	table: reshuffle_sstables: add log message To mark the point in time writes are disabled and scanning of the data directory is beginning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Dejan Mircevski	8dcb35913a	table: Avoid needless allocation of cell lockers All `table` instances currently unconditionally allocate a cell locker for counter cells, though not all need one. Since the lockers occupy quite a bit of memory (as reported in #4441), it's wasteful to allocate them when unneeded. Fixes #4441. Tests: unit (dev, debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190515190910.87931-1-dejan@scylladb.com>	2019-05-16 11:10:38 +03:00
Avi Kivity	add20eb9a6	table: fix potentially wrong schema when reading from zero sstables We use the schema during creation of the mutation_source rather than during the query itself. Likely they're the same, and since no rows are returned from a zero-sstable query, harmless. But gcc 9 complains. Fix by using the query's schema.	2019-05-07 09:56:30 +03:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Raphael S. Carvalho	d59f716e1c	table: fix wild disk usage stat after sstables are discarded by truncate Truncate would make disk usage stat go wild because it isn't updated when sstables are removed in table::discard_sstables(). Let's update the stat after sstables are removed from the sstable set. Fixes #3624. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190328154918.25404-1-raphaelsc@scylladb.com>	2019-04-01 13:55:11 +03:00
Avi Kivity	4b330b3911	Merge "introduce sstables manager" from Benny " This series introduce a rudimentary sstables manager that will be used for making and deleting sstables, and tracking of thereof. The motivation for having a sstables manager is detailed in https://github.com/scylladb/scylla/issues/4149. The gist of it is that we need a proper way to manage the life cycle of sstables to solve potential races between compaction and various consumers of sstables, so they don't get deleted by compaction while being used. In addition, we plan to add global statistics methods like returning the total capacity used by all sstables. This patchset changes the way class sstable gets the large_data_handler. Rather than passing it separately for writing the sstable and when deleting sstables, we provide the large_data_handler when the sstable object is constructed and then use it when needed. Refs #4149 " * 'projects/sstables_manager/v3' of https://github.com/bhalevy/scylla: sstables: provide large_data_handler to constructor sstables_manager: default_sstable_buffer_size need not be a function sstables: introduce sstables_manager sstables: move shareable_components def to its own header tests: use global nop_lp_handler in test_services sstables: compress.hh: add missing include sstables: reorder entry_descriptor constructor params sstables: entry_descriptor: get rid of unused ctor sstables: make load_shared_components a method of sstable sstables: remove default params from sstable constructor database: add table::make_sstable helper distributed_loader: pass column_family to load_sstables_with_open_info distributed_loader: no need for forward declaration of load_sstables_with_open_info distributed_loader: reshard: use default params for make_sstable	2019-03-26 16:31:40 +02:00

1 2

88 Commits