scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	11c99fc41b	table: Don't use global gossiper The table::get_hit_rate needs gossiper to get hitrates state from. There's no way to carry gossiper reference on the table itself, so it's up to the callers of that method to provide it. Fortunately, there's only one caller -- the proxy -- but the call chain to carry the reference it not very short ... oh, well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:33:08 +03:00
Benny Halevy	bcd35af7cf	replica: table: generate_and_propagate_view_updates: pass mutation to make_flat_mutation_reader_from_mutations_v2 With `f5ef687acd` we can consume the single mutation directly, so there's n need to pass it as a vector of size 1. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220424103826.3930895-1-bhalevy@scylladb.com>	2022-04-24 22:19:19 +03:00
Avi Kivity	a4be927e23	Revert "memtable_list: futurize clear_and_add" This reverts commit `2325c566d9`. It causes a use-after-free of a memtable. Fixes #10421.	2022-04-24 21:09:48 +03:00
Botond Dénes	0b035c9099	row_cache: return v2 readers from make_reader*() And adjust callers. The factory functions just sprinkle upgrade_to_v2() on returned readers for now. One test in row_cache_test.cc had to be disabled, because the upgrade to v2 wrapper we now have over cache readers doesn't allow it to directly control the reader's buffer size and so the test fails. There is a FIXME left in the test code and the test will be re-enabled once a native v2 reader implementation allows us to get rid of the upgrade wrapper.	2022-04-20 10:59:09 +03:00
Botond Dénes	9338affb8e	replica/table: remove v1 reader factory methods	2022-04-01 13:52:08 +03:00
Avi Kivity	af07519928	Merge "Remove reader from mutations v1" from Botond " First migrate all users to the v2 variant, all of which are tests. However, to be able to properly migrate all tests off it, a v2 variant of the restricted reader is also needed. All restricted reader users are then migrated to the freshly introduced v2 variant and the v1 variant is removed. Users include: * replica::table::make_reader_v2() * streaming_virtual_table::as_mutation_source() * sstables::make_reader() * tests This allows us to get rid of a bunch of conversions on the query path, which was mostly v2 already. With a few tests we did kick the can down the road by wrapping the v2 reader in `downgrade_to_v1()`, but this series is long enough already. Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug) " * 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla: readers: remove now unused v1 reader from mutations test: move away from v1 reader from mutations test/boost/mutation_reader_test: use fragment_scatterer test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh test/boost: mutation_fragment_test: refactor fragment_scatterer readers: remove now unused v1 reversing reader test/boost/flat_mutation_reader_test: convert to v2 frozen_mutation: fragment_and_freeze(): convert to v2 frozen_mutation: coroutinize fragment_and_freeze() readers: migrate away from v1 reversing reader db/virtual_table: use v2 variant of reversing and forwardable readers replica/table: use v2 variant of reversing reader sstables/sstable: remove unused make_crawling_reader_v1() sstables/sstable: remove make_reader_v1() readers: add v2 variant of reversing reader readers/reversing: remove FIXME readers: reader from mutations: use mutation's own schema when slicing	2022-03-31 13:29:11 +03:00
Botond Dénes	c10d7bf9f8	replica/table: use v2 variant of reversing reader	2022-03-31 09:57:48 +03:00
Botond Dénes	b7954138ac	mutation_reader: move compacting reader into readers/	2022-03-30 15:42:51 +03:00
Botond Dénes	f8015d9c26	readers: move combined reader into readers/ Since the combined reader family weighs more than 1K SLOC, it gets its own .cc file.	2022-03-30 15:42:51 +03:00
Benny Halevy	2325c566d9	memtable_list: futurize clear_and_add Allow yielding to fix a reactor stall from table::clear. Fixes #10281 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220327141259.213688-1-bhalevy@scylladb.com>	2022-03-27 17:25:43 +03:00
Botond Dénes	9a44c26d7e	index/secondary_index_manager: switch to using data dictionary Instead of directly using replica::table.	2022-03-25 11:44:31 +02:00
Botond Dénes	eff941d22c	replica/table: add as_data_dictionary() To allow converting table instances to data_dictionary::table.	2022-03-25 11:44:31 +02:00
Botond Dénes	e12c543d3f	replica/table: migrate generate_and_propagate_view_updates() to v2	2022-03-17 10:51:25 +02:00
Botond Dénes	4b9219a209	replica/table: migrate populate_views() to v2	2022-03-17 10:51:05 +02:00
Botond Dénes	909be0b9d7	db/view: convert view_update_builder interface to v2 The constructor and the make_ factory method now take v2 readers. Immediate users are patched, with conversions if needed.	2022-03-17 10:50:50 +02:00
Raphael S. Carvalho	58e520ab1d	compaction: Move run_off_strategy_compaction() into compaction manager Compaction manager is calling back the table to run off-strategy compaction, but the logic clearly belongs to manager which should perform the operation independently and only call table to update its state with the result. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220315174504.107926-2-raphaelsc@scylladb.com>	2022-03-16 09:55:52 +02:00
Raphael S. Carvalho	1bae803a8b	table: Add maintenance_sstable_set() Let's expose maintenance set, to allow the implementation of off-strategy compaction to be moved into the compaction manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220315174504.107926-1-raphaelsc@scylladb.com>	2022-03-16 09:55:51 +02:00
Raphael S. Carvalho	fce9d869b4	compaction: Move table::compact_sstables() into compaction manager Table submits compaction request into manager, which in turn calls back table to run the compaction when the time has come, i.e.: table -> compaction manager -> table -> execute compaction But manager should not rely on table to run compaction, as compaction execution procedure sits one layer below the manager and should be accessed directly by it, i.e: table -> compaction manager -> execute compaction This makes code easier to understand and update_compaction_history() can now be noop for unit tests using table_state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220311023410.250149-1-raphaelsc@scylladb.com>	2022-03-14 15:39:23 +02:00
Avi Kivity	e7fb71020b	Merge 'replica: Optimize empty_flat_reader out of the hot path' from Michał Chojnowski When row_cache::make_reader() and memtable::make_flat_reader() see that the query result is empty, they return empty_flat_reader, which is a trivial implementation of flat_mutation_reader. Even though empty_flat_reader doesn't do anything meaningful, it still needs to be created, handled in merging_reader and destroyed. Turns out this is costly. This patch series replaces hot path uses of empty_flat_reader with an empty optional. Performance effects: `perf_simple_query --smp 1` TPS: 138k -> 168k allocs/op: 80.2 -> 71.1 insns/op: 49.9k -> 45.1k `perf_simple_query --smp 1 --enable-cache=1 --flush` TPS: 125k -> 150k allocs/op: 79.2 -> 71.1 insns/op: 51.7k -> 47.2k For a cassandra-stress benchmark (localhost, 100% cache reads) this translates to a TPS increase from ~42k to ~48k per hyperthread. Note that this optimization is effective for single-partition reads where the queried partition is only in cache/sstables or only in memtables. Other queries (e.g. where the partition is in both cache in memtables and needs to be merged) are unaffected. Closes #10204 * github.com:scylladb/scylla: replica: Prefer row_cache::make_reader_opt() to row_cache::make_reader() row_cache: Add row_cache::make_reader_opt() replica: Prefer memtable::make_flat_reader_opt() to memtable::make_flat_reader() memtable: Add memtable::make_flat_reader_opt() [avi: adjust #include for readers/ split]	2022-03-14 14:07:00 +02:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Michał Chojnowski	83efb508d6	replica: Prefer row_cache::make_reader_opt() to row_cache::make_reader() The former is significantly cheaper when there is nothing to be read.	2022-03-14 12:02:49 +01:00
Michał Chojnowski	f211ef9d71	replica: Prefer memtable::make_flat_reader_opt() to memtable::make_flat_reader() The former is significantly cheaper when there is nothing to be read.	2022-03-14 12:02:49 +01:00
Mikołaj Sielużycki	5920349357	row_cache: Make row_cache reader from sstables compacting. Reading data from sstables without compacting first puts unnecessary pressure on the cache. The mutation streams need to be resolved anyway before passing to subsequent consumers, so it's better to do it as close to the source as possible. Fixes: #3568 Closes #10188	2022-03-10 11:40:10 +02:00
Botond Dénes	d8fec08468	memtable-sstable: migrate to v2 variant of sstable writer API	2022-03-10 09:16:33 +02:00
Botond Dénes	7a37e30310	mutation_reader: convert compacting reader v2 Its input was already a v2 reader, now itself is also a v2 reader. With this commit, compaction.cc is finally v2 all-the-way.	2022-03-10 07:03:46 +02:00
Nadav Har'El	7cf2e5ee5c	Merge 'directory_lister: drop abort method and simplify close semantics' from Benny Halevy This series contains: - lister: move to utils - tidy up the clutter in the root dir Based on Avi's feedback to `[PATCH 1/1] utils: directory_lister: close: always abort queue` that was sent to the mailing list: - directory_lister: drop abort method - lister: do not require get after close to fail - test: lister_test: test_directory_lister_close simplify indentation - cosmetic cleanup Closes #10142 * github.com:scylladb/scylla: test: lister_test: test_directory_lister_close simplify indentation lister: do not require get after close to fail directory_lister: drop abort method lister: move to utils	2022-03-01 16:23:47 +02:00
Michael Livshin	34ed752885	memtable::make_flush_reader(): return flat_mutation_reader_v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-28 17:11:54 +02:00
Michael Livshin	9bacce4359	memtable::make_flat_reader(): return flat_mutation_reader_v2 This is just a facade change. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-28 17:11:54 +02:00
Benny Halevy	ebbbf1e687	lister: move to utils There's nothing specific to scylla in the lister classes, they could (and maybe should) be part of the seastar library. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-28 12:36:03 +02:00
Raphael S. Carvalho	ddd693c6d7	compaction_backlog_tracker: Batch changes through a new replacement interface This new interface allows table to communicate multiple changes in the SSTable set with a single call, which is useful on compaction completion for example. With this new interface, the size tiered backlog tracker will be able to know when compaction completed, which will allow it to recompute tiers and their backlog contribution, if any. Without it, tiered tracker would have to recompute tiers for every change, which would be terribly expensive. The old remove/add interface are being removed in favor of the new one. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 15:34:16 -03:00
Raphael S. Carvalho	84d843697b	table: Disable backlog tracker when stopping table Backlog tracker is managed by compaction strategy, and we'd like to have it disabled in table::stop(), to make sure that all state is cleared. For example, a reference to a shared sstable, in the tracker implementation, could prevent the sstable manager from being stopped as it relies on all sstables managed by it being closed first. By calling tracker's disable() method, table::stop() will guarantee that state is cleared by completion. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:41:05 -03:00
Avi Kivity	cbba80914d	memtable: move to replica module and namespace Memtables are a replica-side entity, and so are moved to the replica module and namespace. Memtables are also used outside the replica, in two places: - in some virtual tables; this is also in some way inside the replica, (virtual readers are installed at the replica level, not the cooordinator), so I don't consider it a layering violation - in many sstable unit tests, as a convenient way to create sstables with known input. This is a layering violation. We could make memtables their own module, but I think this is wrong. Memtables are deeply tied into replica memory management, and trying to make them a low-level primitive (at a lower level than sstables) will be difficult. Not least because memtables use sstables. Instead, we should have a memtable-like thing that doesn't support merging and doesn't have all other funky memtable stuff, and instead replace the uses of memtables in sstable tests with some kind of make_flat_mutation_reader_from_unsorted_mutations() that does the sorting that is the reason for the use of memtables in tests (and live with the layering violation meanwhile). Test: unit (dev) Closes #10120	2022-02-23 09:05:16 +02:00
Botond Dénes	fb0e0ec7c1	mutation_reader: compacting_reader: require a v2 input reader Before we add a v2 output option to the compactor, we want to get rid of all the v1 inputs to make it simpler. This means that for a while the compacting reader will be in a strange place of having a v2 input and a v1 output. Hopefully, not for long.	2022-02-21 12:27:55 +02:00
Benny Halevy	19ea228cf8	replica: table: coroutinize move_sstables_from_staging Test: unit(dev) DTest: test_drop_mv_during_base_table_writes Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220214102911.1314022-1-bhalevy@scylladb.com>	2022-02-14 17:52:27 +02:00
Mikołaj Sielużycki	ee386213c2	compaction: Fix indentation in table::compact_sstables.	2022-02-09 12:19:23 +01:00
Mikołaj Sielużycki	ec91192525	compaction: Convert table::compact_sstables to coroutines.	2022-02-09 12:19:23 +01:00
Raphael S. Carvalho	d208d33636	Fix quadratic behavior and compaction inefficiency when adding new files With trigger_compaction() being called after each new sstable is added to the set, we'll get quadratic behavior because strategies like tiered will sort all the candidates before iterating on them, so complexity is ~ ((N - 1) * N * logN). Additionally, compaction may be inefficient as we're not waiting for the sstable set to settle, so table may end up missing files that would allow for more efficient jobs. The latter isn't a big problem because we have reshape running in an earlier phase, so data layout should satisfy the strategy almost. Boot is not affected by these problems because it temporarily disables auto compaction, so trigger_compaction() is a no-op for it. So refresh remains as the only one affected. Fixes #10046. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220208151154.72606-1-raphaelsc@scylladb.com>	2022-02-09 09:27:07 +02:00
Raphael S. Carvalho	755cec1199	table: Close reader if flush fails to peek into fragment An OOM failure while peeking into fragment, to determine if reader will produce any fragment, causes Scylla to abort as flat_mutation_reader expects reader to be closed before destroyed. Let's close it if peek() fails, to handle the scenario more gracefully. Fixes #10027. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220204031553.124848-1-raphaelsc@scylladb.com>	2022-02-04 12:48:36 +02:00
Nadav Har'El	b54e85088d	Merge 'snapshots: Fix snapshot-ctl to include snapshots of dropped tables' from Benny Halevy Snapshot-ctl methods fetch information about snapshots from column family objects. The problem with this is that we get rid of these objects once the table gets dropped, while the snapshots might still be present (the auto_snapshot option is specifically made to create this kind of situation). This commit switches from relying on column family interface to scanning every datadir that the database knows of in search for "snapshots" folders. This PR is a rebased version of #9539 (and slightly cleaned-up, cosmetically) and so it replaces the previous PR. Fixes #3463 Closes #7122 Closes #9884 * github.com:scylladb/scylla: snapshots: Fix snapshot-ctl to include snapshots of dropped tables table: snapshot: add debug messages	2022-02-04 12:34:19 +02:00
Benny Halevy	2a90896b79	table: snapshot: add debug messages Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-01 22:31:37 +02:00
Benny Halevy	e03b6eeff8	table: add perform_offstrategy_compaction Expose an async method to perform offstrategy- compaction, if needed. Returns a future<bool> that is resolved when offstrategy_compaction completes. The future value is true iff offstrategy compaction was required. To be used in a following patch by the storage_service api. Call it from `trigger_offstrategy_compaction` that triggers offstrategy compaction in the background and warn about ignored failures. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-30 20:09:35 +02:00
Nadav Har'El	7cb6250c40	Merge 'snapshot_ctl: true_snapshots_size: fix space accounting' from Benny Halevy This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size https://github.com/scylladb/scylla/issues/9897 https://github.com/scylladb/scylla/issues/9898 And adds a couple unit tests to tests the snapshot_ctl functionality. Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug) Closes #9899 * github.com:scylladb/scylla: table: get_snapshot_details: count allocated_size snapshot_ctl: cleanup true_snapshots_size snpashot_ctl: true_snapshots_size: do not map_reduce across all shards	2022-01-19 11:57:15 +02:00
Benny Halevy	94c2272c8e	table: get_snapshot_details: count allocated_size Rather than the logical file sizes so to account for metadata overhead. Fixes #9898 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-19 08:10:57 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Benny Halevy	84e80f7b99	table: snapshot: handle error from seal_snapshot If seal_snapshot fails we currently do not signal the manifest_write semaphore and shards waiting for it will be blocked forever. Also, call manifest_write.wait in a `finally` clause rather than in a `then` clause, even though `my_work` future never fails at the moment, to make this future proof. Fixes #9936 Test: database_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220117181733.3706764-1-bhalevy@scylladb.com>	2022-01-18 12:17:01 +02:00
Mikołaj Sielużycki	f6d9d6175f	sstables: Harden bad_alloc handling during memtable flush. dirty_memory_manager monitors memory and triggers memtable flushing if there is too much pressure. If bad_alloc happens during the flush, it may break the loop and flushes won't be triggered automatically, leading to blocked writes as memory won't be automatically released. The solution is to add exception handling to the loop, so that the inner part always returns a non-exceptional future (meaning the loop will break only on node shutdown). try/catch is used around on_internal_error instead of on_internal_error_noexcept, as the latter doesn't have a version that accepts an exception pointer. To get the exception message from std::exception_ptr a rethrow is needed anyway, so this was a simpler approach. Fixes: #4174 Message-Id: <20220114082452.89189-1-mikolaj.sieluzycki@scylladb.com>	2022-01-14 16:09:21 +02:00
Botond Dénes	b6828e899a	Merge "Postpone reshape of SSTables created by repair" from Raphael " SSTables created by repair will potentially not conform to the compaction strategy layout goal. If node shuts down before off-strategy has a chance to reshape those files, node will be forced to reshape them on restart. That causes unexpected downtime. Turns out we can skip reshape of those files on boot, and allow them to be reshaped after node becomes online, as if the node never went down. Those files will go through same procedure as files created by repair-based ops. They will be placed in maintenance set, and be reshaped iteratively until ready for integration into the main set. " Fixes #9895. tests: UNIT(dev). * 'postpone_reshape_on_repair_originated_files' of https://github.com/raphaelsc/scylla: distributed_loader: postpone reshape of repair-originated sstables sstables: Introduce filter for sstable_directory::reshape table: add fast path when offstrategy is not needed sstables: add constant for repair origin	2022-01-14 14:05:09 +02:00
Botond Dénes	c727360eca	db: convert data listeners to v2 To remove yet another back-and-forth conversion in table::make_reader_v2(). Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220114085551.565752-1-bdenes@scylladb.com>	2022-01-14 13:57:44 +02:00
Raphael S. Carvalho	ae3b589f12	table: Reduce off-strategy space requirement if multiple compaction rounds are required Off-strategy compaction works by iteratively reshaping the maintenance set until it's ready for integration into the main set. As repair-based ops produces disjoint sstables only, off-strategy compaction can complete the reshape in a single round. But if reshape ends up requiring more than one round, space requirement for off-strategy to succeed can be high. That's because we're only deleting input SSTables on completion. SSTables from maintenance set can be only deleted on completion as we can only merge maintenance set into main one once we're done reshaping[1]. But a SSTable that was created by a reshape and later used as a input in another reshape can be deleted immediately as its existence is not needed anywhere. [1] We don't update maintenance set after each reshape round, because that would mess with its disjointness. We also don't iteratively merge maintenance set into main set, as the data produced by a single round is potentially not ready for integration into main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220111202950.111456-1-raphaelsc@scylladb.com>	2022-01-14 13:46:31 +02:00
Kamil Braun	168c6f47f9	replica: database: allow disabling optimized TWCS queries through compaction strategy options As requested from field engineering, add a way to disable the optimized TWCS query algorithm (use regular query path) just in case a bug or a performance regression shows up in production. To disable the optimized query path, add 'enable_optimized_twcs_queries': 'false' to compaction strategy options, e.g. ``` alter table ks.t with compaction = {'class': 'TimeWindowCompactionStrategy', 'enable_optimized_twcs_queries': 'false'}; ``` Setting the `enable_optimized_twcs_queries` key to anything other than `'false'` (note: a boolean `false` expands to a string `'false'`) or skipping it (re)enables the optimized query path. Note: the flag can be set in a cluster in the middle of upgrade. Nodes which do not understand it simply ignore it, but they do store it in their schema tables (they store the entire `compaction` map). After these nodes are upgraded, they will understand the flag and act accordingly. Note: in the situation above, some nodes may use the optimized path and some may use the regular path. This may happen also in a fully upgraded cluster when compaction options are changed concurrently to reads; there is a short period of time where the schema change propagates and some nodes got the flag but some didn't. These should not be a problem since the optimization does not change the returned read results (unless there is a bug). Generally, the flag is not intended for normal use, but for field engineers to disable it in case of a serious problem. Ref #6418. Closes #9900	2022-01-14 07:10:02 +02:00

1 2

60 Commits