scylladb

Author	SHA1	Message	Date
Benny Halevy	6f202cf48b	compaction_group, storage_group, table_state: add extended timestamp stats getters To return the minimum live timestamp and live row-marker timestamp across a compaction_group, storage_group, or table_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	14d86a3a12	sstables, memtable: track live timestamps When garbage collecting tombstones, we care only about shadowing of live data. However, currently we track min/max timestamp of both live and dead data, but there is no problem with purging tombstones that shadow dead data (expired or shdowed by other tombstones in the sstable/memtable). Also, for shadowable tombstones, we track live row marker timestamps separately since, if the live row marker timestamp is greater than a shadowable tombstone timestamp, then the row marker would shadow the shadowable tombstone thus exposing the cells in that row, even if their timestasmp may be smaller than the shadow tombstone's. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:49 +03:00
Calle Wilund	e18a855abe	extensions: Add exception types for IO extensions and handle in memtable write path Fixes #19960 Write path for sstables/commitlog need to handle the fact that IO extensions can generate errors, some of which should be considered retry-able, and some that should, similar to system IO errors, cause the node to go into isolate mode. One option would of course be for extensions to simply generate std::system_errors, with system_category and appropriate codes. But this is probably a bad idea, since it makes it more muddy at which level an error happened, as well as limits the expressibility of the error. This adds three distinct types (sharing base) distinguishing permission, availabilty and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in disk error handler and memtable write loop. Tests updated to use and verify behaviour. Closes scylladb/scylladb#19961	2024-08-11 13:52:35 +03:00
Botond Dénes	1f4b9a5300	Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk If parent_info argument of compaction_manager::perform_compaction is std::nullopt, then created compaction executor isn't tracked by task manager. Currently, all compaction operations should by visible in task manager. Modify split methods to keep split executor in task manager. Get rid of the option to bypass task manager. Closes scylladb/scylladb#19995 * github.com:scylladb/scylladb: compaction: replace optional<task_info> with task_info param compaction: keep split executor in task manager	2024-08-11 10:26:43 +03:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Aleksandra Martyniuk	c456a43173	compaction: replace optional<task_info> with task_info param compaction_manager::perform_compaction does not create task manager task for compaction if parent_info is set to std::nullopt. Currently, we always want to create task manager task for compaction. Remove optional from task info parameters which start compaction. Track all compactions with task manager.	2024-08-02 14:38:46 +02:00
Calle Wilund	91b1be6736	memtable_test: Add test for isolate behaviour on exceptions during flush Tests that certain exceptions thrown during flush to sstable does not crash the node, but does trigger io_error_handler and causes node isolation	2024-07-17 09:36:28 +00:00
Raphael S. Carvalho	ad5c5bca5f	replica: get rid of fragile compaction group intrusive list It was added to make integration of storage groups easier, but it's complicated since it's another source of truth and we could have problems if it becomes inconsistent with the group map. Fixes #18506. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:53:35 -03:00
Avi Kivity	fdc1449392	treewide: rename flat_mutation_reader_v2 to mutation_reader flat_mutation_reader_v2 was introduced in a pair of commits in 2021: `e3309322c3` "Clone flat_mutation_reader related classes into v2 variants" `08b5773c12` "Adapt flat_mutation_reader_v2 to the new version of the API" as a replacement for flat_mutation_reader, using range_tombstone_change instead of range_tombstone to represent represent range tombstones. See those commits for more information. The transition was incremental; the last use of the original flat_mutation_reader was removed in 2022 in commit `026f8cc1e7` "db: Use mutation_partition_v2 in mvcc" In turn, flat_mutation_reader was introduced in 2017 in commit `748205ca75` "Introduce flat_mutation_reader" To transition from a mutation_reader that nested rows within a partition in a separate stream, to a flat reader that streamed partitions and rows in the same stream. Here, we reclaim the original name and rename the awkward flat_mutation_reader_v2 to mutation_reader. Note that mutation_fragment_v2 remains since we still use the original for compatibilty, sometimes. Some notes about the transition: - files were also renamed. In one case (flat_mutation_reader_test.cc), the rename target already existed, so we rename to mutation_reader_another_test.cc. - a namespace 'mutation_reader' with two definitions existed (in mutation_reader_fwd.hh). Its contents was folded into the mutation_reader class. As a result, a few #includes had to be adjusted. Closes scylladb/scylladb#19356	2024-06-21 07:12:06 +03:00
Kefu Chai	223fba3243	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Kefu Chai	2df4e9cfc2	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Avi Kivity	02111d6754	memtable: consolidate _read_section, _allocating_section in a struct Those two members are passed from memtable_list to memtable. Since we wish to pass them from table, it becomes awkward to pass them as two separate variables as their contents are specific to memtable internals. Wrap them in a name that indicates their role (being table-wide shared data for memtables) and pass them as a unit.	2023-12-26 21:11:48 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Pavel Emelyanov	eeee58def8	tests: Make use of make_memtable() helper There's one in the utils that creates lw_shared_ptr<memtable> and applies provided vector of mutations into it. Lots of other test cases do literally the same by hand. The make_memtable() assumes that the caller is sitting in the seastar thread, and all the test cases that can benfit from it already are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:28:35 +03:00
Avi Kivity	7d5e22b43b	replica: memtable: don't forget memtable memory allocation statistics A memtable object contains two logalloc::allocating_section members that track memory allocation requirements during reads and writes. Because these are local to the memtable, each time we seal a memtable and create a new one, these statistics are forgotten. As a result we may have to re-learn the typical size of reads and writes, incurring a small performance penalty. The solution is to move the allocating_section object to the memtable_list container. The workload is the same across all memtables of the same table, so we don't lose discrimination here. The performance penalty may be increased later if log changes to memory reserve thresholds including a backtrace, so this reduces the odds of incurring such a penalty. Closes scylladb/scylladb#15737	2023-10-18 17:43:33 +02:00
Patryk Jędrzejczak	866c9a904d	test: always pass empty description to migration_manager::announce In the next commit, we remove the default value for the description parameter of migration_manager::announce to avoid using it in the future. However, many calls to announce in tests use the default value. We have to change it, but we don't really care about descriptions in the tests, so we pass the empty string everywhere.	2023-08-07 14:38:11 +02:00
Patryk Jędrzejczak	3468cbd66b	service: migration_manager: change the prepare_ methods to functions The migration_manager service is responsible for schema convergence in the cluster - pushing schema changes to other nodes and pulling schema when a version mismatch is observed. However, there is also a part of migration_manager that doesn't really belong there - creating mutations for schema updates. These are the functions with prepare_ prefix. They don't modify any state and don't exchange any messages. They only need to read the local database. We take these functions out of migration_manager and make them separate functions to reduce the dependency of other modules (especially query_processor and CQL statements) on migration_manager. Since all of these functions only need access to storage_proxy (or even only replica::database), doing such a refactor is not complicated. We just have to add one parameter, either storage_proxy or database and both of them are easily accessible in the places where these functions are called.	2023-07-28 13:55:27 +02:00
Alejo Sanchez	a9350493e3	test/boost/memtable_test: split memtable sub-tests Split long-runing memtable tests. At a trade-off with verbosity, split these sub-tests for the long running tests test_memtable_with_many_versions_conforms_to_mutation_source*. Refs #13905	2023-07-15 10:51:09 +02:00
Alejo Sanchez	520bd90008	test/boost/memtable_test: split test plain/reverse Split long running test test_memtable_with_many_versions_conforms_to_mutation_source to 2 tests for _plain and _reverse. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14447	2023-07-03 15:20:12 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Avi Kivity	da5467c687	Merge 'Use implicit default prio class in tests' from Pavel Emelyanov There are several places in tests that either use default_priority_class() explicitly, or use some specific prio class obtained from priority manager. There's currently an ongoing work to remove all priority classes, this set makes the final patch a bit smaller and easier to review. In particular -- in many cases default_priority_class() is implicit and can be avoided by callers. Also, using any prio class by test is excessive, it can go with (implicit) default_priority_class. ref: #13963 Closes #13991 * github.com:scylladb/scylladb: test, memtable: Use default prio class test, memtable: Add default value for make_flush_reader() last arg test, view_build: Use default prio class test, sstables: Use implicit default prio class in dma_write() test, sstables: Use default sstable::get_writer()'s prio class arg	2023-05-23 18:46:52 +03:00
Pavel Emelyanov	f9ff5cdfdf	test, memtable: Use default prio class Similarly to previous patch with view-building -- using default class is OK for a unit test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 10:21:27 +03:00
Pavel Emelyanov	daa808aa21	test, memtable: Add default value for make_flush_reader() last arg Many places call memtable::make_flush_reader() with default priority class. Make it a default-arg for the method, other reader making methods of memtable already have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 10:20:37 +03:00
Jan Ciolek	d2ef55b12c	test: use NetworkTopologyStrategy in all unit tests As described in https://github.com/scylladb/scylladb/issues/8638, we're moving away from `SimpleStrategy`, in the future it will become deprecated. We should remove all uses of it and replace them with `NetworkTopologyStrategy`. This change replaces `SimpleStrategy` with `NetworkTopologyStrategy` in all unit tests, or at least in the ones where it was reasonable to do so. Some of the tests were written explicitly to test the `SimpleStrategy` strategy, or changing the keyspace from `SimpleStrategy` to `NetworkTopologyStrategy`. These tests were left intact. It's still a feature that is supported, even if it's slowly getting deprecated. The typical way to use `NetworkTopologyStrategy` is to specify a replication factor for each datacenter. This could be a bit cumbersome, we would have to fetch the list of datacenters, set the repfactors, etc. Luckily there is another way - we can just specify a replication factor to use for or each existing datacenter, like this: ```cql CREATE KEYSPACE {} WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}; ``` This makes the change rather straightforward - just replace all instances of `'SimpleStrategy'', with `'NetworkTopologyStrategy'`. Refs: https://github.com/scylladb/scylladb/issues/8638 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13990	2023-05-23 08:52:56 +03:00
Avi Kivity	0770b328c7	test: fix some mismatched signed/unsigned comparisons gcc likes to complain about sized/unsigned compares as they can yield surprising results. The fixes are trivial, so apply them.	2023-03-21 13:15:12 +02:00
Kefu Chai	60eac12db6	test: memtable_test: mark dummy variable for loop [[maybe_unused]] without C++23 `std::ranges::repeat_view`, it'd be cumbersume to implement a loop without dummy variable. this change helps to silence following warning: ``` test/boost/memtable_test.cc:1135:26: error: unused variable 'value' [-Werror,-Wunused-variable] for (int value : boost::irange<int>(0, num_flushes)) { ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Tomasz Grabiec	ccc8e47db1	Merge 'test/lib: introduce key_utils.hh' from Botond Dénes We currently have two method families to generate partition keys: * make_keys() in test/lib/simple_schema.hh * token_generation_for_shard() in test/lib/sstable_utils.hh Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size. This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation. Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size: * sstable_compaction_test.sstable_run_based_compaction_test * sstable_mutation_test.test_key_count_estimation These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test. Closes #12657 * github.com:scylladb/scylladb: test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends test/lib/simple_schema: remove now unused make_keys() and friends test: migrate to tests::generate_partition_key[s]() test/lib/test_services: add table_for_tests::make_default_schema() test/lib: add key_utils.hh test/lib/random_schema.hh: value_generator: add min_size_in_bytes	2023-02-06 18:11:32 +01:00
Avi Kivity	f73e2c992f	Merge 'Keep range tombstones with rows in memtables and cache' from Tomasz Grabiec This series switches memtable and cache to use a new representation for mutation data, called `mutation_partition_v2`. In this representation, range tombstone information is stored in the same tree as rows, attached to row entries. Each entry has a new tombstone field, which represents range tombstone part which applies to the interval between this entry and the previous one. See docs/dev/mvcc.md for more details about the format. The transient mutation object still uses the old model in order to avoid work needed to adapt old code to the new model. It may also be a good idea to live with two models, since the transient mutation has different requirements and thus different trade-offs can be made. Transient mutation doesn't need to support eviction and strong exception guarantees, so its algorithms and in-memory representation can be simpler. This allows us to incrementally evict range tombstone information. Before this series, range tombstones were accumulated and evicted only when the whole partition entry was evicted. This could lead to inefficient use of cache memory. Another advantage of the new representation is that reads don't have to lookup range tombstone information in a different tree while reading. This leads to simpler and more efficient readers. There are several disadvantages too. Firstly, rows_entry is now larger by 16 bytes. Secondly, update algorithms are more complex because they need to deoverlap range tombstone information. Also, to handle preemption and provide strong exception guarantees, update algorithms may need to allocate sentinel entries, which adds complexity and reduces performance. The memtable reader was changed to use the same cursor implementation which cache uses, for improved code reuse and reducing risk of bugs due to discrepancy of algorithms which deal with MVCC. Remaining work: - performance optimizations to apply_monotonically() to avoid regressions - performance testing - preemption support in apply_to_incomplete (cache update from memtable) Fixes #2578 Fixes #3288 Fixes #10587 Closes #12048 * github.com:scylladb/scylladb: test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction test: mvcc: Extract mvcc_container::allocate_in_region() row_cache, lru: Introduce evict_shallow() test: mvcc: Avoid copies of mutation under failure injection test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic mutation_partition_v2: Avoid full scan when applying mutation to non-evictable Pass is_evictable to apply() tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1 tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint tests: mutation_partition_v2: Add test for exception safety of mutation merging tests: Add tests for the mutation_partition_v2 model mutation_partition_v2: Implement compact() cache_tracker: Extract insert(mutation_partition_v2&) mvcc, mutation_partition: Document guarantees in case merging succeeds mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically() mutation_partition_v2: Simplify get_continuity() row_cache: Distinguish dummy insertion site in trace log db: Use mutation_partition_v2 in mvcc range_tombstone_change_merger: Introduce peek() readers: Extract range_tombstone_change_merger mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots mvcc: partition_snapshot_row_cursor: Support digest calculation mutation_partition_v2: Store range tombstones together with rows db: Introduce mutation_partition_v2 doc: Introduce docs/dev/mvcc.md db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU db: Print range_tombstone bounds as position_in_partition test: memtable_test: Relax test_segment_migration_during_flush test: cache_flat_mutation_reader: Avoid timestamp clash test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows test: mvcc: Fix sporadic failures due to compact_for_compaction() test: lib: random_mutation_generator: Produce partition tombstone less often test: lib: random_utils: Introduce with_probability() test: lib: Improve error message in has_same_continuity() test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker test: mvcc: Insert entries in the tracker test: mvcc_test: Do not set dummy::no on non-clustering rows mutation_partition: Print full position in error report in append_clustered_row() db: mutation_cleaner: Extract make_region_space_guard() position_in_partition: Optimize equality check mvcc: Fix version merging state resetting mutation_partition: apply_resume: Mark operator bool() as explicit	2023-02-05 22:33:10 +02:00
Raphael S. Carvalho	3c5afb2d5c	test: Enable Scylla test command line options for boost tests We have enabled the command line options without changing a single line of code, we only had to replace old include with scylla_test_case.hh. Next step is to add x-log-compaction-groups options, which will determine the number of compaction groups to be used by all instantiations of replica::table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	2d2460046b	test: memtable_test: Fix it with multiple compaction groups With compaction groups, automatic flushing may not pick the user table. Fix it by using explicit flush. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Botond Dénes	4ad3ba52b0	test: migrate to tests::generate_partition_key[s]() Use the newly introduced key generation facilities, instead of the the old inflexible alternatives and hand-rolled code. Most of the migrations are mechanic, but there are two tests that were tricky to migrate: * sstable_compaction_test.sstable_run_based_compaction_test * sstable_mutation_test.test_key_count_estimation These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.	2023-01-30 05:03:42 -05:00
Tomasz Grabiec	919ff433d1	tests: Add tests for the mutation_partition_v2 model	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	40719c600c	test: memtable_test: Relax test_segment_migration_during_flush Partition version merging can now insert sentinels, which may temporarily increase unspooled memory. It is no longer true that unspooled monotonically decreases, which the test verified. Relax it, and only verify that unspooled is smaller than real dirty.	2023-01-27 19:15:39 +01:00
Raphael S. Carvalho	ef8f542d75	replica: Adapt table::active_memtable() to compaction groups active_memtable() was fine to a single group, but with multiple groups, there will be one active memtable per group. Let's change the interface to reflect that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:14 -03:00
Pavel Emelyanov	6075e01312	test/lib: Remove sstable_utils.hh from simple_schema.hh The latter is pretty popular test/lib header that disseminates the former one over whole lot of unit tests. The former, in turn, naturally includes sstables.hh thus making tons of unrelated tests depend on sstables class unused by them. However, simple removal doesn't work, becase of local_shard_only bool class definition in sstable_utils.hh used in simple_schema.hh. This thing, in turn, is used in keys making helpers that don't belong to sstable utils, so these are moved into simple_schema as well. When done, this affects the mutation_source_test.hh, which needs the local_shard_only bool class (and helps spreading the sstables.hh throughout more unrelated tests) and a bunch of .cc test sources that used sstable_utils.hh to indirectly include various headers of their demand. After patching, sstables.hh touches 2x times less tests. As a side effect the sstables_manager.hh also becomes 2x times less dependent on by tests. Continuation of `9bdea110a6` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12240	2022-12-08 15:37:33 +02:00
Avi Kivity	444de2831e	dirty_memory_manager: move to replica module It's a replica-side thing, so move it there. The related flush_permit and sstable_write_permit are moved alongside.	2022-12-06 22:24:17 +02:00
Avi Kivity	37c6b46d26	dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty" The "virtual dirty" term is not very informative. "Virtual" means "not real", but it doesn't say in which way it isn't real. In this case, virtual dirty refers to real dirty memory, minus the portion of memtables that has been written to disk (but not yet sealed - in that case it would not be dirty in the first place). I chose to call "the portion of memtables that has been written to disk" as "spooled memory". At least the unique term will cause people to look it up and may be easier to remember. From that we have "unspooled memory". I plan to further change the accounting to account for spooled memory rather than unspooled, as that is a more natural term, but that is left for later. The documentation, config item, and metrics are adjusted. The config item is practically unused so it isn't worth keeping compatibility here.	2022-10-04 14:03:59 +03:00
Benny Halevy	0627667a06	mutation_partition: compact_for_compaction: get tombstone_gc_state And pass down to `do_compact`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Mikołaj Sielużycki	e0c6e1ef3c	table: Add test where compaction doesn't keep up with flush rate. The test simulates a situation where 2 threads issue flushes to 2 tables. Both issue small flushes, but one has injected reactor stalls. This can lead to a situation where lots of small sstables accumulate on disk, and, if compaction never has a chance to keep up, resources can be exhausted. (cherry picked from commit `b5684aa96d`) (cherry picked from commit `25407a7e41`)	2022-07-28 14:43:33 +03:00
Benny Halevy	bb9eddc67f	test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once Now that memtable flush error handling was moved entirely to table::seal_active_memtable, we don't need to notify_soft_pressure to keep retry going. The inifinite retry loop should eventually either succeed or die (by isolating the node or aborting) on its own. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 14:06:59 +03:00
Benny Halevy	b5abbb971f	test: memtable_test: failed_flush_prevents_writes: extend error injection Inject errors into all seal_active_memtable distinct error handling sites. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 14:06:59 +03:00
Avi Kivity	419fe65259	Revert "Merge 'Block flush until compaction finishes if sstables accumulate' from Mikołaj Sielużycki" This reverts commit `aa8f135f64`, reversing changes made to `9a88bc260c`. The patch causes hangs during flush. Also reverts parts of `411231da75` that impacted the unit test. Fixes #10897.	2022-07-06 12:19:02 +03:00
Tomasz Grabiec	a6aef60b93	memtable: Fix missing range tombstones during reads under ceratin rare conditions There is a bug introduced in `e74c3c8` (4.6.0) which makes memtable reader skip one a range tombstone for a certain pattern of deletions and under certain sequence of events. _rt_stream contains the result of deoverlapping range tombstones which had the same position, which were sipped from all the versions. The result of deoverlapping may produce a range tombstone which starts later, at the same position as a more recent tombstone which has not been sipped from the partition version yet. If we consume the old range tombstone from _rt_stream and then refresh the iterators, the refresh will skip over the newer tombstone. The fix is to drop the logic which drains _rt_stream so that _rt_stream is always merged with partition versions. For the problem to trigger, there have to be multiple MVCC versions (at least 2) which contain deletions of the following form: [a, c] @ t0 [a, b) @ t1, [b, d] @ t2 c > b The proper sequence for such versions is (assuming d > c): [a, b) @ t1, [b, d] @ t2 Due to the bug, the reader will produce: [a, b) @ t1, [b, c] @ t0 The reader also needs to be preempted right before processing [b, d] @ t2 and iterators need to get invalidated so that lsa_partition_reader::do_refresh_state() is called and it skips over [b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it does emit the proper range tombstone, it's possible that it will violate fragment order in the stream if _rt_stream accumulated remainders (possible with 3 MVCC versions). The problem goes away once MVCC versions merge. Fixes #10913 Fixes #10830 Closes #10914	2022-06-29 19:02:23 +03:00
Kamil Braun	411231da75	test/boost: memtable_test: perform schema operations on shard 0 Will be a prerequisite with Raft enabled.	2022-06-23 16:14:41 +02:00
Botond Dénes	4bd4aa2e88	Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec When memtable receives a tombstone it can happen under some workloads that it covers data which is still in the memtable. Some workloads may insert and delete data within a short time frame. We could reduce the rate of memtable flushes if we eagerly drop tombstoned data. One workload which benefits is the raft log. It stores a row for each uncommitted raft entry. When entries are committed they are deleted. So the live set is expected to be short under normal conditions. Fixes #652. Closes #10807 * github.com:scylladb/scylla: memtable: Add counters for tombstone compaction memtable, cache: Eagerly compact data with tombstones memtable: Subtract from flushed memory when cleaning mvcc: Introduce apply_resume to hold state for partition version merging test: mutation: Compare against compacted mutations compacting_reader: Drop irrelevant tombstones mutation_partition: Extract deletable_row::compact_and_expire() mvcc: Apply mutations in memtable with preemption enabled test: memtable: Make failed_flush_prevents_writes() immune to background merging	2022-06-15 18:12:42 +03:00
Tomasz Grabiec	3bec1cc19f	test: memtable: Make failed_flush_prevents_writes() immune to background merging Before the change, the test artificiallu set the soft pressure condition hoping that the background flusher will flush the memtable. It won't happen if by the time the background flusher runs the LSA region is updated and soft pressure (which is not really there) is lifted. Once apply() becomes preemptibe, backgroun partition version merging can lift the soft pressure, making the memtable flush not occur and making the test fail. Fix by triggering soft pressure on retries. Fixes #10801 Refs #10793 (cherry picked from commit `0e78ad50ea`) Closes #10802	2022-06-15 14:33:19 +02:00

1 2

88 Commits