scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Botond Dénes	eb42213db4	compact_mutation: close active range tombstone on page end The compactor recently acquired the ability to consume a v2 stream. The v2 spec requires that all streams end with a null tombstone. `range_tombstone_assembler`, the component the compactor uses for converting the v2 input into its v1 output enforces this with a check on `consume_end_of_partition()`. Normally the producer of the stream the compactor is consuming takes care of closing the active tombstone before the stream ends. The compactor however (or its consumer) can decide to end the consume early, e.g. to cut the current page. When this happens the compactor must take care of closing the tombstone itself. Furthermore it has to keep this tombstone around to re-open it on the next page. This patch implements this mechanism which was left out of `134601a15e`. It also adds a unit test which reproduces the problems caused by the missing mechanism. The compactor now tracks the last clustering position emitted. When the page ends, this position will be used as the position of the closing range tombstone change. This ensures the range tombstone only covers the actually emitted range. Fixes: #9907 Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>	2022-01-25 09:52:30 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	134601a15e	Merge "Convert input side of mutation compactor to v2" from Botond " With this series the mutation compactor can now consume a v2 stream. On the output side it still uses v1, so it can now act as an online v2->v1 converter. This allows us to push out v2->v1 conversion to as far as the compactor, usually the next to last component in a read pipeline, just before the final consumer. For reads this is as far as we can go, as the intra-node ABI and hence the result-sets built are v1. For compaction we could go further and eliminate conversion altogether, but this requires some further work on both the compactor and the sstable writer and so it is left to be done later. To summarize, this patchset enables a v2 input for the compactor and it updates compaction and single partition reads to use it. " * 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla: table: add make_reader_v2() querier: convert querier_cache and {data,mutation}_querier to v2 compaction: upgrade compaction::make_interposer_consumer() to v2 mutation_reader: remove unecessary stable_flattened_mutations_consumer compaction/compaction_strategy: convert make_interposer_consumer() to v2 mutation_writer: migrate timestamp_based_splitting_writer to v2 mutation_writer: migrate shard_based_splitting_writer to v2 mutation_writer: add v2 clone of feed_writer and bucket_writer flat_mutation_reader_v2: add reader_consumer_v2 typedef mutation_reader: add v2 clone of queue_reader compact_mutation: make start_new_page() independent of mutation_fragment version compact_mutation: add support for consuming a v2 stream compact_mutation: extract range tombstone consumption into own method range_tombstone_assembler: add get_range_tombstone_change() range_tombstone_assembler: add get_current_tombstone()	2022-01-12 14:37:19 +02:00
Botond Dénes	aa3c943f4c	mutation_reader: remove unecessary stable_flattened_mutations_consumer Said wrapper was conceived to make unmovable `compact_mutation` because readers wanted movable consumers. But `compact_mutation` is movable for years now, as all its unmovable bits were moved into an `lw_shared_ptr<>` member. So drop this unnecessary wrapper and its unnecessary usages.	2022-01-07 13:52:07 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	53a83c4b1e	Merge "flat_mutation_reader: convert flat_mutation_reader_from_mutations to v2" from Botond " Like flat_mutation_reader_from_fragments, this reader is also heavily used by tests to compose a specific workload for readers above it. So instead of converting it, we add a v2 variant and leave the v1 variant in place. The v2 variant was written from scratch to have built-in support for reading in reverse. It is built-on `mutation::consume()` to avoid duplicating the logic of consuming the contents of the mutation. To avoid stalls, `mutation::consume()` gets support for pausing and resuming consuming a mutation. Tests: unit(dev) " * 'flat_mutation_reader_from_mutations_v2/v2' of https://github.com/denesb/scylla: flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2 flat_mutation_reader: extract mutation slicing into a function mutation: consume(): make it pausable/resumable mutation: consume(): restructure clustering iterator initialization test/boost/mutation_test: add rebuild test for mutation::consume()	2022-01-05 10:23:17 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Botond Dénes	5e547dcc8a	test/boost/mutation_test: add rebuild test for mutation::consume() In the next patches we will refactor mutation::consume(). Before doing that add another test, which rebuilds the consumed mutation, comparing it with the original.	2022-01-04 11:43:46 +02:00
Botond Dénes	64bb48855c	flat_mutation_reader: revamp flat_mutation_reader_from_mutations() Add schema parameter so that: * Caller has better control over schema -- especially relevant for reverse reads where it is not possible to follow the convention of passing the query schema which is reversed compared to that of the mutations. * Now that we don't depend on the mutations for the schema, we can lift the restriction on mutations not being empty: this leads to safer code. When the mutations parameter is empty, an empty reader is created. Add "make_" prefix to follow convention of similar reader factory functions. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>	2021-11-15 17:58:46 +02:00
Tomasz Grabiec	c4328ffc4d	tests: mutation_test: Add test for position_in_partition::reversed() Message-Id: <20210927154942.44236-1-tgrabiec@scylladb.com>	2021-09-28 13:09:39 +02:00
Botond Dénes	f02632aeb0	range_tombstone_accumulator: drop _reversed flag	2021-09-09 15:42:15 +03:00
Botond Dénes	f07805c3ef	test/boost/mutation_test: add test for mutation::consume() monotonicity In both forward and reverse modes.	2021-09-09 15:42:15 +03:00
Pavel Emelyanov	5515f7187d	range_tombstone, code: Add range_tombstone& getters Currently all the code operates on the range_tombstone class. and many of those places get the range tombstone in question from the range_tombstone_list. Next patches will make that list carry (and return) some new object called range_tombstone_entry, so all the code that expects to see the former one there will need to patched to get the range_tombstone from the _entry one. This patch prepares the ground for that by introdusing the range_tombstone& tombstone() { return *this; } getter on the range_tombstone itself and patching all future users of the _entry to call .tombstone() right now. Next patch will remove those getters together with adding the new range_tombstone_entry object thus automatically converting all the patched places into using the entry in a proper way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-03 19:34:45 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Avi Kivity	0909e3c17d	treewide: remove redundant "x <=> 0" compares If x is of type std::strong_ordering, then "x <=> 0" is equivalent to x. These no-ops were inserted during #1449 fixes, but are now unnecessary. They have potential for harm, since they can hide an accidental of the type of x to an arithmetic type, so remove them. Ref #1449.	2021-07-28 13:30:32 +03:00
Avi Kivity	70f481a1f0	test: mutation_test: convert internal tri-compare to std::strong_ordering Drop the temporary merge_container() overload we had to support tri-compares that returned int.	2021-07-28 13:30:07 +03:00
Avi Kivity	d86e529239	serialized_tri_compare: change to std::strong_ordering Also convert a users in mutation_test. Ref #1449.	2021-07-28 13:19:00 +03:00
Botond Dénes	2d2b9e7b36	test/boost: migrate off the global test reader semaphore	2021-07-08 16:53:38 +03:00
Botond Dénes	3679418e62	test/lib/test_services: migrate off the global test reader semaphore	2021-07-08 15:28:39 +03:00
Raphael S. Carvalho	1924e8d2b6	treewide: Move compaction code into a new top-level compaction dir Since compaction is layered on top of sstables, let's move all compaction code into a new top-level directory. This change will give me extra motivation to remove all layer violations, like sstable calling compaction-specific code, and compaction entanglement with other components like table and storage service. Next steps: - remove all layer violations - move compaction code in sstables namespace into a new one for compaction. - move compaction unit tests into its own file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>	2021-07-07 23:21:51 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Emelyanov	d2442a1bb3	tests: Ditch storage_service_for_tests The purpose of the class in question is to start sharded storage service to make its global instance alive. I don't know when exactly it happened but no code that instantiates this wrapper really needs the global storage service. Ref: #2795 tests: unit(dev), perf_sstable(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210526170454.15795-1-xemul@scylladb.com>	2021-05-27 14:39:13 +03:00
Avi Kivity	e2e723cc4c	build: enable -Wrange-loop-construct warning This warning triggers when a range for ("for (auto x : range)") causes non-trivial copies, prompting the developer to replace with a capture by reference. A few minor violations in the test suite are corrected. Closes #8699	2021-05-26 10:32:56 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Benny Halevy	efe938cf1f	flat_mutation_reader: make sure to close reader passed to read_mutation_from_flat_mutation_reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	4b8dc7ac7e	flat_mutation_reader: make sure to close flat_mutation_reader_from_mutations Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:25:47 +03:00
Pavel Emelyanov	64074f45ce	code: Relax position_in_partition::tri_compare users There are some pieces left doing res <=> 0 with the res now being a strong_ordering itself. All these can be just dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 18:20:39 +03:00
Pavel Emelyanov	a15f158661	test: Convert clustering_fragment_summary::tri_cmp to strong_ordering Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 18:20:39 +03:00
Avi Kivity	f0092ae475	test: mutation_test: prepare merge_container for std::strong_ordering The function merge_container() accepts a trichotomic comparator returning an int. As #1449 explains, this is dangerous as it could be mistaken for a less comparator. Switch to std::strong_ordering, but leave a compatible merge_container() in place as it is still needed (even after this series).	2021-03-18 12:40:05 +02:00
Botond Dénes	cf28552357	mutation_test: test_mutation_diff_with_random_generator: compact input mutations This test checks that `mutation_partition::difference()` works correctly. One of the checks it does is: m1 + m2 == m1 + (m2 - m1). If the two mutations are identical but have compactable data, e.g. a shadowable tombstone shadowed by a row marker, the apply will collapse these, causing the above equality check to fail (as m2 - m1 is null). To prevent this, compact the two input mutations. Fixes: #8221 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210310141118.212538-1-bdenes@scylladb.com>	2021-03-10 16:28:14 +01:00
Michał Chojnowski	5b79d6ca4c	test: mutation_test: remove an obsolete assertion Due to small value optimizations, the removed assertions are not true in general. Until now, atomic_cell did not use small value optimizations, but it will after upcoming changes.	2021-02-16 21:35:14 +01:00
Michał Chojnowski	aa60f28a09	test: mutation_test: initialize an uninitialized variable It was assumed to be zero-initialized, but C++ does not guarantee that. It has to be initialized explicitly.	2021-02-16 21:35:14 +01:00
Pavel Emelyanov	575c992a35	test: Bring test_apply_monotonically_is_monotonic back to work The idea of the monotonicity checking test is: try to apply one one random partition to another random one sequentually failing allocations. Each time allocation fails (with the bad_alloc exception) -- check the exception guarantee is respected, then apply (!) the very same two partitions to each other. At the end of the test we make sure, that an exception may pop up at any point of application and it will be safe. This idea is flawed currently. When verifying the guarantee the test moves the 2nd partition and leaves it empty for the next loop iteration. So right on the 2nd attempt to apply partitions it becomes a no-op, doesn't fail and no more exceptions arise. Fix by restoring both partitions at the end of each check. Broken since `74db08165d`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210129153641.5449-1-xemul@scylladb.com>	2021-01-29 18:47:15 +01:00
Avi Kivity	aec231ba2e	Merge "Unify query paths" from Botond " Currently we have two parallel query paths: * database::query() -> table::query() -> data_query() * mutation::query() The former is used by single partition queries, the latter by range scans, as mutation::query() is used to convert reconcilable_result to query::result (which means it is also used in single partition queries if it triggers read repair). This is a rather unfortunate situation as we have two parallel implementation of the query code, which means they are prone to diverge, and in fact they already have -- more on that later. This patchset aims to remedy this situation by retiring `mutation::query()` and migrating users to an implementation based on the "standard" query path, in other words one using the same building blocks as the `database::query()` path. This means using `compact_mutation` for compacting and `query_result_builder` for result building. These components however were created to work with `flat_mutation_reader`, however introducing a reader into this pipeline would mean that we'd have to make all the related APIs asynchronous, which would cause an insane amount of churn. To avoid this, this patchset adds an API compatible `consume()` method to `mutation`, which can accept a `compact_mutation` instance as-is. This allows an elegant and succinct reimplementation. So far so good. Like mentioned above, the two implementations have diverged in time, or have been different from the start. The difference manifest when calculating digests, more precisely in which tombstones are included in the digest. The retired `mutation::query()` path incorporates only non-purgeable tombstones in the digest. The standard query path however incorporates all tombstones, even those that can be purged. After some scrutiny however this difference proved to be completely theoretical, as the code path where this would matter -- converting reconcilable result to query result -- passes min timestamp as the query time to the compaction, so nothing is compacted and hence the difference has no chance to manifest. This patch-set was motivated by the desire to provide a single solution to #7434, instead of two, one for each path. Tests: unit(release:v2, debug:v2, dev:v3) " * 'unified-query-path/v3' of https://github.com/denesb/scylla: mutation: remove now unused query() and query_compacted() treewide: use query_mutations() instead of mutation::query() mutation_test: test_query_digest: ensure digest is produced consistently mutation_query: introduce query_mutation() mutation_query: to_data_query_result(): migrate to standard query code mutation_query: move to_data_query_result() to mutation_partition.cc mutation: add consume() flat_mutation_reader: move mutation consumer concepts to separate header mutation compactor: query compaction: ignore purgeable tombstones	2021-01-27 15:58:47 +02:00
Avi Kivity	f58151d191	test: mutation_test: fix initialization order bug with thread local storage test_cell_external_memory_usage uses with_allocator() to observe how some types allocate memory. However, compiler reordering (observed with clang 11 on aarch64) can move the various thread-local CQL type object initialization into the with_allocator() scope; so any managed object allocated as part of this initialization also gets measured, and the test fails. The code movement is legal, as far as I can tell. Fix this by initializing the type object early; use an atomic_thread_fence as an optimization barrier so the compiler doesn't eliminate the or move the early initialization. Closes #7951	2021-01-26 11:14:42 +02:00
Botond Dénes	1a3ee71b39	treewide: use query_mutations() instead of mutation::query() We want to retire the latter.	2021-01-22 15:36:37 +02:00
Botond Dénes	a9d726c7ba	mutation_test: test_query_digest: ensure digest is produced consistently Before we retire the mutation::query() code, expand the digest test to check that the new code replacing it produces identical digest on all possible equivalent mutations.	2021-01-22 15:27:48 +02:00
Michał Chojnowski	f317b3c39f	mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator measuring_allocator is a wrapper around standard_allocator, but it exposed the default preferred_max_contiguous_allocation, not the one from standard_allocator. Thus managed_bytes allocated in those two allocators had fragments of different size, and their total memory usage differed, causing test_external_memory_usage to fail if standard_allocator::preferred_max_contiguous_allocation was changed from the default. Fix that.	2021-01-08 14:16:08 +01:00
Avi Kivity	732d83dc0e	test: mutation_test: check both ranges when comparing summaries A copy/paste error means we ignore the termination of one of the ranges. Change the comma expression to a disjunction to avoid the unused value warning from clang. The code is not perfect, since if the two ranges are not the same size we'll invoke undefined behavior, but it is no worse than before (where we ignored the comparison completely).	2020-12-07 16:47:52 +02:00
Avi Kivity	b406af2556	test: mutation_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:28 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Avi Kivity	0134e2f436	mutation_test: adjust for column_family_test_config accepting an sstables_manager Acquire a test_env and extract an sstables_manager from that, passing it to column_familty_test_config, in preparation for losing the default constructor of column_familty_test_config.	2020-09-23 20:55:11 +03:00
Piotr Sarna	fe5cd846b5	test: extend mutation_test for NULL values The test is extended for another possible corner case: [1, NULL, 2] vs [1, 2, NULL] should have different digests. Also, a check for legacy behavior is added.	2020-09-10 13:16:44 +02:00
Paweł Dziepak	287d0371fa	tests/mutation: add reproducer for #4567	2020-09-10 13:16:44 +02:00
Rafael Ávila de Espíndola	74db08165d	tests: Convert to using memory::with_allocation_failures Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200805155143.122396-1-espindola@scylladb.com>	2020-08-10 18:37:42 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola	efeaded427	Everywhere: Add a make_shared_schema helper This replaces a lot of make_lw_shared(schema(...)) with make_shared_schema(...). This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00

1 2

64 Commits