scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	d01e1a774b	Merge 'Build performance: do not include the entire <seastar/net/ip.hh>' from Nadav Har'El The header file <seastar/net/ip.hh> is a large collection of unrelated stuff, and according to ClangBuildAnalyzer, takes 2 seconds to compile for every source file that included it - and unfortunately virtually all Scylla source files included it - through either "types.hh" or "gms/inet_address.hh". That's 2300 CPU seconds wasted. In this two-patch series we completely eliminate the inclusion of <seastar/net/ip.hh> from Scylla. We still need the ipv4_address, ipv6_address types (e.g., gms/inet_address.hh uses it to hold a node's IP address) so those were split (in a Seastar patch that is already in) from ip.hh into separate small header files that we can include. This patch reduces the entire build time (of build/dev/scylla) by 4% - reducing almost 10 sCPU minutes (!) from the build. Closes #9875 github.com:scylladb/scylla: build performance: do not include <seastar/net/ip.hh> build performance: speed up inclusion of <gm/inet_address.hh>	2022-01-05 17:55:07 +02:00
Nadav Har'El	6012f6f2b6	build performance: do not include <seastar/net/ip.hh> In a previous patch, we noticed that the header file <gm/inet_address.hh>, which is included, directly or indirectly, by most source files, includes <seastar/net/ip.hh> which is very slow to compile, and replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>. However, we also included <seastar/net/ip.hh> in types.hh - and that too is included by almost every file, so the actual saving from the above patch was minimal. So in this patch we replace this include too. After this patch Scylla does not include <seastar/net/ip.hh> at all. According to ClangBuildAnalyzer, this reduces the average time to include types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds, and reduces total build time (dev mode) by about 3%. Some of the source files were now missing some include directives, that were previously included in ip.hh - so we need to add those explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-01-05 17:29:21 +02:00
Avi Kivity	53a83c4b1e	Merge "flat_mutation_reader: convert flat_mutation_reader_from_mutations to v2" from Botond " Like flat_mutation_reader_from_fragments, this reader is also heavily used by tests to compose a specific workload for readers above it. So instead of converting it, we add a v2 variant and leave the v1 variant in place. The v2 variant was written from scratch to have built-in support for reading in reverse. It is built-on `mutation::consume()` to avoid duplicating the logic of consuming the contents of the mutation. To avoid stalls, `mutation::consume()` gets support for pausing and resuming consuming a mutation. Tests: unit(dev) " * 'flat_mutation_reader_from_mutations_v2/v2' of https://github.com/denesb/scylla: flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2 flat_mutation_reader: extract mutation slicing into a function mutation: consume(): make it pausable/resumable mutation: consume(): restructure clustering iterator initialization test/boost/mutation_test: add rebuild test for mutation::consume()	2022-01-05 10:23:17 +02:00
Botond Dénes	62d82b8b0e	flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2 Since this reader is also heavily used by tests to compose a specific workload for readers above it, we just add a v2 variant, instead of changing the existing v1 one. The v2 variant was written from scratch to have built-in support for reading in reverse. It is built-on `mutation::consume()` to avoid duplicating the logic of consuming the contents of the mutation. A v2 native unit test is also added.	2022-01-05 09:06:16 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Botond Dénes	5e547dcc8a	test/boost/mutation_test: add rebuild test for mutation::consume() In the next patches we will refactor mutation::consume(). Before doing that add another test, which rebuilds the consumed mutation, comparing it with the original.	2022-01-04 11:43:46 +02:00
Avi Kivity	9e74556413	Merge 'Support reverse reads in the row cache natively' from Tomasz Grabiec This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read. The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after `703aed3277`. Refs: #1413 Tests: - unit [dev] - manual query with build/dev/scylla and cache tracing on Closes #9454 * github.com:scylladb/scylla: tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries row_cache: partition_snapshot_row_cursor: Print more details about the current version vector row_cache: Improve trace-level logging config: Use cache for reversed reads by default config: Adjust reversed_reads_auto_bypass_cache description row_cache: Support reverse reads natively mvcc: partition_snapshot: Support slicing range tombstones in reverse test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition row_cache: Log produced range tombstones test: Make produces_range_tombstone() report ck_ranges tests: lib: random_mutation_generator: Extract make_random_range_tombstone() partition_snapshot_row_cursor: Support reverse iteration utils: immutable-collection: Make movable intrusive_btree: Make default-initialized iterator cast to false	2021-12-29 16:53:25 +02:00
Botond Dénes	aba68c8f83	Merge "reader_concurrency_semaphore: convert to flat_mutation_reader_v2" from Michael " The second patch in this series is a mechanical conversion of reader_concurrency_semaphore to flat_mutation_reader_v2, and caller updates. The first patch is needed to pass the test suite, since without it a real reader version conversion would happen on every entry to and exit from reader_concurrency_semaphore, which is stressful (for example: mutation_reader_test.test_multishard_streaming_reader reaches 8191 conversions for a couple of readers, which somehow causes it to catch SIGSEGV in diverse and seemingly-random places). Note that in a real workload it is unreasonable to expect readers being parked in a reader_concurrency_semaphore to be pristine, so short-circuiting their version conversions will be impossible and this workaround will not really help. " * tag 'rcs-v2-v4' of https://github.com/cmm/scylla: reader_concurrency_semaphore: convert to flat_mutation_reader_v2 short-circuit flat mutation reader upgrades and downgrades	2021-12-22 15:08:31 +02:00
Michael Livshin	a1b8ba23d2	reader_concurrency_semaphore: convert to flat_mutation_reader_v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-12-21 11:26:17 +02:00
Raphael S. Carvalho	64ec1c6ec6	table: Make sure major compaction doesn't miss data in memtable Make sure that major will compact data in all sstables and memtable, as tombstones sitting in memtable could shadow data in sstables. For example, a tombstone in memtable deleting a large partition could be missed in major, so space wouldn't be saved as expected. Additionally, write amplification is reduced as data in memtable won't have to travel through tiers once flushed. Fixes #9514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217160055.96693-2-raphaelsc@scylladb.com>	2021-12-21 07:21:34 +02:00
Botond Dénes	55bb70a878	Merge "Make sure TWCS per-window major includes all files" from Raphael " TWCS perform STCS on a window as long as it's the most recent one. From there on, TWCS will compact all files in the past window into a single file. With some moderate write load, it could happen that there's still some compaction activity in that past window, meaning that per-window major may miss some files being currently compacted. As a result, a past window may contain more than 1 file after all compaction activity is done on its behalf, which may increase read amplification. To avoid that, TWCS will now make sure that per-window major is serialized, to make sure no files are missed. Fixes #9553. tests: unit(dev). " * 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla: TWCS: Make sure major on past window is done on all its sstables TWCS: remove needless param for STCS options TWCS: kill unused param in newest_bucket() compaction: Implement strategy control and wire it compaction: Add interface to control strategy behavior.	2021-12-20 17:12:50 +02:00
Avi Kivity	e772fcbd57	Merge "Convert combined reader to v2" from Botond " Users are adjusted by sprinkling `upgrade_to_v2()` and `downgrade_to_v1()` where necessary (or removing any of these where possible). No attempt was made to optimize and reduce the amount of v1<->v2 conversions. This is left for follow-up patches to keep this set small. The combined reader is composed of 3 layers: 1. fragment producer - pop fragments from readers, return them in batches (each fragment in a batch having the same type and pos). 2. fragment merger - merge fragment batches into single fragments 3. reader implementation glue-code Converting layers (1) and (3) was mostly mechanical. The logic of merging range tombstone changes is implemented at layer (2), so the two different producer (layer 1) implementations we have share this logic. Tests: unit(dev) " * 'combined-reader-v2/v4' of https://github.com/denesb/scylla: test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging mutation_reader: convert make_clustering_combined_reader() to v2 mutation_reader: convert position_reader_queue to v2 mutation_reader: convert make_combined_reader() overloads to v2 mutation_reader: combined_reader: convert reader_selector to v2 mutation_reader: convert combined reader to v2 mutation_reader: combined_reader: attach stream_id to mutation_fragments flat_mutation_reader_v2: add v2 version of empty reader test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation	2021-12-20 14:01:03 +02:00
Botond Dénes	7f331cee01	test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging Stressing the range tombstone change merging logic.	2021-12-20 09:29:05 +02:00
Botond Dénes	e1bbc4a480	mutation_reader: convert make_clustering_combined_reader() to v2 Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to call sites, no attempts at optimization was done.	2021-12-20 09:29:05 +02:00
Botond Dénes	2364144b19	mutation_reader: convert position_reader_queue to v2 By removing the converting (v1->v2) constructor of `reader_and_upper_bound` and adjusting its users.	2021-12-20 09:29:05 +02:00
Botond Dénes	aeddcf50a1	mutation_reader: convert make_combined_reader() overloads to v2 Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to call sites, no attempts at optimization was done.	2021-12-20 09:29:05 +02:00
Botond Dénes	1554b94b78	mutation_reader: combined_reader: convert reader_selector to v2	2021-12-20 09:29:05 +02:00
Tomasz Grabiec	1c80d7fec4	tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries	2021-12-19 22:43:52 +01:00
Tomasz Grabiec	63351483f0	row_cache: Support reverse reads natively Some implementation notes below. When iterating in reverse, _last_row is after the current entry (_next_row) in table schema order, not before like in the forward mode. Since there is no dummy row before all entries, reverse iteration must be now prepared for the fact that advancing _next_row may land not pointing at any row. The partition_snapshot_row_cursor maintains continuity() correctly in this case, and positions the cursor before all rows, so most of the code works unchanged. The only excpetion is in move_to_next_entry(), which now cannot assume that failure to advance to an entry means it can end a read. maybe_drop_last_entry() is not implemented in reverse mode, which may expose reverse-only workload to the problem of accumulating dummy entries. ensure_population_lower_bound() was not updating _last_row after inserting the entry in latets version. This was not a problem for forward reads because they do not modify the row in the partition snapshot represented by _last_row. They only need the row to be there in the latest version after the call. It's different for reveresed reads, which change the continuity of the entry represented by _last_row, hence _last_row needs to have the iterator updated to point to the entry from the latest version, otherwise we'd set the continuity of the previous version entry which would corrupt the continuity.	2021-12-19 22:41:35 +01:00
Tomasz Grabiec	d0c367f44f	mvcc: partition_snapshot: Support slicing range tombstones in reverse	2021-12-19 22:41:35 +01:00
Tomasz Grabiec	757fc1275f	partition_snapshot_row_cursor: Support reverse iteration	2021-12-19 22:41:35 +01:00
Raphael S. Carvalho	f508f54f3e	table: move min_compaction_threshold() and compaction_enforce_min_threshold() into table_state Compaction specific methods can be implemented in table_state only, as they aren't needed elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211214191822.164223-1-raphaelsc@scylladb.com>	2021-12-17 10:00:31 +02:00
Botond Dénes	f15f4952be	test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation Currently the test assumes that fragments represent weakly monotonic upper bounds and therefore unconditionally overwrites the upper-bound on receiving each fragment. Range tombstones however violate this as a range tombstone with a smaller position (lower bound) may have a higher upper bound than some or all fragments that follow it in the stream. This causes test failures after the converting the combined reader to v2, but not before, no idea why.	2021-12-16 14:57:49 +02:00
Pavel Emelyanov	b2a62d2b59	Merge 'db: range_tombstone_list: Deoverlap empty range tombstones' from Tomasz Grabiec Appending an empty range adjacent to an existing range tombstone would not deoverlap (by dropping the empty range tombstone) resulting in different (non canoncial) result depending on the order of appending. Suppose that range tombstone [a, b] covers range tombstone [x, x), and [a, x) and [x, b) are range tombstones which correspond to [a, b] split around position x. Appending [a, x) then [x, b) then [x, x) would give [a, b) Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b) The fix is to drop empty range tombstones in range_tombstone_list so that the result is canonical. Fixes #9661 Closes #9764 * github.com:scylladb/scylla: range_tombstone_list: Deoverlap adjacent empty ranges range_tombstone_list: Convert to work in terms of position_in_partition	2021-12-16 10:00:40 +03:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Avi Kivity	3ac622bdd8	Merge "Add v2 versions of make_forwadable() and make_flat_mutation_reader_from_fragments()" from Botond " These two readers are crucial for writing tests for any composable reader so we need v2 versions of them before we can convert and test the combined reader (for example). As these two readers are often used in situations where the payload they deliver is specially crafted for the test at hand, we keep their v1 versions too to avoid conversion meddling with the tests. Tests: unit(dev) " * 'forwarding-and-fragment-reader-v2/v1' of https://github.com/denesb/scylla: flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments() test/lib/mutation_source_test: don't force v1 reader in reverse run mutation_source: add native_version() getter flat_mutation_reader_v2: add make_forwardable() position_in_partition: add after_key(position_in_partition_view) flat_mutation_reader: make_forwardable(): fix indentation flat_mutation_reader: make_forwardable(): coroutinize reader	2021-12-14 20:43:09 +02:00
Benny Halevy	32d61a3d09	test: sstable_directory_test_table_lock_works: verify that truncate is blocked on the the table lock The test in its current form is invalid, as database::remove does removing the table's name from its listing as well as from the keyspace metadata, so it won't be found after that. That said, database::drop_column_family then proceeds to truncate and stop the table, after calling await_pending_ops, and the latter should indeed block on the lock taken by the test. This change modifies the test to create some sstables in the table's directory before starting the sstable_directory. Then, when executing "drop table" in the background, wait until the table is not found by db.find_column_family That would fail the test before this change. See https://jenkins.scylladb.com/job/scylla-enterprise/job/next/1442/artifact/testlog/x86_64_debug/sstable_directory_test.sstable_directory_test_table_lock_works.4720.log ``` INFO 2021-12-13 14:00:17,298 [shard 0] schema_tables - Dropping ks.cf id=00487bc0-5c1d-11ec-9e3b-a44f824027ae version=b10c4994-31c7-3f5a-9591-7fedb0273c82 test/boost/sstable_directory_test.cc(453): fatal error: in "sstable_directory_test_table_lock_works": unexpected exception thrown by table_ok.get() ``` A this point, the test verifies again that the sstables are still on disk (and no truncate happened), and only after drop completed, the table should not exist on disk. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211214104407.2225080-1-bhalevy@scylladb.com>	2021-12-14 14:26:17 +02:00
Tomasz Grabiec	b228ddabb7	Merge "Move schema altering statement to raft" from Gleb The series is on top of "wire up schema raft state machine". It will apply without, but will not work obviously (when raft is disabled it does nothing anyway). This series does not provide any linearisability just yet though. It only uses raft as a means to distribute schema mutations. To achieve linearisability more work is needed. We need to at lease make sure that schema mutation use monotonically increasing timestamps and, since schema altering statement are RMW, no modification to schema were done between schema mutation creation and application. If there were an operation needs to be restarted. * scylla-dev/gleb/raft-schema-v5: (59 commits) cql3: cleanup mutation creation code in ALTER TYPE cql3: use migration_manager::schema_read_barrier() before accessing a schema in altering statements cql3: bounce schema altering statement to shard 0 migration_manager: add is_raft_enabled() to check if raft is enabled on a cluster migration_manager: add schema_read_barrier() function migration_manager: make announce() raft aware migration_manager: co-routinize announce() function migration_manager: pass raft_gr to the migration manager migration_manager: drop view_ptr array from announce_column_family_update() mm: drop unused announce_ methods cql3: drop schema_altering_statement::announce_migration() cql3: drop has_prepare_schema_mutations() from schema altering statement cql3: drop announce_migration() usage from schema_altering_statement cql3: move DROP AGGREGATE statement to prepare_schema_mutations() api migration_manager: add prepare_aggregate_drop_announcement() function cql3: move DROP FUNCTION statement to prepare_schema_mutations() api migration_manager: add prepare_function_drop_announcement() function cql3: move CREATE AGGREGATE statement to prepare_schema_mutations() api migration_manager: add prepare_new_aggregate_announcement() function cql3: move CREATE FUNCTION statement to prepare_schema_mutations() api ...	2021-12-14 11:05:32 +01:00
Tomasz Grabiec	78a6474982	range_tombstone_list: Deoverlap adjacent empty ranges Appending an empty range adjacent to an existing range tombstone would not deoverlap (by dropping the empty range tombstone) resulting in different (non canoncial) result depending on the order of appending. Suppose that [a, b] covers [x, x) Appending [a, x) then [x, b) then [x, x) would give [a, b) Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b) Fix by dropping empty range tombstones.	2021-12-13 21:31:36 +01:00
Raphael S. Carvalho	8eace8fc49	TWCS: Make sure major on past window is done on all its sstables Once current window is sealed, TWCS is supposed to compact all its sstables into one. If there's ongoing compaction, it can happen that sstables are missed and therefore past windows will contain more than one sstable. Additionally, it could happen that major doesn't happen at all if under heavy load. All these problems are fixed by serializing major on past window and also postponing it if manager refuses to run the job now. Fixes #9553. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:10:43 -03:00
Raphael S. Carvalho	2dc890d8e6	TWCS: remove needless param for STCS options STCS option can be retrieved from class member, as newest_bucket() is no longer a static function. let's get rid of it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:40 -03:00
Raphael S. Carvalho	41a5736aaf	TWCS: kill unused param in newest_bucket() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:36 -03:00
Raphael S. Carvalho	49f40c8791	compaction: Implement strategy control and wire it This implements strategy control interface for both manager and tests, and wire it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:23 -03:00
Avi Kivity	e44a28dce4	Merge "compaction: Allow data from different buckets (e.g. windows) to be compacted together" from Raphael " Today, data from different buckets (e.g. windows) cannot be compacted together because mutation compactor happens inside each consumer, where each consumer is done on behalf of a particular bucket. To solve this problem, mutation compaction process is being moved from consumer into producer, such that interposer consumer, which is responsible for segregation, will be feeded with compacted data and forward it into the owner bucket. Fixes #9662. tests: unit(debug). " * 'compact_across_buckets_v2' of github.com:raphaelsc/scylla: tests: sstable_compaction_test: add test_twcs_compaction_across_buckets compaction: Move mutation compaction into producer for TWCS compaction: make enable_garbage_collected_sstable_writer() more precise	2021-12-12 15:07:15 +02:00
Gleb Natapov	38e1f85959	migration_manager: drop view_ptr array from announce_column_family_update() No users pass it any longer.	2021-12-11 12:31:07 +02:00
Raphael S. Carvalho	7c90088152	tests: sstable_compaction_test: add test_twcs_compaction_across_buckets Verify that TWCS compaction can now compact data across time windows, like a tombstone which will cause all shadowed data to be purged once they're all compacted together. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 17:14:45 -03:00
Botond Dénes	39426b1aa3	flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments() The main difference compared to v1 (apart from having _v2 suffix at relevant places) is how slicing and reversing works. The v2 variant has native reverse support built-in because the reversing reader is not something we want to convert to v2. A native v2 mutation-source test is also added.	2021-12-10 15:48:49 +02:00
Pavel Emelyanov	5a405a4273	tests: Make B-tree tests use unique-ptrs for insertion The non-smart-pointers overloads are going away, prepare tests for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-10 12:35:12 +03:00
Benny Halevy	cca956bce2	database_test: snapshot_with_quarantine_works: get the list of sstables from table object Rather than the filesystem, to reduce flakiness. Also, add some test logging. Fixes #9763 Test: database_test(debug, release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211209175144.854896-1-bhalevy@scylladb.com>	2021-12-09 21:01:25 +02:00
Benny Halevy	8728fd480d	database_test: do_with_some_data: get the return func future do_with_some_data runs a function in a seastar thread. It needs to get() the future func returns rather than propagating it. This solves a secondary failure due to abandoned future when the test case fails, as seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4254/artifact/testlog/x86_64_debug/database_test.snapshot_with_quarantine_works.381.log ``` test/boost/database_test.cc(903): fatal error: in "snapshot_with_quarantine_works": critical check expected.empty() has failed WARN 2021-12-08 00:35:16,300 [shard 0] seastar - Exceptional future ignored: boost::execution_aborted, backtrace: 0x10935e50 0x16ff2d8d 0x16ff2a4d 0x16ff5033 0x16ff5ec2 0x162d4ce9 0x10a2bdb5 0x10a2bd24 0x10a54ca4 0x10a27cf3 0x10a22151 0x10a67c9d 0x10a67a78 0x163ac37e 0x163b29e9 0x163b7690 0x163b51c1 0x17c212df 0x17c1f097 0x17bf8b4c 0x17bf83f2 0x17bf82a2 0x17bf7d52 0x10f8bf5a 0x166db84b /lib64/libpthread.so.0+0x9298 /lib64/libc.so.6+0x100352 ... *** 1 abandoned failed future(s) detected Failing the test because fail was requested by --fail-on-abandoned-failed-futures ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211209174512.851945-1-bhalevy@scylladb.com>	2021-12-09 21:11:56 +03:00
Mikołaj Sielużycki	504efe0607	table: Prevent resurrecting data from memtable on compaction Mutations are not guaranteed to come in the order of their timestamps. If there is an expired tombstone in the sstable and a repair inserts old data into memtable, the compaction would not consider memtable data and purge the tombstone leading to data resurrection. The solution is to disallow purging tombstones newer than min memtable timestamp.	2021-12-09 13:22:14 +01:00
Mikołaj Sielużycki	7ce0ca040d	table: Add min_memtable_timestamp function to table	2021-12-09 13:14:38 +01:00
Botond Dénes	2e5440bdf2	Merge 'Convert compaction to flat_mutation_reader_v2' from Raphael Carvalho Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too. There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2. tests: unit(debug). Closes #9725 * github.com:scylladb/scylla: compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 sstable_set: update incremental_reader_selector to flat_mutation_reader_v2	2021-12-07 15:17:38 +02:00
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Avi Kivity	79bcdc104e	Merge "Fix stateful multi-range scans" from Botond " Currently stateful (readers being saved and resumed on page boundaries) multi-range scans are broken in multiple ways. Trying to use them can result in anything from use-after-free (#6716) or getting corrupt data (#9718). Luckily no-one is doing such queries today, but this started to change recently as code such as Alternator TTL and distributed aggregate reads started using this. This series fixes both problems and adds a unit test too exercising this previously completely unused code-path. Fixes: #6716 Fixes: #9718 Tests: unit(dev, release, debug) " * 'fix-stateful-multi-range-scans/v1' of https://github.com/denesb/scylla: test/boost/multishard_mutation_query_test: add multi-range test test/boost/multishard_mutation_query_test: add multi-range support multishard_mutation_query: don't drop data during stateful multi-range reads multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward	2021-12-07 12:19:56 +02:00
Avi Kivity	395b30bca8	mutation_reader: update make_filtering_reader() to flat_mutation_reader_v2 As part of the drive to move over to flat_mutation_reader_v2, update make_filtering_reader(). Since it doesn't examine range tombstones (only the partition_start, to filter the key) the entire patch is just glue code upgrading and downgrading users in the pipeline (or removing a conversion, in one case). Test: unit (dev) Closes #9723	2021-12-07 12:18:07 +02:00
Benny Halevy	9ed72cac95	test: sstable_compaction_test: add sstable_scrub_quarantine_mode_test For each quarantine mode: Validate sstables to quarantine one of them and then scrub with the given quarantine mode and verify the output whwther the quarantined sstable was scrubbed or not. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:58 +02:00
Benny Halevy	60ff28932c	compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub So we can pass additional options on top of the scrub mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:21:37 +02:00

1 2 3 4 5 ...

1411 Commits