scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	a7cdd846da	compaction: Prevent tons of compaction of fully expired sstable from happening in parallel Compaction manager can start tons of compaction of fully expired sstable in parallel, which may consume a significant amount of resources. This problem is caused by weight being released too early in compaction, after data is all compacted but before table is called to update its state, like replacing sstables and so on. Fully expired sstables aren't actually compacted, so the following can happen: - compaction 1 starts for expired sst A with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 2 starts for expired sst B with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 3 starts for expired sst C with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 1 is done updating table state, so it finally completes and releases all the resources. - compaction 2 is done updating table state, so it finally completes and releases all the resources. - compaction 3 is done updating table state, so it finally completes and releases all the resources. This happens because, with expired sstable, compaction will release weight faster than it will update table state, as there's nothing to be compacted. With my reproducer, it's very easy to reach 50 parallel compactions on a single shard, but that number can be easily worse depending on the amount of sstables with fully expired data, across all tables. This high parallelism can happen only with a couple of tables, if there are many time windows with expired data, as they can be compacted in parallel. Prior to `55a8b6e3c9`, weight was released earlier in compaction, before last sstable was sealed, but right now, there's no need to release weight earlier. Weight can be released in a much simpler way, after the compaction is actually done. So such compactions will be serialized from now on. Fixes #8710. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com> [avi: drop now unneeded storage_service_for_tests]	2021-05-30 23:22:51 +03:00
Avi Kivity	0acf5bfca6	build: enable -Wreturn-std-move Clang warns when "return std::move(x)" is needed to elide a copy, but the call to std::move() is missing. We disabled the warning during the migration to clang. This patch re-enables the warning and fixes the places it points out, usually by adding std::move() and in one place by converting the returned variable from a reference to a local, so normal copy elision can take place. Closes #8739	2021-05-27 21:16:26 +03:00
Raphael S. Carvalho	ee39eb9042	sstables: Fix slow off-strategy compaction on STCS tables Off-strategy compaction on a table using STCS is slow because of the needless write amplification of 2. That's because STCS reshape isn't taking advantage of the fact that sstables produced by a repair-based operation are disjoint. So the ~256 input sstables were compacted (in batches of 32) into larger sstables, which in turn were compacted into even larger ones. That write amp is very significant on large data sets, making the whole operation 2x slower. Fixes #8449. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210524213426.196407-1-raphaelsc@scylladb.com>	2021-05-25 11:24:42 +03:00
Benny Halevy	56d3cb514a	sstables: parse statistics: improve error handling Properly return malformed_sstable_exception if the statistics file fails to parse. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210524113808.973951-1-bhalevy@scylladb.com>	2021-05-24 15:12:48 +03:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Avi Kivity	047b3f85d3	sstables: mx reader: drop unused _column_value_length field	2021-05-21 21:02:55 +03:00
Avi Kivity	32d9ba2fbb	sstables: index_consumer: drop unused max_quantity field	2021-05-21 21:02:16 +03:00
Avi Kivity	cb587aaa5c	compaction: resharding_compaction: drop unused _shard field	2021-05-21 21:01:54 +03:00
Avi Kivity	f62469b7c5	compaction: compaction_read_monitor: drop unused _compaction_manager field A constructor that now takes on argument is made explicit.	2021-05-21 21:00:47 +03:00
Pavel Solodovnikov	d7a77a993f	sstables/writer: remove extraneous `service/storage_service.hh` include Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 02:03:24 +03:00
Pavel Solodovnikov	c3a7b55507	treewide: remove extraneous database.hh includes from headers Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:59:14 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Avi Kivity	6db826475d	Merge "Introduce segregate scrub mode" from Botond " The current scrub compaction has a serious drawback, while it is very effective at removing any corruptions it recognizes, it is very heavy-handed in its way of repairing such corruptions: it simply drops all data that is suspected to be corrupt. While this is the safest way to cleanse data, it might not be the best way from the point of view of a user who doesn't want to loose data, even at the risk of retaining some business-logic level corruption. Mind you, no database-level scrub can ever fully repair data from the business-logic point of view, they can only do so on the database-level. So in certain cases it might be desirable to have a less heavy-handed approach of cleansing the data, that tries as hard as it can to not loose any data. This series introduces a new scrub mode, with the goal of addressing this use-case: when the user doesn't want to loose any data. The new mode is called "segregate" and it works by segregating its input into multiple outputs such that each output contains a valid stream. This approach can fix any out-of-order data, be that on the partition or fragment level. Out-of-order partitions are simply written into a separate output. Out of order fragments are handled by injecting a partition-end/partition-start pair right before them, so that they are now in a separate (duplicate) partition, that will just be written into a separate output, just like a regular out-of-order partition. The reason this series is posted as an RFC is that although I consider the code stable and tested, there are some questions related to the UX. * First and foremost every scrub that does more than just discard data that is suspected to be corrupt (but even these a certain degree) have to consider the possibility that they are rehabilitating corruptions, leaving them in the system without a warning, in the sense that the user won't see any more problems due to low-level corruptions and hence might think everything is alright, while data is still corrupt from the business logic point of view. It is very hard to draw a line between what should and shouldn't scrub do, yet there is a demand from users for scrub that can restore data without loosing any of it. Note that anybody executing such a scrub is already in a bad shape, even if they can read their data (they often can't) it is already corrupt, scrub is not making anything worse here. * This series converts the previous `skip_corrupted` boolean into an enum, which now selects the scrub mode. This means that `skip_corrupted` cannot be combined with segregate to throw out what the former can't fix. This was chosen for simplicity, a bunch of flags, all interacting with each other is very hard to see through in my opinion, a linear mode selector is much more so. * The new segregate mode goes all-in, by trying to fix even fragment-level disorder. Maybe it should only do it on the partition level, or maybe this should be made configurable, allowing the user to select what to happen with those data that cannot be fixed. Tests: unit(dev), unit(sstable_datafile_test:debug) " * 'sstable-scrub-segregate-by-partition/v1' of https://github.com/denesb/scylla: test: boost/sstable_datafile_test: add tests for segregate mode scrub api: storage_service/keyspace_scrub: expose new segregate mode sstables: compaction/scrub: add segregate mode mutation_fragment_stream_validator: add reset methods mutation_writer: add segregate_by_partition api: /storage_service/keyspace_scrub: add scrub mode param sstables: compaction/scrub: replace skip_corrupted with mode enum sstables: compaction/scrub: prevent infinite loop when last partition end is missing tests: boost/sstable_datafile_test: use the same permit for all fragments in scrub tests	2021-05-18 13:43:01 +03:00
Raphael S. Carvalho	10ae77966c	compaction_manager: Don't swallow exception in procedure used by reshape and resharding run_custom_job() was swallowing all exceptions, which is definitely wrong because failure in a resharding or reshape would be incorrectly interpreted as success, which means upper layer will continue as if everything is ok. For example, ignoring a failure in resharding could result in a shared sstable being left unresharded, so when that sstable reaches a table, scylla would abort as shared ssts are no longer accepted in the main sstable set. Let's allow the exception to be propagated, so failure will be communicated, and resharding and reshape will be all or nothing, as originally intended. Fixes #8657. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210515015721.384667-1-raphaelsc@scylladb.com>	2021-05-17 13:57:05 +02:00
Michael Livshin	357ab759ee	statistics: add global bloom filter memory gauge Refs #251. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-05-12 03:48:07 +03:00
Michael Livshin	5abeadde4d	statistics: add some sstable management metrics Add the following metrics, as part of #251: - open for writing (a.k.a. "created", unless I'm missing something?) - open for reading - deleted - currently open for reading/writing (gauges) Refs #251. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-05-12 03:48:07 +03:00
Michael Livshin	9a2b54fcf6	sstables: make the `_open` field more useful The field is hitherto only used in scylla-gdb.py. Let it store the open mode (if any). Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-05-12 03:48:07 +03:00
Michael Livshin	1f83251b2b	sstables: stats: noexcept all accessors Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-05-12 03:48:07 +03:00
Piotr Sarna	00e59a9823	sstables: disambiguate boost::find There are multiple functions named `find` in boost, so to avoid future clashes, this one is explicitly marked as belonging to boost::range.	2021-05-10 11:48:14 +02:00
Raphael S. Carvalho	8480839932	LCS/reshape: Don't reshape single sstable in level 0 with strict mode With strict mode, it could happen that a sstable alone in level 0 is selected for offstrategy compaction, which means that we could run into an infinite reshape process. This is fixed by respecting the offstrategy threshold. Unit test is added. Fixes #8573. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210506181324.49636-1-raphaelsc@scylladb.com>	2021-05-09 11:09:54 +03:00
Lauro Ramos Venancio	15f72f7c9e	TWCS: initialize _highest_window_seen The timestamp_type is an int64_t. So, it has to be explicitly initialized before using it. This missing inicialization prevented the major compactation from happening when a time window finishes, as described in #8569. Fixes #8569 Signed-off-by: Lauro Ramos Venancio <lauro.venancio@incognia.com> Closes #8590	2021-05-05 17:31:05 +03:00
Botond Dénes	674a77ead0	sstables: compaction/scrub: add segregate mode In segregate mode scrub will segregate the content of of input sstables into potentially multiple output sstables such that they respect partition level and fragment level monotonicity requirements. This can be used to fix data where partitions or even fragments are out-of-order or duplicated. In this case no data is lost and after the scrub each sstables contains valid data. Out-of-order partitions are fixed by simply being written into a separate output, compared to the last one compaction was writing into. Out-of-order fragments are fixed by injecting a partition-end/partition-start pair right before them, effectively moving them into a separate (duplicate) partition which is then treated in the above mentioned way. This mode can fix corruptions where partitions are out-of-order or duplicated. This mode cannot fix corruptions where partitions were merged, although data will be made valid from the database level, it won't be on the business-logic level.	2021-05-05 14:33:49 +03:00
Benny Halevy	ead96e21c3	compaction: size_tiered_compaction_strategy: get_buckets: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-05-05 14:26:37 +03:00
Benny Halevy	c1681cb9ea	compaction: size_tiered_compaction_strategy: get_buckets: don't let the bucket average drift too high SSTables are added in increasing size order so the bucket's average might drift upwards. Don't let it drift too high, to a point where the smallest SSTable might fall out of range. For example, here's a simulation run of the algorithm for these sstable sizes: [21, 123, 252, 363, 379, 394, 407, 428, 463, 467, 470, 523, 752, 774] the simulated compaction strategy options are: min_sstable_size = 4 bucket_low = 0.66667 bucket_high = 1.5 For each bucket, the following is printed: (avg * bucket_low) avg (avg * bucket_high) UNCHANGED: buckets={ ( 14.0) 21.0 ( 31.5): [21] ( 82.0) 123.0 ( 184.5): [123] ( 276.4) 414.6 ( 621.9): [252, 363, 379, 394, 407, 428, 463, 467, 470, 523] ( 508.7) 763.0 (1144.5): [752, 774] } IMPROVED: buckets={ ( 14.0) 21.0 ( 31.5): [21] ( 82.0) 123.0 ( 184.5): [123] ( 247.0) 370.5 ( 555.8): [252, 363, 379, 394, 407, 428] ( 320.5) 480.8 ( 721.1): [463, 467, 470, 523] ( 508.7) 763.0 (1144.5): [752, 774] } Fixes #8584 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-05-05 14:26:28 +03:00
Benny Halevy	d3aa5265ab	compaction: size_tiered_compaction_strategy: get_buckets: keep bucket average size as double precision floating point number Using integer division lose accuracy by rounding down the result. Each time we calculate: ``` auto total_size = bucket.size() * old_average_size; auto new_average_size = (total_size + size) / (bucket.size() + 1); ``` We accumulate the rounding error. total_size might be too small since old_average_size was previously rounded down, and then new_average_size is rounded down again. Rather than trying to compensate for the rounding errors by e.g. adding size / 2 to the dividend, simply keep the average as a double precision number. Note that we multiply old_average_size by options.bucket_{low,high}, that are double precision too so the size comparisons are already using FP instructions implicitly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-05-05 14:26:25 +03:00
Benny Halevy	44b094f9a5	compaction: size_tiered_compaction_strategy: get_buckets: rename old_average_size to bucket_average_size Since now it became a reference used to update the bucket's average size after a new sstable is inserted into it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-05-05 14:26:20 +03:00
Benny Halevy	336a4dc0fd	compaction: size_tiered_compaction_strategy: get_buckets: consider only current bucket for each sstable Since the sstables are sorted in increasing size order there is no need to consider all buckets to find a matching one. Instead, just consider the most recently inserted bucket. Once we see a sstable size outside the allowed range for this bucket, create a new bucket and consider this one for the next sstable. Note, `old_average_size` should be renamed since this change turns it into a reference and it's assigned with the new average_size. This patch keeps the old name to reduce the churn. The following patch will do only the rename. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-05-05 14:26:05 +03:00
Botond Dénes	03728f5c26	sstables: compaction/scrub: replace skip_corrupted with mode enum We want to add more modes than the current two, so replace the current boolean mode selector with an enum which allows for easy extensions.	2021-05-05 12:03:42 +03:00
Botond Dénes	ba75115e20	sstables: compaction/scrub: prevent infinite loop when last partition end is missing Scrub compaction will add the missing last partition-end in a stream when allowed to modify the stream. This however can cause an infinite loop: 1) user calls fill_buffer() 2) process fragments until underlying is at EOS 3) add missing partition end 4) set EOS 5) user sees that last buffer wasn't empty 6) calls fill_buffer() again 7) goto (3) To prevent this cycle, break out of `fill_buffer()` early when both the scrub reader and the underlying is at EOS.	2021-05-05 12:03:42 +03:00
Pavel Emelyanov	13b07a3c58	sstables: Make checksum sink report buffer size from lower sink The checksum sink carries another sink on board and forwards the put buffers lower, so there's no point in making these two have different buffer sizes. This is what really happens now, but this change makes this more explicit and makes the checksumming code conform to the new output stream API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-04 12:01:30 +03:00
Pavel Emelyanov	01b979beca	sstables: Report buffer size from compressed file sink This change just moves the place from which the output_stream knows the compression::uncompressed_chunk_length() value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-04 12:01:27 +03:00
Botond Dénes	9fc3cba055	sstables: improve error message for invalid sstable paths The error message currently complains about "invalid version" and later says the reason is that the path is not recognized. This is confusing so change the error message to start with "invalid path" instead. It is the path that is invalid not the version after all. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210429092749.52659-1-bdenes@scylladb.com>	2021-04-29 12:50:48 +03:00
Asias He	60ba8eb9b8	sstables: Add debug info when create_sharding_metadata generates zero ranges The range passed to create_sharding_metadata is supposed to be owned or at least partially owned by the shard. Log keys, range and split ranges for debugging if the range does not belong to the shard. This is helpful for debugging "Failed to generate sharding metadata for foo.db" issues reported. Refs #7056 Closes #8557	2021-04-28 11:22:06 +03:00
Benny Halevy	3e7075a739	compaction: setup: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	90a7a8ff0e	compaction: close reader when done consuming Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	7d42a71310	mutation_reader: position_reader_queue: add close method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	3c05529329	sstables: scrub_compaction: reader: close underlying reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	75eed563bc	sstables: write_components: close reader when done Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	8c585ccb5c	sstables: sstable_mutation_reader: implement close Close both the _index_reader and _context, if they are engaged. Warn and ignore any erros from close as it may be called either from the destructor or from f_m_r close. Call close() for closing in the background if needed when destroyed and warn about. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	6a82e9f4be	sstables: index_reader: mark close noexcept We'd like that to simplify the soon-to-be-introduced sstable_mutation_reader::close error handling path. close_index_list can be marked noexcept since parallel_for_each is, with that index_reader::close can be marked noexcept too. Note that since reader close can not fail both lower and upper bounds are closed (since closing lower_bound cannot fail). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Avi Kivity	350f79c8ce	Merge 'sstables: remove large allocations when parsing cells' from Wojciech Mitros sstable cells are parsed into temporary_buffers, which causes large contiguous allocations for some cells. This is fixed by storing fragments of the cell value in a fragmented_temporary_buffer instead. To achieve this, this patch also adds new methods to the fragmented_temporary_buffer(size(), ostream& operator<<()) and adds methods to the underlying parser(primitive_consumer) for parsing byte strings into fragmented buffers. Fixes #7457 Fixes #6376 Closes #8182 * github.com:scylladb/scylla: primitive_consumer: keep fragments of parsed buffer in a small_vector sstables: add parsing of cell values into fragmented buffers sstables: add non-contiguous parsing of byte strings to the primitive_consumer utils: add ostream operator<<() for fragmented_temporary_buffer::view compound_type: extend serialize_value for all FragmentedView types	2021-04-22 15:38:10 +02:00
Avi Kivity	a063173ace	Merge "Fix unbounded memory usage and high write amplification in TWCS reshape" from Raphael " Memory usage is considerably reduced by making reshape switch to partitioned set, given that input sstables are disjoint. This will benefit reshape for all strategies, not only TWCS. Write amplification is reduced a lot by compacting all input sstables at once, which is possible given that unbounded memory usage is fixed too. With both these issues fixed, TWCS reshape will be much more efficient. tests: mode(dev). " * 'twcs_reshape_fixes' of github.com:raphaelsc/scylla: tests: sstables: Check that TWCS is able to reshape disjoint sstables efficiently TWCS: Reshape all sstables in a time window at once if they're disjoint sstables: Extract code to count amount of overlapping into a function LCS: reshape: Fix overlapping check when determining if a sstable set is disjoint compaction: Make reshape compaction always use partitioned_sstable_set compaction: Allow a compaction type to override the sstable_set for input sstables	2021-04-22 11:24:49 +03:00
Raphael S. Carvalho	d5fc2f3839	TWCS: Reshape all sstables in a time window at once if they're disjoint With repair-based operations, each window will have 256 disjoint sstables due to data segregation which produces N sstables for each vnode range, where N = # of existing windows. So each window ends up with one sstable per vnode range = 256. Given that reshape now unconditionally uses partitioned set's incremental selector, all the 256 sstables can be compacted at once as compaction essentially becomes a copy operation, where only one sstable will be opened at a time, making its memory usage very efficient. By compacting all sstables at once, write amplification is a lot reduced because each byte is now only rewritten once. Previously, with the initial set of 256 sstables, write amp could be up to 8, which makes reshape for TWCS very slow. Refs #8449. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-21 11:03:16 -03:00
Raphael S. Carvalho	0f7774a6f8	sstables: Extract code to count amount of overlapping into a function This function will be reused by TWCS reshape when checking if all sstables in a window are disjoint and can be all compacted together. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-21 11:03:16 -03:00
Raphael S. Carvalho	39ecddbd34	LCS: reshape: Fix overlapping check when determining if a sstable set is disjoint Wrong comparison operator is used when checking for overlapping. It would miss overlapping when last key of a sstable is equal to the first key of another sstable that comes next in the set, which is sorted by first key. Fixes #8531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-21 11:03:07 -03:00
Piotr Sarna	2ad09d0bf8	Merge 'treewide: remove inclusions of storage_proxy.hh from headers' from Avi Kivity Reduce rebuilds and build time by removing unnecessary includes. Along the way, improve header sanity. Ref #1. Test: dev-headers, unit(dev). Closes #8524 * github.com:scylladb/scylla: treewide: remove inclusions of storage_proxy.hh from headers storage_proxy: unnest coordinator_query_result treewide: make headers self-sufficient utils: intrusive_btree: add missing #pragma once	2021-04-21 08:22:52 +02:00
Benny Halevy	7130e2e7ff	sstables: harden unlink Make sure that sstable::unlink will never fail. It will terminate in the unlikely case toc_filename throws (e,g, on bad_alloc), otherwise it ignores any other error and juts warns about it. Make unlink a coroutine to simplify the implementation without introducing additional allocations. Note that remove_by_toc_name and maybe_delete_large_data_entries are executed asynchronously and concurrently. Waiting for them to finish is serialized by co_await, making sure that both are being waited on so not to leave abandoned futures behind. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210420135020.102733-1-bhalevy@scylladb.com>	2021-04-21 08:22:52 +02:00
Raphael S. Carvalho	678e4c0bb9	compaction: Make reshape compaction always use partitioned_sstable_set Reshape compaction potentially works with disjoint sstables, so it will benefit a lot from using partitioned_sstable_set, which is able to incrementally open the disjoint sstables. Without it, all sstables are opened at once, which means unbounded memory usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-20 15:39:51 -03:00
Avi Kivity	14a4173f50	treewide: make headers self-sufficient In preparation for some large header changes, fix up any headers that aren't self-sufficient by adding needed includes or forward declarations.	2021-04-20 21:23:00 +03:00
Raphael S. Carvalho	ad9bc808b9	compaction: Allow a compaction type to override the sstable_set for input sstables By default, compaction will pick a implementation of sstable_set as defined by the underlying compaction strategy. However, reshape compaction potentially works with disjoint sstables and will benefit a lot from always using partitioned set. For example, when reshaping a TWCS table, it's better to use the partitioned set rather than the time window set, as the former will be much more memory efficient by incrementally selecting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-20 12:03:44 -03:00

1 2 3 4 5 ...

2475 Commits