scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-20 16:40:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	58e7ad20eb	sstable/compaction: Use correct schema in the writing consumer Introduced in `2a437ab427`. regular_compaction::select_sstable_writer() creates the sstable writer when the first partition is consumed from the combined mutation fragment stream. It gets the schema directly from the table object. That may be a different schema than the one used by the readers if there was a concurrent schema alter duringthat small time window. As a result, the writing consumer attached to readers will interpret fragments using the wrong version of the schema. One effect of this is storing values of some columns under a different column. This patch replaces all column_family::schema() accesses with accesses to the _schema memeber which is obtained once per compaction and is the same schema which readers use. Fixes #4304. Tests: - manual tests with hard-coded schema change injection to reproduce the bug - build/dev/scylla boot - tests/sstable_mutation_test Message-Id: <1551698056-23386-1-git-send-email-tgrabiec@scylladb.com>	2019-03-04 13:27:19 +02:00
Paweł Dziepak	6d5c1a9813	sstables: compaction: don't access moved-from vector of sstables	2019-02-07 10:16:50 +00:00
Avi Kivity	468f8c7ee7	Merge "Print a warning if a row is too large" from Rafael " This is a first step in fixing #3988. " * 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla: Rename large_partition_handler Print a warning if a row is too large Remove defaut parameter value Rename _threshold_bytes to _partition_threshold_bytes keys: add schema-aware printing for clustering_key_prefix	2019-02-03 13:57:42 +02:00
Raphael S. Carvalho	930f8caff9	sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction Fully expired sstable is not added to compacting set, meaning it's not actually compacted, but it's kept in a list of sstables which incremental compaction uses to check if any sstable can be replaced. Incremental compaction was unconditionally removing expired sstable from compacting set, which led to segfault because end iterator was given. The fix is about changing sstable_set::erase() behavior to follow standard one for erase functions which will works if the target element is not present. Fixes #4085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com>	2019-01-30 16:32:45 +00:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Benny Halevy	1d483bc424	compaction: mc: re-calculate encoding_stats based on column stats When compacting several sstables, get and merge their encoding_stats for encoding the result. Introduce sstable::get_encoding_stats_for_compaction to return encoding_stats based on the sstable's column stats. Use encoding_stats_collector to keep track of the minimum encoding_stats values of all input sstables. Fixes #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:59 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	23884fe9f6	compaction: use same run identifier for sstables generated by same compaction SSTables composing the same run will share the same run identifier. Therefore, a new compaction strategy will be able to get all sstables belong to the same run from sstable_set, which now keeps track of existing runs. Same UUID is passed to writers of a given compaction. Otherwise, a new UUID is picked for every sstable created by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:20 -02:00
Raphael S. Carvalho	3f309ebba9	sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor Motivation is that we want to release space for exhausted sstable and that will only happen when all references to it are gone and that backlog tracker takes the early replacement into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:13 -02:00
Raphael S. Carvalho	f6df949c1a	compaction: share sstable set with incremental reader selector By doing that, we'll be able to release exhausted sstable from both simulteaneously. That's achieved by sharing set containing input sstables with the incremental reader selector and removing exhausted sstables from shared set when the time has come. Step towards reducing disk requirement for compaction by making it delete sstable which all data is in a sealed new sstable. For that to happen, all references must be gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:10 -02:00
Raphael S. Carvalho	e5a0b05c15	sstables/compaction: release space earlier of exhausted input sstables Currently, compaction only replace input sstables at end of compaction, meaning compaction must be finished for all the space of those sstables to be released. What we can do instead is to delete earlier some input sstable under some conditions: 1) SStable data should be committed to a new, sealed output sstable, meaning it's exhausted. 2) Exhausted sstable mustn't overlap with a non-exhausted sstable because a tombstone in the exhausted could have been purged and the shadowed data in non-exhausted could be ressurected if system crashes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:07 -02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Raphael S. Carvalho	1c5934c934	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>	2018-11-06 09:28:37 +02:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	738e713edf	sstables: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Glauber Costa	51906f7144	compactions: log tokens that we decide not to write down to an SSTable May be important when debugging issues related to cleanups Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181015162643.7834-1-glauber@scylladb.com>	2018-10-15 19:28:00 +03:00
Avi Kivity	7c8143c3c4	Revert "compaction: demote compaction start/end messages to DEBUG level" This reverts commit `b443a9b930`. The compaction history table doesn't have enough information to be a replacement for this log message yet.	2018-10-03 13:13:37 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Avi Kivity	b443a9b930	compaction: demote compaction start/end messages to DEBUG level Compactions start and end all the time, especially with many shards, and don't contribute much to understanding what is going on these days. Compaction throughput is available through the metrics and other information is available via the compaction history table. Demote compaction start and end messages to DEBUG level to keep the log clean. Cleaning and resharding compactions are kept as INFO, at least for now, since they are manual operations and therefore rarer. Message-Id: <20180724132859.14109-1-avi@scylladb.com>	2018-07-25 09:53:39 +01:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Glauber Costa	7e3093709a	backlog: add level to write progress monitor For SSTables being written, we don't know their level yet. Add that information to the write monitor. New SSTables will always be at L0. Compacted SSTables will have their level determined by the compaction process. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-31 21:09:38 -04:00
Glauber Costa	b573a2ff61	backlog: keep track of maximum timestamp in write monitor For sealed SSTables we can get the maximum timestamp from the statistics component. But for partially written SSTables, the metadata is not yet available. One way to solve this would be to make the SSTable statistics available earlier. But we would end up with a maximum timestamp that potentially changes all the time as we write more cells. A better approach is to take note of what's the maximum timestamp in a memtable before we start to flush, and when time comes for us to flush we will use the progress manager to inform the consumers about the maximum timestamp. For SSTables being compacted, we can't know for sure what is the maximum timestamp as some entries could be TTLd already. But the maximum of all SSTables present in the compaction is a good enough estimation for this purposes. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 12:55:58 -04:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Vladimir Krivopalov	15ef4ca73c	Support for writing SSTables 3.0 ('mc') Data.db and Index.db files - rows only. This fix adds functionality for writing data in 'mc' format to Data.db file according to the SSTables 3.0 data format as described at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Data-File-Format and Index.db file according to the specification at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Index-File-Format The following cases are not supported yet: - writing counter cells - range tombstones In Index.db, end open markers are not written since range tombstones are not supported for data files yet. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-26 14:34:20 -07:00
Raphael S. Carvalho	11940ca39e	sstables: Fix bloom filter size after resharding by properly estimating partition count We were feeding the total estimation partition count of an input shared sstable to the output unshared ones. So sstable writer thinks, from estimation, that each sstable created by resharding will have the same data amount as the shared sstable they are being created from. That's a problem because estimation is feeded to bloom filter creation which directly influences its size. So if we're resharding all sstables that belong to all shards, the disk usage taken by filter components will be multiplied by the number of shards. That becomes more of a problem with #3302. Partition count estimation for a shard S will now be done as follow: // // TE, the total estimated partition count for a shard S, is defined as // TE = Sum(i = 0...N) { Ei / Si }. // // where i is an input sstable that belongs to shard S, // Ei is the estimated partition count for sstable i, // Si is the total number of shards that own sstable i. Fixes #2672. Refs #3302. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180423151001.9995-1-raphaelsc@scylladb.com>	2018-04-23 18:11:20 +03:00
Avi Kivity	03c22ad524	Merge "Support for Cassandra 2.2 (LA) SSTable formats" from Daniel " These patches add support for C* 2.2 file(name) format. Namely: * It forces Scylla to write files in la format. * Adds storage-service feature for them. * cf and ks are determined from directory, not from file-name (for 2.2 format). * Adds some other fixes to make dtest happy. * Unit tests work with la format or with both formats. " * 'danfiala/filename-format-2.2-v4' of https://github.com/hagrid-the-developer/scylla: tests/sstables: Tests use la format or iterate over both formats. tests/sstables: Helper functions support 2.2 format directory structure. stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. storage_service: Support la sstable storage format as a feature. sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. sstables: Throw more detail exception for unknown item in reverse_map. sstables/compaction: Suppress NaN in a report of a throughput.	2018-03-19 17:49:44 +02:00
Daniel Fiala	c5eca593fc	sstables/compaction: Suppress NaN in a report of a throughput. * It causes failures in dtest. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-18 05:46:32 +01:00
Raphael S. Carvalho	aa75684ee7	sstables: Warn when an extra-large partition is written Based on https://issues.apache.org/jira/browse/CASSANDRA-9643 For compaction_large_partition_warning_threshold_mb option set to 1, follow an example output: WARN 2018-02-22 19:52:11,029 [shard 0] sstable - Writing large row system/local:{key: pk{00056c6f63616c}, token:-7564491331177403445} (1276758 bytes) Fixes #2209. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180306175912.19259-1-raphaelsc@scylladb.com>	2018-03-07 15:49:46 +00:00
Glauber Costa	956af9f099	database, main: set up scheduling_groups for our main tasks Set up scheduling groups for streaming, compaction, memtable flush, query, and commitlog. The background writer scheduling group is retired; it is split into the memtable flush and compaction groups. Comments from Glauber: This patch is based in a patch from Avi with the same subject, but the differences are signficant enough so that I reset authorship. In particular: 1) A bug/regression is fixed with the boundary calculations for the memtable controller sampling function. 2) A leftover is removed, where after flushing a memtable we would go back to the main group before going to the cache group again 3) As per Tomek's suggestion, now the submission of compactions themselves are run in the compaction scheduling group. Having that working is what changes this patch the most: we now store the scheduling group in the compaction manager and let the compaction manager itself enforce the scheduling group. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Duarte Nunes	cbbdfde979	sstables/compaction_backlog_tracker: Constify backlog() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180111004914.25796-1-duarte@scylladb.com>	2018-01-11 13:20:57 +02:00
Glauber Costa	ca284174d0	infrastructure for backlog estimator for compaction work. This patch adds infrastucture in various points in the system to allow us to determine the amount of work present as backlog from compactions. What needs to be done can be explained in three major pieces: 1) Add hooks in the points where sstables are added or inserted to a column family (or more precisely, to a compaction_strategy object). 2) Add hooks in reads and write monitors that allows a compaction backlog estimator (tracker) to become aware of bytes that are partially written and compacted away. 3) Add a per-column family class (compaction_backlog_tracker) that can be used to track work that is done and relevant to compactions (like the two above), and a compaction manager to provide a system-wide backlog based on the response of the individual trackers. The definition of how much backlog one has is strategy-specific. The Null strategy is easy, as it never really has any backlog, and so is the major strategy - since what it really matters is the backlog of the underlying compaction strategy. Although backlogs are strategy-specific, they should be "compatible", in the sense that if a particular strategy has more work to do, it should yield a higher number than its counterparts. All the others are presented in this patch as unimplemented: they will always advertise a mild backlog that should yield a constant CPU-utilization if used alone. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	d4109ebb80	compaction: control destruction of readers Compactions run from a seastar::thread, in run(). They will either fail or succeed, and from the point of view of ordering of destruction between the compaction object and its readers: - if compaction succeed, we have no control over who gets destructed first since both objects will be going out of scope. - if they fail, we will forceably destruct the compaction object, at which point the readers are still alive From the point of view of lifetime management, it would be nice to make sure that the compaction object outlives whichever other objects it needs during compaction. This nice to have will become paramount when we start adding read_monitors to the compaction object, that have to, themselves outlive the readers. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:06 -05:00
Raphael S. Carvalho	eff62bc61e	compaction_manager: serialize compaction of same size tier for different cfs Currently, compaction manager will serialize compaction of same size tier (or weight) if they belong to the same column family. However, it fails to do so if the compaction jobs belong to different column families. That can lead to an ungodly amount of running compaction which gets worse the higher the number of shards and active column families. The problem is that it may affect overall system performance due to excessive resource usage. It's easy to trigger it during bootstraping after loading node with new sstables or repairing, or if lots of cfs are being actively written. That being said, compaction jobs of same size tier are now serialized on a given shard, such that maximum number of compaction (system wise) is now: (SHARDS) * (SIZE TIERS) instead of: (SHARDS) * (COLUMN FAMILIES) * (SIZE TIERS) We'll work hard to release a size tier (weight) for a column family waiting on it as fast as possible, given that we wouldn't like to underutilize resources available for compaction. We want one starting after the other. Compaction for a column family that cannot run now because the size tier is taken, will be postponed. There's a worker that will be sleeping on a condition variable that will be signalled whenever a compaction completes. FIFO ordering is used on postponed list for fairness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:42:48 -02:00
Raphael S. Carvalho	49f3cfe746	sstables: improve compact_sstables() interface Motivation is that a new field in the descriptor will be forwarded to compaction procedure without extending parameter list even more. Also beautifies the interface, making it concise and easier to play with. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:22:19 -02:00
Paweł Dziepak	73b3d02cc0	db: make make_range_sstable_reader() return flat reader	2017-12-13 12:01:03 +00:00
Paweł Dziepak	e12959616c	db: make column_family::make_sstable_reader() return a flat reader	2017-12-13 12:01:03 +00:00
Avi Kivity	d934ca55a7	Merge "SSTable resharding fixes" from Raphael "Didn't affect any release. Regression introduced in `301358e`. Fixes #3041" * 'resharding_fix_v4' of github.com:raphaelsc/scylla: tests: add sstable resharding test to test.py tests: fix sstable resharding test sstables: Fix resharding by not filtering out mutation that belongs to other shard db: introduce make_range_sstable_reader rename make_range_sstable_reader to make_local_shard_sstable_reader db: extract sstable reader creation from incremental_reader_selector db: reuse make_range_sstable_reader in make_sstable_reader	2017-12-07 16:42:48 +02:00
Raphael S. Carvalho	bad21ba444	sstables: Fix resharding by not filtering out mutation that belongs to other shard After `301358e`, sstable resharding stopped work because shared sstables would use a filtering reader, which excludes mutation that belong to other shards. That completely breaks which relies on compaction of mutations that belong to different shards. The fix is about using recently introduced non local shard reader. Fixes #3041. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:26 -02:00
Raphael S. Carvalho	d1b146baa6	rename make_range_sstable_reader to make_local_shard_sstable_reader Tomek says: "I think that the least surprising behavior for a function named like this is to read the sstables unfiltered (it just reads them), and the filtering should be indicated specially in the name or by accepting a parameter." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:25 -02:00
Raphael S. Carvalho	809b30c4a2	sstables/compaction: do not actually compact fully expired sstables There's no need to actually compact a sstable which is fully expired and which deletion of all its data will not ressurect older data. For that, a sstable will only be considered fully expired if it doesn't contain data newer than its overlapping counterparts. That way, there could be a false negative, but never a false positive. Currently, a fully expired sstable would unnecessarily waste read bandwidth of disk. This will help a lot time series workloads in which data for a given time window is all deleted at once using TTL. Fixes #2620. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	d2ab154f12	sstables: switch to const ref wherever possible Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	d916c8cdad	sstables: use gc_clock::time_point for gc_before Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	45c11865fa	sstables: change return value type of get_fully_expired_sstables unordered_set will allow us to quickly extract fully expired tables from a set of compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 18:45:55 -02:00
Paweł Dziepak	b64dd21751	sstables: convert compaction to flat_mutation_reader	2017-11-23 18:14:31 +00:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00

1 2 3 4

163 Commits