scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	70e54cfe6e	compaction/lcs: add support for tombstone compaction LCS will choose its candidate by starting from highest level and getting sstable which has highest droppable tombstone ratio. Unlike STCS which needs to choose oldest sstable from biggest tier, LCS can choose the one with highest d__t__r because sstables in a given level don't overlap. Sstable picked up for tombstone removal compaction won't be demoted or promoted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	138fda468f	tests: basic tombstone compaction test for size tiered Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	8fd80ac22c	compaction/stcs: add support for tombstone compaction Larger sstables are hard to find sstable peers and therefore are left uncompacted for a long time. Expired data and tombstones which can be purged will waste disk space meanwhile. sstable tracks droppable tombstone from which ratio can be calculated. If ratio is greater than threshold (0.2 by default), sstable will be eligible for compaction. Oldest sstables from biggest tiers are preferrable because droppable data in them are more likely to satisfy the conditions for purge, like not shadowing data in another sstable. Subsequent patches will add support in leveled and date tiered strategies. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	ad24470972	tests: add test for estimation of droppable tombstone ratio Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	eb6d17b748	sstables: introduce function to estimate droppable tombstone ratio Function used to estimate ratio of droppable tombstone. A tombstone is considered droppable for cells expired before gc_before and regular tombstones older than gc_before. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	0d21129cc7	compaction_manager: periodically submit cfs for compaction This is useful for a column family which isn't generating new content and will have lots of expired data later on that can be purged. Compaction submission is NO-OP if there's nothing to do, so I think it's reasonable to do it at an interval of 1 hour. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:03 -03:00
Raphael S. Carvalho	719dbf547d	streaming_histogram: fix coding style Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	6fb26d9f0c	tests: add streaming_histogram_test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	a65b9eb8b4	streaming_histogram: implement sum This function is used to estimate number of points in interval [-inf,b]. It will be useful for estimating droppable tombstone ratio in a given sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	c01c659594	tests: add test for sstable with bad tombstone histogram Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	06fabf9810	sstables: discard bad streaming histogram for future use Find bad histogram which had incorrect elements merged due to use of unordered map. The keys will be unordered. Histogram which size is less than max allowed will be correct because no entries needed to be merged, so we can avoid discarding those. This is important because histogram for tombstone will be used to estimate droppable tombstone ratio. If it's incorrectly high for many of existing sstables, we will needlessly compact lots of them. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:10 -03:00
Raphael S. Carvalho	7b532867ce	tests: add sstable tombstone histogram test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:28 -03:00
Raphael S. Carvalho	f35bd66da4	streaming_histogram: fix update This bug was introduced when converting java code. Return value of map::erase() was used as if it were the value of the removed entry, but it's actually the number of removed entries. update() also relies on ordered keys, so map is used instead by histogram. In addition, histograms will be written in sorted order (like C* does) such that we can detect bad histograms, using disk_array. disk_array is also used from now on to read histograms. The conversion from array to map is fine because histograms for sstables are limited to 100 elements. Coming patch will detect bad histograms (generated only by us) and discard them, because we can't rely on their information. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:26 -03:00
Raphael S. Carvalho	d90f46000d	streaming_histogram: move it to utils It's not specific to sstables. May be needed somewhere else in the future. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-28 01:07:13 -03:00
Raphael S. Carvalho	fb9bc609c6	streaming_histogram: do not limit it to be used by sstables streaming histogram will later be placed in /utils, so we want it to use std::unordered_map<> instead of disk_hash<>. That also requires implementing serialization/deserialization functions for it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-27 16:51:52 -03:00
Raphael S. Carvalho	e224653d70	sstables: update tombstone_histogram for cells with expiration time That tombstone_histogram is used to determine droppable data ratio for a sstable, and unlike C*, we were only updating it for tombstones. We need to update it with expiration time of cells too, if any. Creation time (expiration - ttl) cannot be used because if ttl > gc_grace_seconds, the resulting sstable could be considered worth dropping by tomstone compaction before any data is actually expired. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-27 16:50:38 -03:00
Avi Kivity	08488a75e0	dist: tolerate sysctl failures sysctl may fail in a container environment if /proc is not virtualized properly. Fixes #1990 Message-Id: <20170625145930.31619-1-avi@scylladb.com>	2017-06-27 16:11:48 +02:00
Avi Kivity	ff7be8241f	Merge "Fix compilation issues in older environments" from Tomasz * 'tgrabiec/fix-compilation-issues' of github.com:cloudius-systems/seastar-dev: tests: streamed_mutation_test: Avoid using boost::size() on row ranges tests: row_cache: Remove unused method	2017-06-27 16:30:54 +03:00
Tomasz Grabiec	eb844a10e9	tests: streamed_mutation_test: Avoid using boost::size() on row ranges Fails to compile with libboost 1.55.	2017-06-27 15:27:13 +02:00
Tomasz Grabiec	e68925595c	tests: row_cache: Remove unused method	2017-06-27 14:10:37 +02:00
Vlad Zolotarov	6839a50677	db::commitlog: entry_writer add a virtual destructor Add a virtual destructor for a base class commitlog::entry_writer. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1498511180-18391-1-git-send-email-vladz@scylladb.com>	2017-06-27 10:17:10 +03:00
Takuya ASADA	1e86196ed5	dist/debian: unofficial support of Ubuntu non-LTS versions / Debian non-stable versions Currently our build script only supports Ubuntu 14.04/16.04 and Debian 8, this change extends support to Ubuntu non-LTS versions / Debian non-stable versions. Note that this is unofficial support, users should build the package for these distributions theirselves. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1498491473-28691-1-git-send-email-syuu@scylladb.com>	2017-06-26 18:55:55 +03:00
Asias He	cc02a62756	repair: Prefer nodes in local dc when streaming When peer nodes have the same partition data, i.e., with the same checksum, we currently choose to stream from any of them randomly. To improve streaming performance, select the peer within the same DC. This patch is supposed to improve repair perforamnce with multiple DC. Message-Id: <c6a345b6e8ed2b59f485e53c865241e463b44507.1498490831.git.asias@scylladb.com>	2017-06-26 18:34:21 +03:00
Avi Kivity	1170f56447	Merge "Speed up gossip dissemination in large cluster" from Asias Fixes #2528. * tag 'asias/gossip_talk_to_more_nodes/v3' of github.com:cloudius-systems/seastar-dev: gossip: Use vector for _live_endpoints gossip: Talk to more live nodes in each gossip round	2017-06-26 17:59:43 +03:00
Asias He	e31d4a3940	gossip: Use vector for _live_endpoints To speed up the random access in get_random_node. Switch to use vector instead of set.	2017-06-26 22:49:59 +08:00
Asias He	437899909d	gossip: Talk to more live nodes in each gossip round In large clusters with multiple DC deployment, it is observed that it takes long delay for gossip update to disseminate in the cluster. To speed up, talk to more live nodes in each gossip round. Fixes #2528	2017-06-26 22:49:59 +08:00
Nadav Har'El	6cf44f6817	Optimize column_family::make_sstable_reader() for one partition This patch does the same thing to column_family::make_sstable_reader() as commit `186f031` did to sstable::as_mutation_source(). Although usually one can fast_forward_to() on the result of a column_family::make_sstable_reader(), earlier we had an optimization where if a single partition was specified, it was read exactly, and fast_forward_to() was NOT allowed. With the mutation_reader::forwarding flag patch, when this flag was on - requesting fast_forward_to() - we disabled this optimization. This makes sense, but is not backward compatible with the code which previously assumes this optimization exists. In particular, column_family::data_query() does a single partition read but does not specify forwarding::no explicitly. So this patch returns this optimization, despite this meaning that we blatently ignore the fwd_mr flag in that case. Fixes #2524. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170626141121.30322-1-nyh@scylladb.com>	2017-06-26 17:13:03 +03:00
Avi Kivity	9b21a9bfb6	Merge "Implement partial cache" from Tomasz and Piotr "This series enables cache to keep partial partitions. Reads no longer have to read whole partition from sstables in order to cache the result. The 10MB threshold for partition size in cache is lifted. Known issues: - There is no partial eviction yet, whole partitions are still evicted, and partition snapshots held by active reads are not evictable at all - Information about range continuity is not recorded if that would require inserting a dummy entry, or if previous entry doesn't belong to the latest snapshot - Cache update after memtable flush happening concurrently with reads may inhibit that reads' ability to populate cache (new issue) - Cache update from flushed memtables has partition granularity, so may cause latency problems with large partition - Schema is still tracked per-partition, so after schema changes reads may induce high latency due to whole partition needing to be converted atomically - Range tombstones are repeated in the stream for every range between cache entries they cover (new issue) - Populating scans for both small and large partitions (perf_fast_forward) experienced a 40% reduction of throughput, CPU bound How was this tested: - test.py --mode release - row_cache_stress_test -c1 -m1G - perf_fast_forward, passes except for the test case checking range continuity population which would require inserting a dummy entry (mentioned above) - perf_simple_query (-c1 -m1G --duration 32): before: 90k [ops/s] stdev: 4k [ops/s] after: 94k [ops/s] stdev: 2k [ops/s]" * tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits) tests: row_cache: Add test_tombstone_merging_in_partial_partition test case tests: Introduce row_cache_stress_test utils: Add helpers for dealing with nonwrapping_range<int> tests: simple_schema: Allow passing the tombstone to make_range_tombstone() tests: simple_schema: Accept value by reference tests: simple_schema: Make add_row() accept optional timestamp tests: simple_schema: Make new_timestamp() public tests: simple_schema: Introduce make_ckeys() tests: simple_schema: Introduce get_value(const clustered_row&) helper tests: simple_schema: Fix comment tests: simple_schema: Add missing include row_cache: Introduce evict() tests: Add cache_streamed_mutation_test tests: mutation_assertions: Allow expecting fragments mutation_fragment: Implement equality check tests: row_cache: Add test for population of random partitions tests: row_cache: Add test for partition tombstone population tests: row_cache: Test reading randomly populated partition tests: row_cache: Add test_single_partition_update() tests: row_cache: Add test_scan_with_partial_partitions ...	2017-06-26 14:54:37 +03:00
Avi Kivity	555621b537	Disentable memtables from sstables Remove sstable::write_components(memtable), replacing it with a helper. Fixes #2354 Message-Id: <20170624142639.16662-1-avi@scylladb.com>	2017-06-26 09:37:11 +02:00
Avi Kivity	236a8370e4	Remove use of std::random_shuffle() It was removed in C++17. Replace with std::shuffle(). Message-Id: <20170626063809.7563-1-avi@scylladb.com>	2017-06-26 09:36:38 +02:00
Avi Kivity	c4ae2206c7	messaging: respect inter_dc_tcp_nodelay configuration parameter We respect it partially (client side only) for now. Fixes #6. Message-Id: <20170623172048.23103-1-avi@scylladb.com>	2017-06-24 21:49:27 +02:00
Duarte Nunes	2dfd7040eb	CMakeLists.txt: Add boost support Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170623172236.15507-1-duarte@scylladb.com>	2017-06-24 21:49:27 +02:00
Avi Kivity	801b5220d6	Merge seastar upstream * seastar 9e2b7ec...0ab7ae5 (4): > Update fmt submodule > rpc: add options to control tcp_nodelay > core: Fix compilation for older versions of Boost > tests/lowres_clock_test: Fix compilation issues	2017-06-24 20:47:52 +03:00
Tomasz Grabiec	b0bcf2be53	tests: row_cache: Add test_tombstone_merging_in_partial_partition test case	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	23c6f517cb	tests: Introduce row_cache_stress_test Runs readers, updates and eviction concurrently and verifies the following property of reads: - reads see all past writes - reads see no partial writes within a single partition	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	4b4aef789e	utils: Add helpers for dealing with nonwrapping_range<int>	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5c9f87fb27	tests: simple_schema: Allow passing the tombstone to make_range_tombstone()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	edf4a3494c	tests: simple_schema: Accept value by reference	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5f70df472f	tests: simple_schema: Make add_row() accept optional timestamp	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	53867c4328	tests: simple_schema: Make new_timestamp() public	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	51b5814ec2	tests: simple_schema: Introduce make_ckeys()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	074c67fe4d	tests: simple_schema: Introduce get_value(const clustered_row&) helper	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8ffc776e06	tests: simple_schema: Fix comment	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	ecacd2e84a	tests: simple_schema: Add missing include	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	b56232b216	row_cache: Introduce evict()	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	c4e8effffa	tests: Add cache_streamed_mutation_test [tgrabiec: - extracted from a larger commit - removed coupling with how cache_streamed_mutation is created (the code went out of sync), used more stable make_reader(). it's simpler too. - replaced false/true literals with is_continuous/is_dummy where appropraite - dropped tests for cache::underlying (class is gone) - reused streamed_mutation_assertions, it has better error messages - fixed the tests to not create tombstones with missing timestamps - relaxed range tombstone assertions to only check information relevant for the query range - print cache on failure for improved debuggability ]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	44fdee3f2e	tests: mutation_assertions: Allow expecting fragments	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	1f23130b07	mutation_fragment: Implement equality check	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	116bcb8b30	tests: row_cache: Add test for population of random partitions	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	930a1415fe	tests: row_cache: Add test for partition tombstone population	2017-06-24 18:06:11 +02:00

1 2 3 4 5 ...

12410 Commits