scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	8334086441	lcs: remove quadratic behavior from L0 compaction L0 compaction triggers quadratic behavior when many newly created sstables are needed for promotion due to their size being relatively low to max sstable size parameter. So until L0 is worth promoting, the strategy will compact every new sstable with all the existing ones in L0. To fix it, let's do STCS on level 0 until it becomes worth promoting. Fixes #2432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:35 -03:00
Avi Kivity	7b4412c3ce	Revert "Merge "improvements for leveled strategy manifest" from Raphael" This reverts commit `43a3e718e6`, reversing changes made to `3813e94b0a`. It contains some unrelated commits.	2017-07-11 11:12:53 +03:00
Raphael S. Carvalho	28ebe1807f	lcs: remove quadratic behavior from L0 compaction L0 compaction triggers quadratic behavior when many newly created sstables are needed for promotion due to their size being relatively low to max sstable size parameter. So until L0 is worth promoting, the strategy will compact every new sstable with all the existing ones in L0. To fix it, let's do STCS on level 0 until it becomes worth promoting. Fixes #2432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-10 15:42:28 -03:00
Tomasz Grabiec	72e01b7fe8	tests: commitlog: Check there are no segments left on disk after clean shutdown Reproduces #2550. Message-Id: <1499358825-17855-2-git-send-email-tgrabiec@scylladb.com>	2017-07-09 19:25:27 +03:00
Raphael S. Carvalho	7f7758fb6f	tests/sstable: make sstable_expired_data_ratio more robust this change will stress histogram ability to return a good estimation after merging keys such that it doesn't grow beyond size limit. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170708205713.5958-1-raphaelsc@scylladb.com>	2017-07-09 10:33:10 +03:00
Piotr Jastrzebski	a4b6cfe8f0	row_cache: use continuity info in single partition queries If a query requests for a single partition that is inside a range that has already been queried, use the continuity info and don't go to disk when it's not needed. Fixes #2244. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <15bb3b5b03225e7402e3862da53b5e06d3f4fa74.1499345295.git.piotr@scylladb.com>	2017-07-07 10:29:19 +02:00
Piotr Jastrzebski	70f4b23876	row_cache_test: Add test to reproduce issue 2544 This tests checks that cache should use continuity information for single partition queries inside a range that has already been queried. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2ebd03ff5366e554d520f86da8054e0b9eff4178.1499345295.git.piotr@scylladb.com>	2017-07-07 10:29:19 +02:00
Tomasz Grabiec	a5fdff2ac2	row_cache: Add partition_ prefix to current counters In preparation for adding per-row counters.	2017-07-04 13:55:06 +02:00
Asias He	2a794db61b	tests: Add test_selective_token_range_sharder	2017-07-04 18:46:19 +08:00
Nadav Har'El	d95f908586	Fix test to use non-wrapping range The test put a wrapping range into a non-wrapping range variable. This was harmless at the time this test was written, but newer code may not be as forgiving so better use a non-wrapping range as intended. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170704103128.29689-1-nyh@scylladb.com>	2017-07-04 13:36:29 +03:00
Raphael S. Carvalho	b350352e6c	compaction: keep only one variant of size_tiered_most_interesting_bucket two variants of size_tiered_most_interesting_bucket existed to avoid copy, but subsequent work will make lcs use vector for each level of sstables, so let's only keep one variant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-04 03:34:51 -03:00
Avi Kivity	5883e85da3	Merge "improve maintainability of compaction strategies" from Raphael "compaction_strategy.cc keeps the full implementation of size tiered, major, and null strategies, and partial implementation of leveled and date tiered strategies. It's a mess. In the future, we will also need space for time window strategy. The file is hard to read and maintain. My goal here is to improve maintainability of the strategies by putting each of them into its own header. NOTE: No semantic change is introduced here." * 'improve_compaction_strategy_maintainability' of github.com:raphaelsc/scylla: compaction_strategy: move dtcs to its existing header compaction_strategy: move lcs implementation to its own header compaction_strategy: move stcs implementation to its own header compaction_strategy: move compaction_strategy_impl to its own header	2017-07-03 11:39:30 +03:00
Avi Kivity	6895f6e603	sstable_datafile_test: fix sstable_expired_data_ratio failure A comment states that we want the file to be old enough, but sets a timestamp of max(), which is in the future. This may have passed because the conversion from numeric_limits<time_t>::max() to db_clock::time_point is not well defined (their dynamic range is different), so truncation may have converted the large number to a low one. Message-Id: <20170702082903.20879-1-avi@scylladb.com>	2017-07-02 20:22:51 +02:00
Raphael S. Carvalho	69a9ad468c	compaction_strategy: move dtcs to its existing header Goal is to improve maintainability. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-30 03:50:09 -03:00
Raphael S. Carvalho	ab335c8085	tests: more testing for tombstone compaction options Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	ce4dc15a20	tests: basic tombstone compaction test for date tiered Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	c400bf97b9	tests: basic test of tombstone compaction with lcs Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	138fda468f	tests: basic tombstone compaction test for size tiered Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	ad24470972	tests: add test for estimation of droppable tombstone ratio Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	6fb26d9f0c	tests: add streaming_histogram_test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	c01c659594	tests: add test for sstable with bad tombstone histogram Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	7b532867ce	tests: add sstable tombstone histogram test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:28 -03:00
Raphael S. Carvalho	fb9bc609c6	streaming_histogram: do not limit it to be used by sstables streaming histogram will later be placed in /utils, so we want it to use std::unordered_map<> instead of disk_hash<>. That also requires implementing serialization/deserialization functions for it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-27 16:51:52 -03:00
Tomasz Grabiec	eb844a10e9	tests: streamed_mutation_test: Avoid using boost::size() on row ranges Fails to compile with libboost 1.55.	2017-06-27 15:27:13 +02:00
Tomasz Grabiec	e68925595c	tests: row_cache: Remove unused method	2017-06-27 14:10:37 +02:00
Avi Kivity	9b21a9bfb6	Merge "Implement partial cache" from Tomasz and Piotr "This series enables cache to keep partial partitions. Reads no longer have to read whole partition from sstables in order to cache the result. The 10MB threshold for partition size in cache is lifted. Known issues: - There is no partial eviction yet, whole partitions are still evicted, and partition snapshots held by active reads are not evictable at all - Information about range continuity is not recorded if that would require inserting a dummy entry, or if previous entry doesn't belong to the latest snapshot - Cache update after memtable flush happening concurrently with reads may inhibit that reads' ability to populate cache (new issue) - Cache update from flushed memtables has partition granularity, so may cause latency problems with large partition - Schema is still tracked per-partition, so after schema changes reads may induce high latency due to whole partition needing to be converted atomically - Range tombstones are repeated in the stream for every range between cache entries they cover (new issue) - Populating scans for both small and large partitions (perf_fast_forward) experienced a 40% reduction of throughput, CPU bound How was this tested: - test.py --mode release - row_cache_stress_test -c1 -m1G - perf_fast_forward, passes except for the test case checking range continuity population which would require inserting a dummy entry (mentioned above) - perf_simple_query (-c1 -m1G --duration 32): before: 90k [ops/s] stdev: 4k [ops/s] after: 94k [ops/s] stdev: 2k [ops/s]" * tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits) tests: row_cache: Add test_tombstone_merging_in_partial_partition test case tests: Introduce row_cache_stress_test utils: Add helpers for dealing with nonwrapping_range<int> tests: simple_schema: Allow passing the tombstone to make_range_tombstone() tests: simple_schema: Accept value by reference tests: simple_schema: Make add_row() accept optional timestamp tests: simple_schema: Make new_timestamp() public tests: simple_schema: Introduce make_ckeys() tests: simple_schema: Introduce get_value(const clustered_row&) helper tests: simple_schema: Fix comment tests: simple_schema: Add missing include row_cache: Introduce evict() tests: Add cache_streamed_mutation_test tests: mutation_assertions: Allow expecting fragments mutation_fragment: Implement equality check tests: row_cache: Add test for population of random partitions tests: row_cache: Add test for partition tombstone population tests: row_cache: Test reading randomly populated partition tests: row_cache: Add test_single_partition_update() tests: row_cache: Add test_scan_with_partial_partitions ...	2017-06-26 14:54:37 +03:00
Avi Kivity	555621b537	Disentable memtables from sstables Remove sstable::write_components(memtable), replacing it with a helper. Fixes #2354 Message-Id: <20170624142639.16662-1-avi@scylladb.com>	2017-06-26 09:37:11 +02:00
Avi Kivity	236a8370e4	Remove use of std::random_shuffle() It was removed in C++17. Replace with std::shuffle(). Message-Id: <20170626063809.7563-1-avi@scylladb.com>	2017-06-26 09:36:38 +02:00
Tomasz Grabiec	b0bcf2be53	tests: row_cache: Add test_tombstone_merging_in_partial_partition test case	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	23c6f517cb	tests: Introduce row_cache_stress_test Runs readers, updates and eviction concurrently and verifies the following property of reads: - reads see all past writes - reads see no partial writes within a single partition	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5c9f87fb27	tests: simple_schema: Allow passing the tombstone to make_range_tombstone()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	edf4a3494c	tests: simple_schema: Accept value by reference	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5f70df472f	tests: simple_schema: Make add_row() accept optional timestamp	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	53867c4328	tests: simple_schema: Make new_timestamp() public	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	51b5814ec2	tests: simple_schema: Introduce make_ckeys()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	074c67fe4d	tests: simple_schema: Introduce get_value(const clustered_row&) helper	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8ffc776e06	tests: simple_schema: Fix comment	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	ecacd2e84a	tests: simple_schema: Add missing include	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	c4e8effffa	tests: Add cache_streamed_mutation_test [tgrabiec: - extracted from a larger commit - removed coupling with how cache_streamed_mutation is created (the code went out of sync), used more stable make_reader(). it's simpler too. - replaced false/true literals with is_continuous/is_dummy where appropraite - dropped tests for cache::underlying (class is gone) - reused streamed_mutation_assertions, it has better error messages - fixed the tests to not create tombstones with missing timestamps - relaxed range tombstone assertions to only check information relevant for the query range - print cache on failure for improved debuggability ]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	44fdee3f2e	tests: mutation_assertions: Allow expecting fragments	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	116bcb8b30	tests: row_cache: Add test for population of random partitions	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	930a1415fe	tests: row_cache: Add test for partition tombstone population	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	9bfece6f82	tests: row_cache: Test reading randomly populated partition	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	0358334579	tests: row_cache: Add test_single_partition_update() [tgrabiec: Extracted from "row_cache: Introduce cache_streamed_mutation"]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8bb76e2f12	tests: row_cache: Add test_scan_with_partial_partitions	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5a0ae55f6d	Introduce schema_upgrader	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	7ae40d7045	tests: Add test for update_invalidating()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	fb62dfab02	tests: mvcc: Introduce test_schema_upgrade_preserves_continuity	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	164989a574	tests: mvcc: Add test for partition_entry::apply_to_incomplete()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	bbfa52822e	row_cache: Switch readers to use per-entry snapshots Currently readers are always using the latest snapshot. This is fine for respecting write atomicity if partitions are fully continuous in cache (now), but will break write atomicity once partial population is allowed. Consider the following case: flush write(ck=1), write(ck=2) -> snapshot_1 cache reader 1 reads and inserts ck=1 @snapshot_1 flush write(ck=1), write(ck=2) -> snapshot_2 cache reader 2 reads and inserts ck=2 @snapshot_2 Because cache update is not atomic, it can happen that reader 2 will complete while the partition hasn't been updated yet for snapshot_2. In such case, after read 2 the partition would contain ck=1 from snapshot_1 and ck=2 from snapshot_2. It will match neither of the snapshots, and this could violate write atomicity. To solve this problem we conceptually assign each partition key in the ring to its current snapshot which it reflects. The update process gradually converts entries in ring order to the new snapshot. Reads will not be using the latest snapshot, but rather the current snapshot for the position in the ring they are at. There is a race between the update process and populating reads. Since after the update all entries must reflect the new snapshot, reads using the old snapshot cannot be allowed to insert data which can no longer be reached by the update process. Before this patch this race was prevented by the use of a phased_barrier, where readers would keep phased_barrier::operation alive between starting a read of a partition and inserting it into cache. Cache update was waiting for all prior operations before starting the update. Any later read which was not waited for would use the latest snapshot for reads, so the update process didn't have to fix anything up for such reads. After this change, later reads cannot always use the latest snapshot, they have to use the snapshot corresponding to given entry. So it's not enough for update() to wait for prior reads in order to prevent stale populations. The (simple) solution implemented in this patch is to detect the conflict and abandon population of given sub-range. In general, reads are allowed to populate given range only if it belongs to a single snapshot. Note that the range here is not the whole query range. For population of continuity, it is the range starting after the previous key and ending after the key being inserted. When populating a partition entry, the range is a singular range containing only the partition key. Readers switch to new snapshots automatically as they move across the ring. It's possible that the insertion of the partition doesn't conflict, but continuity does. In such case the entry will be inserted but continuity will not be set.	2017-06-24 18:06:11 +02:00

1 2 3 4 5 ...

1514 Commits