scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	5857a4756d	Merge "Some fixes for performance regressions in perf_fast_forward" from Paweł These patches contain some minor fixes for performance regression reported by perf_fast_forward after partial cache was merged. The solution is still far from perfect, there is one case that still has 30% degradation, but there is some improvement so there is no reason to hold these changes back. Refs #2582. Some numbers: before - before cache changes were merged (`555621b537`) cache - at the commit that introduced the partial cache (`9b21a9bfb6`) after - recent master + this series (based on `e988121dbb`) Differences are shown relative to "before". Testing effectiveness of caching of large partition, single-key slicing reads: Large partitions, range [0, 500000], populating cache before cache after 1636840 1013688 1234606 -38% -25% Large partitions, range [0, 500000], reading from cache before cache after 2012615 3076812 3035423 +53% +51% Testing scanning small partitions with skips. reading small partitions (skip 0) before cache after 227060 165261 200639 -27% -11% skipping small partitions (skip 1) before cache after 29813 27312 38210 -8% +28% Testing slicing small partitions: slicing small partitions (offset 0, read 4096) before cache after 195282 149695 180497 -23% -8% * https://github.com/pdziepak/scylla.git perf_fast_forward-regression/v3: sstables: make sure that fill_buffer() actually fills buffer mutation_merger: improve handling of non-deferring fill_buffer()s partition_snapshot_row_cursor: avoid apply() in single-version cases sstables: introduce decorated_key_view ring_position_comparator: accept sstables::decorated_key_view sstable: keep a pre-computed token in summary_entry sstables: cache token in index entries index_reader: advance_and_check_if_present() use index_comparator ring_position_comparator: drop unused overloads cache_streamed_mutation: avoid moving clustering_row streamed_mutation: introduce consume_mutation_fragments_until() cache_streamed_mutation: use consumer based read_context reader rows_entry: make position() inlineable mutation_fragment: make destructor always_inline keys: introduce compound_wrapper::from_exploded_view() sstables: avoid copying key components compound_compat: explode: reserve some elements in a vector cache: short-circut static row logic if there are no static columns cache: use equality comparators instead of tri_compare sstables: avoid indirect calls to abstract_type::is_multi_cell() (cherry picked from commit `e9fc0b0491`)	2017-07-27 13:58:23 +02:00
Duarte Nunes	804793e291	tests/schema_change_test: Add test case for add+drop notification Reproduces #2616 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170725170622.4380-2-duarte@scylladb.com> (cherry picked from commit `472f32fb06`)	2017-07-26 16:26:59 +02:00
Paweł Dziepak	345a91d55d	tests/row_cache: test queries with no clustering ranges Reproducer for #2604. Message-Id: <20170725131220.17467-3-pdziepak@scylladb.com> (cherry picked from commit `79a1ad7a37`)	2017-07-25 15:37:32 +02:00
Paweł Dziepak	fda8b35cda	tests: do not overload the meaning of empty clustering range Empty clustering key range is perfectly valid and signifies that the reader is not interested in anything but the static row. Let's not make it mean anything else. Message-Id: <20170725131220.17467-2-pdziepak@scylladb.com> (cherry picked from commit `1ea507d6ae`)	2017-07-25 15:37:29 +02:00
Avi Kivity	e1a3052e76	tests: fix sstable_datafile_test build with boost 1.55 Boost 1.55 accidentally removed support for "range for" on recursive_directory_iterator (previous and latter versions do support it). Use old-style iteration instead. Message-Id: <20170724080128.8824-1-avi@scylladb.com> (cherry picked from commit `c21bb5ae05`)	2017-07-24 11:20:53 +03:00
Duarte Nunes	83cc640c6a	Merge 'Revert back to 1.7 schema layout in memory' from Tomasz "Fixes schema layout incompatibility in a mixed 1.7 and 2.0 cluster (#2555) by reverting back to using the old layout in memory and thus also in across-node requests. We still use the new v3 layout in schema tables (needed by drivers and external tools). Translations happen when converting to/from schema mutations." * tag 'tgrabiec/use-v2-schema-layout-in-memory-v2' of github.com:scylladb/seastar-dev: schema: Revert back to the 1.7 layout of static compact tables in memory schema: Use v3 column layout when converting to/from schema mutations schema: Encapsulate column layout translations in the v3_columns class (cherry picked from commit `1daf1bc4bb`)	2017-07-19 19:49:43 +03:00
Avi Kivity	a7dfdc0155	tests: move tmpdir to /tmp Reduces view_schema_test runtime to 5 seconds, from 53 seconds on an NVMe disk with write-back cache, and forever on a spinning disk. Message-Id: <20170716081653.10018-1-avi@scylladb.com> (cherry picked from commit `d9c64ef737`)	2017-07-17 08:47:17 +03:00
Avi Kivity	70be29173a	tests: copy the sstable with an unknown component to the data directory We will be creating links to those sstable's files, and those don't work if the data directory and the test sstable are on different devices. Copying the files to the same directory fixes the problem. Message-Id: <20170716090405.14307-1-avi@scylladb.com> (cherry picked from commit `9116dd91cb`)	2017-07-17 08:47:08 +03:00
Avi Kivity	00b9640b2c	Merge "Preserve table schema digest on schema tables migration" from Tomasz "Currently new nodes calculate digests based on v3 schema mutations, which are very different from v2 mutations. As a result they will use schemas with different table_schema_version that the old nodes. The old nodes will not recognize the version and will try to request its definition. That will fail, because old nodes don't understand v3 schema mutations. To fix this problem, let's preserve the digests during migration, so that they're the same on new and old nodes. This will allow requests to proceed as usual. This does not solve the problem of schema being changed during the rolling upgrade. This is not allowed, as it would bring the same problem back. Fixes #2549." * tag 'tgrabiec/use-consistent-schema-table-digests-v2' of github.com:cloudius-systems/seastar-dev: tests: Add test for concurrent column addition legacy_schema_migrator: Set digest to one compatible with the old nodes schema_tables: Persist table_schema_version schema_tables: Introduce system_schema.scylla_tables schema_tables: Simplify read_table_mutations() schema_tables: Resurrect v2 read_table_mutations() system_keyspace: Forward-declare legacy schemas legacy_schema_migrator: Take storage_proxy as dependency (cherry picked from commit `a397889c81`)	2017-07-11 17:23:21 +03:00
Tomasz Grabiec	de7cb7bfa4	tests: commitlog: Check there are no segments left on disk after clean shutdown Reproduces #2550. Message-Id: <1499358825-17855-2-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `72e01b7fe8`)	2017-07-09 19:25:44 +03:00
Avi Kivity	c9ed522fa8	Merge "Adjust row cache metrics for row granularity" from Tomasz * tag 'tgrabiec/row-cache-metrics-v2' of github.com:cloudius-systems/seastar-dev: row_cache: Switch _stats.hits/misses to row granularity row_cache: Rename num_entries() to partitions() for clarity row_cache: Track mispopulations also at row level row_cache: Track row insertions row_cache: Track row hits and misses row_cache: Make mispopulation counter also apply for continuity information row_cache: Add partition_ prefix to current counters misc_services: Switch to using reads_with[_no]_misses counters row_cache: Add metrics for operations on underlying reader row_cache: Add reader-related metrics row_cache: Remove dead code (cherry picked from commit `b1a0e37fcb`)	2017-07-04 15:21:00 +03:00
Avi Kivity	7893a3aad2	Merge "Use selective_token_range_sharder in repair" from Asias "This series introduces selective_token_range_sharder and uses it in repair to generate dht::token_range belongs to a specific shard." * tag 'asias/repair-selective_token_range_sharder-v3' of github.com:cloudius-systems/seastar-dev: repair: Use selective_token_range_sharder tests: Add test_selective_token_range_sharder dht: Add selective_token_range_sharder (cherry picked from commit `66e56511d6`)	2017-07-04 14:18:08 +03:00
Nadav Har'El	e467eef58d	Fix test to use non-wrapping range The test put a wrapping range into a non-wrapping range variable. This was harmless at the time this test was written, but newer code may not be as forgiving so better use a non-wrapping range as intended. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170704103128.29689-1-nyh@scylladb.com> (cherry picked from commit `d95f908586`)	2017-07-04 14:18:01 +03:00
Avi Kivity	3de701dbe1	Merge "Fix compilation issues in older environments" from Tomasz * 'tgrabiec/fix-compilation-issues' of github.com:cloudius-systems/seastar-dev: tests: streamed_mutation_test: Avoid using boost::size() on row ranges tests: row_cache: Remove unused method (cherry picked from commit `ff7be8241f`)	2017-06-27 16:31:42 +03:00
Avi Kivity	9b21a9bfb6	Merge "Implement partial cache" from Tomasz and Piotr "This series enables cache to keep partial partitions. Reads no longer have to read whole partition from sstables in order to cache the result. The 10MB threshold for partition size in cache is lifted. Known issues: - There is no partial eviction yet, whole partitions are still evicted, and partition snapshots held by active reads are not evictable at all - Information about range continuity is not recorded if that would require inserting a dummy entry, or if previous entry doesn't belong to the latest snapshot - Cache update after memtable flush happening concurrently with reads may inhibit that reads' ability to populate cache (new issue) - Cache update from flushed memtables has partition granularity, so may cause latency problems with large partition - Schema is still tracked per-partition, so after schema changes reads may induce high latency due to whole partition needing to be converted atomically - Range tombstones are repeated in the stream for every range between cache entries they cover (new issue) - Populating scans for both small and large partitions (perf_fast_forward) experienced a 40% reduction of throughput, CPU bound How was this tested: - test.py --mode release - row_cache_stress_test -c1 -m1G - perf_fast_forward, passes except for the test case checking range continuity population which would require inserting a dummy entry (mentioned above) - perf_simple_query (-c1 -m1G --duration 32): before: 90k [ops/s] stdev: 4k [ops/s] after: 94k [ops/s] stdev: 2k [ops/s]" * tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits) tests: row_cache: Add test_tombstone_merging_in_partial_partition test case tests: Introduce row_cache_stress_test utils: Add helpers for dealing with nonwrapping_range<int> tests: simple_schema: Allow passing the tombstone to make_range_tombstone() tests: simple_schema: Accept value by reference tests: simple_schema: Make add_row() accept optional timestamp tests: simple_schema: Make new_timestamp() public tests: simple_schema: Introduce make_ckeys() tests: simple_schema: Introduce get_value(const clustered_row&) helper tests: simple_schema: Fix comment tests: simple_schema: Add missing include row_cache: Introduce evict() tests: Add cache_streamed_mutation_test tests: mutation_assertions: Allow expecting fragments mutation_fragment: Implement equality check tests: row_cache: Add test for population of random partitions tests: row_cache: Add test for partition tombstone population tests: row_cache: Test reading randomly populated partition tests: row_cache: Add test_single_partition_update() tests: row_cache: Add test_scan_with_partial_partitions ...	2017-06-26 14:54:37 +03:00
Avi Kivity	555621b537	Disentable memtables from sstables Remove sstable::write_components(memtable), replacing it with a helper. Fixes #2354 Message-Id: <20170624142639.16662-1-avi@scylladb.com>	2017-06-26 09:37:11 +02:00
Avi Kivity	236a8370e4	Remove use of std::random_shuffle() It was removed in C++17. Replace with std::shuffle(). Message-Id: <20170626063809.7563-1-avi@scylladb.com>	2017-06-26 09:36:38 +02:00
Tomasz Grabiec	b0bcf2be53	tests: row_cache: Add test_tombstone_merging_in_partial_partition test case	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	23c6f517cb	tests: Introduce row_cache_stress_test Runs readers, updates and eviction concurrently and verifies the following property of reads: - reads see all past writes - reads see no partial writes within a single partition	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5c9f87fb27	tests: simple_schema: Allow passing the tombstone to make_range_tombstone()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	edf4a3494c	tests: simple_schema: Accept value by reference	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5f70df472f	tests: simple_schema: Make add_row() accept optional timestamp	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	53867c4328	tests: simple_schema: Make new_timestamp() public	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	51b5814ec2	tests: simple_schema: Introduce make_ckeys()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	074c67fe4d	tests: simple_schema: Introduce get_value(const clustered_row&) helper	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8ffc776e06	tests: simple_schema: Fix comment	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	ecacd2e84a	tests: simple_schema: Add missing include	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	c4e8effffa	tests: Add cache_streamed_mutation_test [tgrabiec: - extracted from a larger commit - removed coupling with how cache_streamed_mutation is created (the code went out of sync), used more stable make_reader(). it's simpler too. - replaced false/true literals with is_continuous/is_dummy where appropraite - dropped tests for cache::underlying (class is gone) - reused streamed_mutation_assertions, it has better error messages - fixed the tests to not create tombstones with missing timestamps - relaxed range tombstone assertions to only check information relevant for the query range - print cache on failure for improved debuggability ]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	44fdee3f2e	tests: mutation_assertions: Allow expecting fragments	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	116bcb8b30	tests: row_cache: Add test for population of random partitions	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	930a1415fe	tests: row_cache: Add test for partition tombstone population	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	9bfece6f82	tests: row_cache: Test reading randomly populated partition	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	0358334579	tests: row_cache: Add test_single_partition_update() [tgrabiec: Extracted from "row_cache: Introduce cache_streamed_mutation"]	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8bb76e2f12	tests: row_cache: Add test_scan_with_partial_partitions	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	5a0ae55f6d	Introduce schema_upgrader	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	7ae40d7045	tests: Add test for update_invalidating()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	fb62dfab02	tests: mvcc: Introduce test_schema_upgrade_preserves_continuity	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	164989a574	tests: mvcc: Add test for partition_entry::apply_to_incomplete()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	bbfa52822e	row_cache: Switch readers to use per-entry snapshots Currently readers are always using the latest snapshot. This is fine for respecting write atomicity if partitions are fully continuous in cache (now), but will break write atomicity once partial population is allowed. Consider the following case: flush write(ck=1), write(ck=2) -> snapshot_1 cache reader 1 reads and inserts ck=1 @snapshot_1 flush write(ck=1), write(ck=2) -> snapshot_2 cache reader 2 reads and inserts ck=2 @snapshot_2 Because cache update is not atomic, it can happen that reader 2 will complete while the partition hasn't been updated yet for snapshot_2. In such case, after read 2 the partition would contain ck=1 from snapshot_1 and ck=2 from snapshot_2. It will match neither of the snapshots, and this could violate write atomicity. To solve this problem we conceptually assign each partition key in the ring to its current snapshot which it reflects. The update process gradually converts entries in ring order to the new snapshot. Reads will not be using the latest snapshot, but rather the current snapshot for the position in the ring they are at. There is a race between the update process and populating reads. Since after the update all entries must reflect the new snapshot, reads using the old snapshot cannot be allowed to insert data which can no longer be reached by the update process. Before this patch this race was prevented by the use of a phased_barrier, where readers would keep phased_barrier::operation alive between starting a read of a partition and inserting it into cache. Cache update was waiting for all prior operations before starting the update. Any later read which was not waited for would use the latest snapshot for reads, so the update process didn't have to fix anything up for such reads. After this change, later reads cannot always use the latest snapshot, they have to use the snapshot corresponding to given entry. So it's not enough for update() to wait for prior reads in order to prevent stale populations. The (simple) solution implemented in this patch is to detect the conflict and abandon population of given sub-range. In general, reads are allowed to populate given range only if it belongs to a single snapshot. Note that the range here is not the whole query range. For population of continuity, it is the range starting after the previous key and ending after the key being inserted. When populating a partition entry, the range is a singular range containing only the partition key. Readers switch to new snapshots automatically as they move across the ring. It's possible that the insertion of the partition doesn't conflict, but continuity does. In such case the entry will be inserted but continuity will not be set.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8ba6366610	row_cache: Switch to using snapshot_source Currently every time cache needs to create reader for missing data it obtains a reader which is most up to date. That reader includes writes from later populate phases, for which update() was not yet called. This will be problematic once we allow partitions to be partially populated, because different parts of the partition could be partially populated using readers using different sets of writes, and break write atomicity. The solution will be to always populate given partition using the same set of writes, using reader created from the current snapshot. The snapshot changes only on update(), with update() gradually converting each partition to the new snapshot.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	e23c7e2f34	row_cache: Rework invalidate() implementation 1) Reduce duplication by delegating to more general overloads 2) Improve documentation to not mention effects in terms of population (detail) but rather write visibiliy 3) Rename clear() to invalidate() and merge with the range variant, it has the same semantics	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	bd023b6161	tests: Introduce memtable_snapshot_source Snapshottable in-memory mutation source for use in row_cache tests.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	7f8620d4a7	tests: mutation_source: Relax expectations about range tombstones In preparation for having partial cache which trims range tombstones to the lower bound of the query.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	3a9212e0f2	tests: mutation_assertions: Add ability to limit verification to given clustering_row_ranges Currently mutation sources are free to return range tombstones covering range which is larger than the query range. The cache mutation source will soon become more eager about trimming such tombstones. To cover up for such differences, allow telling the restrictions to only care about differences relevant for given clustering ranges.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	f925b26241	tests: mutation_reader_assertions: Simplify	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	9380dd1ee3	mutation_source: make sure we never ignore fast forwarding mutation source sometimes ignore fast forwarding parameter so this change adds assertion to check that this parameter can be safely ignored. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	ac03331490	row_cache_test: improve test_sliced_read_row_presence Remove unused parameter and add checks to make sure all expected rows have been received. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	db053ef902	tests: Add test for continuity merging rules	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	2edf08d36a	tests: random_mutation_generator: Generate random continuity	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	8873a443db	tests: mutation: Generate mutations with continuity	2017-06-24 18:06:11 +02:00

1 2 3 4 5 ...

1503 Commits