scylladb

Author	SHA1	Message	Date
Avi Kivity	f70ece9f88	tests: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Tomasz Grabiec	9a0548397c	tests: row_cache: Add test for eviction from invalidated partitions Message-Id: <1531933216-28026-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 21:06:36 +03:00
Tomasz Grabiec	1de5177175	tests: row_cache: Fix use-after-scope on partition_range passed to readers The partition_range must outlive the reader. Message-Id: <1531301583-15476-1-git-send-email-tgrabiec@scylladb.com>	2018-07-11 12:39:30 +03:00
Tomasz Grabiec	a91974af7a	tests: row_cache: Reduce concurrency limit to avoid bad_alloc The test uses random mutations. We saw it failing with bad_alloc from time to time. Reduce concurrency to reduce memory footprint. Message-Id: <20180611090304.16681-1-tgrabiec@scylladb.com>	2018-06-11 10:06:56 +01:00
Tomasz Grabiec	9975135110	row_cache: Make sure reader makes forward progress after each fill_buffer() If reader's buffer is small enough, or preemption happens often enough, fill_buffer() may not make enough progress to advance _lower_bound. If also iteartors are constantly invalidated across fill_buffer() calls, the reader will not be able to make progress. See row_cache_test.cc::test_reading_progress_with_small_buffer_and_invalidation() for an examplary scenario. Also reproduced in debug-mode row_cache_test.cc::test_concurrent_reads_and_eviction Message-Id: <1528283957-16696-1-git-send-email-tgrabiec@scylladb.com>	2018-06-06 16:01:52 +03:00
Avi Kivity	aab6b0ee27	Merge "Introduce new in-memory representation for cells" from Paweł " This is the first part of the first step of switching Scylla. It covers converting cells to the new serialisation format. The actual structure of the cells doesn't differ much from the original one with a notable exception of the fact that large values are now fragmented and linearisation needs to be explicit. Counters and collections still partially rely on their old, custom serialisation code and their handling is not optimial (although not significantly worse than it used to be). The new in-memory representation allows objects to be of varying size and makes it possible to provide deserialisation context so that we don't need to keep in each instance of an IMR type all the information needed to interpret it. The structure of IMR types is described in C++ using some metaprogramming with the hopes of making it much easier to modify the serialisation format that it would be in case of open-coded serialisation functions. Moreover, IMR types can own memory thanks to a limited support for destructors and movers (the latter are not exactly the same thing as C++ move constructors hence a different name). This makes it (relatively) to ensure that there is an upper bound on the size of all allocations. For now the only thing that is converted to the IMR are atomic_cells and collections which means that the reduction in the memory footprint is not as big as it can be, but introducing the IMR is a big step on its own and also paves the way towards complete elimination of unbounded memory allocations. The first part of this patchset contains miscellaneous preparatory changes to various parts of the Scylla codebase. They are followed by introduction of the IMR infrastructure. Then structure of cells is defined and all helper functions are implemented. Next are several treewide patches that mostly deal with propagating type information to the cell-related operations. Finally, atomic_cell and collections are switched to used the new IMR-based cell implementation. The IMR is described in much more detail in imr/IMR.md added in "imr: add IMR documentation". Refs #2031. Refs #2409. perf_simple_query -c4, medians of 30 results: ./perf_base ./perf_imr diff read 308790.08 309775.35 0.3% write 402127.32 417729.18 3.9% The same with 1 byte values: ./perf_base1 ./perf_imr1 diff read 314107.26 314648.96 0.2% write 463801.40 433255.96 -6.6% The memory footprint is reduced, but that is partially due to removal of small buffer optimisation (whether it will be restored depends on the exact mesurements of the performance impact). Generally, this series was not expected to make a huge difference as this would require converting whole rows to the IMR. Memory footprint: Before: mutation footprint: - in cache: 1264 - in memtable: 986 After: mutation footprint: - in cache: 1104 - in memtable: 866 Tests: unit (release, debug) " * tag 'imr-cells/v3' of https://github.com/pdziepak/scylla: (37 commits) tests/mutation: add test for changing column type atomic_cell: switch to new IMR-based cell reperesentation atomic_cell: explicitly state when atomic_cell is a collection member treewide: require type for creating collection_mutation_view treewide: require type for comparing cells atomic_cell: introduce fragmented buffer value interface treewide: require type to compute cell memory usage treewide: require type to copy atomic_cell treewide: require type info for copying atomic_cell_or_collection treewide: require type for creating atomic_cell atomic_cell: require column_definition for creating atomic_cell views tests: test imr representation of cells types: provide information for IMR data: introduce cell data: introduce type_info imr/utils: add imr object holder imr: introduce concepts imr: add helper for allocating objects imr: allow creating lsa migrators for IMR objects imr: introduce placeholders ...	2018-05-31 19:21:15 +03:00
Tomasz Grabiec	b5e42bc6a0	tests: row_cache: Do not hang when only one of the readers throws Message-Id: <20180531122729.3314-1-tgrabiec@scylladb.com>	2018-05-31 18:00:22 +03:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Tomasz Grabiec	f6e21accc7	tests: cache: Take into account that update() may defer The test incorrectly assumed that once update() is started the cache will return only versions from last_generation. This will not hold once we start to defer during partition merging.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	f0c1edd672	cache: Destroy partition versions incrementally Instead of destroying whole partition_versions at once, we will do that gently using mutation_cleaner to avoid reactor stalls. Large deletions could happen when large partition gets invalidated, upgraded to a new schema, or when it's abandaned by a detached snapshot. Refs #3289.	2018-05-30 14:41:40 +02:00
Avi Kivity	7161244130	Merge seastar upstream * seastar 70aecca...ac02df7 (5): > Merge "Prefix preprocessor definitions" from Jesse > cmake: Do not enable warnings transitively > posix: prevent unused variable warning > build: Adjust DPDK options to fix compilation > io_scheduler: adjust property names DEBUG, DEFAULT_ALLOCATOR, and HAVE_LZ4_COMPRESS_DEFAULT macro references prefixed with SEASTAR_. Some may need to become Scylla macros.	2018-04-29 11:03:21 +03:00
Tomasz Grabiec	180a877db3	tests: cache: Add tests for row-level eviction	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	9fab5068c6	tests: cache: Check that data is evictable after schema change	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	f0e0c79a70	tests: cache: Move definitions to the top	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	f2bdac2874	tests: cache: Do not depend on particular granularity of eviction	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	c306c1050e	tests: cache: Make sure readers touch rows in test_eviction() With row-level eviction just creating a reader won't necessarily update the LRU.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	fb2107416b	tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction In hope of catching more issues.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	bd1e730053	tests: cache: Add test for merging and reading randomly populated versions	2018-03-06 11:32:09 +01:00
Tomasz Grabiec	1b959cb6e9	tests: cache: Take parameters by const&	2018-03-06 11:32:09 +01:00
Tomasz Grabiec	d9f0c1f097	tests: cache: Fix invalidate() not being waited for Probably responsible for occasional failures of subsequent assertion. Didn't mange to reproduce. Message-Id: <1520330967-584-1-git-send-email-tgrabiec@scylladb.com>	2018-03-06 12:14:04 +02:00
Tomasz Grabiec	9c3e56fb16	tests: row_cache: Improve test for snapshot consistency on eviction Reproduces https://github.com/scylladb/scylla/issues/3215. Message-Id: <1518710592-21925-1-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:23 +00:00
Tomasz Grabiec	b3415880b2	tests: row_cache: Add test for exception safety of updates from memtable	2018-02-15 10:13:02 +01:00
Avi Kivity	404172652e	Merge "Use xxHash for digest instead of MD5" from Duarte "This series changes digest calculation to use a faster algorithm (xxHash) and to also cache calculated cell hashes that can be kept in memory to speed up subsequent digest requests. The MD5 hash function has proved to be slow for large cell values: size = 256; elapsed = 4us size = 512; elapsed = 8us size = 1024; elapsed = 14us size = 2048; elapsed = 21us size = 4096; elapsed = 33us size = 8192; elapsed = 51us size = 16384; elapsed = 86us size = 32768; elapsed = 150us size = 65536; elapsed = 278us size = 131072; elapsed = 531us size = 262144; elapsed = 1032us size = 524288; elapsed = 2026us size = 1048576; elapsed = 4004us size = 2097152; elapsed = 7943us size = 4194304; elapsed = 15800us size = 8388608; elapsed = 31731us size = 16777216; elapsed = 64681us size = 33554432; elapsed = 130752us size = 67108864; elapsed = 263154us The xxHash is a non-cryptographic, 64bit (there's work in progress on the 128 version) hash that can be used to replace MD5. It performs much better: size = 256; elapsed = 2us size = 512; elapsed = 1us size = 1024; elapsed = 1us size = 2048; elapsed = 2us size = 4096; elapsed = 2us size = 8192; elapsed = 3us size = 16384; elapsed = 5us size = 32768; elapsed = 8us size = 65536; elapsed = 14us size = 131072; elapsed = 28us size = 262144; elapsed = 59us size = 524288; elapsed = 116us size = 1048576; elapsed = 226us size = 2097152; elapsed = 456us size = 4194304; elapsed = 935us size = 8388608; elapsed = 1848us size = 16777216; elapsed = 4723us size = 33554432; elapsed = 10507us size = 67108864; elapsed = 21622us Performance was tested using a 3 node cluster with 1 cpu and 8GB, and with the following cassandra-stress loaders. Measurements are for the read workload. sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 32699 [READ:32699] partition rate : 32699 [READ:32699] row rate : 32699 [READ:32699] latency mean : 3.0 [READ:3.0] latency median : 3.0 [READ:3.0] latency 95th percentile : 3.9 [READ:3.9] latency 99th percentile : 4.5 [READ:4.5] latency 99.9th percentile : 6.6 [READ:6.6] latency max : 24.0 [READ:24.0] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:05 END md5: Results: op rate : 25241 [READ:25241] partition rate : 25241 [READ:25241] row rate : 25241 [READ:25241] latency mean : 3.9 [READ:3.9] latency median : 3.9 [READ:3.9] latency 95th percentile : 5.1 [READ:5.1] latency 99th percentile : 5.8 [READ:5.8] latency 99.9th percentile : 8.0 [READ:8.0] latency max : 24.8 [READ:24.8] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:06:36 END This translates into a 21% improvoment for this workload. Bigger cell values were also tested: sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 19964 [READ:19964] partition rate : 19964 [READ:19964] row rate : 19964 [READ:19964] latency mean : 4.9 [READ:4.9] latency median : 4.6 [READ:4.6] latency 95th percentile : 7.2 [READ:7.2] latency 99th percentile : 11.5 [READ:11.5] latency 99.9th percentile : 13.6 [READ:13.6] latency max : 29.2 [READ:29.2] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:08:20 END md5: Results: op rate : 12773 [READ:12773] partition rate : 12773 [READ:12773] row rate : 12773 [READ:12773] latency mean : 7.7 [READ:7.7] latency median : 7.3 [READ:7.3] latency 95th percentile : 10.2 [READ:10.2] latency 99th percentile : 16.8 [READ:16.8] latency 99.9th percentile : 19.2 [READ:19.2] latency max : 71.5 [READ:71.5] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:13:02 END This translates into a 37% improvoment for this workload. Fixes #2884 Tests: unit-tests (release), dtests (smp=2) Note: dtests are kinda broken in master (> 30 failures), so take the tests tag with a grain of himalayan salt." * 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits) tests/row_cache_test: Test hash caching tests/memtable_test: Test hash caching tests/mutation_test: Use xxHash instead of MD5 for some tests tests/mutation_test: Test xx_hasher alongside md5_hasher schema: Remove unneeded include service/storage_proxy: Enable hash caching service/storage_service: Add and use xxhash feature message/messaging_service: Specify algorithm when requesting digest storage_proxy: Extract decision about digest algorithm to use cache_flat_mutation_reader: Pre-calculate cell hash partition_snapshot_reader: Pre-calculate cell hash query::partition_slice: Add option to specify when digest is requested row: Use cached hash for hash calculation mutation_partition: Replace hash_row_slice with appending_hash mutation_partition: Allow caching cell hashes mutation_partition: Force vector_storage internal storage size test.py: Increase memory for row_cache_stress_test atomic_cell_hash: Add specialization for atomic_cell_or_collection query-result: Use digester instead of md5_hasher range_tombstone: Replace feed_hash() member function with appending_hash ...	2018-02-08 18:24:58 +02:00
Tomasz Grabiec	c1b82e60e3	tests: row_cache: Add test for memtable readers surviving flush and eviction Reproduces https://github.com/scylladb/scylla/issues/3186	2018-02-06 14:24:19 +01:00
Duarte Nunes	992de302a2	tests/row_cache_test: Test hash caching Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Piotr Jastrzebski	7729bc5e7b	Remove unused mutation_reader_assertions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	39ec13133f	row_cache: rename make_flat_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	0d76091a28	test_mvcc: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	425c1624cd	test_cache_population_and_clear_race: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	dc97acb778	test_cache_population_and_update_race: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	1bead9747a	test_continuity_flag_and_invalidate_race: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	4266b9759e	test_update_failure: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	d5366026b1	row_cache_test: use flat reader in verify_has Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	56b0157831	row_cache_test: use flat reader in has_key Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	06bca9f4d5	test_sliced_read_row_presence: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	6c3d9cdb9f	test_lru: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	a979869a15	test_update_invalidating: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	781d9a324d	test_scan_with_partial_partitions: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	f199aab1ad	test_cache_populates_partition_tombstone: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	9755f7677c	test_tombstone_merging_in_partial_partition: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	2e1b12b6ce	consume_all,populate_range: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	d08f4a40b2	test_readers_get_all_data_after_eviction: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	f99992261f	test_tombstones_are_not_missed_when_range_is_invalidated: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	50fb2a57b6	test_exception_safety_of_reads: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	f0af5a1321	test_exception_safety_of_transitioning_from_underlying_read_to_read_from_cache: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	98b97be19a	test_exception_safety_of_partition_scan: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	5010c082f6	test_concurrent_population_before_latest_version_iterator: use flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:44 +01:00

1 2 3 4

175 Commits