scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	210a390892	tests: add missing sstable for partition skipping test Commit `7dcd70124a` "tests/sstables: add test for fast forwarding reader" added a test for skipping parts of sstable. Unfortunately, it did not include the sstables it was trying to read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 23:23:49 +01:00
Glauber Costa	1578d7363a	commitlog: rework blocking logic The current incarnation of commitlog establishes a maximum amount of writes that can be in-flight, and blocks new requests after that limit is reached. That is obviously something we must do, but the current approach to it is problematic for two main reasons: 1) It forces the requests that trigger a write to wait on the current write to finish. That is excessive; ideally we would wait for one particular write to finish, not necessarily the current one. That is made worse by the fact that when a write is followed by a flush (happens when we move to a new segment), then we must wait for all writes in that segment to finish. 1) it casts concurrency in terms of writes instead of memory, which makes the aforementioned problem a lot worse: if we have very big buffers in flight and we must wait for them to finish, that can take a long time, often in the order of seconds, causing timeouts. The approach taken by this patch is to replace the _write_semaphore with a request_controller. This data structure will account the amount of memory used by the buffers and set a limit on it. New allocations will be held until we go below that limit, and will be released as soon as this happens. This guarantees that the latencies introduced by this mechanism are spread out a lot better among requests and will keep higher percentile latencies in check. To test this, I have ran a workload that times out frequently. That workload use 10 threads to write 100 partitions (to isolate from the effects of the memtable introduced latencies) in a loop and each partition is 2MB in size. After 10 minutes running this load, we are left with the following percentiles: latency mean : 51.9 [WRITE:51.9] latency median : 9.8 [WRITE:9.8] latency 95th percentile : 125.6 [WRITE:125.6] latency 99th percentile : 1184.0 [WRITE:1184.0] latency 99.9th percentile : 1991.2 [WRITE:1991.2] latency max : 2338.2 [WRITE:2338.2] After this patch: latency mean : 54.9 [WRITE:54.9] latency median : 43.5 [WRITE:43.5] latency 95th percentile : 126.9 [WRITE:126.9] latency 99th percentile : 253.9 [WRITE:253.9] latency 99.9th percentile : 364.6 [WRITE:364.6] latency max : 471.4 [WRITE:471.4] Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:56:36 -04:00
Glauber Costa	aec724bbda	commitlog: factor out code for checking mutation size In a subsequent patch, I'll use this code in a different place. To prepare for that, we move it out as a method. It also fits a lot better inside the segment manager, so move it there. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	a50996f376	commitlog: calculate segment-independent size of mutations Goal is to calculate a size that is lesser or equal than the segment-dependent size. This was originally written by Tomasz, and featured in his submission "commitlog: Handle overload more gracefully" Extracted here so it sits clearly in a different patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	0b7c9fa17f	commitlog: remove _needed_size It is mostly an optimization, and while it makes sense in this context, it won't soon as we'll stop waiting for the current cycle specifically to finish. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	6214bdeb66	commitlog: move segment_manager constructor outside the class definition We'll do that so we can, in following patches, use static members from the segment. Those are not defined at this point. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	299877f432	commitlog: add a counter for pending allocations We track the amount of pending allocations but we don't really export it. It will be crucial when we stop tracking pending writes. This patch exports it through a method instead of the totals structure, so we can easily change it. Current code probing pending_allocations (the api code) is also converted to use the public method instead of the totals struct. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Avi Kivity	07c995ab3d	Merge "Fast forward mutation readers" from Paweł "This patchset enables mutation readers to be fast forwarded to a different partition range. The main reason for introducing such feature are range queries served from cache. If the cache is partially populated in the requested range the reader will end up with multiple subranges that have to be read from the sstables. Originally, each of these subranges would require a new reader to be created, but with fast forwarding we can have just one sstable reader. This is better since there is a chance that buffers kept by the reader may be still useful after fast forwarding it. In this series there are also patches that clean up cache readers in order to make integration with fast forwarding easier. Namely, continuity flag is changed to store information about range before the entry which significantly simplifies the logic. Fixes #1299." * 'pdziepak/fast-forward-mutation-readers/v5' of github.com:cloudius-systems/seastar-dev: (24 commits) sstables: keep separate stream history for single and range reads sstables: drop sstable::{lower, upper}_bound() row_cache: rework cache to use fast forwarding reader row_cache: put cache entry flags in a struct row_cache: add do_find_or_create_entry() to reduce code duplication mutation_reader: forward fast_forward_to() calls tests/row_cache: add fast_forward_to() to throttled reader tests/row_cache: count mutations read from _underlying memtable: add support for fast_forward_to() drop key readers tests/mutation_reader: test fast forwarding combined reader database: enable fast forwarding of range_sstable_reader combined_mutation_reader: implement fast_forward_to() mutation_reader: make combinded_reader public tests/sstables: add test for fast forwarding reader tests: add more helpers to mutation reader assertions sstables: enable fast forwarding for range readers mutation_reader: introduce fast_forward_to() sstables: implement mutation_reader::impl::fast_forward_to() sstables: introduce index_reader ...	2016-10-19 18:10:44 +03:00
Paweł Dziepak	ab0eeae82d	sstables: keep separate stream history for single and range reads Single partition and partition range reads are expected to behave considerably different so it is worth to have them use separate file stream history. This also makes reads use different history for each sstable which is also a good thing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	20bfa1fa52	sstables: drop sstable::{lower, upper}_bound() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5ff699e09f	row_cache: rework cache to use fast forwarding reader This uncomfortably large patch overhauls cache range reader so that it can take advantage of fast forwarding mutation readers. A significant change in the cache itself is that the continuity flag now is used to determine whether cache is contiguous between the previous entry and the current one. This allows for a significant simplification of the cache code and easier integration with reader fast forwarding. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	18acb0c0e6	row_cache: put cache entry flags in a struct Flags are easier to manage if they are in a single structure. Especially, default initialization and move contstructors are simpler and less error prone. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f248e23db5	row_cache: add do_find_or_create_entry() to reduce code duplication Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	bcd374c05d	mutation_reader: forward fast_forward_to() calls Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	0c24bbe639	tests/row_cache: add fast_forward_to() to throttled reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	69645455f3	tests/row_cache: count mutations read from _underlying Originally, cache tests checked how many times a mutation reader was created from the underlying mutation source to determine whether continuity flag is working correctly. This is not going to work with fast forwarding mutation readers so the test is switched to count number of mutations (+ end of stream markers) returned from underlying mutaiton readers which is much less fragile. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	e14f8027d5	memtable: add support for fast_forward_to() Fast forwarding of memtable readers is needed only for unit tests which often use memtables as underlying data source for cache and the cache readers. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5ac9babe97	tests/mutation_reader: test fast forwarding combined reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	7bebfb851f	database: enable fast forwarding of range_sstable_reader When fast forwarding a reader that combines sstable reader we must also remember that the set of sstables for the new range may be different than for the previous one. The reader introduced in this patch makes sure that we read from correct sstables. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	b7b7b2bd63	combined_mutation_reader: implement fast_forward_to() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	2c0cdd55fc	mutation_reader: make combinded_reader public We want to be able to fast forward sstable readers. However, just implementing fast_forward_to() for combined_reader is not enough as the sstables we are reading from may need to change. Following patches are going to introduce a combined sstable reader that derives from combined_reader. To make that possible we first need to make combined_reader public. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	7dcd70124a	tests/sstables: add test for fast forwarding reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5534dc2817	tests: add more helpers to mutation reader assertions Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	cf024975fe	sstables: enable fast forwarding for range readers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	62c9492d33	mutation_reader: introduce fast_forward_to() This patch introduces the interface for fast forwarding mutation readers. The main user of this feature is going to be cache which, while serving range query, may need to read multiple small ranges from the sstables to populate itself with the missing entries. Fast forwarding is an alternative to recreating a reader with different range. Its main advantage is fact that it avoids dropping data that has already been read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	a530762277	sstables: introduce index_reader index_reader is a helper that implements index lookups. Its goal is to avoid dropping read buffers if they still may be needed (for example to get end bound of the range or after fast forwarding the reader). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	0bc873ace5	sstables: add fast_forward_to() to continuous_data_consumer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	2124d08b88	sstables: add skip() to compressed_file_data_source Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	54069162f5	Merge "Add test for partition version list consistency after compaction" from Tomek	2016-10-18 11:03:25 +01:00
Tomasz Grabiec	308434f891	tests: memtable: Add test for partition version list consistency after compaction	2016-10-18 11:57:14 +02:00
Tomasz Grabiec	6548132423	lsa: Make logalloc::tracker::full_compaction() compact all reclaimable regions is_compactible() will pass on very small regions. full_compaction() is only used in tests to force objects to be moved due to compaction, so we want all reclaimable regions to be compacted.	2016-10-18 11:16:08 +02:00
Tomasz Grabiec	ecf85cbffb	mutation: Define + operation It's more convenient to write m1 + m2 in tests than to do more elaborate constructs with copy constructors and apply().	2016-10-18 11:16:08 +02:00
Tomasz Grabiec	fe387f8ba0	partition_version: Fix corruption of partition_version list The move constructor of partition_version was not invoking move constructor of anchorless_list_base_hook. As a result, when partition_version objects were moved, e.g. during LSA compaction, they were unlinked from their lists. This can make readers return invalid data, because not all versions will be reachable. It also casues leaks of the versions which are not directly attached to memtable entry. This will trigger assertion failure in LSA region destructor. This assetion triggers with row cache disabled. With cache enabled (default) all segments are merged into the cache region, which currently is not destroyed on shutdown, so this problem would go unnoticed. With cache disabled, memtable region is destroyed after memtable is flushed and after all readers stop using that memtable. Fixes #1753. Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>	2016-10-18 09:25:38 +01:00
Duarte Nunes	1d45f19c78	create_view_statement: Use cf_properties This patch uses cf_properties instead to add the missing attributes to the create_view_statement class. Fixes #1766 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	7c58b7e764	unimplemented: Add materialized views This patch adds the VIEWS element to the cause enum so we can mark failures due to incomplete support of materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	7c28ed3dfc	schema: Extract default compressor This patch extracts the definition of the default compressor into the compression_parameters class, so that the table and view creation statements don't have to explicitly deal with it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	dc470e6a36	cql3: Extract cf_properties This patch extracts the cf_properties class, which contains common attributes of tables and materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:51 +00:00
Takuya ASADA	587d375e19	main: exit with 1 when verify_seastar_io_scheduler() failed Since we are exiting Scylla process in engine().at_exit() using ::_exit(0), even verify_seastar_io_scheduler() throwing an exception, scylla always exit with 0. Systemd misunderstands scylla-server.service was shutdown successfully because of this, so we need to pass correct exit code to ::_exit() here. Fixes #1674 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>	2016-10-17 13:57:00 +03:00
Avi Kivity	163088c6af	Merge seastar upstream * seastar 207bf3d...ccd8649 (3): > Merge "Augment semaphore with non-blocking operations" from Glauber > Merge "More dynamic fstream patches" from Paweł > Merge "fstream: add dynamic adjustments based on stream history" from Paweł	2016-10-17 12:49:17 +03:00
Avi Kivity	65c27ccf21	bytes_ostream: make max_chunk_size() an inline function Fixes debug build looking for a variable definition and not finding it.	2016-10-17 11:49:33 +03:00
Avi Kivity	c0a1ad0b77	bytes_ostream: use larger allocations A 1MB response will require 2000 allocations with the current 512-byte chunk size. Increase it exponentially to reduce allocation count for larger responses (still respecting the upper limit). Message-Id: <1476369152-1245-1-git-send-email-avi@scylladb.com>	2016-10-16 10:05:48 +01:00
Tomasz Grabiec	d836e8f64b	tests: memtable: Add tests for flushing reader Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:11:06 +01:00
Tomasz Grabiec	63784fd921	db: Fix corruption of partition_entry Memory accounting code was attaching partition_snapshot to partition_entry in order to calculate the size of partition_version object. However, it is only allowed if partition_entry doesn't have any snapshot attached already. In this case it always has one, created by the flushing reader. Change the accounting code to reuse existing partition_snapshot reference. Fixes #1746 Message-Id: <1476449160-9252-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:10:48 +01:00
Paweł Dziepak	d08cffd3c7	lsa: avoid exceptions during segment_zone creation LSA tries to allocate zones as large as possible (while still leaving enough free space for the standard allocator). It uses the amount of free memory in order to guess how much it can get, but that obviously doesn't account for fragmentation and the allocation attempt may fail. This patch changes the LSA code so that it doesn't throw in case zone couldn't be created but just returns a null pointer which should be more performant if the LSA memory cannot grow any more. Fixes #1394. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1476435031-5601-1-git-send-email-pdziepak@scylladb.com>	2016-10-14 11:08:24 +02:00
Amnon Heiman	7829da13b4	scylla_setup: Reorder questions and actions The expected behaviour in the scylla_setup script is that a question will be followed by the answer. For example, after asking if the scylla should be run as a service the relevant actions will be taken before the following question. This patch address two such mis-orders: 1. the scylla-housekeeping depends on the scylla-server, but the setup should first setup the scylla-server service and only then ask (and install if needed) the scylla-housekeeping. 2. The node_exporter should be placed after the io_setup is done. Fixes #1739 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1476370098-25617-1-git-send-email-amnon@scylladb.com>	2016-10-13 18:29:36 +03:00
Pekka Enberg	3b4e6cdc5e	abstract_replication_strategy: Fix exception type if class not found Change abstract_replication_strategy::create_replication_strategy() to throw exceptions::configuration_error if replication strategy class lookup to make sure the error is converted to the correct CQL response. Fixes #1755 Message-Id: <1476361262-28723-1-git-send-email-penberg@scylladb.com>	2016-10-13 17:39:28 +03:00

... 22 23 24 25 26 ...

11716 Commits