scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	548f6066c5	tests: add test for sstable set's incremental selector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:17 -02:00
Tomasz Grabiec	3511bf4a81	Merge branch 'tgrabiec/memtable-gentle-clearing' from seastar-dev.git When row cache is disabled, update_cache() will do nothing to the memtable. Active readers may keep the memtable alive for unbounded amount of time, preventing it from going away. This doesn't play well with virtual dirty accounting. Soon before calling update_cache(), the memory which was subtracted during flush is added back to the amount of virtual dirty memory. If there was write pressure all along, we will be at the dirty memory limit. When we give back subtracted memory this will put virtual dirty way above the limit. This will stall all writes until another memtable flush drags virtual dirty down or readers finally release the memtable. We want to prevent upward jumps of virtual dirty. First part of the fix is to ensure that as long as the memtable's region is in the dirty group, we will not revert flushed memory. This must happen synchronously from region's memory being removed from the group in order to prevent upward virtual dirty jumps. To make this easier, tracking of flushed memory was moved to the memtable object. Another part of the fix is to gradually clear the memtable when cache is disabled in a similar fashion as when it's moved to cache. This ensures that the actual memory held by memtable's region is released sooner than it dies. Refs #1879	2016-12-08 12:18:32 +01:00
Tomasz Grabiec	c3768fe4de	memtable: Pass dirty_memory_manager& to memtable constructor The implementation assumes that memtable's region group is owned by dirty_memory_manager, and tries to obtain a reference to it like this: boost::intrusive::get_parent_from_member(_region.group(), &dirty_memory_manager::_region_group)); This is undefined behavior when the region's group does not come from dirty manager. It's safer to be explicit about this dependency by taking a reference to dirty_memory_manager in the constructor.	2016-12-05 12:59:09 +01:00
Asias He	00d7a35949	utils: Put crc32 under utils namespace It conflicts with crc in zlib Message-Id: <1480918984-4117-2-git-send-email-asias@scylladb.com>	2016-12-05 11:48:29 +02:00
Asias He	86c2620b7a	gossip: Skip stopping if it is not started If exception is triggered early in boot when doing an I/O operation, scylla will fail because io checker calls storage service to stop transport services, and not all of them were initialized yet. Scylla was failing as follow: scylla: ./seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local() [with Service = gms::gossiper]: Assertion `local_is_initialized()' failed. Aborting on shard 0. Backtrace: 0x000000000048a2ca 0x000000000048a3d3 0x00007fc279e739ff 0x00007fc279ad6a27 0x00007fc279ad8629 0x00007fc279acf226 0x00007fc279acf2d1 0x0000000000c145f8 0x000000000110d1bc 0x000000000041bacd 0x00000000005520f1 0x00007fc279aeaf1f Aborted (core dumped) Refs #883. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Asias He <asias@scylladb.com> Message-Id: <963f7b0f5a7a8a1405728b414a7d7a6dccd70581.1479172124.git.asias@scylladb.com>	2016-12-05 09:42:37 +02:00
Tomasz Grabiec	c35e18ba12	tests: Fix use-after-free on commitlog Only shutdown() ensures all internal processes are complete. Call it before calling clear(). Message-Id: <1480495534-2253-1-git-send-email-tgrabiec@scylladb.com>	2016-11-30 11:03:26 +02:00
Raphael S. Carvalho	a16425833c	size_tiered: do not recreate bucket when it goes beyond max threshold Problem will cause size tiered to return small jobs when there are more than max_threshold sstables of similar size. For example, if max_threshold is 32, and there are 36 sstables of similar size, strategy will only return 4 sstables to be compacted. That's because we incorrectly create a new bucket when it meets the max threshold. What we should do is to allow buckets to grow beyond max threshold and trim them when selecting the most suitable one for compaction. Important to mention that estimation for size tiered will now work better when there are more than max_threshold sstables of similar size. Fixes #1901. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <080bad70d6cb86eaf52ac1bdd6765ac47aab5b03.1478316140.git.raphaelsc@scylladb.com>	2016-11-29 16:56:02 +02:00
Avi Kivity	28857e42e7	Merge " Virtualize size_estimates system table" from Duarte "We currently write the size_estimates system table for every schema on a periodic basis, currently set to 5 minutes, which can interfere with an ongoing workload. This patchset virtualizes it such that queries are intercepted and we calculate the results on the fly, only for the ranges the caller is interested in. Fixes #1616" * 'virtual-estimates/v4' of github.com:duarten/scylla: size_estimates_virtual_reader: Add unit test db: Delete size_estimates_recorder size_estimates: Add virtual reader column_family: Add support for virtual readers storage_service: get_local_tokens() returns a future nonwrapping_range: Add slice() function range: Find a sequence's lower and upper bounds system_keyspace: Build mutations for size estimates size_estimates: Store the token range as bytes range_estimates: Add schema murmur3_partitioner: Convert maximum_token to sstring	2016-11-28 10:12:59 +02:00
Paweł Dziepak	919825a2c7	Merge "Improve sharding in large clusters" from Avi "Clusters with a large number of nodes, or a low number of vnodes, and a high number of shards, or a combination, suffer from an aliasing problem: both vnodes and intra-node sharding consider the most significant bits to select the owning node and owning shard respectively. Since the same bits are used for both, a low number of vnodes leads to some shards being overcommitted relative to others. This series fixes the problem by sharding on bits 0:47 of the token (murmur3 partitioner only), leaving the most significant 12 bits for vnodes. Simulation shows that this value provides reasonable sharding for 100-node, 30-shard clusters. In order to prevent re-sharding sstables on each boot, token ranges for the range are stored in a new sub-component of the sstable Statistics component. With the default 12 ignored bits we have 4096 token ranges for non-Level-compacted SSTables, which takes some space but is still reasonable. Fixes #1277."	2016-11-23 11:25:53 +00:00
Avi Kivity	af16c0fac4	murmur3_partitioner: shard on the middle token bits, not most significant bits Sharding on the most significant token bits aliases with the vnode mechanism, which also uses the most significant bits; this requires a huge number of vnodes to achieve good sharding. This patch teaches the murmur3 partitioner to ignore the most significant N bits when calculating a token's hard, so we use token bits which still have some entropy. In effect, with changes the token range layout from shard 0 shard 1 ... shard S-1 to shard 0 shard 1 ... shard S-1 shard 0 shard 1 ... shard S-1 ... shard 0 shard 1 ... shard S-1 Where the number of repetitions of the block is 2^(ignored msb bits). For compatibility, the default is zero ignored bits, matching the pre-patch state, until we wire things up.	2016-11-22 21:56:42 +02:00
Duarte Nunes	def2bc72b0	size_estimates_virtual_reader: Add unit test Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:05 +00:00
Gleb Natapov	9222a47fed	sstable test: add test for generated summary data Message-Id: <20161117155051.GV6765@scylladb.com>	2016-11-20 19:50:45 +02:00
Paweł Dziepak	b8d737ff0a	tests/row_cache_test: verify that eviction follows lru Refs #1847. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1479231555-28191-1-git-send-email-pdziepak@scylladb.com>	2016-11-15 18:57:54 +01:00
Duarte Nunes	e680587b8a	sstable_test: Be explicit about uncompressed tables After 7c28ed, the schemas defined in the test became compressed by default. This patch changes the test so that it is explicit about which schemas shouldn't define a compressor. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1478646530-5558-1-git-send-email-duarte@scylladb.com>	2016-11-09 11:21:59 +02:00
Piotr Jastrzebski	50b41f7d1d	Fix row_cache_test partition_range passed to row_cache::make_reader has to be kept alive as long as the resulting reader is used. Otherwise weird things start to happen. This used to work just because of a pure luck. When I started changing the row_cache implementation I run into very weird behaviors for this tests. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2c9e337dbbcf35f4e1394cad043eda10b8c2bd4a.1478602876.git.piotr@scylladb.com>	2016-11-08 13:28:53 +01:00
Calle Wilund	11baf37ab5	commitlog: Prevent exceptions in stream::produce from being set twice Fixes #1775 stream lacks a check "is_open", which is a bummer. We have to both prevent exception propagation and add a flag of our own to make sure exceptions in producer code reaches consumer, and does not simply get lost in the reactor. Message-Id: <1478508817-18854-1-git-send-email-calle@scylladb.com>	2016-11-07 11:41:33 +01:00
Paweł Dziepak	985d2f6d4a	Merge "Remove quadratic behavior from atomic sstable deletion" from Avi "The atomic sstable deletion provides exception safety at the cost of quadratic behavior in the number of sstables awaiting deletion. This causes high cpu utilization during startup. Change the code to avoid quadratic complexity, and add some unit tests. See #1812."	2016-11-04 15:48:04 +00:00
Avi Kivity	f75aceabc5	sstables: add unit tests for atomic deletion We simulate shards deleting sstables, but this is all happening on a single core, and no sstables are harmed during test execution.	2016-11-04 15:48:43 +02:00
Avi Kivity	1d77e3a03a	partitioner: add unit tests for token_for_next_shard() i_partitioner::token_for_next_shard() is an inverse for i_partitioner::shard_of(), test that this is so.	2016-11-03 19:10:20 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Raphael S. Carvalho	53b7b7def3	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>	2016-11-02 12:44:53 +02:00
Avi Kivity	7faf2eed2f	build: support for linking statically with boost Remove assumptions in the build system about dynamically linked boost unit tests. Includes seastar update which would have otherwise broken the build.	2016-10-26 08:51:21 +03:00
Piotr Jastrzebski	27726cecff	Clean up position_in_partition. Introduce position_in_partition_view and use it in position() method in mutation_fragment, range_tombstone, static_row and clustering_row. Clean up comparators in position_in_partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c65293c71a6aa23cf930ed317fb63df1fdc34fd1.1477399763.git.piotr@scylladb.com>	2016-10-25 15:13:20 +01:00
Paweł Dziepak	210a390892	tests: add missing sstable for partition skipping test Commit `7dcd70124a` "tests/sstables: add test for fast forwarding reader" added a test for skipping parts of sstable. Unfortunately, it did not include the sstables it was trying to read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 23:23:49 +01:00
Paweł Dziepak	0c24bbe639	tests/row_cache: add fast_forward_to() to throttled reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	69645455f3	tests/row_cache: count mutations read from _underlying Originally, cache tests checked how many times a mutation reader was created from the underlying mutation source to determine whether continuity flag is working correctly. This is not going to work with fast forwarding mutation readers so the test is switched to count number of mutations (+ end of stream markers) returned from underlying mutaiton readers which is much less fragile. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5ac9babe97	tests/mutation_reader: test fast forwarding combined reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	7dcd70124a	tests/sstables: add test for fast forwarding reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5534dc2817	tests: add more helpers to mutation reader assertions Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	cf024975fe	sstables: enable fast forwarding for range readers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Tomasz Grabiec	308434f891	tests: memtable: Add test for partition version list consistency after compaction	2016-10-18 11:57:14 +02:00
Tomasz Grabiec	d836e8f64b	tests: memtable: Add tests for flushing reader Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:11:06 +01:00
Avi Kivity	9ac441d3b5	range: adjust split_after to allow split_point outside input range Make split_after() more generic by allowing split_point to be anywhere, not just within the input range. If the split_point is before, the entire range is returned; and if it is after, stdx::nullopt is returned. "before" and "after" are not well defined for wrap-around ranges, so but we are phasing them out and soon there will not be wrapping_range::split_after() users. This is a prerequisite for converting partition_range and friends to nonwrapping_range. Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>	2016-10-06 17:54:44 +02:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Avi Kivity	f8118d9fc2	Merge "Virtual dirty memory management" from Glauber "Description: ============ Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that, is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Results ======= With this patchset running a load big enough to easily saturate the disk, (commitlog disabled to highlight the effects of the memtable writer), I am able to run scylla for many minutes, with timeouts occurring only when I run out of disk space, whereas without this patch a swarm of timeouts would start merely 2 seconds after the load started - and would never get stable. In V2, I have sent a set of graphs illustrating the performance of this solution. This version does not have any significant differences in that front. For details, please refer to https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ Accuracy of the accounting: --------------------------- It is important for us to be as accurate as possible when accounting freed memory, since every byte we mark as freed may allow one or more requests to be executed. I have measured the accuracy of this approach (ignoring padding, object size for the mutation fragments) to be 99.83 % of used memory in the test workload I have ran (large, 65k mutations). Memtables under this circumnstance tend to have a very high occupancy ratio because throttle breeds idle, and idle breeds compact-on-idle. Known Issues: ------------- A lot of time can be elapsed between destroying the flush_reader and actually releasing memory. The release of memory only happens when the SSTable is fully sealed, and we have to flush the files, as well as finish writing all SSTable components at this point. This happened in practice with a buggy kernel that would result in flushes taking a long time. After that is fixed, this is just a theoretical problem and in practice it shouldn't matter given the time we expect those operations to take." * 'virtual-dirty-v6' of github.com:glommer/scylla: database: allow virtual dirty memory management streamed_mutation: make _buffer private add accounting of memory read to partition_snapshot_reader move partition_snapshot_reader code to header file LSA: allow a group to query its own region group memtables: split scanning reader in two sstables: use special reader for writing a memtable LSA: export information about object memory footprint LSA: export information about size of the throttle queue database: export virtual dirty bytes region group	2016-10-04 20:57:52 +03:00
Glauber Costa	28e3f2f6ee	LSA: export information about object memory footprint We allocate objects of a certain size, but we use a bit more memory to hold them. To get a clerer picture about how much memory will an object cost us, we need help from the allocator. This patch exports an interface that allow users to query into a specific allocator to get that information. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Avi Kivity	a51804eca8	Merge "token_restriction: Deal with minimum tokens" from Duarte "This patch set ensures we can correctly handle queries where the minimum token is specified." * 'min-token/v3' of github.com:duarten/scylla: cql_query_test: Add test case for min/max token bounds token_restriction: Deal with minimum tokens partitioner: Parse token from bytes	2016-10-02 12:32:40 +03:00
Raphael S. Carvalho	a8ab4b8f37	lcs: fix starvation at higher levels When max sstable size is increased, higher levels are suffering from starvation because we decide to compact a given level if the following calculation results in a number greater than 1.001: level_size(L) / max_size_for_level_l(L) Fixes #1720. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-30 14:09:49 -03:00
Raphael S. Carvalho	a3bf7558f2	lcs: fix broken token range distribution at higher levels Uniform token range distribution across sstables in a level > 1 was broken, because we were only choosing sstable with lowest first key, when compacting a level > 0. This resulted in performance problem because L1->L2 may have a huge overlap over time, for example. Last compacted key will now be stored for each level to ensure sort of "round robin" selection of sstables for compactions at level >= 1. That's also done by C*, and they were once affected by it as described in https://issues.apache.org/jira/browse/CASSANDRA-6284. Fixes #1719. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-30 14:09:16 -03:00
Duarte Nunes	72af476397	cql_query_test: Add test case for min/max token bounds This patch adds a test case for specifying the minimum and maximum tokens in a cql3 query. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-30 11:45:45 +00:00
Raphael S. Carvalho	0eaa0f46c9	sstables: store first and last decorated keys in sstable object leveled strategy uses heavily first and last decorated keys of a sstable to get overlapping sstables in a given level. By storing first and last decorated keys in sstable object, it's expected that performance of leveled strategy (not compaction) will be improved. We will set first and last keys in sstable when either loading or sealing it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Raphael S. Carvalho	dffb41f9d8	sstables: remove schema parameter from some sstable methods schema can now be found in the sstable object itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Tomasz Grabiec	2282599394	tests: Add test for UUID type ordering Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>	2016-09-16 11:07:14 +01:00
Gleb Natapov	2e8b255741	Merge seastar upstream * seastar 0303e0c...e534401 (6): > Merge "enable rpc to work on non contiguous memory for receive" from Gleb > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py > fix tcp stuck when output_stream write more than 212992 bytes once. > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask > native_network_stack: Fix use after free/missing wait in dhcp Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.	2016-09-15 12:12:16 +03:00
Raphael S. Carvalho	2a426ab248	tests: add test to check tombstone metadata Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:35 -03:00
Raphael S. Carvalho	026853fabb	tests: add test to check composite validity Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:30 -03:00
Raphael S. Carvalho	1f31223f32	sstables: store schema in sstable object That will be needed for optimization that will store decorated keys in the sstable object, and also for a subsequent work that will detect wrong metadata (min/max column names) by looking at columns in the schema. As schema is stored in sstable, there's no longer a need to store ks and cf names in it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:17 -03:00

1 2 3 4 5 ...

1227 Commits