scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Duarte Nunes	1d45f19c78	create_view_statement: Use cf_properties This patch uses cf_properties instead to add the missing attributes to the create_view_statement class. Fixes #1766 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	7c58b7e764	unimplemented: Add materialized views This patch adds the VIEWS element to the cause enum so we can mark failures due to incomplete support of materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	7c28ed3dfc	schema: Extract default compressor This patch extracts the definition of the default compressor into the compression_parameters class, so that the table and view creation statements don't have to explicitly deal with it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Duarte Nunes	dc470e6a36	cql3: Extract cf_properties This patch extracts the cf_properties class, which contains common attributes of tables and materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:51 +00:00
Takuya ASADA	587d375e19	main: exit with 1 when verify_seastar_io_scheduler() failed Since we are exiting Scylla process in engine().at_exit() using ::_exit(0), even verify_seastar_io_scheduler() throwing an exception, scylla always exit with 0. Systemd misunderstands scylla-server.service was shutdown successfully because of this, so we need to pass correct exit code to ::_exit() here. Fixes #1674 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>	2016-10-17 13:57:00 +03:00
Avi Kivity	163088c6af	Merge seastar upstream * seastar 207bf3d...ccd8649 (3): > Merge "Augment semaphore with non-blocking operations" from Glauber > Merge "More dynamic fstream patches" from Paweł > Merge "fstream: add dynamic adjustments based on stream history" from Paweł	2016-10-17 12:49:17 +03:00
Avi Kivity	65c27ccf21	bytes_ostream: make max_chunk_size() an inline function Fixes debug build looking for a variable definition and not finding it.	2016-10-17 11:49:33 +03:00
Avi Kivity	c0a1ad0b77	bytes_ostream: use larger allocations A 1MB response will require 2000 allocations with the current 512-byte chunk size. Increase it exponentially to reduce allocation count for larger responses (still respecting the upper limit). Message-Id: <1476369152-1245-1-git-send-email-avi@scylladb.com>	2016-10-16 10:05:48 +01:00
Tomasz Grabiec	d836e8f64b	tests: memtable: Add tests for flushing reader Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:11:06 +01:00
Tomasz Grabiec	63784fd921	db: Fix corruption of partition_entry Memory accounting code was attaching partition_snapshot to partition_entry in order to calculate the size of partition_version object. However, it is only allowed if partition_entry doesn't have any snapshot attached already. In this case it always has one, created by the flushing reader. Change the accounting code to reuse existing partition_snapshot reference. Fixes #1746 Message-Id: <1476449160-9252-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:10:48 +01:00
Paweł Dziepak	d08cffd3c7	lsa: avoid exceptions during segment_zone creation LSA tries to allocate zones as large as possible (while still leaving enough free space for the standard allocator). It uses the amount of free memory in order to guess how much it can get, but that obviously doesn't account for fragmentation and the allocation attempt may fail. This patch changes the LSA code so that it doesn't throw in case zone couldn't be created but just returns a null pointer which should be more performant if the LSA memory cannot grow any more. Fixes #1394. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1476435031-5601-1-git-send-email-pdziepak@scylladb.com>	2016-10-14 11:08:24 +02:00
Amnon Heiman	7829da13b4	scylla_setup: Reorder questions and actions The expected behaviour in the scylla_setup script is that a question will be followed by the answer. For example, after asking if the scylla should be run as a service the relevant actions will be taken before the following question. This patch address two such mis-orders: 1. the scylla-housekeeping depends on the scylla-server, but the setup should first setup the scylla-server service and only then ask (and install if needed) the scylla-housekeeping. 2. The node_exporter should be placed after the io_setup is done. Fixes #1739 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1476370098-25617-1-git-send-email-amnon@scylladb.com>	2016-10-13 18:29:36 +03:00
Pekka Enberg	3b4e6cdc5e	abstract_replication_strategy: Fix exception type if class not found Change abstract_replication_strategy::create_replication_strategy() to throw exceptions::configuration_error if replication strategy class lookup to make sure the error is converted to the correct CQL response. Fixes #1755 Message-Id: <1476361262-28723-1-git-send-email-penberg@scylladb.com>	2016-10-13 17:39:28 +03:00
Tomasz Grabiec	e617bcd8a7	logalloc: disable abort on allocation failure in places in which it is benign Some places start big expecting allocation failure, then reduce the requested size. Let's not abort in such cases. Message-Id: <1476295120-32047-1-git-send-email-tgrabiec@scylladb.com>	2016-10-13 10:53:32 +03:00
Avi Kivity	13e9d4c8e3	Merge seastar upstream * seastar f937fb0...207bf3d (11): > Merge "iotune: gracefully exit on predictable exceptions" (Fixes #1623) > core/semaphore: Add semaphore_units::release() > Merge "rometheus API with grafana uses labels" from Amnon > core/thread: Fix stack alloc-dealloc mismatch > core/thread: Make jmp_buf_link::yield_at use the same time point as thread_scheduling_group > file: support for XFS on older kernels > reactor: fix bug when handling EBADF in flush_pending_aio() > prometheus CPU should start in 0 > Collectd: bytes ordering depends on the type > tests: Check that backtrace() doesn't corrupt signal mask > core/thread: Add stack guards to seastar thread stacks	2016-10-12 23:47:12 +03:00
Avi Kivity	63f053e9b7	storage_proxy: fix mutation reordering with wrapping ranges If we have a range query involving a wrapping range (i.e., from thrift), and mutations from both halves of the result are involved, then we will return the results in the wrong order (and potentially the wrong partitions) since we order by token, so the results from the second half of the wrapping range end up before the first. Fix by splitting the two queries, and merging the second half with lower priority compared to the first half. Note: this will be fixed in a better way once we have the sharding iterator, as then we can query sequentially. Fixes #1761. Message-Id: <1476262693-30162-1-git-send-email-avi@scylladb.com>	2016-10-12 15:59:16 +02:00
Avi Kivity	1506b06617	Merge "node_exporter service on ubuntu 16" from Amnon "This series address two issues that interfere with running the node_exporter as a service in ubuntu 16. 1. The service file should be packed in the deb file 2. When setting the node_exporter as a service it doesn't need to run with scylla use" * 'amnon/node_exporter_ubuntu_v2' of github.com:cloudius-systems/seastar-dev: node-exporter service: No need to run as scylla user debian package: Include the node_exporter service file	2016-10-12 12:11:18 +03:00
Amnon Heiman	1bd50789e0	node-exporter service: No need to run as scylla user the node-exporter does not need to run as scylla user. It can run without scylla or without the scylla user being configure. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-10-11 12:44:27 +03:00
Amnon Heiman	d523bf56ed	debian package: Include the node_exporter service file This will include the node_exporter service script for ubuntu distribution with systemd support. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-10-11 12:44:14 +03:00
Avi Kivity	f6998bb260	Merge "Implement describe_splits_ex based on Cassandra" from Duarte "This patch-set re-implements the describe_splits_ex() verb to more closely follow Cassandra's implementation, on which some clients rely. Ref #1139 Ref #693" * 'describe-splits/v2' of github.com:duarten/scylla: thrift: Implement describe_splits_ex based on Cassandra storage_service: Implement get_splits() function sstables: Add function to get key samples sstables/key: Add to_partition_key function size_estimates_recorder: Increase estimate accuracy sstables: Get estimates for a particular range sstables/key: Make key::kind public	2016-10-11 11:13:35 +03:00
Takuya ASADA	0007f2d838	dist/common/sbin: add scylla_cpuset_setup and scylla_dev_mode_setup to /usr/sbin We haven't added symlinks to /usr/sbin for newly created scripts, so add them. Fixes #1702 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1474879711-31793-1-git-send-email-syuu@scylladb.com>	2016-10-11 11:02:14 +03:00
Takuya ASADA	ccad720bb1	dist/common/script/scylla_io_setup: handle comma correctly when parsing cpuset The script mistakenly split value at "," when cpuset list is separated by comma. Instead of matching possible patterns of the argument, let's pass all characters until reach to space delimiter or end of line. Fixes #1716 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1476171037-32373-1-git-send-email-syuu@scylladb.com>	2016-10-11 10:42:32 +03:00
Duarte Nunes	d8cfc56376	thrift: Implement describe_splits_ex based on Cassandra This patch re-implements the describe_splits_ex() verb to more closely follow Cassandra's implementation, on which some clients rely. Ref #1139 Ref #693 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 22:32:10 +02:00
Duarte Nunes	01ab2081cd	storage_service: Implement get_splits() function This patch implements the get_splits() function in storage_service, used to split a particular token range in slices of approximately the specified size, using the sample keys and estimates of the CF's sstables. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 22:32:08 +02:00
Duarte Nunes	c36dbaf0f1	sstables: Add function to get key samples This patch implements the get_key_samples() function, on which a future patch will base an implementation of the describe_splits() thrift verb closer to Cassandra's. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:14 +02:00
Duarte Nunes	fc07b66678	sstables/key: Add to_partition_key function Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:11 +02:00
Duarte Nunes	c19c633299	size_estimates_recorder: Increase estimate accuracy This patch uses the estimated_keys_for_range() function to get better estimates. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:52:16 +02:00
Duarte Nunes	ceed09b23e	sstables: Get estimates for a particular range This patch adds the estimated_keys_for_range() function, which estimates the number of keys present between the specified range. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:52:15 +02:00
Duarte Nunes	8c223b31c8	sstables/key: Make key::kind public Needed to create synthetic keys without any value but with ordering properties. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:47:24 +02:00
Avi Kivity	b305d92a65	Merge "housekeeping: check version during setup" from Amnon "The version is taken from the installation rather than the API, a mode command line indicated that this is part of the setup and uuid is used for the interaction with the checkversion server." * 'amnon/check_version_on_startup_v3' of github.com:cloudius-systems/seastar-dev: scylla_setup: Check and report the scylla version scylla-housekeeping: check version during setup	2016-10-10 16:37:14 +03:00
Vlad Zolotarov	ab748e829d	docs: tracing.md: initial commit Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1475686745-20383-1-git-send-email-vladz@cloudius-systems.com>	2016-10-10 16:12:02 +03:00
Tomasz Grabiec	4357d0a6d9	db: Add counter for writes blocked on dirty memory There is already queue_length-requests_blocked_memory, but it's a gauge so does not reflect what happened between the sampling points. total_operations-requests_blocked_memory will allow to see if there were any (and how many) requests which were blocked by dirty memory. Message-Id: <1476098616-12682-1-git-send-email-tgrabiec@scylladb.com>	2016-10-10 14:25:22 +03:00
Pekka Enberg	3b75ff1496	docs/docker: Tag `--listen-address` as 1.4 feature The Docker Hub documentation is the same for all image versions. Tag `--listen-address` as 1.4 feature. Message-Id: <1475819164-7865-1-git-send-email-penberg@scylladb.com>	2016-10-10 13:26:16 +03:00
Vlad Zolotarov	006999f46c	api::storage_service::slow_query: don't use duration_cast in GET The slow_query_record_ttl() and slow_query_threshold() return the duration of the appropriate type already - no need for an additional cast. In addition there was a mistake in a cast of ttl. Fixes #1734 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1475669400-5925-1-git-send-email-vladz@cloudius-systems.com>	2016-10-09 18:09:13 +03:00
Takuya ASADA	469e9af1f4	dist/common/scripts/scylla_setup: use 'swapon -s' instead of 'swapon --show' Since Ubuntu 14.04 doesn't supported --show option, we need to prevent use it. Fixes #1740 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475788340-22939-2-git-send-email-syuu@scylladb.com>	2016-10-09 18:05:14 +03:00
Takuya ASADA	8452045b85	dist/ubuntu: add realpath to dependency, requires for scylla_setup We need dependency to realpath, since scylla_setup using it. Fixes #1740. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475788340-22939-1-git-send-email-syuu@scylladb.com>	2016-10-09 18:05:14 +03:00
Tomasz Grabiec	41e66ebce2	gdb: Introduce 'scylla heapprof' Presents current heap profile recording. Works in text mode or dumps to collapsed stacks format from which flame graph can be generated. To generate a flamegraph: (gdb) scylla heapprof --flame Wrote heapprof.stacks $ flamegraph.pl --colors mem < heapprof.stacks > heapprof.svg flamegraph.pl comes from: https://github.com/brendangregg/FlameGraph.git Text mode example: (gdb) scylla heapprof --min 100000000 All (274699676, #10213) \-- void* memory::cpu_pages::allocate_large_and_trim<memory::cpu_pages::allocate_large_aligned(unsigned int, unsigned int)::{lambda(unsigned int, unsigned int)#1}>(unsigned int, memory::cpu_pages::allocate_large_aligned(unsigned int, unsigned int)::{lambda(unsigned int, unsigned int)#1}) + 169 (268435456, #1) memory::allocate_large_aligned(unsigned long, unsigned long) + 87 memory::allocate_aligned(unsigned long, unsigned long) + 48 aligned_alloc + 9 logalloc::segment_zone::segment_zone() + 304 logalloc::segment_pool::allocate_segment() + 477 logalloc::segment_pool::segment_pool() + 304 __tls_init.part.801 + 72 logalloc::region_group::release_requests() + 1333 logalloc::region_group::add(logalloc::region_group*) + 514 The branches are formatted like this: -- <symbol> (<size>, #<count>) Where <size> is total size of live objects and <count> is total number of live objects, for all objects allocated from paths going through this node. Nodes which share the same <size> and <count> are stacked like this: -- <symbol_1> (<size>, #<count>) <symbol_2> <symbol_3> Message-Id: <1475583334-19524-1-git-send-email-tgrabiec@scylladb.com>	2016-10-09 10:54:08 +03:00
Glauber Costa	33e9c2bbdd	memtable: reduce sstable flush concurrency to one Limiting the concurrency of memtable flushes to 4 was a temporary workaround for the fact that we lacked good write behind support. Now that write behind is properly merged we can reduce the concurrency to what it should be, one. This means that memtable flushes will now be serialized, and only when one of them ends will the next one begin. Disk parallelism is obtained through the write-behind mechanism. Fixes #1373 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <528f9ef928b5101bed952df600eb8555c275497a.1475881100.git.glauber@scylladb.com>	2016-10-09 10:48:57 +03:00
Tomasz Grabiec	2a5a90f391	db: Do not timeout streaming readers There is a limit to concurrency of sstable readers on each shard. When this limit is exhausted (currently 100 readers) readers queue. There is a timeout after which queued readers are failed, equal to read_request_timeout_in_ms (5s by default). The reason we have the timeout here is primarily because the readers created for the purpose of serving a CQL request no longer need to execute after waiting longer than read_request_timeout_in_ms. The coordinator no longer waits for the result so there is no point in proceeding with the read. This timeout should not apply for readers created for streaming. The streaming client currently times out after 10 minutes, so we could wait at least that long. Timing out sooner makes streaming unreliable, which under high load may prevent streaming from completing. The change sets no timeout for streaming readers at replica level, similarly as we do for system tables readers. Fixes #1741. Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>	2016-10-07 15:41:04 +03:00
Raphael S. Carvalho	9175977a9d	cql3: fix build failure by defining out unused function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <cba6207278ea945ee750d78b189320443843a288.1475793747.git.raphaelsc@scylladb.com>	2016-10-07 08:45:18 +03:00
Avi Kivity	9ac441d3b5	range: adjust split_after to allow split_point outside input range Make split_after() more generic by allowing split_point to be anywhere, not just within the input range. If the split_point is before, the entire range is returned; and if it is after, stdx::nullopt is returned. "before" and "after" are not well defined for wrap-around ranges, so but we are phasing them out and soon there will not be wrapping_range::split_after() users. This is a prerequisite for converting partition_range and friends to nonwrapping_range. Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>	2016-10-06 17:54:44 +02:00
Raphael S. Carvalho	7ea4513595	database: trigger compaction after loading new sstables Scylla wasn't trying to compact new sstables uploaded via 'nodetool refresh'. Thus, all new sstables were left uncompacted until user issued 'nodetool flush' or a new sstable was written which would trigger compaction too. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <bbdf274c8bb49f4bedeefcb85da78a6fb61a1232.1475535203.git.raphaelsc@scylladb.com>	2016-10-06 18:26:49 +03:00
Raphael S. Carvalho	9c59ccc52a	storage_service: improve log message for refresh 'No new SSTables were found for keyspace1.standard1' was printed if user uploaded new sstables to upload dir instead, and that is confusing. We should instead print that if new sstables weren't found in both cf and cf/upload dirs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <90386f6255407697434213227ae7ff0de7464f99.1475535203.git.raphaelsc@scylladb.com>	2016-10-06 18:26:32 +03:00
Raphael S. Carvalho	76862d0d9c	main: start compaction procedure after commit log is replayed Commit log replay is a synchronous operation in bootstrap, so services will only be started after it's completed. By starting compaction before, less bandwidth will be available to both and consequently boot will be slowed down. Fix is simply about moving compaction, which is an asynchronous operation after commitlog replay is over. Fixes #1620. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d2a173a4ee4d474317b970c6b39530e61067fea9.1475527955.git.raphaelsc@scylladb.com>	2016-10-06 18:25:24 +03:00
Nadav Har'El	ee7ec10b11	CQL parser: "CREATE MATERIALIZED VIEW" statement This patch adds the parsing for the "CREATE MATERIALIZED VIEW" statement, following Cassandra 3 syntax. For example: CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL PRIMARY KEY(city, name); It also adds the "IS NOT NULL" operator needed for this purpose. As in Cassandra, "IS NOT NULL" can only be used for materialized view creation, and not in a normal SELECT. It can only be used with the NULL operand (i.e., "IS NOT 3" will be a syntax error). The current implementation of this statement just does some sanity checking (such as to verify that "city" is a valid column name and that the "building" base table exists), complains that materialized views are not yet supported: SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="Failed parsing statement: [CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL PRIMARY KEY(city, name);] reason: unsupported operation: Materialized views not yet supported"> As mentioned above, the "IS NOT NULL" restriction is not allowed in ordinary selects not creating a materialized views: SELECT * FROM buildings WHERE city IS NOT NULL; InvalidRequest: code=2200 [Invalid query] message="restriction 'city IS NOT null' is only supported in materialized view creation" Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1475742927-30695-1-git-send-email-nyh@scylladb.com>	2016-10-06 15:42:37 +03:00
Glauber Costa	7146776d7c	fix sstable tests by not using the flush_reader if no region_group The latest virtual dirty patches broke the SSTable tests. The reason for this is that those tests will flush synthetic memtables that do not have a region_group attached to it. Normally in cases like this we would just give the flush_reader an empty region group. However, the memtable class constructor takes a region_group pointer and that can be null according to the interface. So we must conditionally test it. If there isn't a region_group involved, the virtual dirty accounting should be disabled: after all, we won't even have the baseline memory to begin with. One of the approaches to fix this could be to just provide null accounter classes to be used as a surrogate for the accounting classes in this case. However, since this is mostly used for tests, a much simpler way is to just revert back to the scanning reader in that case. The scanning reader is similar enough to the flush_reader, except that it can handle partial ranges, slices, and delegate accesses to an sstable post-flush. We don't need any of that, but as argued above, there is no need to remove it either. Signed-off-by: Glauber Costa <glommer@scylladb.com> Message-Id: <1475667271-60806-1-git-send-email-glommer@scylladb.com>	2016-10-05 12:44:21 +01:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Avi Kivity	f8118d9fc2	Merge "Virtual dirty memory management" from Glauber "Description: ============ Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that, is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Results ======= With this patchset running a load big enough to easily saturate the disk, (commitlog disabled to highlight the effects of the memtable writer), I am able to run scylla for many minutes, with timeouts occurring only when I run out of disk space, whereas without this patch a swarm of timeouts would start merely 2 seconds after the load started - and would never get stable. In V2, I have sent a set of graphs illustrating the performance of this solution. This version does not have any significant differences in that front. For details, please refer to https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ Accuracy of the accounting: --------------------------- It is important for us to be as accurate as possible when accounting freed memory, since every byte we mark as freed may allow one or more requests to be executed. I have measured the accuracy of this approach (ignoring padding, object size for the mutation fragments) to be 99.83 % of used memory in the test workload I have ran (large, 65k mutations). Memtables under this circumnstance tend to have a very high occupancy ratio because throttle breeds idle, and idle breeds compact-on-idle. Known Issues: ------------- A lot of time can be elapsed between destroying the flush_reader and actually releasing memory. The release of memory only happens when the SSTable is fully sealed, and we have to flush the files, as well as finish writing all SSTable components at this point. This happened in practice with a buggy kernel that would result in flushes taking a long time. After that is fixed, this is just a theoretical problem and in practice it shouldn't matter given the time we expect those operations to take." * 'virtual-dirty-v6' of github.com:glommer/scylla: database: allow virtual dirty memory management streamed_mutation: make _buffer private add accounting of memory read to partition_snapshot_reader move partition_snapshot_reader code to header file LSA: allow a group to query its own region group memtables: split scanning reader in two sstables: use special reader for writing a memtable LSA: export information about object memory footprint LSA: export information about size of the throttle queue database: export virtual dirty bytes region group	2016-10-04 20:57:52 +03:00
Avi Kivity	cc33c8b4ba	Merge seastar upstream * seastar 18f7bb8...f937fb0 (5): > Merge "Fix signal mask corruption" from Tomasz > core/memory: Avoid violating strict aliasing when accessing allocation sites > core/memory: Avoid indirection when storing allocation sites > core/memory: Add a way to disable abort on allocation failure in some scope > core/sharded: Allow mapper to take the service by non-const reference	2016-10-04 20:08:57 +03:00
Glauber Costa	f89a67c75c	database: allow virtual dirty memory management Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00

1 2 3 4 5 ...

10529 Commits