scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	3dc9294023	db: do not leak deleted sstable when deletion triggers an exception The leakage results in deleted sstables being opened until shutdown, and disk space isn't released. That's because column_family::rebuild_sstable_list() will not remove reference to deleted sstables if an exception was triggered in sstables::delete_atomically(). A sstable only has its files closed when its object is destructed. The exception happens when a major compaction is issued in parallel to a regular one, and one of them will be unable to delete a sstable already deleted by the other. That results in remove_by_toc_name() triggering boost::filesystem ::filesystem_error because TOC and temporary TOC don't exist. We wouldn't have seen this problem if major compaction were going through compaction manager, but remove_by_toc_name() and rebuild_sstable_list() should be made resilient. Fixes #1840. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d43b2e78f9658e2c3c5bbb7f813756f18874bf92.1479390842.git.raphaelsc@scylladb.com>	2016-11-17 17:46:36 +02:00
Glauber Costa	f08162e181	database: rework memtable flush logic The way we currently flush memtables, we seal the current one but wait on a semaphore for the actual flush to proceed. This is pointless, because if the flush is not proceeding we'll use up memory for the new entries anyway, be them in a newly opened memtable or not. As a matter of fact, by opening a new memtable we are foregoing coalescing opportunities. After recent changes to the flush paths, we are now in a position to do differently. We move the semaphore earlier, and if we can't acquire it we keep appending to the current memtable. For explicit flushes, we'll queue and prioritize them over memory-based flushes. This has the nice property of potentially coalescing various flushes for the same CF into one. Coalescing flushes for the same CF is particularly helpful for commitlog-initiated flushes that can't complete within the flush period. What we see currently, is that under heavy load the commitlog will keep sealing memtables adding to the existing load. Another interesting property of this approach is that we can keep the disk utilization higher, by allowing a new flush to start before the memtable is fully sealed. By design, every time a memtable is finished flushing it will call revert_potentially_cleaned_up_memory() to revert the virtual memory charges. That is the perfect moment for us to act. It indicates that all the data flushing part is done. The way we'll do it is by keeping the semaphore_units alive for this memtable. When the flush ends, we destroy that object. This will effectively trigger the next flush if there is a next flush that can be initiated. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-16 21:20:58 -05:00
Glauber Costa	895e838ac0	get rid of max_memtable_size After recent changes to the memtable code, there is no reason for us to uphold a maximum memtable size. Now that we only flush one memtable at a time anyway, and also have soft limit notifications from the region_group_reclaimer, we can just set the soft limit to the target size and let all of that be handled by the dirty_memory_manager. It does have the added property that we'll be flushing when we globally reach the soft limit threshold. In conditions in which we have multiple CF writes fighting for memory, that guarantees that we will start flushing much earlier than the hard limit. The threshold is set to 1/4 of dirty memory. While in theory we would prefer the memtables to go as big as 1/2 of dirty memory, in my experiments I have found 1/4 to be a better fit, at least for the moment. The reason for such behavior is that in situations where we have slow disks, setting the soft limit to 1/2 of dirty will put us in a situation in which we may not have finished writing down the memtable when we hit the limit, and then throttle. When set the threshold to 1/4 of dirty, we don't throttle at all. This behavior could potentially be fixed by not doing the full memtable-based throttling after we do the commitlog throttling, but that is not something realistic for the moment. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-16 21:20:24 -05:00
Glauber Costa	da738a6cd1	database: remove outdated comment Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-16 21:20:23 -05:00
Glauber Costa	919de98aa5	database: uphold virtual dirty for system tables. Currently the virtual dirty mechanism is not properly set for system tables. We haven't divided the system table allowance by two, which means it won't start thottling earlier as it was supposed to. In practice, this has little effect because system table requests are very well behaved, their sizes well known, and they tend to be force-flushed. But we should be consistent. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-16 21:20:23 -05:00
Gleb Natapov	27e041606b	fix LOCAL_ONE printout Message-Id: <20161109125307.GH7766@scylladb.com>	2016-11-09 12:53:55 +00:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Tomasz Grabiec	3b5ccda70e	Revert "database: refactor code so apply_in_memory() is called only once" This reverts commit `3f825f593d`.	2016-11-04 10:48:18 +01:00
Tomasz Grabiec	6366eb5cf8	Revert "correctly calculate latencies for writes" This reverts commit `a382f10fc4`.	2016-11-04 10:48:02 +01:00
Tomasz Grabiec	a5ee87611a	Revert "database: when querying, move latency counter instead of copying" This reverts commit `8840a5a593`.	2016-11-04 10:47:58 +01:00
Glauber Costa	8840a5a593	database: when querying, move latency counter instead of copying It is comprised of two time points. Let's move it instead of copying it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <c7c155c77780e188bfbe05881c81ce86456016d5.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	a382f10fc4	correctly calculate latencies for writes Right now we are calculating latencies only when we are about to add an item to the memtable. That's incorrect and misleading, for two reasons. First, it leaves the commitlog latencies out. But second, it is done after the memtable wall effect is applied, which means we are not counting throttle time neither in the memtables or in the commitlog. To do that, we'll start the latency_counter object as soon as possible and move it all the way to apply_in_memory(). That should span the entire write operation. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <4e424780d290fd5938046060df2b17e2b470b717.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	3f825f593d	database: refactor code so apply_in_memory() is called only once There are two variants of apply_in_memory() being called in do_apply(): with and without the commitlog. The main differences are that when the commitlog is involved, we need to wait for its future to complete before moving to apply_in_memory. That can easily be factored out by providing an always-ready future if we don't have the commitlog enabled, and waiting on that. The second, is that the commitlog version can cause apply_in_memory to generate an exception if there is replay position reordering. However, there is no harm in appending the exception handler to both versions. In one of them it's an impossible exception, but that's fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <8cee0cad9b1930a057a24e095f0a655069ae8be2.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Raphael S. Carvalho	d11e839520	db: make refresh resilient to permission denied error User may forget to set permission of new sstables in upload dir before refreshing them, and that will result in shutdown. io_checker is now able to work with a custom handler, so all we have to do is to whitelist EACCES. Fixes #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 16:50:40 -02:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Raphael S. Carvalho	bc2d351c25	sstables: remove duplicated declaration of remove_by_toc_name Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-26 11:21:27 -02:00
Raphael S. Carvalho	fa308c079c	database: fix collectd metrics for clustering key filter Same instance name was used for exported metrics, which is definitely wrong. Checked it works properly now via collectd exporter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <471a36706113af60aeba86fb56a365feb4dab31a.1477086706.git.raphaelsc@scylladb.com>	2016-10-22 09:51:18 +03:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	7bebfb851f	database: enable fast forwarding of range_sstable_reader When fast forwarding a reader that combines sstable reader we must also remember that the set of sstables for the new range may be different than for the previous one. The reader introduced in this patch makes sure that we read from correct sstables. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Tomasz Grabiec	4357d0a6d9	db: Add counter for writes blocked on dirty memory There is already queue_length-requests_blocked_memory, but it's a gauge so does not reflect what happened between the sampling points. total_operations-requests_blocked_memory will allow to see if there were any (and how many) requests which were blocked by dirty memory. Message-Id: <1476098616-12682-1-git-send-email-tgrabiec@scylladb.com>	2016-10-10 14:25:22 +03:00
Glauber Costa	33e9c2bbdd	memtable: reduce sstable flush concurrency to one Limiting the concurrency of memtable flushes to 4 was a temporary workaround for the fact that we lacked good write behind support. Now that write behind is properly merged we can reduce the concurrency to what it should be, one. This means that memtable flushes will now be serialized, and only when one of them ends will the next one begin. Disk parallelism is obtained through the write-behind mechanism. Fixes #1373 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <528f9ef928b5101bed952df600eb8555c275497a.1475881100.git.glauber@scylladb.com>	2016-10-09 10:48:57 +03:00
Tomasz Grabiec	2a5a90f391	db: Do not timeout streaming readers There is a limit to concurrency of sstable readers on each shard. When this limit is exhausted (currently 100 readers) readers queue. There is a timeout after which queued readers are failed, equal to read_request_timeout_in_ms (5s by default). The reason we have the timeout here is primarily because the readers created for the purpose of serving a CQL request no longer need to execute after waiting longer than read_request_timeout_in_ms. The coordinator no longer waits for the result so there is no point in proceeding with the read. This timeout should not apply for readers created for streaming. The streaming client currently times out after 10 minutes, so we could wait at least that long. Timing out sooner makes streaming unreliable, which under high load may prevent streaming from completing. The change sets no timeout for streaming readers at replica level, similarly as we do for system tables readers. Fixes #1741. Message-Id: <1475840678-25606-1-git-send-email-tgrabiec@scylladb.com>	2016-10-07 15:41:04 +03:00
Raphael S. Carvalho	7ea4513595	database: trigger compaction after loading new sstables Scylla wasn't trying to compact new sstables uploaded via 'nodetool refresh'. Thus, all new sstables were left uncompacted until user issued 'nodetool flush' or a new sstable was written which would trigger compaction too. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <bbdf274c8bb49f4bedeefcb85da78a6fb61a1232.1475535203.git.raphaelsc@scylladb.com>	2016-10-06 18:26:49 +03:00
Avi Kivity	f8118d9fc2	Merge "Virtual dirty memory management" from Glauber "Description: ============ Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that, is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Results ======= With this patchset running a load big enough to easily saturate the disk, (commitlog disabled to highlight the effects of the memtable writer), I am able to run scylla for many minutes, with timeouts occurring only when I run out of disk space, whereas without this patch a swarm of timeouts would start merely 2 seconds after the load started - and would never get stable. In V2, I have sent a set of graphs illustrating the performance of this solution. This version does not have any significant differences in that front. For details, please refer to https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ Accuracy of the accounting: --------------------------- It is important for us to be as accurate as possible when accounting freed memory, since every byte we mark as freed may allow one or more requests to be executed. I have measured the accuracy of this approach (ignoring padding, object size for the mutation fragments) to be 99.83 % of used memory in the test workload I have ran (large, 65k mutations). Memtables under this circumnstance tend to have a very high occupancy ratio because throttle breeds idle, and idle breeds compact-on-idle. Known Issues: ------------- A lot of time can be elapsed between destroying the flush_reader and actually releasing memory. The release of memory only happens when the SSTable is fully sealed, and we have to flush the files, as well as finish writing all SSTable components at this point. This happened in practice with a buggy kernel that would result in flushes taking a long time. After that is fixed, this is just a theoretical problem and in practice it shouldn't matter given the time we expect those operations to take." * 'virtual-dirty-v6' of github.com:glommer/scylla: database: allow virtual dirty memory management streamed_mutation: make _buffer private add accounting of memory read to partition_snapshot_reader move partition_snapshot_reader code to header file LSA: allow a group to query its own region group memtables: split scanning reader in two sstables: use special reader for writing a memtable LSA: export information about object memory footprint LSA: export information about size of the throttle queue database: export virtual dirty bytes region group	2016-10-04 20:57:52 +03:00
Glauber Costa	f89a67c75c	database: allow virtual dirty memory management Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Raphael S. Carvalho	747b42299c	database: remove unused code Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <95e1ed590c9e45d15f19a84824a4dce05aefdab8.1475528611.git.raphaelsc@scylladb.com>	2016-10-04 09:26:43 +03:00
Raphael S. Carvalho	a3bf7558f2	lcs: fix broken token range distribution at higher levels Uniform token range distribution across sstables in a level > 1 was broken, because we were only choosing sstable with lowest first key, when compacting a level > 0. This resulted in performance problem because L1->L2 may have a huge overlap over time, for example. Last compacted key will now be stored for each level to ensure sort of "round robin" selection of sstables for compactions at level >= 1. That's also done by C*, and they were once affected by it as described in https://issues.apache.org/jira/browse/CASSANDRA-6284. Fixes #1719. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-30 14:09:16 -03:00
Glauber Costa	f5fd6bd714	LSA: export information about size of the throttle queue Also add information about for how long has the oldest been sitting in the queue. This is part of the backpressure work to allow us to throttle incoming requests if we won't have memory to process them. Shortages can happen in all sorts of places, and it is useful when designing and testing the solutions to know where they are, and how bad they are. This counter is named for consistency after similar counters from transport/. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-27 12:09:08 -04:00
Glauber Costa	aa6a96d09b	database: export virtual dirty bytes region group Currently, we export the region group where memtables are placed as dirty bytes. Upcoming patches will optimistically mark some bytes in this region as free, a scheme we know as "virtual dirty". We are still interested in knowing the real state of the dirty region, so we will keep track of the bytes virtually freed and split the counters in two. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-27 12:09:08 -04:00
Asias He	b505e34062	database: Introduce make_streaming_reader The make_streaming_reader returns a combined mutation reader reads mutations from sstables and memtable. The memtable reader handles memtable flushing automatically so no special handling is needed here. It will be used by streaming soon.	2016-09-26 16:02:48 +08:00
Raphael S. Carvalho	67343798cf	api: implement api to return sstable count per level 'nodetool cfstats' wasn't showing per-level sstable count because the API wasn't implemented. Fixes #1119. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>	2016-09-21 09:13:40 +03:00
Raphael S. Carvalho	dffb41f9d8	sstables: remove schema parameter from some sstable methods schema can now be found in the sstable object itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Calle Wilund	f126cf769a	column_family: Ensure flush() waits for all previous flushes + self Fixes #1577 Message-Id: <1472569952-4066-1-git-send-email-calle@scylladb.com>	2016-09-14 11:00:41 +01:00
Tomasz Grabiec	a498da1987	database: Ignore spaces in initial_token list Currently we get boost::lexical_cast on startup if inital_token has a list which contains spaces after commas, e.g.: initial_token: -1100081313741479381, -1104041856484663086, ... Fixes #1664. Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>	2016-09-14 11:58:13 +03:00
Raphael S. Carvalho	b9f67351da	db: expose clustering filter info via collectd That's needed to observe behavior of clustering filter, and to check if it's worthwhile for a specific workload. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 11:32:23 -03:00
Raphael S. Carvalho	a2dc88889d	db: enable clustering optimization only on dtcs Leveled strategy will not benefit from this strategy because there's only a few sstables that will contain a given partition key, which means that a clustering key that belongs to a specific partition key can only be in a few sstables as well. Date tiered strategy is the one that will actually benefit the most from this optimization. Size tiered may benefit from it too if clustering key isn't overwritten, but it will not use the clustering optimization. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 11:31:07 -03:00
Raphael S. Carvalho	8d03ccd604	sstables: optimize reads with clustering filter If user specifies a clustering filter, it's possible to filter out sstable based on its metadata that tracks min/max clustering value. For example, if sstable stores clustering key from 'a' through 'c', it's possible to filter out that sstable if user asks for data with clustering key greater than 'c'. That's done by comparing each component separately because clustering key may be composite. Further information can be found here: https://issues.apache.org/jira/browse/CASSANDRA-5514 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:51:50 -03:00
Raphael S. Carvalho	004617839d	database: check bloom filter of all sstables earlier All sstables will now have bloom filter checked in a single pass before reader iterate through all candidates. It's possible that we will need to futurize the procedure if it holds cpu for too long. This change is also a step towards the optimization that will rule out sstables based on clustering filter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:50:08 -03:00
Raphael S. Carvalho	1f31223f32	sstables: store schema in sstable object That will be needed for optimization that will store decorated keys in the sstable object, and also for a subsequent work that will detect wrong metadata (min/max column names) by looking at columns in the schema. As schema is stored in sstable, there's no longer a need to store ks and cf names in it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:17 -03:00
Glauber Costa	dc5d8e33af	Revert "row_cache: update sstable histograms on cache hits" This reverts commit `1726b1d0cc`. Reverting this patch turns our SSTable access counter into a miss counter only. The estimated histogram always starts its first bucket at 1, so by marking cache accesses we will be wrongly feeding "1" into the buckets. Notice that this is not yet ideal: nodetool is supposed to show a histogram of all reads, and by doing this we are changing its meaning slightly. Workloads that serve mostly from cache will be distorted towards their misses. The real solution is to use a different histogram, but we will need to enforce a newer version of nodetool for that: the current issue is that nodetool expects an EstimatedHistogram in a specific format in the other side. Conflicts: row_cache.hh Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy lladb.com> Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-01 18:07:31 +03:00
Duarte Nunes	ba374da043	database: Trace sstable accesses This patch traces when we read from an sstable, be it a key range or a single one. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:04:32 +02:00
Duarte Nunes	030db65c62	database: Accept a trace_state_ptr This patch changes the database and column_family types so a trace_state_ptr can be passed in when querying. This enables tracing of the inner components. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:04:28 +02:00
Glauber Costa	1726b1d0cc	row_cache: update sstable histograms on cache hits If we have a cache hit, we still need to update our sstable histogram - notting that we have touched 0 SSTables. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:14:22 -04:00
Glauber Costa	ce24fd05fe	database: keep statistics on SSTables touched per read That is done for single partition queries only - mimicking what Cassandra does on that matter. For this to be correct, we also need to update this histogram on cache hits - in which case we update the read as having touched 0 SSTables. That will be done on a separate patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:14:21 -04:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Raphael S. Carvalho	108fd1fade	database: close file in lister After listing is done, let's close file. This fixes no bug. It's only an improvement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2f52d297bcf6a6b6e3429912c28f17e6b37f8842.1471381607.git.raphaelsc@scylladb.com>	2016-08-17 11:01:44 +03:00
Glauber Costa	b361dee488	database: memtables pending flushes tell us nothing We have two counters that tracks how many memtable flushes are in progress, and how much memory are they pinning. The problem is, after we have revamped the code to limit the amount of flushes in progress, those counters became useless: as they live inside the semaphore side, they will only be incremented once we have past the semaphore. One wouldn't notice if working with CPU-bound problems, where memtables don't pile. But as soon as they do, those counters will always show the same numbers: the depth of the semaphore, which doesn't mean much. The problem is poised to become much worse: once we enable write behind in full and set the semaphore's depth to one, that's the number we'll see here all the time. The fix is to move the counters outside the semaphore, which will bring back its old semantics. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <c5ae6903e170f3f356cdda7ed78a4c9ba8d5f024.1471370504.git.glauber@scylladb.com>	2016-08-17 10:54:15 +03:00
Paweł Dziepak	8a386a51bd	Merge "Don't cache wide partitions" from Piotr "When reading a partition try to read it all but once more bytes are read than a given limit we decide that partition is wide and we don't cache it. Instead we retry the read with clustering key filtering applied."	2016-07-21 10:24:25 +01:00

1 2 3 4 5 ...

645 Commits