scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-19 16:15:07 +00:00

Author	SHA1	Message	Date
Avi Kivity	d9700a2826	storage_proxy: don't query concurrently needlessly during range queries storage_proxy has an optimization where it tries to query multiple token ranges concurrently to satisfy very large requests (an optimization which is likely meaningless when paging is enabled, as it always should be). However, the rows-per-range code severely underestimates the number of rows per range, resulting in a large number of "read-ahead" internal queries being performed, the results of most of which are discarded. Fix by disabling this code. We should likely remove it completely, but let's start with a band-aid that can be backported. Fixes #1863. Message-Id: <20161120165741.2488-1-avi@scylladb.com> (cherry picked from commit `6bdb8ba31d`)	2016-11-21 18:19:59 +02:00
Glauber Costa	d2438059a7	database: keep a pointer to the memtable list in a memtable We current pass a region group to the memtable, but after so many recent changes, that is a bit too low level. This patch changes that so we pass a memtable list instead. Doing that also has a couple of advantages. Mainly, during flush we must get to a memtable to a memtable_list. Currently we do that by going to the memtable to a column family through the schema, and from there to the memtable_list. That, however, involves calling virtual functions in a derived class, because a single column family could have both streaming and normal memtables. If we pass a memtable_list to the memtable, we can keep pointer, and when needed get the memtable_list directly. Not only that gets rid of the inheritance for aesthetic reasons, but that inheritance is not even correct anymore. Since the introduction of the big streaming memtables, we now have a plethora of lists per column family and this transversal is totally wrong. We haven't noticed before because we were flushing the memtables based on their individual sizes, but it has been wrong all along for edge cases in which we would have to resort to size-based flush. This could be the case, for instance, with various plan_ids in flight at the same time. At this point, there is no more reason to keep the derived classes for the dirty_memory_manager. I'm only keeping them around to reduce clutter, although they are useful for the specialized constructors and to communicate to the reader exactly what they are. But those can be removed in a follow up patch if we want. The old memtable constructor signature is kept around for the benefit of two tests in memtable_tests which have their own flush logic. In the future we could do something like we do for the SSTable tests, and have a proxy class that is friends with the memtable class. That too, is left for the future. Fixes #1870 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <811ec9e8e123dc5fc26eadbda82b0bae906657a9.1479743266.git.glauber@scylladb.com> (cherry picked from commit `0ca8c3f162`)	2016-11-21 18:18:56 +02:00
Glauber Costa	4098831ebc	commitlog: wait for pending allocations to finish before closing gate. allocations may enter the gate, so it would be wise for us to wait for them. Fixes #1860 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <53cd6996c1cbd8b38bab3b03604bd11e5c20beda.1479650012.git.glauber@scylladb.com> (cherry picked from commit `21c1e2b48c`)	2016-11-20 20:00:32 +02:00
Glauber Costa	4539b8403a	database: fix direct flushes of non-durable column families. If a Column Family is non-durable, then its flushes will never create a memtable flush reader. Our current flush logic depends on that being created and destroyed to release the semaphore permits on the flush. We will remove the permits ourselves it there is an exception, but not under normal circumnstances. Given this issue, however, it would be more adequate to always try to remove the permits after we flush. If the permits were already removed by the flush reader, then this test will just see that the permit is not in the map and return. But if it is still there, then it is removed. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <049334c3b4bef620af2c7c045e6c84347dcf9013.1479498026.git.glauber@scylladb.com> (cherry picked from commit `1933349654`)	2016-11-18 21:33:22 +01:00
Raphael S. Carvalho	558f535fcb	db: do not leak deleted sstable when deletion triggers an exception The leakage results in deleted sstables being opened until shutdown, and disk space isn't released. That's because column_family::rebuild_sstable_list() will not remove reference to deleted sstables if an exception was triggered in sstables::delete_atomically(). A sstable only has its files closed when its object is destructed. The exception happens when a major compaction is issued in parallel to a regular one, and one of them will be unable to delete a sstable already deleted by the other. That results in remove_by_toc_name() triggering boost::filesystem ::filesystem_error because TOC and temporary TOC don't exist. We wouldn't have seen this problem if major compaction were going through compaction manager, but remove_by_toc_name() and rebuild_sstable_list() should be made resilient. Fixes #1840. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d43b2e78f9658e2c3c5bbb7f813756f18874bf92.1479390842.git.raphaelsc@scylladb.com> (cherry picked from commit `3dc9294023`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <760f96d81de0bab7507bb4f52c06b30f21e82577.1479420770.git.raphaelsc@scylladb.com>	2016-11-18 13:10:46 +02:00
Glauber Costa	3d45d0d339	fix shutdown and exception conditions for flush logic This patch addresses post-merge follow up comments by Tomek. Basically, what we do is: - we don't need to signal() from remove_from_flush_manager(), because the explicit flushes no longer wait on the condition variable. So we don't. - We now wait on the stop() flushes (regardless of their return status) so we can make sure that the _flush_queue will indeed be done with. - we acquire the semaphore before shutting down the dirty_memory_manager to make sure that there are no pending flushes - the flush manager that holds the semaphore has to match in the exception handler Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <a23ab5098934546c660a08de64cd9294bb3a2008.1479400239.git.glauber@scylladb.com> (cherry picked from commit `461778918b`)	2016-11-18 11:53:21 +02:00
Avi Kivity	affc0d9138	Merge "get rid of memtable size parameter and rework flush logic" from Glauber "This patchset allows Scylla to determine the size of a memtable instead of relying in the user-provided memtable_cleanup_threshold. It does that by allowing the region_group to specify a soft limit which will trigger the allocation as early as it is reached. Given that, we'll keep the memtables in memory for as long as it takes to reach that limit, regardless of the individual size of any single one of them. That limit is set to 1/4 of dirty memory. That's the same as last submission, except this time I have run some experiments to gauge behavior of that versus 1/2 of dirty memory, which was a preferred theoretical value. After that is done, the flush logic is reworked to guarantee that flushes are not initiated if we already have one memtable under flush. That allow us to better take advantage of coalescing opportunities with new requests and prevents the pending memtable explosion that is ultimately responsible for Issue 1817. I have run mainly two workloads with this. The first one a local RF=1 workload with large partitions, sized 128kB and 100 threads. The results are: Before: op rate : 632 [WRITE:632] partition rate : 632 [WRITE:632] row rate : 632 [WRITE:632] latency mean : 157.8 [WRITE:157.8] latency median : 115.5 [WRITE:115.5] latency 95th percentile : 486.7 [WRITE:486.7] latency 99th percentile : 534.8 [WRITE:534.8] latency 99.9th percentile : 599.0 [WRITE:599.0] latency max : 722.6 [WRITE:722.6] Total partitions : 189667 [WRITE:189667] Total errors : 0 [WRITE:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:00 END After: op rate : 951 [WRITE:951] partition rate : 951 [WRITE:951] row rate : 951 [WRITE:951] latency mean : 104.8 [WRITE:104.8] latency median : 102.5 [WRITE:102.5] latency 95th percentile : 155.8 [WRITE:155.8] latency 99th percentile : 177.8 [WRITE:177.8] latency 99.9th percentile : 686.4 [WRITE:686.4] latency max : 1081.4 [WRITE:1081.4] Total partitions : 285324 [WRITE:285324] Total errors : 0 [WRITE:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:00 END The other workload was the workload described in #1817. And the result is that we now have a load that is very stable around 100k ops/s and hardly any timeouts, instead of the 1.4 baseline of wild variations around 100k ops/s and lots of timeouts, or the deep reduction of 1.5-rc1." * 'issue-1817-v4' of github.com:glommer/scylla: database: rework memtable flush logic get rid of max_memtable_size pass a region to dirty_memory_manager accounting API memtable: add a method to expose the region_group logalloc: allow region group reclaimer to specify a soft limit database: remove outdated comment database: uphold virtual dirty for system tables. (cherry picked from commit `5d067eebf2`)	2016-11-17 14:41:23 +02:00
Gleb Natapov	3c68504e54	sstables: fix ad-hoc summary creation If sstable Summary is not present Scylla does not refuses to boot but instead creates summary information on the fly. There is a bug in this code though. Summary files is a map between keys and offsets into Index file, but the code creates map between keys and Data file offsets instead. Fix it by keeping offset of an index entry in index_entry structure and use it during Summary file creation. Fixes #1857. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20161116165421.GA22296@scylladb.com> (cherry picked from commit `ae0a2935b4`)	2016-11-17 11:45:29 +02:00
Raphael S. Carvalho	e9b26d547d	main: fix exception handling when initializing data or commitlog dirs Exception handling was broken because after io checker, storage_io_error exception is wrapped around system error exceptions. Also the message when handling exception wasn't precise enough for all cases. For example, lack of permission to write to existing data directory. Fixes #883. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com> (cherry picked from commit `9a9f0d3a0f`)	2016-11-16 15:12:48 +02:00
Raphael S. Carvalho	8510389188	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com> (cherry picked from commit `53b7b7def3`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e30115e089a4c3c3fb4aad956645c9d006c2ee55.1479141101.git.raphaelsc@scylladb.com>	2016-11-16 15:11:05 +02:00
Amnon Heiman	ea61a8b410	API: cache_capacity should use uint for summing Using integer as a type for the map_reduce causes number over overflow. Fixes #1801 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1479299425-782-1-git-send-email-amnon@scylladb.com> (cherry picked from commit `a4be7afbb0`)	2016-11-16 15:03:15 +02:00
Paweł Dziepak	bd694d845e	partition_version: make sure that snapshot is destroyed under LSA Snapshot destructor may free some objects managed by the LSA. That's why partition_snapshot_reader destructor explicitly destroys the snapshot it uses. However, it was possible that exception thrown by _read_section prevented that from happenning making snapshot destoryed implicitly without current allocator set to LSA. Refs #1831. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com> (cherry picked from commit `f16d6f9c40`)	2016-11-16 14:34:11 +02:00
Paweł Dziepak	01c01d9ac4	query_pagers: distinct queries do not have clustering keys Query pager needs to handle results that contain partitions with possibly multiple clustering rows quite differently than results with just one row per partition (for example a page may end in a middle of partition). However, the logic dealing with partitions with clustering rows doesn't work correctly for SELECT DISTINCT queries, which are much more similar to the ones for schemas without clustering key. The solution is to set _has_clustering_keys to false in case of SELECT DISTINCT queries regardless of the schema which will make pager correctly expect each partition to return at most one rows. Fixes #1822. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478612486-13421-1-git-send-email-pdziepak@scylladb.com> (cherry picked from commit `055d78ee4c`)	2016-11-16 10:17:34 +01:00
Paweł Dziepak	ed39e8c235	row_cache: touch entries read during range queries Fixes #1847. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com> (cherry picked from commit `999dafbe57`)	2016-11-15 20:34:40 +00:00
Avi Kivity	c57835e7b5	Merge "Fixes for histogram and moving average calculations" from Glauber "JMX metrics were found to be either not showing, or showing absurd values. Turns out there were multiple things wrong with them. The patches were sent separately but conflict with one another. This series is a collection of the patches needed to fix the issues we saw. Fixes #1832, #1836, #1837" (cherry picked from commit `bf20aa722b`)	2016-11-13 11:42:53 +02:00
Amnon Heiman	13baa04056	API: fix a type in storage_proxy This patch fixes a typo in the URL definition, causing the metric in the jmx not to find it. Fixes #1821 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1478563869-20504-1-git-send-email-amnon@scylladb.com> (cherry picked from commit `c8082ccadb`)	2016-11-13 09:25:14 +02:00
Glauber Costa	298de37cef	histogram: moving averages: fix inverted parameters moving_averages constructor is defined like this: moving_average(latency_counter::duration interval, latency_counter::duration tick_interval) But when it is time to initialize them, we do this: ... {tick_interval(), std::chrono::minutes(1)} ... As it can be seen, the interval and tick interval are inverted. This leads to the metrics being assigned bogus values. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <d83f09eed20ea2ea007d120544a003b2e0099732.1478798595.git.glauber@scylladb.com> (cherry picked from commit `d3f11fbabf`)	2016-11-11 10:15:32 +02:00
Paweł Dziepak	91e5e50647	Merge "Remove quadratic behavior from atomic sstable deletion" from Avi "The atomic sstable deletion provides exception safety at the cost of quadratic behavior in the number of sstables awaiting deletion. This causes high cpu utilization during startup. Change the code to avoid quadratic complexity, and add some unit tests. See #1812." (cherry picked from commit `985d2f6d4a`)	2016-11-08 22:46:01 +02:00
Pekka Enberg	08b1ff53dd	release: prepare for 1.5.rc1	2016-11-02 13:39:53 +02:00
Pekka Enberg	0485289741	cql3: Fix selecting same column multiple times Under the hood, the selectable::add_and_get_index() function deliberately filters out duplicate columns. This causes simple_selector::get_output_row() to return a row with all duplicate columns filtered out, which triggers and assertion because of row mismatch with metadata (which contains the duplicate columns). The fix is rather simple: just make selection::from_selectors() use selection_with_processing if the number of selectors and column definitions doesn't match -- like Apache Cassandra does. Fixes #1367 Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com> (cherry picked from commit `e1e8ca2788`)	2016-11-01 09:33:19 +00:00
Avi Kivity	b3504e5482	Update seastar submodule * seastar 57a17ca...25137c2 (2): > reactor: improve task quota timer resolution > future: prioritise continuations that can run immediately Fixes #1794.	2016-10-28 14:17:26 +03:00
Avi Kivity	6cdb1256bb	Update seastar submodule * seastar e2c2bbc...57a17ca (1): > rpc: Avoid using zero-copy interface of output_stream (Fixes #1786)	2016-10-28 14:11:47 +03:00
Pekka Enberg	39b0da51a3	auth: Fix resource level handling We use `data_resource` class in the CQL parser, which let's users refer to a table resource without specifying a keyspace. This asserts out in get_level() for no good reason as we already know the intented level based on the constructor. Therefore, change `data_resource` to track the level like upstream Cassandra does and use that. Fixes #1790 Message-Id: <1477599169-2945-1-git-send-email-penberg@scylladb.com> (cherry picked from commit `b54870764f`)	2016-10-27 23:37:50 +03:00
Glauber Costa	0656e66f5f	auth: always convert string to upper case before comparing We store all auth perm strings in upper case, but the user might very well pass this in upper case. We could use a standard key comparator / hash here, but since the strings tend to be small, the new sstring will likely be allocated in the stack here and this approach yields significantly less code. Fixes #1791. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <51df92451e6e0a6325a005c19c95eaa55270da61.1477594199.git.glauber@scylladb.com> (cherry picked from commit `ef3c7ab38e`)	2016-10-27 22:11:02 +03:00
Avi Kivity	185fbb8abc	Merge "Cache fixes" from Paweł "5ff699e09fcbd62611e78b9de601f6c8636ab2f0 ("row_cache: rework cache to use fast forwarding reader") brought some significant changes to the row cache implementation. Unfortunately, "significant changes" often translates to "more bugs" and this time was no different. This series contains fixes for the problems introduced in that rework and makes failing dtest bootstrap_test.py:TestBootstrap.local_quorum_bootstrap_test pass again." * 'pdziepak/cache-fixes/v1' of github.com:cloudius-systems/seastar-dev: row_cache: avoid dereferencing invalid iterator row_cache: set _first_element flag correctly row_cache: fix clearing continuity flag at eviction (cherry picked from commit `72d78ffa7e`)	2016-10-27 11:45:20 +03:00
Tomasz Grabiec	4ed3d350cc	Update seastar submodule * seastar ab1531e...e2c2bbc (3): > rpc: do not assume underling semaphore type > rpc: fix default resource limit > rpc: Move _connected flag to protocol::connection	2016-10-26 10:00:52 +02:00
Tomasz Grabiec	72d4a26c43	Update seastar submodule * seastar f8e4e93...ab1531e (1): > rpc: Fix crash during connection teardown	2016-10-26 09:49:41 +02:00
Tomasz Grabiec	b582525ad8	Merge seastar upstream (This time for real) * seastar 69acec1...f8e4e93 (1): > rpc: Do not close client connection on error response for a timed out request Refs #1778	2016-10-25 13:53:01 +02:00
Tomasz Grabiec	5ca372e852	Merge seastar upstream * seastar 69acec1...f8e4e93 (1): > rpc: Do not close client connection on error response for a timed out request Refs #1778	2016-10-25 13:45:58 +02:00
Avi Kivity	fc8210a875	tests: fix tests with boost 1.60 In boost 1.60, the executable's command-line arguments are expected to be separated from the boost command-line arguments by '--'. Detect this requirement and comply with it. Message-Id: <1477212424-3831-1-git-send-email-avi@scylladb.com>	2016-10-24 09:36:56 +02:00
Avi Kivity	37f112b610	dist: add python3-yaml to ununtu dependencies for blocktune	2016-10-23 16:42:13 +03:00
Avi Kivity	7d50d6df9b	blocktune: fix syntax error in exception handling	2016-10-23 16:40:00 +03:00
Avi Kivity	e261a380a9	dist: add PyYAML dependency to rpm (for blocktune)	2016-10-23 10:36:29 +03:00
Raphael S. Carvalho	fa308c079c	database: fix collectd metrics for clustering key filter Same instance name was used for exported metrics, which is definitely wrong. Checked it works properly now via collectd exporter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <471a36706113af60aeba86fb56a365feb4dab31a.1477086706.git.raphaelsc@scylladb.com>	2016-10-22 09:51:18 +03:00
Glauber Costa	a13c410749	commitlog: cycle based on total size, not on mutation size We calculate two sizes during the allocation: "size", which is the in-segment size of this mutation, and "s", which is that plus the overhead. cycle() must be called with the latter, not the former, as doing otherwise may lead to buffer overflows. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <ccf346d8d0ebb44a1ba9fd069653bab0d7be0a61.1477063157.git.glauber@scylladb.com>	2016-10-21 18:57:41 +03:00
Glauber Costa	d9875784a1	commitlog: do not wait on pending operations for batch mode This was explicitly mentioned in my set as gone in one of the versions. Somehow it came back in the final version - sorry about that. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2a0eba28cd74267d1a1fdcf1aef2901cc74ffc9f.1477059963.git.glauber@scylladb.com>	2016-10-21 17:27:16 +03:00
Vlad Zolotarov	f75a350a8f	service::storage_proxy: use global_trace_state_ptr when using invoke_on When trace_state may migrate to a different shard a global_trace_state_ptr has to be used. This patch completes the patch below: commit `7e180c7bd3` Author: Vlad Zolotarov <vladz@cloudius-systems.com> Date: Tue Sep 20 19:09:27 2016 +0300 tracing: introduce the tracing::global_trace_state_ptr class Fixes #1770 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1476993537-27388-1-git-send-email-vladz@cloudius-systems.com>	2016-10-21 11:34:13 +03:00
Avi Kivity	e3ae54f0fe	Merge "Rework commitlog to avoid timeouts" from Glauber "This patchset reworks the commitlog logic to better handle conditions in which we are getting requests faster than the disk can handle. It does this by building a wall around the commitlog and only allowing allocations to proceed when we are under the desired memory threshold. The main advantage of that is that we can now easily set the commitlog to work at disk speed, more or less allowing an "one byte in for each byte out" approach instead of depending on the current cycle to finish. As a result, max latencies are greatly reduced. Testing Results =============== To test this, I have ran a workload that times out frequently. That workload use 10 threads to write 100 partitions (to isolate from the effects of the memtable introduced latencies) in a loop and each partition is 2MB in size. After 10 minutes running this load, we are left with the following percentiles: latency mean : 51.9 [WRITE:51.9] latency median : 9.8 [WRITE:9.8] latency 95th percentile : 125.6 [WRITE:125.6] latency 99th percentile : 1184.0 [WRITE:1184.0] latency 99.9th percentile : 1991.2 [WRITE:1991.2] latency max : 2338.2 [WRITE:2338.2] After this patch: latency mean : 54.9 [WRITE:54.9] latency median : 43.5 [WRITE:43.5] latency 95th percentile : 126.9 [WRITE:126.9] latency 99th percentile : 253.9 [WRITE:253.9] latency 99.9th percentile : 364.6 [WRITE:364.6] latency max : 471.4 [WRITE:471.4] I have run this with larger sizes as well, and it generally performs much better than the baseline version. For sizes up to 5MB, I have seen no timeouts in my setup. After that, I see some timeouts. Buffer splitting is expected to make this better. Aside from performance testing, this was also tested with batch and periodic mode for various requests sizes."	2016-10-20 16:44:39 +03:00
Glauber Costa	d5618c6ace	commitlog: add total_operations type for requests_blocked_memory Current tracker for pending allocations is a queue_size GAUGE. Add a total_operations version so we have more insight on what's going on. It will be called requests_blocked_memory for consistency with other subsystems that track similar things. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-20 09:25:38 -04:00
Avi Kivity	db2f5e6be1	blocktune: wire up blocktune on startup Message-Id: <1476357027-15014-3-git-send-email-avi@scylladb.com>	2016-10-20 13:24:05 +03:00
Avi Kivity	098d02ad1a	scylla-blocktune: introduce scylla-blocktune is a script that parses scylla.yaml and tunes the data file and commitlog directories it references. Tuning includes: - set the I/O scheduler to noop - disable merging - tune dependent devices (like RAID members) Message-Id: <1476357027-15014-2-git-send-email-avi@scylladb.com>	2016-10-20 13:24:05 +03:00
Avi Kivity	fad34eef6c	scylla_raid_setup: don't mess with read-ahead It doesn't affect O_DIRECT reads, and it's not persistent. Message-Id: <1476269082-2473-2-git-send-email-avi@scylladb.com>	2016-10-20 13:23:38 +03:00
Avi Kivity	a837da06ef	scylla_raid_setup: increase chunk size The current chunk size of 256 gives a 50% probability of a 128k read or write getting split into two accesses. This reduces efficiency and increases latency. Change the chunk size to 1MB, with a 12% probability of cross-member access. Message-Id: <1476269082-2473-1-git-send-email-avi@scylladb.com>	2016-10-20 13:23:38 +03:00
Takuya ASADA	80e3d8286c	dist/ami: fix incorrect /etc/fstab entry on CentOS7 base image There was incorrect rootfs entry on /etc/fstab: /dev/sda1 / xfs defaults,noatime 1 1 This causes boot error when updated to new kernel. (see: https://github.com/scylladb/scylla/issues/1597#issuecomment-250243187) So replaced the entry to UUID=<uuid> / xfs defaults,noatime 1 1 Also all recent security updates applied. Fixes #1597 Fixes #1707 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1475094957-9464-1-git-send-email-syuu@scylladb.com>	2016-10-20 11:48:24 +03:00
Takuya ASADA	5f602752a5	dist/ubuntu: backport g++-5 from Debian 9(stretch) to Debian 8(jessie) Since Debian 8(jessie) does not provides g++-5, we frequently got compile error because we are using older compiler. To fix the problem, backport g++-5 from Debian 9(stretch). Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1476694318-10640-3-git-send-email-syuu@scylladb.com>	2016-10-20 11:41:02 +03:00
Takuya ASADA	7d67504b56	dist/ubuntu: use VERSION_ID from /etc/os-release instead of 'lsb_release -r' On Debian, lsb_release -r returns the version number something like '8.6'. However, on this script we want to check major version only. Therefore we can use VERSION_ID from /etc/os-release which only contains major version number. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1476694318-10640-2-git-send-email-syuu@scylladb.com>	2016-10-20 11:41:02 +03:00
Avi Kivity	0da2f64cfb	Merge seastsar upstream * seastar ccd8649...69acec1 (2): > app/iotune: add --smp option > rpc: Add missing adjustment of snd_buf::size Fixes #1767. Fixes #1768.	2016-10-20 11:16:40 +03:00
Paweł Dziepak	210a390892	tests: add missing sstable for partition skipping test Commit `7dcd70124a` "tests/sstables: add test for fast forwarding reader" added a test for skipping parts of sstable. Unfortunately, it did not include the sstables it was trying to read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 23:23:49 +01:00
Glauber Costa	1578d7363a	commitlog: rework blocking logic The current incarnation of commitlog establishes a maximum amount of writes that can be in-flight, and blocks new requests after that limit is reached. That is obviously something we must do, but the current approach to it is problematic for two main reasons: 1) It forces the requests that trigger a write to wait on the current write to finish. That is excessive; ideally we would wait for one particular write to finish, not necessarily the current one. That is made worse by the fact that when a write is followed by a flush (happens when we move to a new segment), then we must wait for all writes in that segment to finish. 1) it casts concurrency in terms of writes instead of memory, which makes the aforementioned problem a lot worse: if we have very big buffers in flight and we must wait for them to finish, that can take a long time, often in the order of seconds, causing timeouts. The approach taken by this patch is to replace the _write_semaphore with a request_controller. This data structure will account the amount of memory used by the buffers and set a limit on it. New allocations will be held until we go below that limit, and will be released as soon as this happens. This guarantees that the latencies introduced by this mechanism are spread out a lot better among requests and will keep higher percentile latencies in check. To test this, I have ran a workload that times out frequently. That workload use 10 threads to write 100 partitions (to isolate from the effects of the memtable introduced latencies) in a loop and each partition is 2MB in size. After 10 minutes running this load, we are left with the following percentiles: latency mean : 51.9 [WRITE:51.9] latency median : 9.8 [WRITE:9.8] latency 95th percentile : 125.6 [WRITE:125.6] latency 99th percentile : 1184.0 [WRITE:1184.0] latency 99.9th percentile : 1991.2 [WRITE:1991.2] latency max : 2338.2 [WRITE:2338.2] After this patch: latency mean : 54.9 [WRITE:54.9] latency median : 43.5 [WRITE:43.5] latency 95th percentile : 126.9 [WRITE:126.9] latency 99th percentile : 253.9 [WRITE:253.9] latency 99.9th percentile : 364.6 [WRITE:364.6] latency max : 471.4 [WRITE:471.4] Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:56:36 -04:00
Glauber Costa	aec724bbda	commitlog: factor out code for checking mutation size In a subsequent patch, I'll use this code in a different place. To prepare for that, we move it out as a method. It also fits a lot better inside the segment manager, so move it there. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00

1 2 3 4 5 ...

10609 Commits