scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 08:12:08 +00:00

Author	SHA1	Message	Date
Calle Wilund	b12b65db92	commitlog/replayer: Bugfix: minimum rp broken, and cl reader offset too The previous fix removed the additional insertion of "min rp" per source shard based on whether we had processed existing CF:s or not (i.e. if a CF does not exist as sstable at all, we must tag it as zero-rp, and make whole shard for it start at same zero. This is bad in itself, because it can cause data loss. It does not cause crashing however. But it did uncover another, old old lingering bug, namely the commitlog reader initiating its stream wrongly when reading from an actual offset (i.e. not processing the whole file). We opened the file stream from the file offset, then tried to read the file header and magic number from there -> boom, error. Also, rp-to-file mapping was potentially suboptimal due to using bucket iterator instead of actual range. I.e. three fixes: * Reinstate min position guarding for unencoutered CF:s * Fix stream creating in CL reader * Fix segment map iterator use. v2: * Fix typo Message-Id: <1490611637-12220-1-git-send-email-calle@scylladb.com>	2017-03-28 10:32:28 +02:00
Calle Wilund	c3a510a08d	commitlog_replayer: Do proper const-loopup of min positions for shards Fixes #2173 Per-shard min positions can be unset if we never collected any sstable/truncation info for it, yet replay segments of that id. Wrap the lookups to handle "missing data -> default", which should have been there in the first place. Message-Id: <1490185101-12482-1-git-send-email-calle@scylladb.com>	2017-03-22 17:57:09 +02:00
Calle Wilund	078589c508	commitlog_replayer: Make replay parallel per shard Fixes #2098 Replay previously did all segments in parallel on shard 0, which caused heavy memory load. To reduce this and spread footprint across shards, instead do X segments per shard, sequential per shard. v2: * Fixed whitespace errors Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>	2017-03-15 13:07:17 +02:00
Paweł Dziepak	374c8a56ac	commitlog: avoid copying column_mapping It is safe to copy column_mapping accros shards. Such guarantee comes at the cost of performance. This patch makes commitlog_entry_writer use IDL generated writer to serialise commitlog_entry so that column_mapping is not copied. This also simplifies commitlog_entry itself. Performance difference tested with: perf_simple_query -c4 --write --duration 60 (medians) before after diff write 79434.35 89247.54 +12.3%	2017-02-27 17:05:58 +00:00
Gleb Natapov	2dc56013f8	commitlog: handle cycle() error Do not ignore a future<> retuned by cycle() since it will produce a warning in case of an error. Log it instead. Message-Id: <20170219151811.GN11471@scylladb.com>	2017-02-22 19:15:14 +01:00
Calle Wilund	e20b804a65	commitlog/database: Add "release" method to ensure we free segments On database stop, we do flush memtables and clean up commit log segment usage. However, since we never actually destroy the distributed<database>, we don't actually free the commitlog either, and thus never clear out the remaining (clean) segments. Thus we leave perfectly clean segments on disk. This just adds a "release" method to commitlog, and calls it from database::stop, after flushing CF:s. Message-Id: <1485784950-17387-1-git-send-email-calle@scylladb.com>	2017-02-21 18:17:47 +01:00
Calle Wilund	ff8f82f21c	scylla tls: Add option support for client auth and tls opts Refs #1813 (fixes scylla part) Added require_client_auth and priority_string options to server_encryption_options/client_encryption_options an process them. Allows TLS method/algo specification. Also enabled enforcing known cert authentication for both node-to-node and client communication.	2017-02-06 09:45:09 +00:00
Paweł Dziepak	9f1ebd4f7c	idl/mutation: add counter serialisation logic	2017-02-02 10:35:14 +00:00
Amnon Heiman	45b6070832	Merge seastar upstream * seastar 397685c...c1dbd89 (13): > lowres_clock: drop cache-line alignment for _timer > net/packet: add missing include > Merge "Adding histogram and description support" from Amnon > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&' > Set the option '--server' of tests/tcp_sctp_client to be required > core/memory: Remove superfluous assignment > core/memory: Remove dead code > core/reactor: Use logger instead of cerr > fix inverted logic in overprovision parameter > rpc: fix timeout checking condition > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier Includes treewide change to accomodate rpc changing its timeout clock to lowres_clock. Includes fixup from Amnon: collectd api should use the metrics getters As part of a preperation of the change in the metrics layer, this change the way the collectd api uses the metrics value to use the getters instead of calling the member directly. This will be important when the internal implementation will changed from union to variant. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>	2017-02-01 14:39:08 +02:00
Tomasz Grabiec	634761dbba	commitlog: Fix default limit for size on disk The per-node limit will be total memory divided by number of shards instead of just total memory. For example, when Scylla is started with -c16 -m16G, the commit log will induce flushes on given shard when unflushed data exceeds on that shard 62MB instead of 1GB. Fixes #2046. Message-Id: <1485874534-10939-1-git-send-email-tgrabiec@scylladb.com>	2017-01-31 17:12:59 +02:00
Vlad Zolotarov	dcdd98ccc1	db::commitlog::commitlog: move collectd counters registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Tomasz Grabiec	059a1a4f22	db: Fix commitlog replay to not drop cell mutations with older schema column_mapping is not safe to access across shards, because data_type is not safe to access. One of the manifestation of this is that abstract_type::is_value_compatible_with() always fails if the two types belong to different shards. During replay, column_mapping lives on the replaying shard, and is used by converting_mutation_partition_applier against the schema on the target shard. Since types in the mapping will be considered incompatible with types in the schema, all cells will be dropped. Fix by using column_mapping in a safe way, by copying it to the target shard if necessary. Each shard maintains its own cache of column mappings. Fixes #1924. Message-Id: <1481310463-13868-1-git-send-email-tgrabiec@scylladb.com>	2016-12-13 12:19:32 +02:00
Glauber Costa	9b5e6d6bd8	commitlog: correctly report requests blocked The semaphore future may be unavailable for many reasons. Specifically, if the task quota is depleted right between sem.wait() and the .then() clause in get_units() the resulting future won't be available. That is particularly visible if we decrease the task quota, since those events will be more frequent: we can in those cases clearly see this counter going up, even though there aren't more requests pending than usual. This patch improves the situation by replacing that check. We now verify whether or not there are waiters in the semaphore. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com>	2016-12-09 15:02:26 +02:00
Tomasz Grabiec	f7197dabf8	commitlog: Fix replay to not delete dirty segments The problem is that replay will unlink any segments which were on disk at the time the replay starts. However, some of those segments may have been created by current node since the boot. If a segment is part of reserve for example, it will be unlinked by replay, but we will still use that segment to log mutations. Those mutations will not be visible to replay after a crash though. The fix is to record preexisting segents before any new segments will have a chance to be created and use that as the replay list. Introduced in `abe7358767`. dtest failure: commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>	2016-12-07 15:54:47 +02:00
Asias He	00d7a35949	utils: Put crc32 under utils namespace It conflicts with crc in zlib Message-Id: <1480918984-4117-2-git-send-email-asias@scylladb.com>	2016-12-05 11:48:29 +02:00
Glauber Costa	99a5a77234	prevent commitlog replay position reordering during reserve refill When requests hit the commitlog, each of them will be assigned a replay position, which we expect to be ordered. If reorders happen, the request will be discarded and re-applied. Although this is supposed to be rare, it does increase our latencies, specially when big requests are involved. Processing big requests is expensive and if we have to do it twice that adds to the cost. The commitlog is supposed to issue replay positions in order, and it coudl be that the code that adds them to the memtables will reorder them. However, there is one instance in which the commitlog will not keep its side of the bargain. That happens when the reserve is exhausted, and we are allocating a segment directly at the same time the reserve is being replenished. The following sequence of events with its deferring points will ilustrate it: on_timer: return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) { At this point, the segment id is already allocated. new_segment(): if (_reserve_segments.empty()) { [ ... ] return allocate_segment(true).then ... At this point, we have a new segment that has an id that is higher than the previous id allocated. Then we resume the execution from the deferring point in on_timer(): i = _reserve_segments.emplace(i, std::move(s)); The next time we need to allocate a segment, we'll pick it from the reserve. But the segment in the reserve has an id that is lower than the id that we have already used. Reorders are bad, but this one is particularly bad: because the reorder happens with the segment id side of the replay position, that means that every request that falls into that segment will have to be reinserted. This bug can be a bit tricky to reproduce. To make it more common, we can artificially add a sleep() fiber after the allocate_segment(false) in on_timer(). If we do that, we'll see a sea of reinsertions going on in the logs (if dblog is set to debug). Applying this patch (keeping the sleep) will make them all disappear. We do this by rewriting the reserve logic, so that the segments always come from the reserve. If we draw from a single pool all the time, there is no chance of reordering happening. To make that more amenable, we'll have the reserve filler always running in the background and take it out of the timer code. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com>	2016-12-01 13:20:46 +01:00
Tomasz Grabiec	31645e2c4a	commitlog: Allow allocations to be timed out	2016-11-29 16:40:58 +01:00
Glauber Costa	353a4cd2d4	commitlog: sync segments before acquiring semaphore on shutdown. Sync all segments before acquiring the semaphore, otherwise waiting may have to wait for the timer to kick in and push them down. Note that we can't guarantee that no other requests were executed in the mean time, so we have to sync again. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <aea019fe49820acce5d2b55dd5ec31e975b3436c.1480388674.git.glauber@scylladb.com>	2016-11-29 11:07:28 +02:00
Tomasz Grabiec	96c7764458	Revert "prevent commitlog replay position reordering during reserve refill" This reverts commit `0e9b75d406`. commitlog_test fails with this: Running 14 test cases... ERROR 2016-11-28 20:48:00,565 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:00,578 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:10,591 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:20,601 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen tests/commitlog_test.cc(203): fatal error in "test_commitlog_discard_completed_segments": critical check dn <= nn failed ERROR 2016-11-28 20:48:20,645 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:20,837 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen WARN 2016-11-28 20:48:20,838 [shard 0] commitlog - Exception in segment reservation: std::system_error (error system:2, No such file or directory) ERROR 2016-11-28 20:48:20,952 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:31,064 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:31,083 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:31,098 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:31,111 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen ERROR 2016-11-28 20:48:31,113 [shard 0] commitlog - Segment reserve is full! Ignoring and trying to continue, but shouldn't happen WARN 2016-11-28 20:48:31,116 [shard 0] commitlog - Could not allocate 16388 k bytes output buffer (16388 k required) *** 1 failure detected in test suite "tests/commitlog_test.cc" WARN 2016-11-28 20:48:31,117 [shard 0] commitlog - Exception in segment reservation: std::system_error (error system:2, No such file or directory)	2016-11-28 20:52:13 +01:00
Glauber Costa	0e9b75d406	prevent commitlog replay position reordering during reserve refill When requests hit the commitlog, each of them will be assigned a replay position, which we expect to be ordered. If reorders happen, the request will be discarded and re-applied. Although this is supposed to be rare, it does increase our latencies, specially when big requests are involved. Processing big requests is expensive and if we have to do it twice that adds to the cost. The commitlog is supposed to issue replay positions in order, and it coudl be that the code that adds them to the memtables will reorder them. However, there is one instance in which the commitlog will not keep its side of the bargain. That happens when the reserve is exhausted, and we are allocating a segment directly at the same time the reserve is being replenished. The following sequence of events with its deferring points will ilustrate it: on_timer: return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) { At this point, the segment id is already allocated. new_segment(): if (_reserve_segments.empty()) { [ ... ] return allocate_segment(true).then ... At this point, we have a new segment that has an id that is higher than the previous id allocated. Then we resume the execution from the deferring point in on_timer(): i = _reserve_segments.emplace(i, std::move(s)); The next time we need to allocate a segment, we'll pick it from the reserve. But the segment in the reserve has an id that is lower than the id that we have already used. Reorders are bad, but this one is particularly bad: because the reorder happens with the segment id side of the replay position, that means that every request that falls into that segment will have to be reinserted. This bug can be a bit tricky to reproduce. To make it more common, we can artificially add a sleep() fiber after the allocate_segment(false) in on_timer(). If we do that, we'll see a sea of reinsertions going on in the logs (if dblog is set to debug). Applying this patch (keeping the sleep) will make them all disappear. We do this by rewriting the reserve logic, so that the segments always come from the reserve. If we draw from a single pool all the time, there is no chance of reordering happening. To make that more amenable, we'll have the reserve filler always running in the background and take it out of the timer code. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2606b97df39997bcf3af84a23adf17e094ffb0b8.1480107174.git.glauber@scylladb.com>	2016-11-28 19:26:26 +01:00
Glauber Costa	0b8b5abf16	commitlog: acquire semaphore earlier Recently we have changed our shutdown strategy to wait for the _request_controller semaphore to make sure no other allocations are in-flight. That was done to fix an actual issue. The problem is that this wasn't done early enough. We acquire the semaphore after we have already marked ourselves as _shutdown and released the timer. That means that if there is an allocation in flight that needs to use a new segment, it will never finish - and we'll therefore neve acquire the semaphore. Fix it by acquiring it first. At this point the allocations will all be done and gone, and then we can shutdown everything else. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <5c2a2f20e3832b6ea37d6541897519a9307294ed.1479765782.git.glauber@scylladb.com>	2016-11-21 22:19:32 +00:00
Glauber Costa	21c1e2b48c	commitlog: wait for pending allocations to finish before closing gate. allocations may enter the gate, so it would be wise for us to wait for them. Fixes #1860 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <53cd6996c1cbd8b38bab3b03604bd11e5c20beda.1479650012.git.glauber@scylladb.com>	2016-11-20 19:45:33 +02:00
Glauber Costa	60b7d35f15	commitlog: close file after read, and not at stop There are other code paths that may interrupt the read in the middle and bypass stop. It's safer this way. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <8c32ca2777ce2f44462d141fd582848ac7cf832d.1479477360.git.glauber@scylladb.com>	2016-11-18 14:09:33 +00:00
Glauber Costa	59a41cf7f1	commitlog: use read ahead for replay requests Aside from putting the requests in the commitlog class, read ahead will help us going through the file faster. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-17 14:09:54 -05:00
Glauber Costa	aa375cd33d	commitlog: use commitlog priority for replay Right now replay is being issued with the standard seastar priority. The rationale for that at the time is that it is an early event that doesn't really share the disk with anybody. That is largely untrue now that we start compactions on boot. Compactions may fight for bandwidth with the commitlog, and with such low priority the commitlog is guaranteed to lose. Fixes #1856 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-17 14:09:02 -05:00
Glauber Costa	4d3d774757	commitlog: close replay file Replay file is opened, so it should be closed. We're not seeing any problems arising from this, but they may happen. Enabling read ahead in this stream makes them happen immediately. Fix it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-17 12:35:24 -05:00
Calle Wilund	11baf37ab5	commitlog: Prevent exceptions in stream::produce from being set twice Fixes #1775 stream lacks a check "is_open", which is a bummer. We have to both prevent exception propagation and add a flag of our own to make sure exceptions in producer code reaches consumer, and does not simply get lost in the reactor. Message-Id: <1478508817-18854-1-git-send-email-calle@scylladb.com>	2016-11-07 11:41:33 +01:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Glauber Costa	a13c410749	commitlog: cycle based on total size, not on mutation size We calculate two sizes during the allocation: "size", which is the in-segment size of this mutation, and "s", which is that plus the overhead. cycle() must be called with the latter, not the former, as doing otherwise may lead to buffer overflows. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <ccf346d8d0ebb44a1ba9fd069653bab0d7be0a61.1477063157.git.glauber@scylladb.com>	2016-10-21 18:57:41 +03:00
Glauber Costa	d9875784a1	commitlog: do not wait on pending operations for batch mode This was explicitly mentioned in my set as gone in one of the versions. Somehow it came back in the final version - sorry about that. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2a0eba28cd74267d1a1fdcf1aef2901cc74ffc9f.1477059963.git.glauber@scylladb.com>	2016-10-21 17:27:16 +03:00
Glauber Costa	d5618c6ace	commitlog: add total_operations type for requests_blocked_memory Current tracker for pending allocations is a queue_size GAUGE. Add a total_operations version so we have more insight on what's going on. It will be called requests_blocked_memory for consistency with other subsystems that track similar things. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-20 09:25:38 -04:00
Glauber Costa	1578d7363a	commitlog: rework blocking logic The current incarnation of commitlog establishes a maximum amount of writes that can be in-flight, and blocks new requests after that limit is reached. That is obviously something we must do, but the current approach to it is problematic for two main reasons: 1) It forces the requests that trigger a write to wait on the current write to finish. That is excessive; ideally we would wait for one particular write to finish, not necessarily the current one. That is made worse by the fact that when a write is followed by a flush (happens when we move to a new segment), then we must wait for all writes in that segment to finish. 1) it casts concurrency in terms of writes instead of memory, which makes the aforementioned problem a lot worse: if we have very big buffers in flight and we must wait for them to finish, that can take a long time, often in the order of seconds, causing timeouts. The approach taken by this patch is to replace the _write_semaphore with a request_controller. This data structure will account the amount of memory used by the buffers and set a limit on it. New allocations will be held until we go below that limit, and will be released as soon as this happens. This guarantees that the latencies introduced by this mechanism are spread out a lot better among requests and will keep higher percentile latencies in check. To test this, I have ran a workload that times out frequently. That workload use 10 threads to write 100 partitions (to isolate from the effects of the memtable introduced latencies) in a loop and each partition is 2MB in size. After 10 minutes running this load, we are left with the following percentiles: latency mean : 51.9 [WRITE:51.9] latency median : 9.8 [WRITE:9.8] latency 95th percentile : 125.6 [WRITE:125.6] latency 99th percentile : 1184.0 [WRITE:1184.0] latency 99.9th percentile : 1991.2 [WRITE:1991.2] latency max : 2338.2 [WRITE:2338.2] After this patch: latency mean : 54.9 [WRITE:54.9] latency median : 43.5 [WRITE:43.5] latency 95th percentile : 126.9 [WRITE:126.9] latency 99th percentile : 253.9 [WRITE:253.9] latency 99.9th percentile : 364.6 [WRITE:364.6] latency max : 471.4 [WRITE:471.4] Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:56:36 -04:00
Glauber Costa	aec724bbda	commitlog: factor out code for checking mutation size In a subsequent patch, I'll use this code in a different place. To prepare for that, we move it out as a method. It also fits a lot better inside the segment manager, so move it there. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	a50996f376	commitlog: calculate segment-independent size of mutations Goal is to calculate a size that is lesser or equal than the segment-dependent size. This was originally written by Tomasz, and featured in his submission "commitlog: Handle overload more gracefully" Extracted here so it sits clearly in a different patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	0b7c9fa17f	commitlog: remove _needed_size It is mostly an optimization, and while it makes sense in this context, it won't soon as we'll stop waiting for the current cycle specifically to finish. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	6214bdeb66	commitlog: move segment_manager constructor outside the class definition We'll do that so we can, in following patches, use static members from the segment. Those are not defined at this point. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	299877f432	commitlog: add a counter for pending allocations We track the amount of pending allocations but we don't really export it. It will be crucial when we stop tracking pending writes. This patch exports it through a method instead of the totals structure, so we can easily change it. Current code probing pending_allocations (the api code) is also converted to use the public method instead of the totals struct. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Gleb Natapov	32989d1e66	Merge seastar upstream * seastar 2b55789...5b7252d (3): > Merge "rpc: serialize large messages into fragmented memory" from Gleb > Merge "Print backtrace on SIGSEGV and SIGABRT" from Tomasz > test_runner: avoid nested optionals Includes patch from Gleb to adapt to seastar changes.	2016-09-28 17:34:16 +03:00
Nadav Har'El	fe1ba753ce	Avoid semaphore's default initial value The fact that Seastar's semaphore has a default initializer of 1 if not explicitly initialized is confusing and unexpected and recently lead to two bugs. So ScyllaDB should not rely on this default behavior, and specify the initial value of each semaphore explicitly. In several cases in the ScyllaDB code, the explict initialization was missing, and this patch adds it. In one case (rate_limiter) I even think the default of 1 was a bit strange, and 0 makes more sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1474530745-23951-1-git-send-email-nyh@scylladb.com>	2016-09-24 19:25:02 +03:00
Calle Wilund	0f9e868839	commitlog: Use exception propagation in flush_queue (for batch) Fixes: #1490 While periodic mode is a all-bets-off crap-shoot as far as knowing if data actually reached disk or not, batch mode is supposed to be somewhat more reliable/deterministic. Thus, if we get an exception writing/flushing the current buffer, we should propagate exceptions to all execution paths involved in this buffer. Thus, adding a muation to commit log in batch, will now, if an error is generated, result in an exception to the caller, which should be interpreted as "data might not have been persisted". The failing segment is then closed, and we happily hope things will get better in the next. Which they probably wont. Missing: registration of some sort of "error-handling policy", similar to origin, which can either kill transports or shut down process. (A reasonable guess is that disk errors in commit log are not gonna be recoverable).	2016-08-03 14:49:43 +00:00
Calle Wilund	14b0fe23c5	commitlog: Ensure we don't end up in a loop when we must wait for alloc Continuation reordering could cause us to repeatedly see the segment-local flag var even though actual write/sync ops are done. Can cause wild recursion without actual delayed continuation -> SOE. Fix by also checking queue status, since this is the wait object.	2016-07-11 07:45:36 +00:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Duarte Nunes	dfbf68cd24	commitlog: Define operator<< in namespace db Needed for compilation with gcc6. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>	2016-06-26 10:05:28 +03:00
Calle Wilund	2b812a392a	commitlog_replayer: Fix calculation of global min pos per shard If a CF does not have any sstables at all, we should treat it as having a replay position of zero. However, since we also must deal with potential re-sharding, we cannot just set shard->uuid->zero initially, because we don't know what shards existed. Go through all CF:s post map-reduce, and for every shard where a CF does not have an RP-mapping (no sstables found), set the global min pos (for shard) to zero. Fixes #1372 Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>	2016-06-21 10:05:05 +03:00
Calle Wilund	7cdea1b889	commitlog: Use flush queue for write/flush ordering, improve batch Using an ordering mechanism better than rw-locks for write/flush means we can wait for pending write in batch mode, and coalesce data from more than one mutation into a chunk. It also means we can wait for a specific read+flush pair (based on file position). Downside is that we will not do parallel writes in batch mode (unless we run out of buffer), which might underutilize the disk bandwidth. Upside is that running in batch mode (i.e. per-write consistency) now has way better bandwidth, and also, at least with high mutation rate, better average latency. Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>	2016-06-20 13:09:16 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	70575699e4	commitlog, sstables: enlarge XFS extent allocation for large files With big rows I see contention in XFS allocations which cause reactor thread to sleep. Commitlog is a main offender, so enlarge extent to commitlog segment size for big files (commitlog and sstable Data files). Message-Id: <20160404110952.GP20957@scylladb.com>	2016-04-04 14:15:00 +03:00
Paweł Dziepak	c8159eca52	commitlog: make sure that segment destructor doesn't throw Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:42:56 +01:00

1 2 3 4

187 Commits