scylladb

Author	SHA1	Message	Date
Glauber Costa	aa375cd33d	commitlog: use commitlog priority for replay Right now replay is being issued with the standard seastar priority. The rationale for that at the time is that it is an early event that doesn't really share the disk with anybody. That is largely untrue now that we start compactions on boot. Compactions may fight for bandwidth with the commitlog, and with such low priority the commitlog is guaranteed to lose. Fixes #1856 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-17 14:09:02 -05:00
Glauber Costa	4d3d774757	commitlog: close replay file Replay file is opened, so it should be closed. We're not seeing any problems arising from this, but they may happen. Enabling read ahead in this stream makes them happen immediately. Fix it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-17 12:35:24 -05:00
Calle Wilund	11baf37ab5	commitlog: Prevent exceptions in stream::produce from being set twice Fixes #1775 stream lacks a check "is_open", which is a bummer. We have to both prevent exception propagation and add a flag of our own to make sure exceptions in producer code reaches consumer, and does not simply get lost in the reactor. Message-Id: <1478508817-18854-1-git-send-email-calle@scylladb.com>	2016-11-07 11:41:33 +01:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Glauber Costa	a13c410749	commitlog: cycle based on total size, not on mutation size We calculate two sizes during the allocation: "size", which is the in-segment size of this mutation, and "s", which is that plus the overhead. cycle() must be called with the latter, not the former, as doing otherwise may lead to buffer overflows. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <ccf346d8d0ebb44a1ba9fd069653bab0d7be0a61.1477063157.git.glauber@scylladb.com>	2016-10-21 18:57:41 +03:00
Glauber Costa	d9875784a1	commitlog: do not wait on pending operations for batch mode This was explicitly mentioned in my set as gone in one of the versions. Somehow it came back in the final version - sorry about that. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2a0eba28cd74267d1a1fdcf1aef2901cc74ffc9f.1477059963.git.glauber@scylladb.com>	2016-10-21 17:27:16 +03:00
Glauber Costa	d5618c6ace	commitlog: add total_operations type for requests_blocked_memory Current tracker for pending allocations is a queue_size GAUGE. Add a total_operations version so we have more insight on what's going on. It will be called requests_blocked_memory for consistency with other subsystems that track similar things. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-20 09:25:38 -04:00
Glauber Costa	1578d7363a	commitlog: rework blocking logic The current incarnation of commitlog establishes a maximum amount of writes that can be in-flight, and blocks new requests after that limit is reached. That is obviously something we must do, but the current approach to it is problematic for two main reasons: 1) It forces the requests that trigger a write to wait on the current write to finish. That is excessive; ideally we would wait for one particular write to finish, not necessarily the current one. That is made worse by the fact that when a write is followed by a flush (happens when we move to a new segment), then we must wait for all writes in that segment to finish. 1) it casts concurrency in terms of writes instead of memory, which makes the aforementioned problem a lot worse: if we have very big buffers in flight and we must wait for them to finish, that can take a long time, often in the order of seconds, causing timeouts. The approach taken by this patch is to replace the _write_semaphore with a request_controller. This data structure will account the amount of memory used by the buffers and set a limit on it. New allocations will be held until we go below that limit, and will be released as soon as this happens. This guarantees that the latencies introduced by this mechanism are spread out a lot better among requests and will keep higher percentile latencies in check. To test this, I have ran a workload that times out frequently. That workload use 10 threads to write 100 partitions (to isolate from the effects of the memtable introduced latencies) in a loop and each partition is 2MB in size. After 10 minutes running this load, we are left with the following percentiles: latency mean : 51.9 [WRITE:51.9] latency median : 9.8 [WRITE:9.8] latency 95th percentile : 125.6 [WRITE:125.6] latency 99th percentile : 1184.0 [WRITE:1184.0] latency 99.9th percentile : 1991.2 [WRITE:1991.2] latency max : 2338.2 [WRITE:2338.2] After this patch: latency mean : 54.9 [WRITE:54.9] latency median : 43.5 [WRITE:43.5] latency 95th percentile : 126.9 [WRITE:126.9] latency 99th percentile : 253.9 [WRITE:253.9] latency 99.9th percentile : 364.6 [WRITE:364.6] latency max : 471.4 [WRITE:471.4] Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:56:36 -04:00
Glauber Costa	aec724bbda	commitlog: factor out code for checking mutation size In a subsequent patch, I'll use this code in a different place. To prepare for that, we move it out as a method. It also fits a lot better inside the segment manager, so move it there. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	a50996f376	commitlog: calculate segment-independent size of mutations Goal is to calculate a size that is lesser or equal than the segment-dependent size. This was originally written by Tomasz, and featured in his submission "commitlog: Handle overload more gracefully" Extracted here so it sits clearly in a different patch. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	0b7c9fa17f	commitlog: remove _needed_size It is mostly an optimization, and while it makes sense in this context, it won't soon as we'll stop waiting for the current cycle specifically to finish. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	6214bdeb66	commitlog: move segment_manager constructor outside the class definition We'll do that so we can, in following patches, use static members from the segment. Those are not defined at this point. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Glauber Costa	299877f432	commitlog: add a counter for pending allocations We track the amount of pending allocations but we don't really export it. It will be crucial when we stop tracking pending writes. This patch exports it through a method instead of the totals structure, so we can easily change it. Current code probing pending_allocations (the api code) is also converted to use the public method instead of the totals struct. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-19 13:49:47 -04:00
Gleb Natapov	32989d1e66	Merge seastar upstream * seastar 2b55789...5b7252d (3): > Merge "rpc: serialize large messages into fragmented memory" from Gleb > Merge "Print backtrace on SIGSEGV and SIGABRT" from Tomasz > test_runner: avoid nested optionals Includes patch from Gleb to adapt to seastar changes.	2016-09-28 17:34:16 +03:00
Nadav Har'El	fe1ba753ce	Avoid semaphore's default initial value The fact that Seastar's semaphore has a default initializer of 1 if not explicitly initialized is confusing and unexpected and recently lead to two bugs. So ScyllaDB should not rely on this default behavior, and specify the initial value of each semaphore explicitly. In several cases in the ScyllaDB code, the explict initialization was missing, and this patch adds it. In one case (rate_limiter) I even think the default of 1 was a bit strange, and 0 makes more sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1474530745-23951-1-git-send-email-nyh@scylladb.com>	2016-09-24 19:25:02 +03:00
Calle Wilund	0f9e868839	commitlog: Use exception propagation in flush_queue (for batch) Fixes: #1490 While periodic mode is a all-bets-off crap-shoot as far as knowing if data actually reached disk or not, batch mode is supposed to be somewhat more reliable/deterministic. Thus, if we get an exception writing/flushing the current buffer, we should propagate exceptions to all execution paths involved in this buffer. Thus, adding a muation to commit log in batch, will now, if an error is generated, result in an exception to the caller, which should be interpreted as "data might not have been persisted". The failing segment is then closed, and we happily hope things will get better in the next. Which they probably wont. Missing: registration of some sort of "error-handling policy", similar to origin, which can either kill transports or shut down process. (A reasonable guess is that disk errors in commit log are not gonna be recoverable).	2016-08-03 14:49:43 +00:00
Calle Wilund	14b0fe23c5	commitlog: Ensure we don't end up in a loop when we must wait for alloc Continuation reordering could cause us to repeatedly see the segment-local flag var even though actual write/sync ops are done. Can cause wild recursion without actual delayed continuation -> SOE. Fix by also checking queue status, since this is the wait object.	2016-07-11 07:45:36 +00:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Duarte Nunes	dfbf68cd24	commitlog: Define operator<< in namespace db Needed for compilation with gcc6. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>	2016-06-26 10:05:28 +03:00
Calle Wilund	2b812a392a	commitlog_replayer: Fix calculation of global min pos per shard If a CF does not have any sstables at all, we should treat it as having a replay position of zero. However, since we also must deal with potential re-sharding, we cannot just set shard->uuid->zero initially, because we don't know what shards existed. Go through all CF:s post map-reduce, and for every shard where a CF does not have an RP-mapping (no sstables found), set the global min pos (for shard) to zero. Fixes #1372 Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>	2016-06-21 10:05:05 +03:00
Calle Wilund	7cdea1b889	commitlog: Use flush queue for write/flush ordering, improve batch Using an ordering mechanism better than rw-locks for write/flush means we can wait for pending write in batch mode, and coalesce data from more than one mutation into a chunk. It also means we can wait for a specific read+flush pair (based on file position). Downside is that we will not do parallel writes in batch mode (unless we run out of buffer), which might underutilize the disk bandwidth. Upside is that running in batch mode (i.e. per-write consistency) now has way better bandwidth, and also, at least with high mutation rate, better average latency. Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>	2016-06-20 13:09:16 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	70575699e4	commitlog, sstables: enlarge XFS extent allocation for large files With big rows I see contention in XFS allocations which cause reactor thread to sleep. Commitlog is a main offender, so enlarge extent to commitlog segment size for big files (commitlog and sstable Data files). Message-Id: <20160404110952.GP20957@scylladb.com>	2016-04-04 14:15:00 +03:00
Paweł Dziepak	c8159eca52	commitlog: make sure that segment destructor doesn't throw Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:42:56 +01:00
Avi Kivity	417bcb122d	commitlog: ignore commitlog segments generated by Cassandra-derived tools Cassandra-derived tools (such as sstable2json) may write commitlog segments, that Scylla cannot recognize. Since we now write them with a distinct name, we can recognize the name and ignore these segments, as we know the data they contain is not interesting. Fixes #1112. Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>	2016-03-31 16:01:08 +03:00
Glauber Costa	d536846433	commitlog: initialize sync period with actual sync period commitlog's sync period is initialized as the batch period, and not as the sync period itself as it should be. I've found this by code inspection, but unless I am missing something really fundamental, this seems to be completely wrong. It's been working fine because in our defaults, I have checked that both variables default to the same value. But it seems to me that as long as anyone would change one of them, the behavior wouldn't be as expected. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>	2016-03-29 15:21:02 +03:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Calle Wilund	0c3322befd	commitlog: Ensure segment survives whole flush call Must keep shared pointer alíve. Likewise though, the shared pointer copy in cycle main continuation is not needed. Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>	2016-03-02 18:22:13 +02:00
Calle Wilund	f1c4e3eb3d	commitlog: Clear reserve segments in orphan_all Otherwise they will keep the segment_manager alive (leak). Fixes jenkins ASan errors. Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>	2016-03-02 18:22:09 +02:00
Calle Wilund	a556f665c0	commitlog: Take segment_manager locks first in write/flush While is is formally better to take a local lock first and then first contend for a global, in this case it is arguably better to ensure we get a gate exception synchronously (early) instead of potentially in a continuation. Old version might cause us to do a gate::leave even while never entered. And since we should really only have one active (contending) segment per shard anyway, it should not matter. Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>	2016-03-02 18:22:05 +02:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Calle Wilund	e667dcc3d0	commitlog: Make segment->segment_manager relation shared pointer The segment->segment_manager pointer has, until now, been a raw pointer, which in a way is sensible, since making circular shared pointer relations is in general bad. However, since the code and life cycle of segments has evolved quite a bit since that initial relation was defined, becoming both more and then suddenly, in a sense, less, asynchronous over time, the usage of the relation is in fact more consistent with a shared pointer, in that a segment needs to access its manager to properly do things like write and flush. These two ops in particular depend on accessing the segment manager in a way that might be fine even using raw pointers, if it was not again for that little annoying thing of continuation reordering. So, lets just make the relation a shared pointer, solving the issue of whether the manager is alive when a segment accesses it. If it has been "released" (shut down), the existing mechanisms (gate) will then trigger and prevent any actual _actions_ from taking place. And we don't have to complicate anything else even more. Only "big" change is that we need to explicitly orphan all segments in commitlog destructor (segment_manager is essentially a p-impl). This fixes some spurious crashes in nightly unit tests. Fixes #966. Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>	2016-03-01 16:48:28 +02:00
Paweł Dziepak	dec63eac6e	commitlog: add commitlog entry move constructor Default move constructor and assignment didn't handle reference to mutation (_mutation) properly. Fixes #935. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>	2016-02-29 18:10:15 +02:00
Calle Wilund	dc136a6a1c	commitlog: Fix reserve counter overflow Fixes #482 See code comment. Reserve segment allocation count sum can temporarily overflow due to continuation delay/reordering, if we manage to reach the on_timer code before finally clauses from previous reserve allocation invocation has processed. However, since these are benign overflows (just indicating even more that we don't need to do anything right now) simply capping the count should be fine. Avoids assert in boost irange. Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>	2016-02-29 14:56:24 +02:00
Avi Kivity	efabb1a1d8	commitlog: fix buffer size calculation We were adding bool(buffer), instead of buffer.size(); exposed by making temporary_buffer::operator bool explicit.	2016-02-24 13:38:05 +02:00
Paweł Dziepak	89b75a02d4	commitlog: use IDL-based serialization for entries Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Paweł Dziepak	f548c75200	commitlog: move implementation to *.cc file Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Pekka Enberg	86173fb8cc	db/commitlog: Fix debug log format string in commitlog_replayer::recover() I saw the following Boost format string related warning during commitlog replay: INFO [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log WARN [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed) While inspecting the code, I noticed that one of the error loggers is missing an argument. As I don't know how the original failure triggered, I wasn't able to verify that that was the only one, though. Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>	2016-01-27 13:40:19 +02:00
Calle Wilund	e6b792b2ff	commitlog bugfix: Fix batch mode Last series accidently broke batch mode. With new, fancy, potentitally blocking ways, we need to treat batch mode differently, since in this case, sync should always come _after_ alloc-write. Previous patch caused infinite loop. Broke jenkins. Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>	2016-01-26 17:13:14 +02:00
Glauber Costa	3f94070d4e	use auto&& instead of auto& for priority classes. By Avi's request, who reminds us that auto& is more suited for situations in which we are assigning to the variable in question. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>	2016-01-26 17:00:20 +02:00
Calle Wilund	89dc0f7be3	commitlog: wait for writes (if needed) on new segment as well Also check closed status in allocate, since alloc queue waiting could lead to us re-allocating in a segment that gets closed in between queue enter and us running the continuation. Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>	2016-01-26 15:05:12 +02:00
Calle Wilund	f2c5315d33	commitlog: Add write/flush limits Configured on start (for now - and dummy values at that). When shard write/flush count reaches limit, and incoming ops will queue until previous ones finish. Consequently, if an allocation op forces a write, which blocks, any other incoming allocations will also queue up to provide back pressure.	2016-01-26 10:19:24 +00:00
Calle Wilund	7628a4dfe0	commitlog: Add some feedback/measurement methods Suitable to derive "back pressure" from.	2016-01-26 09:47:14 +00:00
Calle Wilund	4f5bd4b64b	commitlog: split write/flush counters	2016-01-26 09:47:14 +00:00
Calle Wilund	215c8b60bf	commitlog: minor cleanup - remove red squiggles in eclipse	2016-01-26 09:42:26 +00:00
Glauber Costa	b63611e148	mark I/O operations with priority classes After this patch, our I/O operations will be tagged into a specific priority class. The available classes are 5, and were defined in the previous patch: 1) memtable flush 2) commitlog writes 3) streaming mutation 4) SSTable compaction 5) CQL query Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Calle Wilund	59bf54d59a	commitlog_replayer: Modify logging to more match origin * Match origin log messages - Demote per-file printouts to "debug" level. * Print an all-files stat summary for whole replay (begin/summary) - At info level, like origin Prompted by dtest that expects origin log output. Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>	2016-01-19 17:19:52 +02:00

1 2 3 4

163 Commits