scylladb

Author	SHA1	Message	Date
Gleb Natapov	60a851d3a5	commitlog: always flush segments atomically with writing db::commitlog::segment::batch_cycle() assumes that after a write for a certain position completes (as reported by _pending_ops.wait_for_pending()) it will also be flushed, but this is true only if writing and flushing are atomic wrt _pending_ops lock. It usually is unless flush_after is set to false when cycle() is called. In this case only writing is done under the lock. This is exactly what happens when a segment is closed. Flush is skipped because zero header is added after the last entry and then flushed, but this optimization breaks batch_cycle() assumption. Fix it by flushing after the write atomically even if a segment is being closed. Fixes #5496 Message-Id: <20191224115814.GA6398@scylladb.com>	2019-12-24 14:52:23 +02:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Rafael Ávila de Espíndola	6160b9017d	commitlog: make sure a file is closed If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Avi Kivity	623071020e	commitlog: change variadic stream in read_log_file to future<struct> Since seastar::streams are based on future/promise, variadic streams suffer the same fate as variadic futures - deprecation and eventual removal. This patch therefore replaces a variadic stream in commitlog::read_log_file() with a non-variadic stream, via a helper struct. Tests: unit (dev)	2019-10-29 19:25:12 +01:00
Rafael Ávila de Espíndola	4d0916a094	commitlog: Handle gate_closed_exception Before this patch, if the _gate is closed, with_gate throws and forward_to is not executed. When the promise<> p is destroyed it marks its _task as a broken promise. What happens next depends on the branch. On master, we warn when the shared_future is destroyed, so this patch changes the warning from a broken_promise to a gate closed. On 3.1, we warn when the promises in shared_future::_peers are destroyed since they no longer have a future attached: The future that was attached was the "auto f" just before the with_gate call, and it is destroyed when with_gate throws. The net result is that this patch fixes the warning in 3.1. I will send a patch to seastar to make the warning on master more consistent with the warning in 3.1. Fixes #4394 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190917211915.117252-1-espindola@scylladb.com>	2019-09-17 23:41:21 +02:00
Botond Dénes	136fc856c5	treewide: silence discarded future warnings for questionable discards This patches silences the remaining discarded future warnings, those where it cannot be determined with reasonable confidence that this was indeed the actual intent of the author, or that the discarding of the future could lead to problems. For all those places a FIXME is added, with the intent that these will be soon followed-up with an actual fix. I deliberately haven't fixed any of these, even if the fix seems trivial. It is too easy to overlook a bad fix mixed in with so many mechanical changes.	2019-08-26 19:28:43 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Rafael Ávila de Espíndola	636e2470b1	Always close commitlog files We were using segment::_closed to decide whether _file was already closed. Unfortunately they are not exactly the same thing. As far as I understand it, segments can be closed and reused without actually closing the file. Found with a seastar patch that asserts on destroying an open append_challenged_posix_file_impl. Fixes #4745. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190721171332.7995-1-espindola@scylladb.com>	2019-07-22 10:08:57 +03:00
Calle Wilund	f317d7a975	commitlog: Simplify commitlog extension iteration Fixes #4640 Iterating extensions in commitlog.cc should mimic that in sstables.cc, i.e. a simple future-chain. Should also use same order for read and write open, as we should preserve transformation stack order. Message-Id: <20190702150028.18042-1-calle@scylladb.com>	2019-07-02 18:37:44 +03:00
Calle Wilund	1e37e1d40c	commitlog: Add optional use of O_DSYNC mode Refs #3929 Optionally enables O_DSYNC mode for segment files, and when enabled ignores actual flushing and just barriers any ongoing writes. Iff using O_DSYNC mode, we will not only truncate the file to max size, but also do an actual initial write of zero:s to it, since XFS (intended target) has observably less good behaviour on non-physical file blocks. Once written (and maybe recycled) we should have rather satisfying throughput on writes. Note that the O_DSYNC behaviour is hidden behind a default disabled option. While user should probably seldom worry about this, we should add some sort of logic i main/init that unless specified by user, evaluates the commitlog disk and sets this to true if it is using XFS and looks ok. This is because using O_DSYNC on things like EXT4 etc has quite horrible performance. All above statements about performance and O_DSYNC behaviour are based on a sampling of benchmark results (modified fsqual) on a statistically non-ssignificant selection of disks. However, at least there the observed behaviour is a rather large difference between ::fallocate:ed disk area vs. actually written using O_DSYNC on XFS, and O_DSYNC on EXT4. Note also that measurements on O_DSYNC vs. no O_DSYNC does not take into account the wall-clock time of doing manual disk flush. This is intentionally ignored, since in the commitlog case, at least using periodic mode, flushes are relatively rare. Message-Id: <20190520120331.10229-1-calle@scylladb.com>	2019-05-20 15:10:48 +03:00
Benny Halevy	d9136f96f3	commitlog: descriptor: skip leading path from filename std::regex_match of the leading path may run out of stack with long paths in debug build. Using rfind instead to lookup the last '/' in in pathname and skip it if found. Fixes #4464 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>	2019-05-05 17:51:56 +03:00
Vlad Zolotarov	1cba4a54bb	commitlog: introduce a segment_error Introduce a common base class for all errors that indicate that the current segment has "issues". This allows a laconic "catch" clause for all such errors. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:31:13 -04:00
Glauber Costa	043d102ab6	commitlog: fix typo in error message maxiumum -> maximum Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190326191108.7573-1-glauber@scylladb.com>	2019-03-26 21:32:56 +02:00
Avi Kivity	da0a25859b	Merge "Improvements to commitlog logs" from Paweł " This series contains minor improvements to commitlog log messages that have helped investigating #4231, but are not specific to that bug. " * tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla: commitlog: use consistent chunk offsets in logs commitlog: provide more information in logs commitlog: remove unnecessary comment	2019-03-04 14:52:46 +02:00
Paweł Dziepak	00b33de25c	commitlog: use consistent chunk offsets in logs Logs in commitlog writer use offset in the file of the chunk header to identify chunks. However, the replayer is using offset after the header for the same purpose. This causes unnecessary confusion suggesting that the replayer is reading at the wrong position. This patch changes the replayer so that it reports chunk header offsets.	2019-03-04 12:15:50 +00:00
Paweł Dziepak	813b00a1a6	commitlog: provide more information in logs This commits adds some more information to the logs. Motivated, by experiences with investigating #4231. * size of each write * position of each write * log message for final write	2019-03-04 12:15:50 +00:00
Paweł Dziepak	1a657e9c5f	commitlog: remove unnecessary comment	2019-03-04 12:15:50 +00:00
Paweł Dziepak	434023425d	commitlog: write the correct buffer size Commitlog files contain multiple chunks. Each chunk starts as a single (possibly, fragmented buffer). The size of that buffer in memory may be larger than the size in the file. cycle() was incorrectly using the in-memory size to write the whole buffer to the file. That sometimes caused data corruption, since a smaller on-file size was used to compute the offset of the next chunk and there could be multiple chunk writes happening at the same time. This patch solves the issue by ensuring that only the actual on-file size of the chunk is written.	2019-03-04 10:25:48 +00:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	0e50a9bc6d	db/commitlog: Implement skip in terms of input buffer skipping This simplifies the code and allows to get rid of the overload of advance() taking a temporary_buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Calle Wilund	55f10ffc43	commitlog: Recycle used segments instead of delete + new file Refs #3929 When deleting a segment, IFF we have not yet filled up all reserves, instead of actually deleting the file, put it on a "recycle" list. Next segment allocation will instead of creating a new one simply rename the segment and reuse the file and its allocated space. We rename the file twice: Once on adding to recycle list, with special prefix so we don't mix up actual replayable segments and these. Second when we actually re-use the file (also to ensure consecutive names). Note that we limit the amount of recyclables, so a really stressed application which somehow fills up the replenish queue might cause us to still drop the segments. Could skip this but risk getting to many files on disk. Replay should be safe, since all entries are guarded by CRC based on the file ID (i.e. file name). Thus replaying a recycled segment will simply cause a CRC error in the main header and be ignored (see previous patch). Segments that are fully synced will have terminating zero-header (see previous patch) so we know when to stop processing a recycled file. If a file is the result of a mid-write crash, we will generate a CRC processing error as "normally" in this case, when hitting partially written block or coming to an old/new chunk boundary. v2: * Sync dir on rename * auto -> const sstring& * Allow recycling files as long as we're within disk space limits v3: * Use special names for files waiting for reuse	2018-12-10 09:09:07 +00:00
Calle Wilund	b13b6ef6a0	commitlog: Terminate all segments with a zero chunk Writes a final chunk header of zero to the file on close, to mark end-of-segment. This allows us to gracefully stop replay processing of a segment file even if it was not zeroed from the beginning (maybe recycled - hint hint).	2018-12-10 09:09:07 +00:00
Calle Wilund	b35af84599	commitlog_replay: Enforce file name based id matching When reading the header chunk of a commitlog file, check the stored id value against the id derived from the file name, and ignore if mismatched. This is a prerequisite for re-using renamed commitlog files, as we can then fail-fast should one such be left on disk, instead of trying to replay it. We also check said id via the CRC check for each chunk parsed. If we find a chunk with mismatched id, we will get a CRC error for the chunk, and replay will terminate (albeit not gracefully).	2018-12-10 09:09:07 +00:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Paweł Dziepak	4469f76e7c	commitlog: switch to fragmented buffers So far commitlog was using contiguous buffers for storing the data that is about to be written to disk. It was able to coalesce small writes so that multiple small mutations would use the same buffer, but if a muation was large the commitlog would attempt to allocate a single, appropriately large buffer. This excessively stresses the memory allocator and may cause memory fragmentation to become an issue. The solution is to use fixed-size buffers of 128 kB, which is the standard buffer size in Scylla and keep large values fragmented.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	7c1add6769	commitlog: drop buffer pools Buffer pools were added in `7191a130bb` "Commitlog: recycle buffers to reduce fragmentation." They introduce a lot of complexity and will become unnecessary once the code is switched to use fixed-size 128kB buffers.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	9fee8b8d76	commitlog: drop recovery from bad alloc If a node cannot allocate a 128 kB it is already in a very bad shape, so there isn't much value in trying to recover by attempting smaller allocations and it just adds more complexity to the segment allocation. It actually may be better to let some requests fail and give the node a chance to recover rather than trying to use every last byte of free memory and end up with bad_alloc in a noexcept context.	2018-09-18 17:22:59 +01:00
Paweł Dziepak	fe48aaae46	commitlog: use memory_output_stream memory_output_stream deals with all required pointer arithmetic and allows easy transition to fragmented buffers.	2018-09-18 17:22:59 +01:00
Gleb Natapov	cc47f6c69d	Provide available memory size to commitlog during creation	2018-06-11 15:34:13 +03:00
Calle Wilund	62c3b4c429	commitlog: Ensure file objects are closed before object free Fixes #3446 Previously, only shutdown-synced objects where actually closed, which is wrong. This introduces yet another queue, processed together with the deletion objects, which ensures we explicitly close all objects that have been discarded. Message-Id: <20180521140456.32100-1-calle@scylladb.com>	2018-05-22 14:52:06 +03:00
Glauber Costa	596a525950	commitlog: don't move pointer to segment We are currently moving the pointer we acquired to the segment inside the lambda in which we'll handle the cycle. The problem is, we also use that same pointer inside the exception handler. If an exception happens we'll access it and we'll crash. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180518125820.10726-1-glauber@scylladb.com>	2018-05-18 17:25:18 +02:00
Calle Wilund	bb1a2c6c2e	db::commitlog: Add commitlog/hints file io extension To allow on-disk data to be augumented.	2018-03-26 11:58:27 +00:00
Calle Wilund	2bc98aebaf	db::commitlog: Do segment delete async + force replay delete go via CL Refs #2858 Push segement files to be deleted to a pending list, and process at intervals or flush-requests (or shutdown). Note that we do _not_ indescrimenately do deletes in non-anchored tasks, because we need to guarantee that finshed segments are fully deleted and gone on CL shutdown, not to be mistaken for replayables. Also make sure we delete segments replayed via commitlog call, so IFF we add metadata processing for CL, we can clear it out.	2018-03-26 11:58:27 +00:00
Duarte Nunes	f665f1ab97	db/commitlog: Close the segment file Operations on a segment's underlying append_challenged_posix_file_impl, such as truncate(), schedule asynchronous operations when they are executed, which capture the file object. To synchronize with them and prevent use-after-free, we need to call close() and only delete the segment and file when the returned future resolves. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180216235754.24257-1-duarte@scylladb.com>	2018-02-19 13:09:41 +00:00
Duarte Nunes	7004f6c7ff	db/commitlog: Actually prevent new requests during shutdown When shutting down the commitlog we try to block all new requests by acquiring all available resources. We were, however, letting go of the semaphore permits too early, before closing the gate and shutting down the active segments. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180216234826.24111-1-duarte@scylladb.com>	2018-02-19 13:09:26 +00:00
Glauber Costa	80c4a211d8	consolidate timeout_clock At the moment, various different subsystems use their different ideas of what a timeout_clock is. This makes it a bit harder to pass timeouts between them because although most are actually a lowres_clock, that is not guaranteed to be the case. As a matter of fact, the timeout for restricted reads is expressed as nanoseconds, which is not a valid duration in the lowres_clock. As a first step towards fixing this, we'll consolidate all of the existing timeout_clocks in one, now called db::timeout_clock. Other things that tend to be expressed in terms of that clock--like the fact that the maximum time_point means no timeout and a semaphore that wait()s with that resolution are also moved to the common header. In the upcoming patch we will fix the restricted reader timeouts to be expressed in terms of the new timeout_clock. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Vlad Zolotarov	af70c0a709	db::commitlog: truncate segments to their actual sizes during shutdown Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:48 -05:00
Vlad Zolotarov	033af6c950	db::commitlog: allow defining a metrics category name Add a new field to db::commitlog::config that would define the metrics category name. If not given - metrics are not going to be registered. Set it to "commitlog" in db::commitlog::config(const db::config&). Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Vlad Zolotarov	878d58d23a	db/commitlog/commitlog::descriptor: add a filename_prefix parameter This parameter is used when creating a new segment. It's default value is a descriptor::FILENAME_PREFIX. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Vlad Zolotarov	719b1fb24f	db::commitlog::descriptor::descriptor(filename): pass a filename as a const ref Avoid not needed copy by passing a file name as a reference. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Michael Munday	5158b3f484	utils::crc: introduce process_le/be(T) methods Replace the oblique process(T) overloads for integer types with explicit process_le/be(T) methods that would interpret the given integer as a stream of bytes using the corresponding endiannes. For instance process_le(0x11223344) would treat this integer as the following array of bytes: {0x44, 0x33, 0x22, 0x11}. process_be(0x11223344) on the other hand would treat this integer as if it's {0x11, 0x22, 0x33, 0x44}. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-08 10:12:21 -05:00
Tzach Livyatan	12fb975282	Fix typos in metrics description Fixes #2658 Signed-off-by: Tzach Livyatan <tzach@scylladb.com> Message-Id: <20170803121732.19640-1-tzach@scylladb.com>	2017-08-28 10:48:28 +03:00
Tomasz Grabiec	6555a2f50b	commitlog: Discard active but unused segments on shutdown So that they are not left on disk even though we did a clean shutdown. First part of the fix is to ensure that closed segments are recognized as not allocating (_closed flag). Not doing this prevents them from being collected by discard_unused_segments(). Second part is to actually call discard_unused_segments() on shutdown after all segments were shut down, so that those whose position are cleared can be removed. Fixes #2550. Message-Id: <1499358825-17855-1-git-send-email-tgrabiec@scylladb.com>	2017-07-09 19:25:22 +03:00
Calle Wilund	2913241df1	memtable/commitlog: Change bookkeep to track individul segments Use per CF-id reference count instead, and use handles as result of add operations. These must either be explicitly released or stored (rp_set), or they will release the corresponding replay_position upon destruction. Note: this does _not_ remove the replay positioning ordering requirement for mutations. It just removes it as a means to track segment liveness.	2017-06-07 12:07:01 +00:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00

1 2 3 4

198 Commits