scylladb

Author	SHA1	Message	Date
Duarte Nunes	dfbf68cd24	commitlog: Define operator<< in namespace db Needed for compilation with gcc6. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>	2016-06-26 10:05:28 +03:00
Calle Wilund	7cdea1b889	commitlog: Use flush queue for write/flush ordering, improve batch Using an ordering mechanism better than rw-locks for write/flush means we can wait for pending write in batch mode, and coalesce data from more than one mutation into a chunk. It also means we can wait for a specific read+flush pair (based on file position). Downside is that we will not do parallel writes in batch mode (unless we run out of buffer), which might underutilize the disk bandwidth. Upside is that running in batch mode (i.e. per-write consistency) now has way better bandwidth, and also, at least with high mutation rate, better average latency. Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>	2016-06-20 13:09:16 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	70575699e4	commitlog, sstables: enlarge XFS extent allocation for large files With big rows I see contention in XFS allocations which cause reactor thread to sleep. Commitlog is a main offender, so enlarge extent to commitlog segment size for big files (commitlog and sstable Data files). Message-Id: <20160404110952.GP20957@scylladb.com>	2016-04-04 14:15:00 +03:00
Paweł Dziepak	c8159eca52	commitlog: make sure that segment destructor doesn't throw Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:42:56 +01:00
Avi Kivity	417bcb122d	commitlog: ignore commitlog segments generated by Cassandra-derived tools Cassandra-derived tools (such as sstable2json) may write commitlog segments, that Scylla cannot recognize. Since we now write them with a distinct name, we can recognize the name and ignore these segments, as we know the data they contain is not interesting. Fixes #1112. Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>	2016-03-31 16:01:08 +03:00
Glauber Costa	d536846433	commitlog: initialize sync period with actual sync period commitlog's sync period is initialized as the batch period, and not as the sync period itself as it should be. I've found this by code inspection, but unless I am missing something really fundamental, this seems to be completely wrong. It's been working fine because in our defaults, I have checked that both variables default to the same value. But it seems to me that as long as anyone would change one of them, the behavior wouldn't be as expected. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>	2016-03-29 15:21:02 +03:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Calle Wilund	0c3322befd	commitlog: Ensure segment survives whole flush call Must keep shared pointer alíve. Likewise though, the shared pointer copy in cycle main continuation is not needed. Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>	2016-03-02 18:22:13 +02:00
Calle Wilund	f1c4e3eb3d	commitlog: Clear reserve segments in orphan_all Otherwise they will keep the segment_manager alive (leak). Fixes jenkins ASan errors. Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>	2016-03-02 18:22:09 +02:00
Calle Wilund	a556f665c0	commitlog: Take segment_manager locks first in write/flush While is is formally better to take a local lock first and then first contend for a global, in this case it is arguably better to ensure we get a gate exception synchronously (early) instead of potentially in a continuation. Old version might cause us to do a gate::leave even while never entered. And since we should really only have one active (contending) segment per shard anyway, it should not matter. Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>	2016-03-02 18:22:05 +02:00
Calle Wilund	e667dcc3d0	commitlog: Make segment->segment_manager relation shared pointer The segment->segment_manager pointer has, until now, been a raw pointer, which in a way is sensible, since making circular shared pointer relations is in general bad. However, since the code and life cycle of segments has evolved quite a bit since that initial relation was defined, becoming both more and then suddenly, in a sense, less, asynchronous over time, the usage of the relation is in fact more consistent with a shared pointer, in that a segment needs to access its manager to properly do things like write and flush. These two ops in particular depend on accessing the segment manager in a way that might be fine even using raw pointers, if it was not again for that little annoying thing of continuation reordering. So, lets just make the relation a shared pointer, solving the issue of whether the manager is alive when a segment accesses it. If it has been "released" (shut down), the existing mechanisms (gate) will then trigger and prevent any actual _actions_ from taking place. And we don't have to complicate anything else even more. Only "big" change is that we need to explicitly orphan all segments in commitlog destructor (segment_manager is essentially a p-impl). This fixes some spurious crashes in nightly unit tests. Fixes #966. Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>	2016-03-01 16:48:28 +02:00
Calle Wilund	dc136a6a1c	commitlog: Fix reserve counter overflow Fixes #482 See code comment. Reserve segment allocation count sum can temporarily overflow due to continuation delay/reordering, if we manage to reach the on_timer code before finally clauses from previous reserve allocation invocation has processed. However, since these are benign overflows (just indicating even more that we don't need to do anything right now) simply capping the count should be fine. Avoids assert in boost irange. Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>	2016-02-29 14:56:24 +02:00
Avi Kivity	efabb1a1d8	commitlog: fix buffer size calculation We were adding bool(buffer), instead of buffer.size(); exposed by making temporary_buffer::operator bool explicit.	2016-02-24 13:38:05 +02:00
Calle Wilund	e6b792b2ff	commitlog bugfix: Fix batch mode Last series accidently broke batch mode. With new, fancy, potentitally blocking ways, we need to treat batch mode differently, since in this case, sync should always come _after_ alloc-write. Previous patch caused infinite loop. Broke jenkins. Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>	2016-01-26 17:13:14 +02:00
Glauber Costa	3f94070d4e	use auto&& instead of auto& for priority classes. By Avi's request, who reminds us that auto& is more suited for situations in which we are assigning to the variable in question. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>	2016-01-26 17:00:20 +02:00
Calle Wilund	89dc0f7be3	commitlog: wait for writes (if needed) on new segment as well Also check closed status in allocate, since alloc queue waiting could lead to us re-allocating in a segment that gets closed in between queue enter and us running the continuation. Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>	2016-01-26 15:05:12 +02:00
Calle Wilund	f2c5315d33	commitlog: Add write/flush limits Configured on start (for now - and dummy values at that). When shard write/flush count reaches limit, and incoming ops will queue until previous ones finish. Consequently, if an allocation op forces a write, which blocks, any other incoming allocations will also queue up to provide back pressure.	2016-01-26 10:19:24 +00:00
Calle Wilund	7628a4dfe0	commitlog: Add some feedback/measurement methods Suitable to derive "back pressure" from.	2016-01-26 09:47:14 +00:00
Calle Wilund	4f5bd4b64b	commitlog: split write/flush counters	2016-01-26 09:47:14 +00:00
Calle Wilund	215c8b60bf	commitlog: minor cleanup - remove red squiggles in eclipse	2016-01-26 09:42:26 +00:00
Glauber Costa	b63611e148	mark I/O operations with priority classes After this patch, our I/O operations will be tagged into a specific priority class. The available classes are 5, and were defined in the previous patch: 1) memtable flush 2) commitlog writes 3) streaming mutation 4) SSTable compaction 5) CQL query Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Pekka Enberg	6cc02242f6	Merge "Multi schema support in commit log" from Paweł "This series adds support for multiple schema versions to the commit log. All segments contain column mappings of all schema versions used by the mutations contained in the segment, which are necessary in order to be able to read frozen mutations and upgrade them to the current schema version."	2016-01-18 10:11:26 +02:00
Paweł Dziepak	55d342181a	commitlog: do not skip entries inside a chunk All entries inside a chunk needs to be read since any of them may contain column mapping. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:23:00 +01:00
Paweł Dziepak	a877905bd4	commitlog: allow adding entries using commitlog_entry_writer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:17:45 +01:00
Paweł Dziepak	434c02cdfa	commitlog: keep track of schema versions Each segment chunk should contain column mappings for all schema versions used by the mutations it contains. In order to avoid duplication db::commitlog::segment remembers all schema versions already written in current chunk. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:13:41 +01:00
Paweł Dziepak	9d74268234	commitlog: introduce entry_writer Current commitlog interface requires writers to specify the size of a new entry which cannot depend on the segment to which the entry is written. If column mappings are going to be stored in the commitlog that's not enough since we don't know whether column mapping needs to be written until we known in which segment the entry is going to be stored. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:13:26 +01:00
Calle Wilund	7f4985a017	commit log reader bugfix: Fix tried to read entries across chunk bounds read_entry did not verify that current chunk has enough data left for a minimal entry. Thus we could try to read an entry from the slack left in a chunk, and get lost in the file (pos > next, skip very much -> eof). And also give false errors about corruption. Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>	2016-01-12 10:29:07 +02:00
Glauber Costa	74fbd8fac0	do not call open_file_dma directly We have an API that wraps open_file_dma which we use in some places, but in many other places we call the reactor version directly. This patch changes the latter to match the former. It will have the added benefit of allowing us to make easier changes to these interfaces if needed. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>	2016-01-05 10:37:57 +02:00
Calle Wilund	43929d0ec1	commitlog: Add some comments about the IO flow Documentation.	2015-12-16 13:13:31 +02:00
Tomasz Grabiec	c0ac7b3a73	commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable future<> will require nothrow move constructible types.	2015-12-07 09:50:28 +01:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Glauber Costa	5e8249f062	commitlog: fix but preventing flushing with default max_size value The config file expresses this number in MB, while total_memory() gives us a quantity in bytes. This causes the commitlog not to flush until we reach really skyhigh numbers. While we need this fix for the short term before we cook another release, I will note that for the mid/long term, it would be really helpful to stop representing memory amounts as integers, and use an explicit C++ type for those. That would have prevented this bug. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-12-04 09:29:19 +02:00
Calle Wilund	262f44948d	commitlog: Add get_flush_count method (for testing)	2015-11-23 15:42:45 +01:00
Calle Wilund	2fe2320490	commitlog: Make reading segments with crc/data errors non-fatal Parser object now attempts to skip past/terminate parsing on corrupted entries/chunks (as detected by invalid sizes/crc:s). The amount of data skipped is kept track of (as well as we can estimate - pre-allocation makes it tricky), and at the end of parsing/reporting, IFF errors occurred, and exception detailing the failures is thrown (since subsciption has little mechanism to deal with this otherwise). Thus a caller can decide how to deal with data corruption, but will be given as many entries as possible.	2015-11-23 15:42:45 +01:00
Glauber Costa	00c12319f1	config: change type for commitlog maximum size config option This patch substitutes uint64_t for uint32_t as the type for commitlog_total_space_in_mb. Moving to 64 is not strictly needed, since even a signed 32-bit type would allow us to easily handle 2TB. But since we store that in the commitlog as a 64-bit value, let's match it. Moving from unsigned to signed, however, allow us to represent negative numbers. With that in place, we can change the semantics of the value slightly, so to allow a negative number to mean "all memory". The reason behind this, is that the default value "8GB", is an artifact of the JVM. We don't need that, and in many-shards configuration, each shard flushes the commitlog way too often, since 8GB / many_shards = small_number. 8GB also happens to be a popular heap size for C* in the JVM. For us, we would like to equate that (at least) with the amount of memory. The problem is how to do that without introducing new options or changing the semantics of existing options too radically. The proposed solution will allow us to still parse C* yaml files, since those will always have positive numbers, while introducing our own defaults. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-11-15 10:29:23 +02:00
Calle Wilund	85b8d65374	commitlog: Change file format to include magic marker Allows us fail fast if someone tries to replay an Origin commit log. WARNING: This changes the file format, and there is no good way for me to check if a CL is "old" scylla, or Origin (since "version" is the same). So either "old" scylla files also fail, or we never fail (until later, and worse). Thus, if upgrading from older to this patch, likewise, ensure to have cleaned out all commit logs first.	2015-11-10 17:11:06 +01:00
Calle Wilund	5299cece4c	commitlog: Make "shutdown" do flushing + hard sync of pending ops * Do close + fsync on all segments * Make sure all pending cycle/sync ops are guarded with a gate, and explicitly wait for this gate on shutdown to make sure we don't leave hanging flushes in the task queue. * Fix bug where "commitlog::clear" did not in fact shut down the CL, due to "_shutdown" being already set. Note: This is (at least currently) not an issue for anything else than tests, since we don't shutdown the normal server "properly", i.e. the CL itself will not go away, and hanging tasks are ok, as long as the sync-all is done (which it was previously). But, to make tests predictable, and future-proof the CL, this is better.	2015-10-26 14:50:54 +01:00
Calle Wilund	05de462fa9	commitlog: Make flush/segment delete slightly mode defensive + test tolerant Fix for (mainly) test failures (use-after free) I.e. test case test_commitlog_delete_when_over_disk_limit causes use-after free because test shuts down before a pending flush is done, and the segment manager is actually gone -> crash writing stats. Now, we could make the stats a shared pointer, but we should never allow an operation to outlive the segment_manager. In normal op, we _almost_ guarantee this with the shutdown() call, but technically, we could have a flush continuation trailing somewhere. * Make sure we never delete segments from segment_manager until they are fully flushed * Make test disposal method "clear" be more defensive in flushing and clearing out segments	2015-10-22 15:19:24 +03:00
Calle Wilund	786d66cacf	commitlog: Fix use-after-free Remove "finally". Just use a then_wrapped. Which it was originally, before "handle_exception" was introduced to seastar. Oh, the irony...	2015-10-20 09:56:40 +03:00
Tomasz Grabiec	19d7d30e67	Replace references to 'urchin' with 'scylla'	2015-10-19 11:08:05 +03:00
Avi Kivity	849464670c	commitlog: make new segments more xfs-friendly xfs doesn't like writes beyond eof (exactly at eof is fine), and due to continuation reordering, we sometimes do that. Fix by pre-truncating the segment to its maximum size.	2015-10-14 17:32:59 +03:00
Calle Wilund	206acd8b5b	commitlog: Make reader handle pre-allocated files Silently ignore, and assume eof if reading zeroed file or chunk header data Reading entries already deal with this.	2015-10-14 17:32:23 +03:00
Calle Wilund	2729d5dd71	commitlog: ensure file size remains <= max_size Re-check file size overflow after each cycle() call (new buffer), otherwise we could write more, in the case we are storing a mutation larger than current buffer size (current pos + sizeof(mut) < max_size, but after cycle required by sizeof(mut) > buf_remain, the former might not be true anymore.	2015-10-14 17:32:22 +03:00
Calle Wilund	199b72c6f3	commitlog: fix reader "offset" handling broken + ensure exceptions propagates Must ensure we find a chunk/entry boundary still even when run with a start offset, since file navigation in chunk based. Was not observed as broken previously because 1.) We did not run with offsets 2.) The exception never reached caller. Also make the reader silently ignore empty files.	2015-10-07 08:54:49 +02:00
Calle Wilund	024041c752	commitlog: make log message slightly more informative/correct	2015-10-07 08:54:49 +02:00
Calle Wilund	4941d91063	Commitlog: add some more verbosity	2015-09-22 12:57:33 +02:00
Calle Wilund	a10745cf0e	Commitlog: Delay timer by period/ncpus for each cpu To avoid having all shards doing sync at the same time.	2015-09-21 13:30:35 +02:00
Calle Wilund	dcabf8c1d2	Commitlog: Pre-allocate "reserve" segments Refs #356 Pre-allocates N segments from timer task. N is "adaptive" in that it is increased (to a max) every time segement acquisition is forced to allocate a new instead of picking from pre-alloc (reserve) list. The idea is that it is easier to adapt how many segments we consume per timer quanta than the timer quanta itself. Also does disk pressure check and flush from timer task now. Note that the check is still only done max once every new segment. Some logging cleanup/betterment also to make behaviour easier to trace. Reserve segments start out at zero length, and are still deleted when finished. This is because otherwise we'd still have to clear the file to be able to properly parse it later (given that is can be a "half" file due to power fail etc). This might need revisiting as well. With this patch, there should be no case (except flush starvation) where "add_mutation" actually waits for a (potentially) blocking op (disk). Note that since the amount of reserve is increased as needed, there will be occasional cases where a new segment is created in the alloc path until the system finds equilebrium. But this should only be during a breif warmup. v2: Fixed timestamp not being reset on reserve acquire	2015-09-21 13:04:39 +02:00

1 2 3

113 Commits