Commit Graph

138 Commits

Author SHA1 Message Date
Paweł Dziepak
c8159eca52 commitlog: make sure that segment destructor doesn't throw
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:42:56 +01:00
Avi Kivity
417bcb122d commitlog: ignore commitlog segments generated by Cassandra-derived tools
Cassandra-derived tools (such as sstable2json) may write commitlog segments,
that Scylla cannot recognize.  Since we now write them with a distinct name,
we can recognize the name and ignore these segments, as we know the data they
contain is not interesting.

Fixes #1112.
Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>
2016-03-31 16:01:08 +03:00
Glauber Costa
d536846433 commitlog: initialize sync period with actual sync period
commitlog's sync period is initialized as the batch period, and not as the
sync period itself as it should be.

I've found this by code inspection, but unless I am missing something
really fundamental, this seems to be completely wrong. It's been working
fine because in our defaults, I have checked that both variables default to
the same value. But it seems to me that as long as anyone would change one
of them, the behavior wouldn't be as expected.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>
2016-03-29 15:21:02 +03:00
Benoît Canet
3b1d3d977d exceptions: Shutdown communications on non file I/O errors
Apply the same treatment to non file filesystem I/O errors.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:54 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Calle Wilund
0c3322befd commitlog: Ensure segment survives whole flush call
Must keep shared pointer alíve.
Likewise though, the shared pointer copy in cycle main continuation
is not needed.

Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>
2016-03-02 18:22:13 +02:00
Calle Wilund
f1c4e3eb3d commitlog: Clear reserve segments in orphan_all
Otherwise they will keep the segment_manager alive (leak).
Fixes jenkins ASan errors.

Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>
2016-03-02 18:22:09 +02:00
Calle Wilund
a556f665c0 commitlog: Take segment_manager locks first in write/flush
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.

And since we should really only have one active (contending)
segment per shard anyway, it should not matter.

Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
2016-03-02 18:22:05 +02:00
Paweł Dziepak
bdc23ae5b5 remove db/serializer.hh includes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:07:09 +00:00
Calle Wilund
e667dcc3d0 commitlog: Make segment->segment_manager relation shared pointer
The segment->segment_manager pointer has, until now, been a raw pointer,
which in a way is sensible, since making circular shared pointer
relations is in general bad. However, since the code and life cycle
of segments has evolved quite a bit since that initial relation
was defined, becoming both more and then suddenly, in a sense,
less, asynchronous over time, the usage of the relation is in fact
more consistent with a shared pointer, in that a segment needs to
access its manager to properly do things like write and flush.

These two ops in particular depend on accessing the segment manager
in a way that might be fine even using raw pointers, if it was not
again for that little annoying thing of continuation reordering.

So, lets just make the relation a shared pointer, solving the issue
of whether the manager is alive when a segment accesses it. If it
has been "released" (shut down), the existing mechanisms (gate)
will then trigger and prevent any actual _actions_ from taking
place. And we don't have to complicate anything else even more.

Only "big" change is that we need to explicitly orphan all
segments in commitlog destructor (segment_manager is essentially
a p-impl).

This fixes some spurious crashes in nightly unit tests.

Fixes #966.

Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>
2016-03-01 16:48:28 +02:00
Paweł Dziepak
dec63eac6e commitlog: add commitlog entry move constructor
Default move constructor and assignment didn't handle reference to
mutation (_mutation) properly.

Fixes #935.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>
2016-02-29 18:10:15 +02:00
Calle Wilund
dc136a6a1c commitlog: Fix reserve counter overflow
Fixes #482

See code comment. Reserve segment allocation count sum can temporarily
overflow due to continuation delay/reordering, if we manage to reach the
on_timer code before finally clauses from previous reserve allocation
invocation has processed. However, since these are benign overflows
(just indicating even more that we don't need to do anything right now)
simply capping the count should be fine.
Avoids assert in boost irange.

Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>
2016-02-29 14:56:24 +02:00
Avi Kivity
efabb1a1d8 commitlog: fix buffer size calculation
We were adding bool(buffer), instead of buffer.size(); exposed by making
temporary_buffer::operator bool explicit.
2016-02-24 13:38:05 +02:00
Paweł Dziepak
89b75a02d4 commitlog: use IDL-based serialization for entries
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
f548c75200 commitlog: move implementation to *.cc file
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Pekka Enberg
86173fb8cc db/commitlog: Fix debug log format string in commitlog_replayer::recover()
I saw the following Boost format string related warning during commitlog
replay:

  INFO  [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
  WARN  [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)

While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.

Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
2016-01-27 13:40:19 +02:00
Calle Wilund
e6b792b2ff commitlog bugfix: Fix batch mode
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.

Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
2016-01-26 17:13:14 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Calle Wilund
89dc0f7be3 commitlog: wait for writes (if needed) on new segment as well
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.

Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
2016-01-26 15:05:12 +02:00
Calle Wilund
f2c5315d33 commitlog: Add write/flush limits
Configured on start (for now - and dummy values at that). 
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish. 

Consequently, if an allocation op forces a write, which blocks, any 
other incoming allocations will also queue up to provide back pressure.
2016-01-26 10:19:24 +00:00
Calle Wilund
7628a4dfe0 commitlog: Add some feedback/measurement methods
Suitable to derive "back pressure" from.
2016-01-26 09:47:14 +00:00
Calle Wilund
4f5bd4b64b commitlog: split write/flush counters 2016-01-26 09:47:14 +00:00
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Calle Wilund
59bf54d59a commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>
2016-01-19 17:19:52 +02:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Paweł Dziepak
218898b297 commitlog: upgrade mutations during commitlog replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:26 +01:00
Paweł Dziepak
661849dbc3 commitlog: learn about schema versions during replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:23 +01:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
0254c3e30b commitlog: add commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:49 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Calle Wilund
7f4985a017 commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>
2016-01-12 10:29:07 +02:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Calle Wilund
43929d0ec1 commitlog: Add some comments about the IO flow
Documentation.
2015-12-16 13:13:31 +02:00
Tomasz Grabiec
c0ac7b3a73 commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable
future<> will require nothrow move constructible types.
2015-12-07 09:50:28 +01:00
Tomasz Grabiec
657841922a Mark move constructors noexcept when possible 2015-12-07 09:50:27 +01:00
Glauber Costa
5e8249f062 commitlog: fix but preventing flushing with default max_size value
The config file expresses this number in MB, while total_memory() gives us
a quantity in bytes. This causes the commitlog not to flush until we reach
really skyhigh numbers.

While we need this fix for the short term before we cook another release,
I will note that for the mid/long term, it would be really helpful to stop
representing memory amounts as integers, and use an explicit C++ type for
those. That would have prevented this bug.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-12-04 09:29:19 +02:00
Calle Wilund
262f44948d commitlog: Add get_flush_count method (for testing) 2015-11-23 15:42:45 +01:00
Calle Wilund
76b43fbf74 commitlog_replayer: Handle replay data errors as non-fatal
Discern fatal and non-fatal excceptions, and handle data corruption 
by adding to stats, resporting it, but continue processing.

Note that "invalid_arguement", i.e. attempting to replay origin/old
segments are still considered fatal, as it is probably better to 
signal this strongly to user/admin
2015-11-23 15:42:45 +01:00
Calle Wilund
2fe2320490 commitlog: Make reading segments with crc/data errors non-fatal
Parser object now attempts to skip past/terminate parsing on corrupted
entries/chunks (as detected by invalid sizes/crc:s). The amount of data
skipped is kept track of (as well as we can estimate - pre-allocation
makes it tricky), and at the end of parsing/reporting, IFF errors 
occurred, and exception detailing the failures is thrown (since 
subsciption has little mechanism to deal with this otherwise). 

Thus a caller can decide how to deal with data corruption, but will be
given as many entries as possible.
2015-11-23 15:42:45 +01:00
Glauber Costa
00c12319f1 config: change type for commitlog maximum size config option
This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb.  Moving to 64 is not strictly needed, since even a
signed 32-bit type would allow us to easily handle 2TB. But since we store that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers.  With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of the
JVM.  We don't need that, and in many-shards configuration, each shard flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we would
like to equate that (at least) with the amount of memory. The problem is how to
do that without introducing new options or changing the semantics of existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Calle Wilund
43712a583d commitlog_replayer: Special case exception from "old/origin file"
And write some nice informative stuff.
2015-11-10 17:14:22 +01:00
Calle Wilund
85b8d65374 commitlog: Change file format to include magic marker
Allows us fail fast if someone tries to replay an Origin commit log.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch, likewise, ensure to
have cleaned out all commit logs first.
2015-11-10 17:11:06 +01:00
Calle Wilund
5299cece4c commitlog: Make "shutdown" do flushing + hard sync of pending ops
* Do close + fsync on all segments
* Make sure all pending cycle/sync ops are guarded with a gate, and
  explicitly wait for this gate on shutdown to make sure we don't
  leave hanging flushes in the task queue.
* Fix bug where "commitlog::clear" did not in fact shut down the CL,
  due to "_shutdown" being already set.

Note: This is (at least currently) not an issue for anything else than tests,
since we don't shutdown the normal server "properly", i.e. the CL itself
will not go away, and hanging tasks are ok, as long as the sync-all is done
(which it was previously). But, to make tests predictable, and future-proof
the CL, this is better.
2015-10-26 14:50:54 +01:00
Calle Wilund
05de462fa9 commitlog: Make flush/segment delete slightly mode defensive + test tolerant
Fix for (mainly) test failures (use-after free)
I.e. test case test_commitlog_delete_when_over_disk_limit causes
use-after free because test shuts down before a pending flush is done,
and the segment manager is actually gone -> crash writing stats.
Now, we could make the stats a shared pointer, but we should never
allow an operation to outlive the segment_manager.
In normal op, we _almost_ guarantee this with the shutdown() call,
but technically, we could have a flush continuation trailing somewhere.

* Make sure we never delete segments from segment_manager until they are
  fully flushed
* Make test disposal method "clear" be more defensive in flushing and
  clearing out segments
2015-10-22 15:19:24 +03:00
Calle Wilund
786d66cacf commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 09:56:40 +03:00