Commit Graph

145 Commits

Author SHA1 Message Date
Calle Wilund
4ab03e98cf commitlog: Ensure we don't end up in a loop when we must wait for alloc
Continuation reordering could cause us to repeatedly see the
segment-local flag var even though actual write/sync ops are done.
Can cause wild recursion without actual delayed continuation ->
SOE.

Fix by also checking queue status, since this is the wait object.

Message-Id: <1468234873-13581-1-git-send-email-calle@scylladb.com>
2016-07-11 14:12:38 +03:00
Avi Kivity
2a46410f4a Change sstable_list from a map to a set
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set.  The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
2016-07-03 10:26:57 +03:00
Duarte Nunes
dfbf68cd24 commitlog: Define operator<< in namespace db
Needed for compilation with gcc6.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>
2016-06-26 10:05:28 +03:00
Calle Wilund
2b812a392a commitlog_replayer: Fix calculation of global min pos per shard
If a CF does not have any sstables at all, we should treat it
as having a replay position of zero. However, since we also
must deal with potential re-sharding, we cannot just set
shard->uuid->zero initially, because we don't know what shards
existed.

Go through all CF:s post map-reduce, and for every shard where
a CF does not have an RP-mapping (no sstables found), set the
global min pos (for shard) to zero.

Fixes #1372

Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>
2016-06-21 10:05:05 +03:00
Calle Wilund
7cdea1b889 commitlog: Use flush queue for write/flush ordering, improve batch
Using an ordering mechanism better than rw-locks for write/flush
means we can wait for pending write in batch mode, and coalesce
data from more than one mutation into a chunk.

It also means we can wait for a specific read+flush pair (based on
file position).

Downside is that we will not do parallel writes in batch mode (unless
we run out of buffer), which might underutilize the disk bandwidth.

Upside is that running in batch mode (i.e. per-write consistency)
now has way better bandwidth, and also, at least with high mutation
rate, better average latency.

Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>
2016-06-20 13:09:16 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Gleb Natapov
70575699e4 commitlog, sstables: enlarge XFS extent allocation for large files
With big rows I see contention in XFS allocations which cause reactor
thread to sleep. Commitlog is a main offender, so enlarge extent to
commitlog segment size for big files (commitlog and sstable Data files).

Message-Id: <20160404110952.GP20957@scylladb.com>
2016-04-04 14:15:00 +03:00
Paweł Dziepak
c8159eca52 commitlog: make sure that segment destructor doesn't throw
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:42:56 +01:00
Avi Kivity
417bcb122d commitlog: ignore commitlog segments generated by Cassandra-derived tools
Cassandra-derived tools (such as sstable2json) may write commitlog segments,
that Scylla cannot recognize.  Since we now write them with a distinct name,
we can recognize the name and ignore these segments, as we know the data they
contain is not interesting.

Fixes #1112.
Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>
2016-03-31 16:01:08 +03:00
Glauber Costa
d536846433 commitlog: initialize sync period with actual sync period
commitlog's sync period is initialized as the batch period, and not as the
sync period itself as it should be.

I've found this by code inspection, but unless I am missing something
really fundamental, this seems to be completely wrong. It's been working
fine because in our defaults, I have checked that both variables default to
the same value. But it seems to me that as long as anyone would change one
of them, the behavior wouldn't be as expected.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>
2016-03-29 15:21:02 +03:00
Benoît Canet
3b1d3d977d exceptions: Shutdown communications on non file I/O errors
Apply the same treatment to non file filesystem I/O errors.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:54 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Calle Wilund
0c3322befd commitlog: Ensure segment survives whole flush call
Must keep shared pointer alíve.
Likewise though, the shared pointer copy in cycle main continuation
is not needed.

Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>
2016-03-02 18:22:13 +02:00
Calle Wilund
f1c4e3eb3d commitlog: Clear reserve segments in orphan_all
Otherwise they will keep the segment_manager alive (leak).
Fixes jenkins ASan errors.

Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>
2016-03-02 18:22:09 +02:00
Calle Wilund
a556f665c0 commitlog: Take segment_manager locks first in write/flush
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.

And since we should really only have one active (contending)
segment per shard anyway, it should not matter.

Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
2016-03-02 18:22:05 +02:00
Paweł Dziepak
bdc23ae5b5 remove db/serializer.hh includes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:07:09 +00:00
Calle Wilund
e667dcc3d0 commitlog: Make segment->segment_manager relation shared pointer
The segment->segment_manager pointer has, until now, been a raw pointer,
which in a way is sensible, since making circular shared pointer
relations is in general bad. However, since the code and life cycle
of segments has evolved quite a bit since that initial relation
was defined, becoming both more and then suddenly, in a sense,
less, asynchronous over time, the usage of the relation is in fact
more consistent with a shared pointer, in that a segment needs to
access its manager to properly do things like write and flush.

These two ops in particular depend on accessing the segment manager
in a way that might be fine even using raw pointers, if it was not
again for that little annoying thing of continuation reordering.

So, lets just make the relation a shared pointer, solving the issue
of whether the manager is alive when a segment accesses it. If it
has been "released" (shut down), the existing mechanisms (gate)
will then trigger and prevent any actual _actions_ from taking
place. And we don't have to complicate anything else even more.

Only "big" change is that we need to explicitly orphan all
segments in commitlog destructor (segment_manager is essentially
a p-impl).

This fixes some spurious crashes in nightly unit tests.

Fixes #966.

Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>
2016-03-01 16:48:28 +02:00
Paweł Dziepak
dec63eac6e commitlog: add commitlog entry move constructor
Default move constructor and assignment didn't handle reference to
mutation (_mutation) properly.

Fixes #935.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>
2016-02-29 18:10:15 +02:00
Calle Wilund
dc136a6a1c commitlog: Fix reserve counter overflow
Fixes #482

See code comment. Reserve segment allocation count sum can temporarily
overflow due to continuation delay/reordering, if we manage to reach the
on_timer code before finally clauses from previous reserve allocation
invocation has processed. However, since these are benign overflows
(just indicating even more that we don't need to do anything right now)
simply capping the count should be fine.
Avoids assert in boost irange.

Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>
2016-02-29 14:56:24 +02:00
Avi Kivity
efabb1a1d8 commitlog: fix buffer size calculation
We were adding bool(buffer), instead of buffer.size(); exposed by making
temporary_buffer::operator bool explicit.
2016-02-24 13:38:05 +02:00
Paweł Dziepak
89b75a02d4 commitlog: use IDL-based serialization for entries
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
f548c75200 commitlog: move implementation to *.cc file
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Pekka Enberg
86173fb8cc db/commitlog: Fix debug log format string in commitlog_replayer::recover()
I saw the following Boost format string related warning during commitlog
replay:

  INFO  [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
  WARN  [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)

While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.

Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
2016-01-27 13:40:19 +02:00
Calle Wilund
e6b792b2ff commitlog bugfix: Fix batch mode
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.

Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
2016-01-26 17:13:14 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Calle Wilund
89dc0f7be3 commitlog: wait for writes (if needed) on new segment as well
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.

Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
2016-01-26 15:05:12 +02:00
Calle Wilund
f2c5315d33 commitlog: Add write/flush limits
Configured on start (for now - and dummy values at that). 
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish. 

Consequently, if an allocation op forces a write, which blocks, any 
other incoming allocations will also queue up to provide back pressure.
2016-01-26 10:19:24 +00:00
Calle Wilund
7628a4dfe0 commitlog: Add some feedback/measurement methods
Suitable to derive "back pressure" from.
2016-01-26 09:47:14 +00:00
Calle Wilund
4f5bd4b64b commitlog: split write/flush counters 2016-01-26 09:47:14 +00:00
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Calle Wilund
59bf54d59a commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>
2016-01-19 17:19:52 +02:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Paweł Dziepak
218898b297 commitlog: upgrade mutations during commitlog replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:26 +01:00
Paweł Dziepak
661849dbc3 commitlog: learn about schema versions during replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:23 +01:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
0254c3e30b commitlog: add commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:49 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Calle Wilund
7f4985a017 commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>
2016-01-12 10:29:07 +02:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Calle Wilund
43929d0ec1 commitlog: Add some comments about the IO flow
Documentation.
2015-12-16 13:13:31 +02:00
Tomasz Grabiec
c0ac7b3a73 commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable
future<> will require nothrow move constructible types.
2015-12-07 09:50:28 +01:00
Tomasz Grabiec
657841922a Mark move constructors noexcept when possible 2015-12-07 09:50:27 +01:00
Glauber Costa
5e8249f062 commitlog: fix but preventing flushing with default max_size value
The config file expresses this number in MB, while total_memory() gives us
a quantity in bytes. This causes the commitlog not to flush until we reach
really skyhigh numbers.

While we need this fix for the short term before we cook another release,
I will note that for the mid/long term, it would be really helpful to stop
representing memory amounts as integers, and use an explicit C++ type for
those. That would have prevented this bug.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-12-04 09:29:19 +02:00
Calle Wilund
262f44948d commitlog: Add get_flush_count method (for testing) 2015-11-23 15:42:45 +01:00
Calle Wilund
76b43fbf74 commitlog_replayer: Handle replay data errors as non-fatal
Discern fatal and non-fatal excceptions, and handle data corruption 
by adding to stats, resporting it, but continue processing.

Note that "invalid_arguement", i.e. attempting to replay origin/old
segments are still considered fatal, as it is probably better to 
signal this strongly to user/admin
2015-11-23 15:42:45 +01:00