Commit Graph

92 Commits

Author SHA1 Message Date
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Calle Wilund
7f4985a017 commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>
2016-01-12 10:29:07 +02:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Calle Wilund
43929d0ec1 commitlog: Add some comments about the IO flow
Documentation.
2015-12-16 13:13:31 +02:00
Tomasz Grabiec
c0ac7b3a73 commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable
future<> will require nothrow move constructible types.
2015-12-07 09:50:28 +01:00
Tomasz Grabiec
657841922a Mark move constructors noexcept when possible 2015-12-07 09:50:27 +01:00
Glauber Costa
5e8249f062 commitlog: fix but preventing flushing with default max_size value
The config file expresses this number in MB, while total_memory() gives us
a quantity in bytes. This causes the commitlog not to flush until we reach
really skyhigh numbers.

While we need this fix for the short term before we cook another release,
I will note that for the mid/long term, it would be really helpful to stop
representing memory amounts as integers, and use an explicit C++ type for
those. That would have prevented this bug.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-12-04 09:29:19 +02:00
Calle Wilund
262f44948d commitlog: Add get_flush_count method (for testing) 2015-11-23 15:42:45 +01:00
Calle Wilund
2fe2320490 commitlog: Make reading segments with crc/data errors non-fatal
Parser object now attempts to skip past/terminate parsing on corrupted
entries/chunks (as detected by invalid sizes/crc:s). The amount of data
skipped is kept track of (as well as we can estimate - pre-allocation
makes it tricky), and at the end of parsing/reporting, IFF errors 
occurred, and exception detailing the failures is thrown (since 
subsciption has little mechanism to deal with this otherwise). 

Thus a caller can decide how to deal with data corruption, but will be
given as many entries as possible.
2015-11-23 15:42:45 +01:00
Glauber Costa
00c12319f1 config: change type for commitlog maximum size config option
This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb.  Moving to 64 is not strictly needed, since even a
signed 32-bit type would allow us to easily handle 2TB. But since we store that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers.  With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of the
JVM.  We don't need that, and in many-shards configuration, each shard flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we would
like to equate that (at least) with the amount of memory. The problem is how to
do that without introducing new options or changing the semantics of existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Calle Wilund
85b8d65374 commitlog: Change file format to include magic marker
Allows us fail fast if someone tries to replay an Origin commit log.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch, likewise, ensure to
have cleaned out all commit logs first.
2015-11-10 17:11:06 +01:00
Calle Wilund
5299cece4c commitlog: Make "shutdown" do flushing + hard sync of pending ops
* Do close + fsync on all segments
* Make sure all pending cycle/sync ops are guarded with a gate, and
  explicitly wait for this gate on shutdown to make sure we don't
  leave hanging flushes in the task queue.
* Fix bug where "commitlog::clear" did not in fact shut down the CL,
  due to "_shutdown" being already set.

Note: This is (at least currently) not an issue for anything else than tests,
since we don't shutdown the normal server "properly", i.e. the CL itself
will not go away, and hanging tasks are ok, as long as the sync-all is done
(which it was previously). But, to make tests predictable, and future-proof
the CL, this is better.
2015-10-26 14:50:54 +01:00
Calle Wilund
05de462fa9 commitlog: Make flush/segment delete slightly mode defensive + test tolerant
Fix for (mainly) test failures (use-after free)
I.e. test case test_commitlog_delete_when_over_disk_limit causes
use-after free because test shuts down before a pending flush is done,
and the segment manager is actually gone -> crash writing stats.
Now, we could make the stats a shared pointer, but we should never
allow an operation to outlive the segment_manager.
In normal op, we _almost_ guarantee this with the shutdown() call,
but technically, we could have a flush continuation trailing somewhere.

* Make sure we never delete segments from segment_manager until they are
  fully flushed
* Make test disposal method "clear" be more defensive in flushing and
  clearing out segments
2015-10-22 15:19:24 +03:00
Calle Wilund
786d66cacf commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 09:56:40 +03:00
Tomasz Grabiec
19d7d30e67 Replace references to 'urchin' with 'scylla' 2015-10-19 11:08:05 +03:00
Avi Kivity
849464670c commitlog: make new segments more xfs-friendly
xfs doesn't like writes beyond eof (exactly at eof is fine), and due
to continuation reordering, we sometimes do that.

Fix by pre-truncating the segment to its maximum size.
2015-10-14 17:32:59 +03:00
Calle Wilund
206acd8b5b commitlog: Make reader handle pre-allocated files
Silently ignore, and assume eof if reading zeroed file or chunk header data
Reading entries already deal with this.
2015-10-14 17:32:23 +03:00
Calle Wilund
2729d5dd71 commitlog: ensure file size remains <= max_size
Re-check file size overflow after each cycle() call (new buffer),
otherwise we could write more, in the case we are storing a mutation
larger than current buffer size (current pos + sizeof(mut) < max_size, but
after cycle required by sizeof(mut) > buf_remain, the former might not be
true anymore.
2015-10-14 17:32:22 +03:00
Calle Wilund
199b72c6f3 commitlog: fix reader "offset" handling broken + ensure exceptions propagates
Must ensure we find a chunk/entry boundary still even when run
with a start offset, since file navigation in chunk based.
Was not observed as broken previously because
1.) We did not run with offsets
2.) The exception never reached caller.

Also make the reader silently ignore empty files.
2015-10-07 08:54:49 +02:00
Calle Wilund
024041c752 commitlog: make log message slightly more informative/correct 2015-10-07 08:54:49 +02:00
Calle Wilund
4941d91063 Commitlog: add some more verbosity 2015-09-22 12:57:33 +02:00
Calle Wilund
a10745cf0e Commitlog: Delay timer by period/ncpus for each cpu
To avoid having all shards doing sync at the same time.
2015-09-21 13:30:35 +02:00
Calle Wilund
dcabf8c1d2 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.

v2: Fixed timestamp not being reset on reserve acquire
2015-09-21 13:04:39 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Avi Kivity
dcdc925b86 Revert "Commitlog: Pre-allocate "reserve" segments"
This reverts commit cbf3b63853, due to
reports of increased latency (instead of the opposite).
2015-09-19 09:26:39 +03:00
Calle Wilund
cbf3b63853 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
2015-09-17 19:54:28 +03:00
Calle Wilund
b512192b3b Commitlog: Fix some timing/latency issues with sync
Refs #356

* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
  since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
2015-09-16 20:07:25 +03:00
Calle Wilund
456246dfd5 Commitlog: Add a gate + shutdown method
* Gate ensures we don't add data into a segment after close
* Shutdown closes all segments for business and prohibits new segments
2015-09-08 11:53:41 +02:00
Calle Wilund
d666c747e3 Commitlog: Just add some more verbosity 2015-09-08 11:16:38 +02:00
Calle Wilund
256c0550bf Commitlog: Only delete segments on disk if they are marked clean
For #293 - i.e. allow more or less coherent shutdown/destruction of the
commitlog while retaining disk data.
(tests still clear stuff explicitly).
2015-09-07 20:32:01 +02:00
Calle Wilund
4ed95b7020 Commitlog: Add sync_all_segments()
For #293 - allows explicit flush to disk (not close!) of all active segments
2015-09-07 20:31:59 +02:00
Calle Wilund
d614143f5e Commitlog/database: Fixup series "Commit log flush request on disk overflow"
Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)

Refs #262

Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.

v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
  only fully syncronous over actual call
2015-09-07 21:29:19 +03:00
Calle Wilund
fdb921afb2 Commitlog: Add flushing of segment CF:s on disk overflow
* Do not throw away commitlog segments on disk size overflow. 
  Issue a flush request (i.e. calculate RP we want to free unto, 
  and for all dirty CF:s, do a request).
  "Abstracted" as registerable callback. I.e. DB:s responsibility 
  to actually do something with it.
2015-09-07 13:21:43 +02:00
Calle Wilund
841dd32a8a Commitlog: divide max on-disk-size by num cpus
To try to keep the resulting limit as configured
2015-09-07 13:13:46 +02:00
Calle Wilund
d95101664d Commitlog: Don't throw exceptions on unrecognized files in CL dir 2015-09-01 14:23:03 +02:00
Calle Wilund
1814f89730 Commitlog: Add some more metrics + accessors for json API
Fixes #99

Adding missing commitlog metrics to the rest API.

v2: Mis-send (clumsy fingers)
v3: Use map_reduce0 + subroutine for nicer code
v4: rebased on current master
v5: rebased yet again.

Since the _second_ file in this previous patch set was commited, and is
dependent on this very change below to even compile, some expediency might be
warranted.
2015-09-01 10:15:33 +03:00
Calle Wilund
9ba84e458a Commitlog: Handle partial writes in segment::cycle
* Fixes #247
* Re-introduce test_allocation_failure, but allow for the "failure" to not
  happen. I.e. if run with low memory settings, the test will check that
  allocation failure is graceful. With lots of memory it will check partial
  write.
2015-08-31 20:02:05 +03:00
Calle Wilund
bbf82e80d0 Commitlog: Allow skipping X bytes in commit log reader
Also refactor reader into named methods for debugging sanity.
2015-08-31 14:29:49 +02:00
Calle Wilund
da9ea641e5 Commitlog: Handle full paths in descriptor file name parse. 2015-08-31 14:29:48 +02:00
Calle Wilund
02d2bef1f2 Commitlog: Expose convinience method "list_existing_segments" 2015-08-31 14:29:48 +02:00
Calle Wilund
19052b3c09 Commitlog: Expose list_existing_descriptors 2015-08-31 14:29:48 +02:00
Calle Wilund
e068ffb5a5 Commitlog: Make file reader provide replay_position for entries 2015-08-31 14:29:47 +02:00
Calle Wilund
41b1ad8600 Commitlog: Make descriptor type visible/usable from outside 2015-08-31 14:29:47 +02:00
Calle Wilund
ea38b223bd Commitlog: change the ID generation scheme
* Make it more like origin, i.e. based on wall clock time of app start
* Encode shard ID in the, RP segement ID, to ensure RP:s and segement names
  are unique per shard
2015-08-31 14:29:46 +02:00