"Initial implementation/transposition of commit log replay.
* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
sstables are inspected for high water mark, and then replayed from
those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
per _previous_ runs shards, not current.
Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
like origin. Partly because I am lazy, but also partly because our serial
format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
file, detailing which keyspace/cf:s to replay). Partly because we have no
system properties.
There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
* Make it more like origin, i.e. based on wall clock time of app start
* Encode shard ID in the, RP segement ID, to ensure RP:s and segement names
are unique per shard
recently, "file" started to use a shared_ptr internally, and is already
copy-able and reference counted, and there is no reason to use
lw_shared_ptr<file>. This patch cleans up a few remaining places where
lw_shared_ptr<file> was used.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds a method to return a vector with full-path to the active
segment names. It will be used by the API.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Should fix use-after-free when a frozen_mutation is applied to the
local shard.
Includes two adjustments to urchin collectd usage from Calle:
- Updated thrift collectd registration to use proper move semantics
- Commitlog: Fix collectd registration to use move semantics + test
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
- # segments
- # allocting segments
- # unused segments
- # allocations
- # cycles (disk writes)
- # flush
- # total bytes allocated
- # total bytes disk slack (due to dma blocks)
Counters are per-commitlog (shard). Can be extended to be per-segment also,
but would be transient and probably not much more useful.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Generic read-all-stream from a commit log segmen file.
Provides a byte view for each data entry, doing CRC checks and padding skips.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Use a semaphore as gatekeeper to make sure we finish creating a segment
before starting a new one.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
* A commitlog is created in "work" dirs when initing the db
from a datadir. However, since we have neither disk data storage,
nor replay capability yet (and no real db config), the settings
are basically to just write in-memory serialization, write them to
disk and then discard them. So in fact, pointless. But at least using
the log...
* Moved the actual "apply" of mutation into database. If a commitlog
is active, add an entry to it before applying mutation.
Commit log segment "finish_and_get_new" should not call "new_segment"
explicitly, since more than one invocation might get there on the same
flush condition (segment full).
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Implements a cassandra-file-compatible segmented log
of "mutations". Handles "batch" and "periodic" mode like
stock version. Also includes "dirty" management for the
interaction log/memtable.
Supports:
* add
* sync/flush
* clear dirty bits (thus discarding segments)
Many more estoric stock functions not yet implemented.
Missing: Storage management. Does not deal with total
size on disk of segments yet. Nor does it have any provisions
for dealing with active buffer bloat should async writes stall.
[avi: adjust for future<>::rescue() removal]
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>