Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)
Refs #262
Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.
v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
only fully syncronous over actual call
* Do not throw away commitlog segments on disk size overflow.
Issue a flush request (i.e. calculate RP we want to free unto,
and for all dirty CF:s, do a request).
"Abstracted" as registerable callback. I.e. DB:s responsibility
to actually do something with it.
Fixes#99
Adding missing commitlog metrics to the rest API.
v2: Mis-send (clumsy fingers)
v3: Use map_reduce0 + subroutine for nicer code
v4: rebased on current master
v5: rebased yet again.
Since the _second_ file in this previous patch set was commited, and is
dependent on this very change below to even compile, some expediency might be
warranted.
* Fixes#247
* Re-introduce test_allocation_failure, but allow for the "failure" to not
happen. I.e. if run with low memory settings, the test will check that
allocation failure is graceful. With lots of memory it will check partial
write.
* Make it more like origin, i.e. based on wall clock time of app start
* Encode shard ID in the, RP segement ID, to ensure RP:s and segement names
are unique per shard
Origin
* Note: removed commitlog_test:test_allocation_failure because with
segments limited to 4GB -> mutation limited to 2GB, actually forcing
a fail is not guaranteed or even likely.
"Initial implementation/transposition of commit log replay.
* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
sstables are inspected for high water mark, and then replayed from
those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
per _previous_ runs shards, not current.
Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
like origin. Partly because I am lazy, but also partly because our serial
format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
file, detailing which keyspace/cf:s to replay). Partly because we have no
system properties.
There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
* Make it more like origin, i.e. based on wall clock time of app start
* Encode shard ID in the, RP segement ID, to ensure RP:s and segement names
are unique per shard
recently, "file" started to use a shared_ptr internally, and is already
copy-able and reference counted, and there is no reason to use
lw_shared_ptr<file>. This patch cleans up a few remaining places where
lw_shared_ptr<file> was used.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds a method to return a vector with full-path to the active
segment names. It will be used by the API.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Should fix use-after-free when a frozen_mutation is applied to the
local shard.
Includes two adjustments to urchin collectd usage from Calle:
- Updated thrift collectd registration to use proper move semantics
- Commitlog: Fix collectd registration to use move semantics + test
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
- # segments
- # allocting segments
- # unused segments
- # allocations
- # cycles (disk writes)
- # flush
- # total bytes allocated
- # total bytes disk slack (due to dma blocks)
Counters are per-commitlog (shard). Can be extended to be per-segment also,
but would be transient and probably not much more useful.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Generic read-all-stream from a commit log segmen file.
Provides a byte view for each data entry, doing CRC checks and padding skips.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>