Route request to CPU 0. _operation_mode is not replicated to other CPUS.
Without this:
$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/operation_mode"
returns "NORMAL" and "STARTING" randomly.
Only cpu 0 instance of gossip has the correct information, route request
to cpu 0.
Fix a bug where
$ curl -X GET --header "Accept: application/json"
"http://172.31.5.77:10000/storage_service/gossiping"
returns true and false randomly.
Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)
Refs #262
Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.
v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
only fully syncronous over actual call
If several mutation in a batch throw exceptions have_cl.broken() will be
called more then once. Fix this by dropping ad hoc have_cl and use
parallel_for_each() that does the same thing that current code is doing.
Fixes#297
"Fixes #262
Handles CL disk size exceeding configured max size by calling flush handlers
for each dirty CF id / high replay_position mark. (Instead of uncontrolled
delete as previously).
* Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no
real change, but synced).
* Divide the max disk size by cpus (so sum of all shards == max)
* Abstract flush callbacks in CL
* Handler in DB that initiates memtable->sstable writes when called.
Note that the flush request is done "syncronously" in new_segment() (i.e.
when getting a new segment and crossing threshold). This is however more or
less congruent with Origin, which will do a request-sync in the corresponding
case.
Actual dealing with the request should at least in production code however be
done async, and in DB it is, i.e. we initiate sstable writes. Hopefully
they finish soon, and CL segments will be released (before next segment is
allocated).
If the flush request does _not_ eventually result in any CF:s becoming
clean and segments released we could potentially be issuing flushes
repeatedly, but never more often than on every new segment."
Changes:
tests: improve rpc timeout test
rpc: add unregister_handler
future: Fix assertion failure in case schedule() throws
rpc: fix rpc timeout
build: disable -fsanitize=vptr if it is available and broken
* Do not throw away commitlog segments on disk size overflow.
Issue a flush request (i.e. calculate RP we want to free unto,
and for all dirty CF:s, do a request).
"Abstracted" as registerable callback. I.e. DB:s responsibility
to actually do something with it.
Currently, we use a 128k buffer for creation of data and index
files, but for other components we use a 4k buffer size.
Let's also use a 128k buffer for the other components.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Right now, gossip returns hard coded cluster and partitioner name.
sstring get_cluster_name() {
// FIXME: DatabaseDescriptor.getClusterName()
return "my_cluster_name";
}
sstring get_partitioner_name() {
// FIXME: DatabaseDescriptor.getPartitionerName()
return "my_partitioner_name";
}
Fix it by setting the correct name from configure option.
With this
cqlsh 127.0.0.$i -e "SELECT * from system.local;
returns correct cluster_name.
Fixes#291
Instead of accepting a column resolver callable, accept a schema and
column_kind or column_selector. Makes the interface easier to use and
enables us to move implementation to .cc file.
See #259.
When transferring mutations between memtable and cache, lsa sometimes
runs out of memory. This solves the first two points, keeping reserve
filled up and adjusting the amount of reserve based on execution
history.
In some cases region may be in a state where it is not empty and
nothing could be evicted from it. For example when creating the first
entry, reclaimer may get invoked during creation before it gets
linked. We therefore can't rely on emptiness as a stop condition for
reclamation, the evction function shall signal us if it made forward
progress.
Related to #259. In some cases we need to allocate memory and hold
reclaim lock at the same time. If that region holds most of the
reclaimable memory, allocations inside that code section may
fail. allocating_section is a work-around of the problem. It learns
how big reserves shold be from past execution of critical section and
tries to ensure proper reserves before entering the section.
Some times we may close an empty active segment, if all data in it was
evicted. Normally segments are removed as soon as the last object in
it is freed, but if the segment is already empty when closed, noone is
supposed to call free on it. Such segments would be quickly reclaimed
during compaction, but it's possible that we will destroy the region
before they're reclaimed by compaction. Currently we would fail on an
assertion which checks that there are no segments. This change fixes
the problem by handling empty closed segments when region is
destroyed.
Memory releasing is invoked from destructors so should not throw. As a
consequence it should not allocate memory, so emergency segment pool
was switched from std::deque<> to an alloc-free intrusive stack.