"Adds a small utility queue and through this enforces memtable flush ordering
such that a flush may _run_ unchecked, however the "post" operation may
execute once all "lower numbered" (i.e. lower replay position) post ops
has finished.
This means that:
a.) Callbacks to commitlog are now guaranteed to fulfill ordering criteria
b.) Calling column_family::flush() and waiting for the result will also
wait for any previously initiated flushes to finish. But not those
initiated _after_."
Small utility to order operation->post operation
so that the "post" step is guaranteed to only be run
when all "post"-ops for lower valued keys (T) have been completed
This is a generalized utility mainly to be testable.
Before:
$ nodetool info
ID : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active : false
After:
$ nodetool info
ID : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active : true
Fix#354.
* seastar 78e3924...a2523ae (7):
> core: fix pipe unread
> Merge 'xfs-extents'
> Merge "separate-dma-alignment"
> output_stream: wait for stream to be taken out of poller in case final flush returns exception.
> reactor: Use more widely compatible xfs include
> readme: Add xfslibs-dev to Ubuntu deps
> pipe: add unread() operation
The first problem is the while loop around the code that processes prestate.
That's wrong because there may be a need to read more data before continuing
to process a prestate.
The second problem is the code assumption that a prestate will be processed
at once, and then unconditionally process the current state.
Both problems are likely to happen when reading a large buffer because more
than one read may be required.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
I was mildly annoyed by seeing two warnings about the same directory not
being XFS, when the sstable directory and the commitlog directory are the
same one (I don't know if this is typical, but this is what I do in all
my tests...). So I wrote this trivial patch to make sure not to test the
same directory twice.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
"With this, a new node can stream data from existing nodes when joins the cluster.
I tested with the following:
1) stat a node 1
2) insert data into node 1
3) start node 2
I can see from the logger that data is streamed correctly from node 1
to node 2."
Add code to actually stream data from other nodes during bootstrap.
I tested with the following:
1) stat a node 1
2) insert data into node 1
3) start node 2
I can see from the logger that data is streamed correctly from node 1
to node 2.
One version returns only the ranges
std::vector<range<token>>
Another version returns a map
std::unordered_map<range<token>, std::unordered_set<inet_address>>
which is converted from
std::unordered_multimap<range<token>, inet_address>
They are needed by token_metadata::pending_endpoints_for,
storage_service::get_all_ranges_with_strict_sources_for and
storage_service::decommission.
Given the current token_metadata and the new token which will be
inserted into the ring after bootstrap, calculate the ranges this new
node will be responsible for.
This is needed by boot_strapper::bootstrap().
"This series adds EC2Snich.
Since both GossipingPropertyFileSnitch and EC2SnitchXXX snitches family
are using the same property file it was logical to share the corresponding
code. Most of this series does just that... "
While trying to debug an unrelated bug, I was annoyed by the fact that parsing
caching options keep throwing exceptions all the time. Those exceptions have no
reason to happen: we try to convert the value to a number, and if we fail we
fall back to one of the two blessed strings.
We could just as easily just test for those strings beforehand and avoid all of
that.
While we're on it, the exception message should show the value of "r", not "k".
Signed-off-by: Glauber Costa <glommer@scylladb.com>
Currently, we are calculating truncated_at during truncate() independently for
each shard. It will work if we're lucky, but it is fairly easy to trigger cases
in which each shard will end up with a slightly different time.
The main problem here, is that this time is used as the snapshot name when auto
snapshots are enabled. Previous to my last fixes, this would just generate two
separate directories in this case, which is wrong but not severe.
But after the fix, this means that both shards will wait for one another to
synchronize and this will hang the database.
Fix this by making sure that the truncation time is calculated before
invoke_on_all in all needed places.
Signed-off-by: Glauber Costa <glommer@scylladb.com>