* seastar a2523ae...8207f2c (3):
> rwlock: provide lock / unlock semantics
> with_lock: run a function under a lock
> rwlock: add documentation to the rwlock module
Fixes spurious failures in test_commitlog_discard_completed_segments
* Do explicit sync on all segments to prevent async flushed from keeping
segements alive.
* Use counter instead of actual file counting to avoid racing with
pre-allocation of segments
The Fedora base image has changed so we need to add "hostname" that's
used by the Docker-specific launch script to our image.
Fixes Scylla startup.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
"Fixes the crashes in debug mode with the flush queue test, and
simplifies and cleans up the queue itself.
Aforementioned crashes happened due to reordering with the signalling
loop in previous version. A task completing could race with a reordered
loop continuation in who would get to signal and remove an item.
Rewritten to use much simpler promise chaining instead (which also allows
return value to propagate from Pre- to post op), ensuring only one actor
modifies the queue entry."
Previous version dit looping on post execution and signaling of waiters.
This could "race" with an op just finishing if task reordering happened.
This version simplifies the code significantly (and raises the question why
it was not written like this in the first place... Shame on me) by simpy
building a promise-dependency chain between _previous_ queue items and next
instead.
Also, the code now handles propagation of return value from the "Func" pre-op
to the "Post" op, with exceptions automatically handled.
xfs doesn't like writes beyond eof (exactly at eof is fine), and due
to continuation reordering, we sometimes do that.
Fix by pre-truncating the segment to its maximum size.
Re-check file size overflow after each cycle() call (new buffer),
otherwise we could write more, in the case we are storing a mutation
larger than current buffer size (current pos + sizeof(mut) < max_size, but
after cycle required by sizeof(mut) > buf_remain, the former might not be
true anymore.
"Adds a small utility queue and through this enforces memtable flush ordering
such that a flush may _run_ unchecked, however the "post" operation may
execute once all "lower numbered" (i.e. lower replay position) post ops
has finished.
This means that:
a.) Callbacks to commitlog are now guaranteed to fulfill ordering criteria
b.) Calling column_family::flush() and waiting for the result will also
wait for any previously initiated flushes to finish. But not those
initiated _after_."
Small utility to order operation->post operation
so that the "post" step is guaranteed to only be run
when all "post"-ops for lower valued keys (T) have been completed
This is a generalized utility mainly to be testable.
Before:
$ nodetool info
ID : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active : false
After:
$ nodetool info
ID : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active : true
Fix#354.
* seastar 78e3924...a2523ae (7):
> core: fix pipe unread
> Merge 'xfs-extents'
> Merge "separate-dma-alignment"
> output_stream: wait for stream to be taken out of poller in case final flush returns exception.
> reactor: Use more widely compatible xfs include
> readme: Add xfslibs-dev to Ubuntu deps
> pipe: add unread() operation
The first problem is the while loop around the code that processes prestate.
That's wrong because there may be a need to read more data before continuing
to process a prestate.
The second problem is the code assumption that a prestate will be processed
at once, and then unconditionally process the current state.
Both problems are likely to happen when reading a large buffer because more
than one read may be required.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
I was mildly annoyed by seeing two warnings about the same directory not
being XFS, when the sstable directory and the commitlog directory are the
same one (I don't know if this is typical, but this is what I do in all
my tests...). So I wrote this trivial patch to make sure not to test the
same directory twice.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
"With this, a new node can stream data from existing nodes when joins the cluster.
I tested with the following:
1) stat a node 1
2) insert data into node 1
3) start node 2
I can see from the logger that data is streamed correctly from node 1
to node 2."
Add code to actually stream data from other nodes during bootstrap.
I tested with the following:
1) stat a node 1
2) insert data into node 1
3) start node 2
I can see from the logger that data is streamed correctly from node 1
to node 2.