"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.
Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.
perf_simple_query -c4 --duration 60, medians:
./perf_before ./perf_after diff
read 343086.80 360720.53 5.1%
Tests: unit(release, small_vector in debug)
"
* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
partition_slice: use small_vector for column_ids
mutation_fragment_merger: use small_vector
auth: use small_vector in resource
auth: avoid list-initialisation of vectors
idl: serialiser: add serialiser for utils::small_vector
idl: serialiser: deduplicate vector serialisers
utils: introduce small_vector
intrusive_set_external_comparator: make iterator nothrow move constructible
mutation_fragment_merger: value-initialise iterator
"
This is a backport of CASSANDRA-8236.
Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.
To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.
The RPC_READY is a bad name but I think it is better to follow Cassandra.
Nodes with or without this patch are supposed to work together with no
problem.
Refs #3843
"
* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
storage_service: Use cql_ready facility
storage_service: Handle application_state::RPC_READY
storage_service: Add notify_cql_change
storage_service: Add debug log in notify_joined
storage_service: Add extra check in notify_joined
storage_service: Add notify_joined
storage_service: Add debug log in notify_up
storage_service: Add extra check in notify_up
storage_service: Add notify_up
storage_service: Make notify_left log debug level
storage_service: Introduce notify_left
storage_service: Add debug log in notify_down
storage_service: Introduce notify_down
storage_service: Add set_cql_ready
gossip: Add gossiper::is_cql_ready
gms: Add endpoint_state::is_cql_ready
gms: Add application_state::RPC_READY
gms: Introduce cql_ready in versioned_value
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:
- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false
- New scylla node always send application_state::RPC_READY = false when
the node boots and send application_state::RPC_READY = true when cql
server is up
- Old scylla node that does not support the application_state::RPC_READY
never has application_state::RPC_READY in the endpoint_state, we can
only think their cql server is up, so we return true here if
application_state::RPC_READY is not present
_ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too.
Otherwise, a shift in the range 32-63 will produce wrong results.
Fix by using a 64-bit mask.
Found by Fedora 29's ubsan.
Fixes#3973.
Message-Id: <20181209120549.21371-1-avi@scylladb.com>
"
Refs #3929
Enables re-use of commitlog segments.
First, ensures we never succeed playing back a commitlog
segment with name not matching the ID:s in the actual
file data, by determining expected id based on file name.
This will also handle partially written re-used files, as
each chunk headers CRC is dependent on the ID, and will
fail once we hit any left-overs.
Second part renamed and puts files into a recycle list
instead of actually deleting them when finished.
Allocating new files will the prioritize this list
before creating a new file.
Note that since consumtion and release of segments can
be somewhat unbalanced, this does not really guarantee
we will use recycled files even in all cases when it
might be possible, simply because of timing. It does
however give a good chance of it.
We limit recycled files based on the max disk size
setting, thus we can potentially grow disk size
more than without depending on timing, but not
uncontrolled.
While all this theoretially might improve disk
writes in some cases, it is far from any magic bullet.
No real performance testing has been done yet, only
functional.
"
* 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev:
commitlog: Recycle used segments instead of delete + new file
commitlog: Terminate all segments with a zero chunk
commitlog_replay: Enforce file name based id matching
Refs #3929
When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.
We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).
Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.
Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).
Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.
v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits
v3:
* Use special names for files waiting for reuse
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.
We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
The newer version of node_exporter comes with important bug fixes, that
is especially important for I3.metal is not supported with the older
version of node_exporter.
The dashboards can now support both the new and the old version of
node_exporter.
Fixes#3927
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181210085251.23312-1-amnon@scylladb.com>
"
Make major compaction aware of compaction strategy, by using an
optimal approach which suits the strategy needs.
Refs #1431.
"
* 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla:
tests: add test for compaction-strategy-aware major compaction
compaction: implement major compaction heuristic for leveled strategy
compaction: introduce notion of compaction-strategy-aware major compaction
rh_entry address is captured inside timeout's callback lambda, so the
structure should not be moved after it is created. Change the code to
create rh_entry in-place instead of moving it into the map.
Fixes#3972.
Message-Id: <20181206164043.GN25283@scylladb.com>
The results vector should be populated vertically, not horizontally.
Responsible for assertion failure with --cache-enabled:
void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed.
Introduced in 3fc78a25bf.
Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>