In preparation to make run_with_compaction_disabled() a non-template,
we want to remove any non-copyable captures (so the function can be
an std::function, which requires copyability). Move the flush within
the compaction disabled region. This changes the behavior, but it shouldn't
matter.
"These patches make Scylla refuse to load counter sstables that may
contain unsupported counter shards. They are recognised by the lack of
the Scylla component.
Fixes #2766."
* tag 'reject-non-scylla-counter-sstables/v1' of https://github.com/pdziepak/scylla:
db: reject non-Scylla counter sstables in flush_upload_dir
db: disallow loading non-Scylla counter sstables
sstable: add has_scylla_component()
A keyspace can be deleted while write is ongoing, so the object cannot
be used after defer point. The keyspace reference is only used to check
how many replies a write operation should wait for and this can be
precalculated during write handler creation.
Fixes#2777
Message-Id: <20170911084436.GG24167@scylladb.com>
"This series tries to improve the bootstrap of a node in a large cluster by
improving how gossip applies the gossip node state. In #2404, the joining node
failed to bootstrap, because it did not see the seed node when
storage_service::bootstrap ran. After this series, we apply the whole gossip
state contained in the gossip ack/ack2 message before applying the next one,
and we apply the state of the seed node earlier than non-seed node so we can
have the seed node's state faster. We also add some randomness to the order of
applying gossip node state to prevent some of the nodes' state are always
applied earlier than the others.
This series improves apply_state_locally for large cluster:
- Tune the order of applying endpoint_state
- Serialize apply_state_locally
- Avoid copying of the gossip state map
Fixes#2404"
* tag 'asias/gossip_issue_2404_v2' of github.com:scylladb/seastar-dev:
gossip: Avoid copying with apply_state_locally
gossip: Serialize apply_state_locally
gossip: Tune the order of applying endpoint_state in apply_state_locally
gossip: Introduce is_seed helper
gossip: Pass const endpoint_state& in notify_failure_detector
gossip: Pass reference in notify_failure_detector
Move the std::map<inet_address, endpoint_state> map from the gossip
ack/ack2 message directly and move it around in apply_state_locally to
avoid copying the map.
apply_state_locally will be called when gossip ack/ack2
message is received. It will use the std::map<inet_address,
endpoint_state>& map to update the endpoint state.
However, we can receive multiple such gossip ack/ack2 messages from
multiple peer nodes in parallel. Currently, we process them in parallel.
It is better to apply all the states from one node then move to apply
all the states from another node than interleaving. Because it is more
important to have the state of the whole cluster than to have a bit
newer state from another peer (if it is newer), especially when the node
boots up and runs its first round of gossip exchange.
After this patch, we apply the whole gossip state contained in the
gossip ack/ack2 message before applying the next one.
We currently always apply the endpoint_state in the order of the
endpoint ip address. This is not good because some of the endpoint's
state is always applied earlier than the others.
In large cluster, the number of endpoints can be large, it takes time to
apply all of them. To make it more fair, we apply the endpoint_state
randomly.
Apply the seed node's state earlier because in bootstrap, we will check
if we have seen the seed node in storage_service::bootstrap. In #2404,
the bootstrap failed because, the joining node hasn't apply the seed
node's state when storage_service::bootstrap runs.
* seastar 85ca12d...31b925d (19):
> net/byteorder: fix 64 bit ntohq and htonq on big endian machines
> core, util: fix compilation on non-x86 processors
> core/memory: Fix SIGSEGV in small_pool::add_more_objects()
> log: remove debug leftovers
> Merge "TLS state machine fixes" from Calle
> logger: allow adjusting the timestamp style for stdout logs
> thread: make thread_context::s_main portable
> core: add seastar::cache_line_size constant
> Add detach() to input_stream and output_stream
> Install dependencies for Arch Linux.
> tls: Guard non-established sockets in sesrefs + more explicit close + states
> tls: Make vec_push fully exception safe
> basic_sstring: resize uses sstring
> Merge "Add and correct unit tests" from Jesse
> tcp: enforce 1-byte maximum segment invariant with zero window
> tcp: verify 1-byte maximum segment invariant during send with zero window
> memory: reduce small_pool vulnerability to fragmentation further
> Prometheus: avoid merging all metrics family
> net: Fix possible NULL pointer dereference.
validate_request_failure() assumed that the future returned by execute_cql()
is always ready, which doesn't have to be the case, and caused aborts
in debug mode build.
Message-Id: <1504701342-13300-1-git-send-email-tgrabiec@scylladb.com>
Scylla already refuses to load counter sstables that do not have Scylla
component. However, if this happens because of 'nodetool refresh'
command the existing protection will trigger after sstables have been
moved to the data directory. This is too later, so an additional check
is added when the upload directory is scanned.
has_scylla_component() is going to be used to verify that an sstable has
been generated by a recent version of Scylla. This would make it
possible to reject sstables that may be unsafe to load (e.g. sstables
containing legacy counter shards).
"This series implements the missing API to terminate all repairs.
For example:
$ curl -X POST --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/force_terminate_repair"
With the new stream_plan::abort() api we can now abort the stream
session assocaited with the repair as well.
On top of this, we can support termination of single repair job instead all
jobs.
Fixes#2105"
* tag 'asisas/repair_abort_v4' of github.com:scylladb/seastar-dev:
repair: Support termination of repair jobs
repair: Track repair_info
repair: Intorduce repair id to repair_info map
api: Add force_terminate_repair API
streaming: Add abort to stream_plan
streaming: Add abort_all_stream_sessions for stream_coordinator
streaming: Introduce streaming::abort()
streaming: Make stream_manager and coordinator message debug level
streaming: Check if _stream_result is valid
streaming: Log peer address in on_error
streaming: Introduce received_failed_complete_message
"Scylla 1.7.4 and older use incorrect ordering of counter shards, this
was fixed in 0d87f3dd7d ("utils::UUID:
operator< should behave as comparison of hex strings/bytes"). However,
that patch was not backported to 1.7 branch until very recently. This
means that versions 1.7.4 and older emit counter shards in an incorrect
order and expect them to be so. This is particularly bad when dealing
with imported correct sstables in which case some shards may become
duplicated.
The solution implemented in this patch is to allow any order of counter
shards and automaticly merge all duplicates. The code is written in a
way so that the correct ordering is expected in the fast path in order
not to excessively punish unaffected deployments.
A new feature flag CORRECT_COUNTER_ORDER is introduced to allow seamless
upgrade from 1.7.4 to later Scylla versions. If that feature is not
available Scylla still writes sstables and sends on-wire counters using
the old ordering so that it can be correctly understood by 1.7.4, once
the flag becomes available Scylla switches to the correct order.
Fixes #2752."
* tag 'fix-upgrade-with-counters/v2' of https://github.com/pdziepak/scylla:
tests/counter: verify counter_id ordering
counter: check that utils::UUID uses int64_t
mutation_partition_serializer: use old counter ordering if necessary
mutation_partition_view: do not expect counter shards to be sorted
sstables: write counter shards in the order expected by the cluster
tests/sstables: add storage_service_for_tests to counter write test
tests/sstables: add test for reading wrong-order counter cells
sstables: do not expect counter shards to be sorted
storage_service: introduce CORRECT_COUNTER_ORDER feature
tests/counter: test 1.7.4 compatible shard ordering
counters: add helper for retrieving shards in 1.7.4 order
tests/counter: add tests for 1.7.4 counter shard order
counters: add counter id comparator compatible with Scylla 1.7.4
tests/counter: verify order of counter shards
tests/counter: add test for sorting and deduplicating shards
counters: add function for sorting and deduplicating counter cells
counters: add counter_id::operator>
Until the cluster is fully upgraded from a version that uses the
incorrect counter shard ordering it is essential to keep using it lest
the old nodes corrupt the data upon receiving mutations with a counter
shard ordering they do not expect.
If the feature signaling that we have switched to the correct ordering
of counter shards is not enabled it means that the user still can do a
rollback to a version that expects wrong ordering. In order to avoid any
disasters when that happens write sstables using the 1.7.4 order until
we know for sure that it is no longer needed.
Scylla 1.7.4 used incorrect ordering of counter shards. In order to fix
this problem a new feature is introduced that will be used to determine
when nodes with that bug fixed can start sending counter shard in the
correct order.
Due to a bug in an implementation of UUID less compare some Scylla
versions sort counter shards in an incorrect order. Moreover, when
dealing with imported correct data the inconsistencies in ordering
caused some counter shards to become duplicated.
* 'tgrabiec/make-row-cache-update-exception-safe' of github.com:scylladb/seastar-dev:
row_cache: Improve safety of cache updates
row_cache: Extract invalidate_sync()
memtable: Mark mark_flushed() as noexcept
database: Add non-throwing try_trigger_compaction()
database: Make add_sstable() have strong exception guarantees
row_cache: Don't require presence checker to be supplied externally
database: Supply presence checker in sstable snapshots
mutation_source: Introduce mutation_source::make_partition_presence_checker()
mutation_reader: Move definitions up in the header
mutation_reader: Use constructor delegation to reduce code duplication
row_cache: Make populate() preserve continuity
row_cache: Allow marking as fully continuous on construction
database: Add missing serialization of sstable set udpate and cache invalidation
Cache imposes requirements on how updates to the on-disk mutation source
are made:
1) each change to the on-disk muation source must be followed
by cache synchronization reflecting that change
2) The two must be serialized with other synchronizations
3) must have strong failure guarantees (atomicity)
Because of that, sstable list update and cache synchronization must be
done under a lock, and cache synchronization cannot fail to synchronize.
Normally cache synchronization achieves no-failure thing by wiping the
cache (which is noexcept) in case failure is detect. There are some
setup steps hoever which cannot be skipped, e.g. taking a lock
followed by switching cache to use the new snapshot. That truly cannot
fail. The lock inside cache synchronizers is redundant, since the
user needs to take it anyway around the combined operation.
In order to make ensuring strong exception guarantees easier, and
making the cache interface easier to use correctly, this patch moves
the control of the combined update into the cache. This is done by
having cache::update() et al accept a callback (external_updater)
which is supposed to perform modiciation of the underlying mutation
source when invoked.
This is in-line with the layering. Cache is layered on top of the
on-disk mutation source (it wraps it) and reading has to go through
cache. After the patch, modification also goes through cache. This way
more of cache's requirements can be confined to its implementation.
The failure semantics of update() and other synchronizers needed to
change due to strong exception guaratnees. Now if it fails, it means
the update was not performed, neither to the cache nor to the
underlying mutation source.
The database::_cache_update_sem goes away, serialization is done
internally by the cache.
The external_updater needs to have strong exception guarantees. This
requirement is not new. It is however currently violated in some
places. This patch marks those callbacks as noexcept and leaves a
FIXME. Those should be fixed, but that's not in the scope of this
patch. Aborting is still better than corrupting the state.
Fixes#2754.
Also fixes the following test failure:
tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed
which started to trigger after commit 318423d50b. Thread stack
allocation may fail, in which case we did not do the necessary
invalidation.