Currently this is done only in
storage_service::get_mutable_token_metadata_ptr
but it needs to be done here as well for code paths
calling mutate_token_metadata directly.
Currently, this it is only called from network_topology_strategy_test.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220130152157.2596086-1-bhalevy@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Currently, we first delete all existing token mappings
for the endpoint from _token_to_endpoint_map and then
we add all updated token mappings for it and set should_sort_tokens
if the token is newly inserted, but since we removed all
existing mappings for the endpoint unconditionally, we
will sort the tokens even if the token existed and
its ownership did not change.
This is worthwhile since there are scenarios where
none of the token ownership change. Searching and
erasing tokens from the tokens unordered_set runs
at constant time on average so doing it for n tokens
is O(n), while sorting the tokens is O(n*log(n)).
Test: unit(dev)
DTest: replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap(dev,debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-2-bhalevy@scylladb.com>
It's currently used only by unit tests
and it is dangerous to use on a populated token_metadata
as update_normal_tokens assumes that the set of tokens
owned by the given endpoint is compelte, i.e. previous
tokens owned by the endpoint are no longer owned by it,
but the single-token update_normal_token interface
seems commulative (and has no documentation whatsoever).
It is better to remove this interface and calculate a
complete map of endpoint->tokens from the tests.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-1-bhalevy@scylladb.com>
This is a consolidation of #9714 and #9709 PRs by @elcallio that were reviewed by @asias
The last comment on those was that they should be consolidated in order not to create a security degradation for
ec2 setups.
For some cases it is impossible to determine dc or rack association for nodes on outgoing connections.
One example is when some IPs are hidden behind Nat layer.
In some cases this creates problems where one side of the connection is aware of the rack/dc association where the
other doesn't.
The solution here is a two stage one:
1. First add a gossip reverse lookup that will help us determine the rack/dc association for a broader (hopefully all) range
of setups and NAT situations.
2. When this fails - be more strict about downgrading a node which tries to ensure that both sides of the connection will at least
downgrade the connection instead of just fail to start when it is not possible for one side to determine rack/dc association.
Fixes#9653
/cc @elcallio @asias
Closes#9822
* github.com:scylladb/scylla:
messaging_service: Add reverse mapping of private ip -> public endpoint
production_snitch_base: Do reverse lookup of endpoint for info
messaging_service: Make dc/rack encryption check for connection more strict
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
In a previous patch, we noticed that the header file <gm/inet_address.hh>,
which is included, directly or indirectly, by most source files,
includes <seastar/net/ip.hh> which is very slow to compile, and
replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>.
However, we also included <seastar/net/ip.hh> in types.hh - and that
too is included by almost every file, so the actual saving from the
above patch was minimal. So in this patch we replace this include too.
After this patch Scylla does not include <seastar/net/ip.hh> at all.
According to ClangBuildAnalyzer, this reduces the average time to include
types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds,
and reduces total build time (dev mode) by about 3%.
Some of the source files were now missing some include directives, that
were previously included in ip.hh - so we need to add those explicitly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Refs #9709
Refs #9653
If we don't find immediate info about an endpoint, check if
we're being asked about a "private" ip for the endpoint.
If so, give info for this.
The next patch will disable stopping the keyspaces
in database shutdown due to #9684.
This will leave outstanding e_r_m:s when the factory
is destroyed. They must be unregistered from the factory
so they won't try to submit_background_work()
to gently clear their contents.
Support that temporarily until shutdown is fixed
to ensure they are no outstanding e_r_m:s when
the factory is destroyed, at which point this
can turn into an internal error.
Refs #8995
Refs #9684
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211127083348.146649-1-bhalevy@scylladb.com>
Prevent reactor stalls by gently clearing the replication_map
and token_metadata_ptr when the effective_replication_map is
destroyed.
This is done in the background, protected by the
effective_replication_map_factory::stop() method.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Calculating a new effective_replication_map on each shard
is expensive. To try to save that, use the factory key to
look up an e_r_m on shard 0 and if found, use to to clone
its replication map and use that to make the shard-local
e_r_m copy.
In the future, we may want to improve that in 2 ways:
- instead of always going to shard 0, use hash(key) % smp::count
to create the first copy.
- make full copies only on NUMA nodes and keep a shared pointer
on all other shards.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The effective_replication_map_factory keeps nakes pointers
to outstanding effective_replication_map:s.
These are kept valid using a shared effective_replication_map_ptr.
When the last shared ptr reference is dropped the effective_replication_map
object is destroyed, therefore the raw pointer to it in the factory
must be erased.
This now happens in ~effective_replication_map when the object
is marked as registered.
Registration happens when effective_replication_map_factory inserts
the newly created effective_replication_map to its _replication_maps
map, and the factory calles effective_replication_map::set_factory..
Note that effective_replication_map may be created temporarily
and not be inserted to the factory's map, therefore erase
is called only when required.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Make a factory key using the replication_strategy type
and config options, plus the token_metadata ring version
and use it to search an already-registred effective_replication_map.
If not found, calculate a new create_effective_replication_map
and register it using the above key.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So a effective_replication_map_ptr can be generated
using a raw pointer by effective_replication_map_factory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To be used to locate the effective_replication_map
in the to-be-introduced effective_replication_map_factory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Accessing tm.sorted_tokens().back() causes undefined behavior
if tm.sorted_tokens is empty.
Check that first and throw/abort using on_internal_error
in this case.
This will prevent the segfault but it doesn't fix the root cause
which is getting here with empty token_metadata. That will be fixed
by the following patch.
Refs #9494
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211019075710.1626808-1-bhalevy@scylladb.com>
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Equivalent to abstract_replication_strategy get_range_addresses,
yet synchronous, as it uses the precalculated map.
Call it from storage_service::get_new_source_ranges
and range_streamer::get_all_ranges_with_sources_for.
Consequently, get_new_source_ranges and removenode_add_ranges
can become synchronous too.
Unfortunately we can't entirely get rid of
abstract_replication_strategy::get_range_addresses
as it's still needed by
range_streamer::get_all_ranges_with_strict_sources_for.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is not used any more.
Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for deleting the _shared_token_metadata member.
All we need for recognized_options is the topology
(for network_topology_strategy).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that abstract_replication_strategy methods are all async
clone_only_token_map_sync, and update_normal_tokens_sync
are unused.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is no longer in use.
And with it, the virtual calculate_natural_endpoint_sync method
of which it was the only caller.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Remaining callers of get_address_ranges and get_pending_address_ranges
are all either from a seastar thread or from a coroutine
so we can make the methods always async and drop the
can_yield param.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All remaining use sites are called in a seastar thread
so we drop the can_yield param and make get_range_addresses
always async.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is called only from repair, in a thread,
so it can be made always async and the need_preempt param
can be dropped.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.
Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that all falvors of get_natural_endpoints methods
were moved to effective_replication_map,
do_get_natural_endpoints and its overrides are unused.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For generating unique _ring_version.
Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.
To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.
Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.
Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
implementation
Now that all users of it were converted to use the
effective_replication_map, the legacy
abstract_replication_strategy::get_natural_endpoints method
can be deleted.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.
Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.
We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps. Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.
Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Serialize the metadata changes with
keyspace create, update, or drop.
This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.
Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
effective_replication_map holds the full replication_map
resulting from applying the effective replication strategy
over the given token_metadata and replication_strategy_config_options.
It is calculated once, in make_effective_replication_map(), and then it
can be used for retrieving the endpoints/token_ranges synchronously
from the precalculated map.
A new virtual get_natural_endpoints(const token&, const effective_replication_map&)
method has been added to abstract_replication_strategy so that
local_strategy and everywhere_replication_strategy can override it as they may be
needed before the token_metadata is established.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.
While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And with that rename calculate_natural_endpoints(const token& search_token, const token_metadata&, can_yield)
to do_calculate_natural_endpoints and make it protected,
With this patch, all its external users call the async version, so
rename it back to calculate_natural_endpoints, and make
calculate_natural_endpoints_sync private since it's being called
only within abstract_replication_strategy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
calculate_natural_endpoints_sync and _async are both provided
temporarily until all users of them are converted to use
the async version which will remain.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Extract natural_endpoints_tracker out of calculate_natural_endpoints
so we easily split the function to sync and async variants.
Test: network_topology_strategy_test(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>