Commit Graph

393 Commits

Author SHA1 Message Date
Benny Halevy
3cee0f8bd9 shared_token_metadata: mutate_token_metadata: bump cloned copy ring_version
Currently this is done only in
storage_service::get_mutable_token_metadata_ptr
but it needs to be done here as well for code paths
calling mutate_token_metadata directly.

Currently, this it is only called from network_topology_strategy_test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220130152157.2596086-1-bhalevy@scylladb.com>
2022-01-30 18:15:08 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Benny Halevy
17e006106b token_metadata: update_normal_tokens: avoid unneeded sort when token ownership doesn't change
Currently, we first delete all existing token mappings
for the endpoint from _token_to_endpoint_map and then
we add all updated token mappings for it and set should_sort_tokens
if the token is newly inserted, but since we removed all
existing mappings for the endpoint unconditionally, we
will sort the tokens even if the token existed and
its ownership did not change.

This is worthwhile since there are scenarios where
none of the token ownership change.  Searching and
erasing tokens from the tokens unordered_set runs
at constant time on average so doing it for n tokens
is O(n), while sorting the tokens is O(n*log(n)).

Test: unit(dev)
DTest: replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap(dev,debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-2-bhalevy@scylladb.com>
2022-01-17 12:18:42 +02:00
Benny Halevy
25977db7b4 token_metadata: remove update_normal_token entry point
It's currently used only by unit tests
and it is dangerous to use on a populated token_metadata
as update_normal_tokens assumes that the set of tokens
owned by the given endpoint is compelte, i.e. previous
tokens owned by the endpoint are no longer owned by it,
but the single-token update_normal_token interface
seems commulative (and has no documentation whatsoever).

It is better to remove this interface and calculate a
complete map of endpoint->tokens from the tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-1-bhalevy@scylladb.com>
2022-01-17 12:18:42 +02:00
Pavel Solodovnikov
badbfd521c locator: reconnectable_snitch_helper: coroutinize reconnect
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00
Pavel Solodovnikov
5dcfb94d5a gms: i_endpoint_state_change_subscriber: make callbacks to return futures
Coroutinize a few simple callbacks in the process.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00
Avi Kivity
57188de09e Merge 'Make dc/rack encryption work for some cases where Nat hides ednpoint ips' from Eliran Sinvani
This is a consolidation of #9714 and #9709 PRs by @elcallio that were reviewed by @asias
The last comment on those was that they should be consolidated in order not to create a security degradation for
ec2 setups.
For some cases it is impossible to determine dc or rack association for nodes on outgoing connections.
One example is when some IPs are hidden behind Nat layer.
In some cases this creates problems where one side of the connection is aware of the rack/dc association where the
other doesn't.
The solution here is a two stage one:
1. First add a gossip reverse lookup that will help us determine the rack/dc association for a broader (hopefully all) range
 of setups and NAT situations.
2. When this fails - be more strict about downgrading a node which tries to ensure that both sides of the connection will at least
 downgrade the connection instead of just fail to start when it is not possible for one side to determine rack/dc association.

Fixes #9653
/cc @elcallio @asias

Closes #9822

* github.com:scylladb/scylla:
  messaging_service: Add reverse mapping of private ip -> public endpoint
  production_snitch_base: Do reverse lookup of endpoint for info
  messaging_service: Make dc/rack encryption check for connection more strict
2022-01-09 16:40:49 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Nadav Har'El
6012f6f2b6 build performance: do not include <seastar/net/ip.hh>
In a previous patch, we noticed that the header file <gm/inet_address.hh>,
which is included, directly or indirectly, by most source files,
includes <seastar/net/ip.hh> which is very slow to compile, and
replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>.

However, we also included <seastar/net/ip.hh> in types.hh - and that
too is included by almost every file, so the actual saving from the
above patch was minimal. So in this patch we replace this include too.
After this patch Scylla does not include <seastar/net/ip.hh> at all.

According to ClangBuildAnalyzer, this reduces the average time to include
types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds,
and reduces total build time (dev mode) by about 3%.

Some of the source files were now missing some include directives, that
were previously included in ip.hh - so we need to add those explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-01-05 17:29:21 +02:00
Calle Wilund
4df008adcc production_snitch_base: Do reverse lookup of endpoint for info
Refs #9709
Refs #9653

If we don't find immediate info about an endpoint, check if
we're being asked about a "private" ip for the endpoint.
If so, give info for this.
2021-12-20 06:20:46 +02:00
Benny Halevy
044e4a6b72 token_metadata: delete private constructor
It is not used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211205174306.450536-1-bhalevy@scylladb.com>
2021-12-05 19:49:29 +02:00
Benny Halevy
93367ba55f effective_replication_map_factory: temporarily unregister outstanding maps when destroyed
The next patch will disable stopping the keyspaces
in database shutdown due to #9684.

This will leave outstanding e_r_m:s when the factory
is destroyed. They must be unregistered from the factory
so they won't try to submit_background_work()
to gently clear their contents.

Support that temporarily until shutdown is fixed
to ensure they are no outstanding e_r_m:s when
the factory is destroyed, at which point this
can turn into an internal error.

Refs #8995
Refs #9684

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211127083348.146649-1-bhalevy@scylladb.com>
2021-11-29 11:59:44 +02:00
Benny Halevy
9d2631daaf token_metadata: calculate_pending_ranges_for_leaving: maybe yield
We see long stalls as reported in
https://github.com/scylladb/scylla/issues/8030#issuecomment-974783526

everywhere_replication_strategy::calculate_natural_endpoints
is synchronous and doesn't yield, so add maybe_yield() calls
when looping over many token ranges.

Refs #8030

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211121090339.3955278-1-bhalevy@scylladb.com>
Message-Id: <20211121102606.76700-1-bhalevy@scylladb.com>
2021-11-22 10:48:25 +02:00
Benny Halevy
eed3e95704 effective_replication_map: clear_gently when destroyed
Prevent reactor stalls by gently clearing the replication_map
and token_metadata_ptr when the effective_replication_map is
destroyed.

This is done in the background, protected by the
effective_replication_map_factory::stop() method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
866e1b8479 effective_replication_map_factory: try cloning replication map from shard 0
Calculating a new effective_replication_map on each shard
is expensive.  To try to save that, use the factory key to
look up an e_r_m on shard 0 and if found, use to to clone
its replication map and use that to make the shard-local
e_r_m copy.

In the future, we may want to improve that in 2 ways:
- instead of always going to shard 0, use hash(key) % smp::count
to create the first copy.
- make full copies only on NUMA nodes and keep a shared pointer
on all other shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
6754e6ca2b effective_replication_map: erase from factory when destroyed
The effective_replication_map_factory keeps nakes pointers
to outstanding effective_replication_map:s.
These are kept valid using a shared effective_replication_map_ptr.

When the last shared ptr reference is dropped the effective_replication_map
object is destroyed, therefore the raw pointer to it in the factory
must be erased.

This now happens in ~effective_replication_map when the object
is marked as registered.

Registration happens when effective_replication_map_factory inserts
the newly created effective_replication_map to its _replication_maps
map, and the factory calles effective_replication_map::set_factory..

Note that effective_replication_map may be created temporarily
and not be inserted to the factory's map, therefore erase
is called only when required.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:20 +02:00
Benny Halevy
8a6fbe800f effective_replication_map_factory: add create_effective_replication_map
Make a factory key using the replication_strategy type
and config options, plus the token_metadata ring version
and use it to search an already-registred effective_replication_map.

If not found, calculate a new create_effective_replication_map
and register it using the above key.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
ecba37dbfd effective_replication_map: enable_lw_shared_from_this
So a effective_replication_map_ptr can be generated
using a raw pointer by effective_replication_map_factory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
f4f41e2908 effective_replication_map: define factory_key
To be used to locate the effective_replication_map
in the to-be-introduced effective_replication_map_factory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
3fed73e7c2 locator: add effective_replication_map_factory
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Avi Kivity
36919a4ed7 locator: replace seastar::sprint() with fmt::format()
sprint() is obsolete.
2021-10-27 17:02:00 +03:00
Benny Halevy
dc091fc952 effective_replication_map, abstract_replication_strategy: get_ranges: call on_internal_error in empty sorted_tokens case
Accessing tm.sorted_tokens().back() causes undefined behavior
if tm.sorted_tokens is empty.

Check that first and throw/abort using on_internal_error
in this case.

This will prevent the segfault but it doesn't fix the root cause
which is getting here with empty token_metadata.  That will be fixed
by the following patch.

Refs #9494

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211019075710.1626808-1-bhalevy@scylladb.com>
2021-10-19 18:52:59 +03:00
Benny Halevy
e4dc81ec04 abstract_replication_strategy: add to_qualified_class_name
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-18 12:13:25 +03:00
Benny Halevy
17296cba4b effective_replication_map: add get_range_addresses
Equivalent to abstract_replication_strategy get_range_addresses,
yet synchronous, as it uses the precalculated map.

Call it from storage_service::get_new_source_ranges
and range_streamer::get_all_ranges_with_sources_for.

Consequently, get_new_source_ranges and removenode_add_ranges
can become synchronous too.

Unfortunately we can't entirely get rid of
abstract_replication_strategy::get_range_addresses
as it's still needed by
range_streamer::get_all_ranges_with_strict_sources_for.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
8c85197c6c abstract_replication_strategy: get rid of shared_token_metadata member and ctor param
It is not used any more.

Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
91f2fd5f2c abstract_replication_strategy: recognized_options: pass const topology&
Prepare for deleting the _shared_token_metadata member.
All we need for recognized_options is the topology
(for network_topology_strategy).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
4d2561ff75 abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
d953e7b01a token_metadata: get rid of now-unused sync methods
Now that abstract_replication_strategy methods are all async
clone_only_token_map_sync, and update_normal_tokens_sync
are unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
bdce6f93ca abstract_replication_strategy: get rid of do_calculate_natural_endpoints
It is no longer in use.

And with it, the virtual calculate_natural_endpoint_sync method
of which it was the only caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
cbe58345b9 abstract_replication_strategy: futurize get_*address_ranges
Remaining callers of get_address_ranges and get_pending_address_ranges
are all either from a seastar thread or from a coroutine
so we can make the methods always async and drop the
can_yield param.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
91581ba23a abstract_replication_strategy: futurize get_range_addresses
All remaining use sites are called in a seastar thread
so we drop the can_yield param and make get_range_addresses
always async.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
3040e0a038 abstract_replication_strategy: futurize get_ranges(inet_address ep, token_metadata_ptr)
It is called only from repair, in a thread,
so it can be made always async and the need_preempt param
can be dropped.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
dfdc8d4ddb abstract_replication_strategy: move get_ranges and get_primary_ranges* to effective_replication_map
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.

Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:09:51 +03:00
Benny Halevy
0e5bb94e84 abstract_replication_strategy: get rid of cached_endpoints
Now that do_get_natural_endpoints is gone, the cached
endpoints are no longer in use.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:15:34 +03:00
Benny Halevy
25227ab5ea all replication strategies: get rid of do_get_natural_endpoints
Now that all falvors of get_natural_endpoints methods
were moved to effective_replication_map,
do_get_natural_endpoints and its overrides are unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:13:51 +03:00
Benny Halevy
aab363753f abstract_replication_strategy: move get_natural_endpoints_without_node_being_replaced to effective_replication_map
Use the precalculated endpoints map there
as well as the token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:10:01 +03:00
Benny Halevy
bb0ea0b1c0 shared_token_metadata: set: check version monotonicity
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:51 +03:00
Benny Halevy
43160abaec token_metadata: use static ring version
For generating unique _ring_version.

Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.

To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.

Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.

Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:17 +03:00
Benny Halevy
685f5e7704 token_metadata: get rid of copy constructor and assignment operator
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:00:55 +03:00
Benny Halevy
d74ecfbc29 abstract_replication_strategy: get rid of legacy get_natural_endpoints
implementation

Now that all users of it were converted to use the
effective_replication_map, the legacy
abstract_replication_strategy::get_natural_endpoints method
can be deleted.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:58:18 +03:00
Benny Halevy
4b838197e2 storage_service: update keyspaces effective_replication_map on token_metadata change
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.

Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.

We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps.  Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.

Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:05:28 +03:00
Benny Halevy
3393df45eb token_metadata, storage_service: unify token_metadata_lock and merge_lock.
Serialize the metadata changes with
keyspace create, update, or drop.

This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.

Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:01:25 +03:00
Benny Halevy
1e1d7d7df5 abstract_replication_strategy: introduce effective_replication_map
effective_replication_map holds the full replication_map
resulting from applying the effective replication strategy
over the given token_metadata and replication_strategy_config_options.

It is calculated once, in make_effective_replication_map(), and then it
can be used for retrieving the endpoints/token_ranges synchronously
from the precalculated map.

A new virtual get_natural_endpoints(const token&, const effective_replication_map&)
method has been added to abstract_replication_strategy so that
local_strategy and everywhere_replication_strategy can override it as they may be
needed before the token_metadata is established.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:53:03 +03:00
Benny Halevy
d96a67eb57 abstract_replication_strategy: use shared_ptr in registry
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.

While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
a1c573e6d3 abstract_replication_strategy: make calculate_natural_endpoints_sync private
And with that rename calculate_natural_endpoints(const token& search_token, const token_metadata&, can_yield)
to do_calculate_natural_endpoints and make it protected,

With this patch, all its external users call the async version, so
rename it back to calculate_natural_endpoints, and make
calculate_natural_endpoints_sync private since it's being called
only within abstract_replication_strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
a1098c0094 replication strategies: calculate_natural_endpoints: split into sync and async variants
calculate_natural_endpoints_sync and _async are both provided
temporarily until all users of them are converted to use
the async version which will remain.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
32c7314b80 network_topology_strategy: refactor calculate_natural_endpoints
Extract natural_endpoints_tracker out of calculate_natural_endpoints
so we easily split the function to sync and async variants.

Test: network_topology_strategy_test(dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
416531cce7 network_topology_strategy: use rslogger to debug-log configuration
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
330d9772d4 abstract_replication_strategy: move logger to locator namespace
To be used by network_topology_strategy and later, by
effective_replication_map_registry.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00