Commit Graph

28630 Commits

Author SHA1 Message Date
Benny Halevy
dfdc8d4ddb abstract_replication_strategy: move get_ranges and get_primary_ranges* to effective_replication_map
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.

Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:09:51 +03:00
Benny Halevy
5483269dfb compaction_manager: pass owned_ranges via cleanup/upgrade options
So they can be easily computed using an async task
before constructing the compaction object
in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:17:46 +03:00
Benny Halevy
0e5bb94e84 abstract_replication_strategy: get rid of cached_endpoints
Now that do_get_natural_endpoints is gone, the cached
endpoints are no longer in use.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:15:34 +03:00
Benny Halevy
25227ab5ea all replication strategies: get rid of do_get_natural_endpoints
Now that all falvors of get_natural_endpoints methods
were moved to effective_replication_map,
do_get_natural_endpoints and its overrides are unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:13:51 +03:00
Benny Halevy
facd5035f1 storage_proxy: use effective_replication_map token_metadata_ptr along with endpoints
Use the same token_metadata used for get_natural_endpoints_without_node_being_replaced
where used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:11:43 +03:00
Benny Halevy
aab363753f abstract_replication_strategy: move get_natural_endpoints_without_node_being_replaced to effective_replication_map
Use the precalculated endpoints map there
as well as the token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:10:01 +03:00
Benny Halevy
548719aac1 storage_service: bootstrap: add log messages
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:07:59 +03:00
Benny Halevy
08fef2a702 storage_service: get_mutable_token_metadata_ptr: always invalidate_cached_rings
We should invalidate the cached rings every time the
token metadata changes, not only on topology changes
to invalidate cached token/replication mappings
when the modified token_metadata is committed.

Currently we can do without it (apparently)
but this will become a requirement for keep
versions of the effective_replication_map
in a registry, indexed by the token_metadata ring version,
among other things.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:05:57 +03:00
Benny Halevy
bb0ea0b1c0 shared_token_metadata: set: check version monotonicity
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:51 +03:00
Benny Halevy
43160abaec token_metadata: use static ring version
For generating unique _ring_version.

Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.

To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.

Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.

Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:17 +03:00
Benny Halevy
685f5e7704 token_metadata: get rid of copy constructor and assignment operator
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:00:55 +03:00
Benny Halevy
d74ecfbc29 abstract_replication_strategy: get rid of legacy get_natural_endpoints
implementation

Now that all users of it were converted to use the
effective_replication_map, the legacy
abstract_replication_strategy::get_natural_endpoints method
can be deleted.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:58:18 +03:00
Benny Halevy
4afe8cad3c repair: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:57:16 +03:00
Benny Halevy
cddd16f22d db: view: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:55:50 +03:00
Benny Halevy
96aa6161d8 db: hints manager: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:54:52 +03:00
Benny Halevy
c10a439f6c storage_service: optimize get_effective_replication_map multi-usage
Currently, we call find_keyspace and then
get_effective_replication_map on the _same_ keyspace
to get_natural_endpoints for multiple tokens.

Get the effective_replication_map once in these cases
and use it for each token.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:53:18 +03:00
Benny Halevy
fdaa891332 storage_service, sstables_loader: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:50:27 +03:00
Benny Halevy
4b838197e2 storage_service: update keyspaces effective_replication_map on token_metadata change
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.

Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.

We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps.  Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.

Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:05:28 +03:00
Benny Halevy
3393df45eb token_metadata, storage_service: unify token_metadata_lock and merge_lock.
Serialize the metadata changes with
keyspace create, update, or drop.

This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.

Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:01:25 +03:00
Benny Halevy
4cba7195ee storage_service: coroutinize mutate_token_metadata
And fold with_token_metadata_lock into it, as it's
its only caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:59:58 +03:00
Benny Halevy
045806cae7 storage_service: replicate_to_all_cores: use local pending_token_metadata_ptr
Rather than a _pending_token_metadata_ptr member in the storeage_service
class.  This is now much easier that the function was converted to a
coroutine.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:58:30 +03:00
Benny Halevy
52f48f47f6 storage_service: coroutinize replicate_to_all_cores
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:57:05 +03:00
Benny Halevy
991a6a8664 keyspace: update_effective_replication_map
And use it to get_natural_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:55:34 +03:00
Benny Halevy
970b0a50b5 keyspace: futurize create_replication_strategy
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:53:41 +03:00
Benny Halevy
eb752c3f69 test: network_topology_strategy_test: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:53:09 +03:00
Benny Halevy
1e1d7d7df5 abstract_replication_strategy: introduce effective_replication_map
effective_replication_map holds the full replication_map
resulting from applying the effective replication strategy
over the given token_metadata and replication_strategy_config_options.

It is calculated once, in make_effective_replication_map(), and then it
can be used for retrieving the endpoints/token_ranges synchronously
from the precalculated map.

A new virtual get_natural_endpoints(const token&, const effective_replication_map&)
method has been added to abstract_replication_strategy so that
local_strategy and everywhere_replication_strategy can override it as they may be
needed before the token_metadata is established.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:53:03 +03:00
Benny Halevy
d96a67eb57 abstract_replication_strategy: use shared_ptr in registry
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.

While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
4511c9acdb database.hh: convert ifdef block to pragma once
Besides being more modern and more efficient for
the compiler, this #ifndef block confuses my editor
that greys out the whole block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
a1c573e6d3 abstract_replication_strategy: make calculate_natural_endpoints_sync private
And with that rename calculate_natural_endpoints(const token& search_token, const token_metadata&, can_yield)
to do_calculate_natural_endpoints and make it protected,

With this patch, all its external users call the async version, so
rename it back to calculate_natural_endpoints, and make
calculate_natural_endpoints_sync private since it's being called
only within abstract_replication_strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
a1098c0094 replication strategies: calculate_natural_endpoints: split into sync and async variants
calculate_natural_endpoints_sync and _async are both provided
temporarily until all users of them are converted to use
the async version which will remain.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
32c7314b80 network_topology_strategy: refactor calculate_natural_endpoints
Extract natural_endpoints_tracker out of calculate_natural_endpoints
so we easily split the function to sync and async variants.

Test: network_topology_strategy_test(dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
416531cce7 network_topology_strategy: use rslogger to debug-log configuration
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
330d9772d4 abstract_replication_strategy: move logger to locator namespace
To be used by network_topology_strategy and later, by
effective_replication_map_registry.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
7401d03e8c abstract_replication_strategy: define replication_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Benny Halevy
5001d261d4 abstract_replication_strategy: define replication_strategy_config_options
To be used for searching effective replication strategy instances.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Pavel Solodovnikov
8b917f7c99 db: mark --experimental option deprecated
The documentation for --experimental config option states
that it enables all experimental features, but this is no
longer true, i.e.: raft feature is not enabled with it and
should be explicitly enabled via `--experimental-features=raft`
switch (we don't want to enable it by default alongside
other features).

Since the flag doesn't do what it's intended to, we should
mark it as "deprecated", because documenting each exception
(there could be more than only raft in the future) will be
a burden and docs will constantly go out-of-sync with the
code.

Adjust the description for the option to reflect that, mark
it "deprecated" and suggest using --experimental-features, instead.

Fixes: #9467

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:12 +03:00
Pavel Solodovnikov
162f1899e8 db: update the list of supported experimental features
`raft` and `alternator-streams` features were missing
from the description for `experimental-features` config
flag.

Update `scylla.yaml` template comments to reflect that, too.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-1-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:11 +03:00
Avi Kivity
0d48c39cb3 Merge 'tools/scylla-sstable: allow opening sstables from any path' from Botond Dénes
Currently it is required that sstables (in particular la/mx ones) are located at a valid path. This is required because `sstables::entry_descriptor::make_descriptor()` extracts the keyspace and table names from the sstable dir components. This PR relaxes this by using a newly introduced  `sstables::entry_descriptor::make_descriptor()` overload which allows the caller to specify keyspace and table names, not necessitating these to be extracted from the path.

Tests: unit(dev), manual(testing that `scylla-sstables` can indeed load sstables from invalid path)

Closes #9466

* github.com:scylladb/scylla:
  tools/scylla-sstable: allow loading sstables from any path
  sstables: entry_descriptor::make_descriptor(): add overload with provided ks/cf
2021-10-12 12:50:11 +03:00
Takuya ASADA
06c28585f9 dist: raise fs.file-max and fs.nr_open to enough size for scylla
Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.

Fixes #9461

Closes #9461
2021-10-12 12:47:35 +03:00
Botond Dénes
cc65c9d0da compaction: scrub/segregate: adjust partition-estimate as buckets accumulate
Scrub compaction in segregate mode can split the input sstable into as
many as hundreds or even thousands of output sstables in the extreme
case. But even at a few dozen output sstables, most of these will only
have a few partitions with a few rows. These sstables however will still
have their bloom filter allocated according to the original
partition-count estimate, causing memory bloat or even OOM in the
extreme case.
This patch solves this by aggressively adjusting the partition count
downwards after the second bucket has been created. Each subsequent
bucket will halve the partition estimate, which will quickly reach 1.

Fixes: #9463

Closes #9464
2021-10-12 12:44:42 +03:00
Botond Dénes
d535346a6e tools/scylla-sstable: allow loading sstables from any path
Currently it is required that sstables (in particular la/mx ones) are
located at a valid path. This is required because
`sstables::entry_descriptor::make_descriptor()` extracts the keyspace
and table names from the sstable dir components.
This patch relaxes this by using the freshly introduced
`sstables::entry_descriptor::make_descriptor()` overload which allows
the caller to specify keyspace and table names.
2021-10-12 11:47:58 +03:00
Botond Dénes
1b7b3a81e6 sstables: entry_descriptor::make_descriptor(): add overload with provided ks/cf
Not necessitating these to be extracted from the sstable dir path. This
practically allows for la/mx sstables at non-standard paths to be
opened. This will be used by the `scylla-sstable` tool which wants to be
flexible about where the sstables it opens are located.
2021-10-12 11:43:23 +03:00
Nadav Har'El
e4bc97349c cql-pytest: XFAILing test was fixed by a Python driver fix
Issue #8203 describes a bug in a long scan which returns a lot of empty
pages (e.g., because most of the results are filtered out). We have two
cql-pytest test cases that reproduced this bug - one for a whole-table
scan and one for a single-partition scan.

It turned out that the bug was not in the Scylla server, but actually in
the Python driver which incorrectly stopped the iteration after an empty
page even though this page did contain the "more pages" flag.

This driver bug was already fixed in the Datastax driver (see
6ed53d9f70,
and in the Scylla fork of the driver:
1d9077d3f4

So in this patch we drop the XFAIL, and if the driver is not new enough
to contain this fix - the test is skipped.

Since our Jenkins machines have the latest Scylla fork of the driver and
it already contains this fix, these tests will not be skipped - and will
run and should pass. Developers who run these tests on their development
machine will see these tests either passing or skipped - depending on
which version of the driver they have installed.

Closes #8203

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211011113848.698935-1-nyh@scylladb.com>
2021-10-12 10:04:02 +02:00
Nadav Har'El
33f8ec09df Merge 'treewide: improve compatibility with gcc 11' from Avi Kivity
Our source base drifted away from gcc compatibility; this mostly
restores the ability to build with gcc. An important exception is
coroutines that have an initializer list [1]; this still doesn't work.

We aim to switch back to gcc 11 if/when this gives us better
C++ compatibility and performance.

Test: unit (dev)

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056

Closes #9459

* github.com:scylladb/scylla:
  test: radix_tree_printer: avoid template specialization in class context
  test: raft: avoid ignored variable errors
  test: reader_concurrency_semaphore_test: isolate from namespace of source_location
  test: cql_query_test: drop unused lambda assert_replication_not_contains
  test: commitlog_test: don't use deprecated seastar::unaligned_cast
  test: adjust signed/unsigned comparisons in loops and boost tests
  build: silence some gcc 11 warnings
  sstables: processing_result_generator: make coroutine support palatable for C++20 compilers
  managed_bytes: avoid compile-time loop in converting constructor
  service: service_level_controller: drop unused variable sl_compare
  raft: disambiguate promise name in raft::active_read
  locator: azure_snitch: use full type name in definition of globals
  cql3: statements: create_service_level_statement: don't ignore replace_defaults()
  cql3: statement_restrictions: adjust call to std::vector deduction guide
  types: remove recursive constraint in deserialize_value
  cql3: restrictions: relax constraint on visitor_with_binary_operator_content
  treewide: handle switch statements that return
  cql3: expr: correct type of captured map value_type
  cdc: adjust type of streams_count
  alternator: disambiguate attrs_to_get in table_requests
2021-10-11 16:54:01 +03:00
Nadav Har'El
5e4c60e19a Merge: Unload storage service from irrelevant APIs
Meged patch series from Pavel Emelyanov:

There's a long-term (well, likely mid-term already) goal to keep a
single role for the storage_service, namely -- managing the state of
a node in the ring. Then rename it once it happens to stop people
from loading new stuff into storage_service. There are at least three
REST API endpoints that stand on the way.

1. load_new_ss_tables. This part is moved to a new sharded sstables
   loader that wraps existing distributed_loader

2. view_build_statuses. Satuses are maintained by view_builder so must
   be retrieved from the same place

3. enable_|disable_auto_compaction. This is purely database knob that
   used to be such some time ago

This change also removes view_update_generator from storage_service list
of dependencies and leaves the system_distributed_keyspace be the
start-only one (another not yet published branch makes use of it and
removes s.d.ks from storage service at all).

branch: https://github.com/xemul/scylla/tree/br-unload-storage-service-api-3
tests: unit(dev)
refs: #5489

* 'br-unload-storage-service-api-3' of github.com:xemul/scylla:
  storage_service, api: Move set-tables-autocompaction back into API
  api: Fix indentation after previous patch
  api, database, storage_service: Unify auto-compaction toggle
  api: Remove storage service from new APIs
  view_builder: Accept view_build_statuses
  storage_service: Move view_build_statuses code
  api, storage_service: Keep view builder API handlers separate
  storage_service: Remove view update generator from
  sstables_loader: Accept the sstables loading code
  storage_service: Move the sstables loading code
  storage_service, api: Keep sstables loading API handlers separate
  sstables_loader: Introduce
  distributed_loader, utils: Move verify_owner_and_mode
  distributed_loader: Fix methods visibility
2021-10-11 15:22:06 +03:00
Kamil Braun
339b9bc38a sstables: mx: partition_reversing_data_source: close internal data consumers
`partition_reversing_data_source` uses `continuous_data_consumer`s
internally (`partition_header_context`, `row_body_skipping_context`)
which hold `input_stream`s opened to sstable data files. These
`input_stream`s must be closed before destruction. Right now they would
sometimes cause "Assertion `_reads_in_progress == 0' failed" on
destruction.

Close the `continuous_data_consumer`s before they are destroyed so they
can close their `input_stream`s.

Fixes #9444.

Closes #9451
2021-10-11 12:35:54 +02:00
Pavel Emelyanov
f0b5ab1c61 storage_service, api: Move set-tables-autocompaction back into API
The global autocompaction toggle is no longer tied to the storage
service. It naturally belongs to the database, but is small and
tidy enough not to pollute database methods and can be placed into
the api/ dir itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:13:59 +03:00
Pavel Emelyanov
fece1a2f9f api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:13:56 +03:00
Pavel Emelyanov
c5128eea67 api, database, storage_service: Unify auto-compaction toggle
There are two knobs here -- global and per-table one. Both were added
without any synchronisation, but the former one was later fixed to
become serialized and not to be available "too early".

This patch unifies both toggles to be serialized with each-other and
not be enabled too early.

The justification for this change is to move the global toggle from out
of the storage service, as it really belongs to the database, not the
storage service. Respectively, the current synchronization, that depends
on storage service internals, should be replaced with something else.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:12:39 +03:00
Pavel Emelyanov
c53c74258a api: Remove storage service from new APIs
The APIs that had been recently switched to using relevant services no
longer need the storage service reference capture, so remove it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:11:52 +03:00