Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.
Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So they can be easily computed using an async task
before constructing the compaction object
in a following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that all falvors of get_natural_endpoints methods
were moved to effective_replication_map,
do_get_natural_endpoints and its overrides are unused.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We should invalidate the cached rings every time the
token metadata changes, not only on topology changes
to invalidate cached token/replication mappings
when the modified token_metadata is committed.
Currently we can do without it (apparently)
but this will become a requirement for keep
versions of the effective_replication_map
in a registry, indexed by the token_metadata ring version,
among other things.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For generating unique _ring_version.
Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.
To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.
Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.
Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
implementation
Now that all users of it were converted to use the
effective_replication_map, the legacy
abstract_replication_strategy::get_natural_endpoints method
can be deleted.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we call find_keyspace and then
get_effective_replication_map on the _same_ keyspace
to get_natural_endpoints for multiple tokens.
Get the effective_replication_map once in these cases
and use it for each token.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.
Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.
We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps. Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.
Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Serialize the metadata changes with
keyspace create, update, or drop.
This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.
Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than a _pending_token_metadata_ptr member in the storeage_service
class. This is now much easier that the function was converted to a
coroutine.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
effective_replication_map holds the full replication_map
resulting from applying the effective replication strategy
over the given token_metadata and replication_strategy_config_options.
It is calculated once, in make_effective_replication_map(), and then it
can be used for retrieving the endpoints/token_ranges synchronously
from the precalculated map.
A new virtual get_natural_endpoints(const token&, const effective_replication_map&)
method has been added to abstract_replication_strategy so that
local_strategy and everywhere_replication_strategy can override it as they may be
needed before the token_metadata is established.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.
While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Besides being more modern and more efficient for
the compiler, this #ifndef block confuses my editor
that greys out the whole block.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And with that rename calculate_natural_endpoints(const token& search_token, const token_metadata&, can_yield)
to do_calculate_natural_endpoints and make it protected,
With this patch, all its external users call the async version, so
rename it back to calculate_natural_endpoints, and make
calculate_natural_endpoints_sync private since it's being called
only within abstract_replication_strategy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
calculate_natural_endpoints_sync and _async are both provided
temporarily until all users of them are converted to use
the async version which will remain.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Extract natural_endpoints_tracker out of calculate_natural_endpoints
so we easily split the function to sync and async variants.
Test: network_topology_strategy_test(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The documentation for --experimental config option states
that it enables all experimental features, but this is no
longer true, i.e.: raft feature is not enabled with it and
should be explicitly enabled via `--experimental-features=raft`
switch (we don't want to enable it by default alongside
other features).
Since the flag doesn't do what it's intended to, we should
mark it as "deprecated", because documenting each exception
(there could be more than only raft in the future) will be
a burden and docs will constantly go out-of-sync with the
code.
Adjust the description for the option to reflect that, mark
it "deprecated" and suggest using --experimental-features, instead.
Fixes: #9467
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>
Currently it is required that sstables (in particular la/mx ones) are located at a valid path. This is required because `sstables::entry_descriptor::make_descriptor()` extracts the keyspace and table names from the sstable dir components. This PR relaxes this by using a newly introduced `sstables::entry_descriptor::make_descriptor()` overload which allows the caller to specify keyspace and table names, not necessitating these to be extracted from the path.
Tests: unit(dev), manual(testing that `scylla-sstables` can indeed load sstables from invalid path)
Closes#9466
* github.com:scylladb/scylla:
tools/scylla-sstable: allow loading sstables from any path
sstables: entry_descriptor::make_descriptor(): add overload with provided ks/cf
Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.
Fixes#9461Closes#9461
Scrub compaction in segregate mode can split the input sstable into as
many as hundreds or even thousands of output sstables in the extreme
case. But even at a few dozen output sstables, most of these will only
have a few partitions with a few rows. These sstables however will still
have their bloom filter allocated according to the original
partition-count estimate, causing memory bloat or even OOM in the
extreme case.
This patch solves this by aggressively adjusting the partition count
downwards after the second bucket has been created. Each subsequent
bucket will halve the partition estimate, which will quickly reach 1.
Fixes: #9463Closes#9464
Currently it is required that sstables (in particular la/mx ones) are
located at a valid path. This is required because
`sstables::entry_descriptor::make_descriptor()` extracts the keyspace
and table names from the sstable dir components.
This patch relaxes this by using the freshly introduced
`sstables::entry_descriptor::make_descriptor()` overload which allows
the caller to specify keyspace and table names.
Not necessitating these to be extracted from the sstable dir path. This
practically allows for la/mx sstables at non-standard paths to be
opened. This will be used by the `scylla-sstable` tool which wants to be
flexible about where the sstables it opens are located.
Issue #8203 describes a bug in a long scan which returns a lot of empty
pages (e.g., because most of the results are filtered out). We have two
cql-pytest test cases that reproduced this bug - one for a whole-table
scan and one for a single-partition scan.
It turned out that the bug was not in the Scylla server, but actually in
the Python driver which incorrectly stopped the iteration after an empty
page even though this page did contain the "more pages" flag.
This driver bug was already fixed in the Datastax driver (see
6ed53d9f70,
and in the Scylla fork of the driver:
1d9077d3f4
So in this patch we drop the XFAIL, and if the driver is not new enough
to contain this fix - the test is skipped.
Since our Jenkins machines have the latest Scylla fork of the driver and
it already contains this fix, these tests will not be skipped - and will
run and should pass. Developers who run these tests on their development
machine will see these tests either passing or skipped - depending on
which version of the driver they have installed.
Closes#8203
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211011113848.698935-1-nyh@scylladb.com>
Our source base drifted away from gcc compatibility; this mostly
restores the ability to build with gcc. An important exception is
coroutines that have an initializer list [1]; this still doesn't work.
We aim to switch back to gcc 11 if/when this gives us better
C++ compatibility and performance.
Test: unit (dev)
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056Closes#9459
* github.com:scylladb/scylla:
test: radix_tree_printer: avoid template specialization in class context
test: raft: avoid ignored variable errors
test: reader_concurrency_semaphore_test: isolate from namespace of source_location
test: cql_query_test: drop unused lambda assert_replication_not_contains
test: commitlog_test: don't use deprecated seastar::unaligned_cast
test: adjust signed/unsigned comparisons in loops and boost tests
build: silence some gcc 11 warnings
sstables: processing_result_generator: make coroutine support palatable for C++20 compilers
managed_bytes: avoid compile-time loop in converting constructor
service: service_level_controller: drop unused variable sl_compare
raft: disambiguate promise name in raft::active_read
locator: azure_snitch: use full type name in definition of globals
cql3: statements: create_service_level_statement: don't ignore replace_defaults()
cql3: statement_restrictions: adjust call to std::vector deduction guide
types: remove recursive constraint in deserialize_value
cql3: restrictions: relax constraint on visitor_with_binary_operator_content
treewide: handle switch statements that return
cql3: expr: correct type of captured map value_type
cdc: adjust type of streams_count
alternator: disambiguate attrs_to_get in table_requests
Meged patch series from Pavel Emelyanov:
There's a long-term (well, likely mid-term already) goal to keep a
single role for the storage_service, namely -- managing the state of
a node in the ring. Then rename it once it happens to stop people
from loading new stuff into storage_service. There are at least three
REST API endpoints that stand on the way.
1. load_new_ss_tables. This part is moved to a new sharded sstables
loader that wraps existing distributed_loader
2. view_build_statuses. Satuses are maintained by view_builder so must
be retrieved from the same place
3. enable_|disable_auto_compaction. This is purely database knob that
used to be such some time ago
This change also removes view_update_generator from storage_service list
of dependencies and leaves the system_distributed_keyspace be the
start-only one (another not yet published branch makes use of it and
removes s.d.ks from storage service at all).
branch: https://github.com/xemul/scylla/tree/br-unload-storage-service-api-3
tests: unit(dev)
refs: #5489
* 'br-unload-storage-service-api-3' of github.com:xemul/scylla:
storage_service, api: Move set-tables-autocompaction back into API
api: Fix indentation after previous patch
api, database, storage_service: Unify auto-compaction toggle
api: Remove storage service from new APIs
view_builder: Accept view_build_statuses
storage_service: Move view_build_statuses code
api, storage_service: Keep view builder API handlers separate
storage_service: Remove view update generator from
sstables_loader: Accept the sstables loading code
storage_service: Move the sstables loading code
storage_service, api: Keep sstables loading API handlers separate
sstables_loader: Introduce
distributed_loader, utils: Move verify_owner_and_mode
distributed_loader: Fix methods visibility
`partition_reversing_data_source` uses `continuous_data_consumer`s
internally (`partition_header_context`, `row_body_skipping_context`)
which hold `input_stream`s opened to sstable data files. These
`input_stream`s must be closed before destruction. Right now they would
sometimes cause "Assertion `_reads_in_progress == 0' failed" on
destruction.
Close the `continuous_data_consumer`s before they are destroyed so they
can close their `input_stream`s.
Fixes#9444.
Closes#9451
The global autocompaction toggle is no longer tied to the storage
service. It naturally belongs to the database, but is small and
tidy enough not to pollute database methods and can be placed into
the api/ dir itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two knobs here -- global and per-table one. Both were added
without any synchronisation, but the former one was later fixed to
become serialized and not to be available "too early".
This patch unifies both toggles to be serialized with each-other and
not be enabled too early.
The justification for this change is to move the global toggle from out
of the storage service, as it really belongs to the database, not the
storage service. Respectively, the current synchronization, that depends
on storage service internals, should be replaced with something else.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The APIs that had been recently switched to using relevant services no
longer need the storage service reference capture, so remove it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>