metric currently_open_for_writing, used to inform # of sstables opened for writing,
holds the same value as total_open_for_writing. that means we aren't actually
decreasing the counter, so it is bogus.
Moved to sstable_writer, because sstable is used by writer to open files,
which are then extracted from sstable object, and later the same object is
reused for read-only mode.
Fixes#9455.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211013134812.177398-1-raphaelsc@scylladb.com>
It was auto-expanded only if the strategy name
was the short "NetworkTopologyStrategy" name.
Fixes#9302.
Closes#9304.
* 'prepare_options' of https://github.com/bhalevy/scylla:
cql3: keyspace prepare_options: expand replication_factor also for fully qualified NetworkTopologyStrategy
abstract_replication_strategy: add to_qualified_class_name
TWCS can reshape at most 32 sstables spanning multiple windows, in a
single compaction round. Which sstables are compacted together, when
there are more than 32 sstables, is random.
If sstables with overlapping windows are compacted together, then
write amplification can be reduced because we may be able to push
all the data to a window W in a single compaction round, so we'll
not have to perform another compaction round later in W, to reduce
its number of files. This is also very good to reduce the amount
of transient file descriptors opened, because TWCS reshape
first reshapes all sstables spanning multiple windows, so if
all windows temporarily grow large in number of files, then
there's a risk which file descriptors can be exhausted.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211013203046.233540-3-raphaelsc@scylladb.com>
After a4053dbb72, data segregation is postponed to offstrategy, so reshape
procedure is called with disjoint sstables which belong to different
windows, so let's extend the optimization for disjoint sstables which
span more than one window. In this way, write amplification is reduced
for offstrategy compaction, as all disjoint sstables will be compacted
at once.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211013203046.233540-2-raphaelsc@scylladb.com>
It was auto-expanded only if the strategy name
was the short "NetworkTopologyStrategy" name.
Fixes#9302
Test: cql_query_test.test_rf_expand(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This mini series contains two fixes that are bundled together since the
second one assumes that the first one exists (or it will not fix
anything really...), the two problems were:
1. When certain operations are called on a service level controller
which doesn't have it's data accessor set, it can lead to a crash
since some operations will still try to dereference the accessor
pointer.
2. The cql environment test initialized the accessor with a
sharded<system_distributed_data>& however this sharded class as
itself is not initialized (sharded::start wasn't called), so for the
same that were unsafe for null dereference the accessor will now crash
for trying to access uninitialized sharded instance.
Closes#9468
* github.com:scylladb/scylla:
CQL test environment: Fix bad initialization order
Service Level Controller: Fix possible dereference of a null pointer
Before this patch, if Scylla crashes during some test in test/alternator,
all tests after it will fail because they can't connect to Scylla - and we
can get a report on hundreds of failures without a clear sign of where the
real problem was.
This patch introduces an autouse fixture (i.e., a fixture automatically
used by every test) which tries to run a do-nothing health-check request
after each test. If this health-check request fails, we conclude that
Scylla crashed and report the test in which this happened - and exit
pytest instead of failing a hundred more tests.
The failure report looks something like this:
```
! _pytest.outcomes.Exit: Scylla appears to have crashed in test test_batch.py::test_batch_get_item !
```
And the entire test run fails.
These extra health checks are not free, but they come fairly close to
being free: In my tests I measured less than 0.1 seconds slowdown of
the entire test suite (which has 618 tests) caused by the extra health
checks.
Fixes#9489
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211017123222.217559-1-nyh@scylladb.com>
Issue #9467 deprecated the blanket "--experimental" option which we
used to enable all experimental Scylla features for testing, and
suggests that individual experimental features should be enabled
instead.
So this is what we do in this patch for the Scylla-running scripts
in test/alternator and test/cql-pytest: We need to enable UDF for
the CQL tests, and to enable Alternator Streams and Alternator TTL
for the Alternator tests.
Refs #9467
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-2-nyh@scylladb.com>
Earlier we added experimental (and very incomplete) support for
Alternator's TTL feature, but forgot to set a *name* for this
experimental feature. As a result, this feature can be enabled only with
the blanket "--experimental" option and not with a specific
"--experimental-features=..." option.
Since issue #9467 deprecated the blanket "--experimental" option
and users are encouraged to only enable specific experimental
features, it is important that we have a name for it.
So the name chosen in this patch is "alternator-ttl".
Eventually this feature might evolve beyond Alternator-only,
but for now, I think it's a good name and we'll probably
graduate the experimental Alternator TTL feature before
supporting CQL, so it will be a new experimental feature
anyway.
Refs #9467.
db/config.cc
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-1-nyh@scylladb.com>
The warning was disabled during the migration to clang, but now it
appears unnecessary (perhaps clang added support for the attributes
it did not have then). It is valuable for detecting misspelled
attributes, so enable it again.
Closes#9480
The help string from the "--experimental-features" command-line option
lists the available experimental features, to helping a user who might
want to enable them. But this help string was manually written, and has
since drifted from reality:
* Two of the listed "experimental" features, cdc and lwt, have actually
graduated from being experimental long ago. Although technically a user
may still use the words "cdc" and "lwt" in the "experimental-features"
parameter, doing so is pointless, and worse: This text in the help
string can mislead a user into thinking that these two features are
still experimental - while they are not!
* One experimental feature - alternator-ttl - is missing from this list.
Instead of updating the help string text now - and needing to do this
again and again in the future as we change experimental features - what
this patch does is to construct the list of features automatically from
the map of supported feature names - excluding any features which map
to UNUSED.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013122635.132582-1-nyh@scylladb.com>
"
The current api design of abstract_replication_strategy
provides a can_yield parameter to calls that may stall
when traversing the token metadata in O(n^2) and even
in O(n) for a large number of token ranges.
But, to use this option the caller must run in a seastar thread.
It can't be used if the caller runs a coroutine or plain
async tasks.
Rather than keep adding threads (e.g. in storage_service::load_and_stream
or storage_service::describe_ring), the series offers an infrastructure
change: precalculating the token->endpoints map once, using an async task,
and keeping the results in a `effective_replication_map` object.
The latter can be used for efficient and stall-free calls, like
get_natural_endpoints, or get_ranges/get_primary_range, replacing their
equivalents in abstract_replication_strategy, and dropping the public
abstract_replication_strategy::calculate_natural_endpoints and its
internal cached_endpoints map.
Other than the performance benefits of:
1. The current calls require running a thread to yield.
Precalculating the map (using async task) allows us to use synchronous calls
without stalling the rector.
2. The replication maps can and should be shared
between keyspaces that use the same replication strategy.
(Will be sent as a follow-up to the series)
The bigger benefits (courtesy of Avi Kivity) are laying the groundwork for:
1. atomic replication metadata - an operation can capture a replication map once, and then use consistent information from the map without worrying that it changes under its feet. We may even be able to s/inet_address/replica_ptr/ later.
2. establish boundaries on the use of replication information - by making a replication map not visible, and observing when its reference count drops to zero, we can tell when the new replication map is fully in use. When we start writing to a new node we'll be able to locate a point in time where all writes that were not aware of the new node were completed (this is the point where we should start streaming).
Notes:
* The get_natural_endpoints method that uses the effective_replication_map
is still provided as a abstract_replication_strategy virtual method
so that local_strategy can override it and privide natural endpoints
for any search token, even in the absence of token_metadata, when\
called early-on, before token_metadata has been established.
The effective_replication_map materializes the replication strategy
over a given replication strategy options and token_metadata.
Whenever either of those change for a keyspace, we make a new
effective_replication_map and keep it in the keyspace for latter use.
Methods that depend on an ad-hoc token_metadata (e.g. during
node operations like bootstrap or replace) are still provided
by abstract_replication_strategy.
TODO:
- effective_replication_map registry
- Move pending ranges from token_metadata to replication map
- get rid of abstract_replication_strategy::get_range_addresses(token_metadata&)
- calculate replication map and use it instead.
Test: unit(dev, debug)
Dtest: next-gating, bootstrap_test.py update_cluster_layout_tests.py alternator_tests.py -a 'dtest-full,!dtest-heavy' (release)
"
* tag 'effective_replication_strategy-v6' of github.com:bhalevy/scylla: (44 commits)
effective_replication_map: add get_range_addresses
abstract_replication_strategy: get rid of shared_token_metadata member and ctor param
abstract_replication_strategy: recognized_options: pass const topology&
abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map
token_metadata: get rid of now-unused sync methods
abstract_replication_strategy: get rid of do_calculate_natural_endpoints
abstract_replication_strategy: futurize get_*address_ranges
abstract_replication_strategy: futurize get_range_addresses
abstract_replication_strategy: futurize get_ranges(inet_address ep, token_metadata_ptr)
abstract_replication_strategy: move get_ranges and get_primary_ranges* to effective_replication_map
compaction_manager: pass owned_ranges via cleanup/upgrade options
abstract_replication_strategy: get rid of cached_endpoints
all replication strategies: get rid of do_get_natural_endpoints
storage_proxy: use effective_replication_map token_metadata_ptr along with endpoints
abstract_replication_strategy: move get_natural_endpoints_without_node_being_replaced to effective_replication_map
storage_service: bootstrap: add log messages
storage_service: get_mutable_token_metadata_ptr: always invalidate_cached_rings
shared_token_metadata: set: check version monotonicity
token_metadata: use static ring version
token_metadata: get rid of copy constructor and assignment operator
...
Make a reader that reads from memtable in reverse order.
This draft PR includes two commits, out of which only the second is
relevant for review.
Described in #9133.
Refs #1413.
Closes#9174
* github.com:scylladb/scylla:
partition_snapshot_reader: pop_range_tombstone returns reference (instead of value) when possible.
memtable: enable native reversing
partition_snapshot_reader: reverse ck_range when needed by Reversing
memtable, partition_snapshot_reader: read from partition in reverse
partition_snapshot_reader: rows_position and rows_iter_type supporting reverse iteration
partition_snapshot_reader: split responsibility of ck_range
partition_snapshot_reader: separate _schema into _query_schema and _partition_schema
query: reverse clustering_range
test: cql_query_test: fix test_query_limit for reversed queries
Our scylla.yaml contains a comment listing the available experimental
features, supposedly helping a user who might want to enable them.
I think the usefuless of this comment is dubious, but as long as we
have one, let's at least make it accurate:
* Two of the listed "experimental" features, cdc and lwt, have actually
graduated from being experimental long ago. Although technically a user
may still use the words "cdc" and "lwt" in the "experimental-features"
list, doing so is pointless, and worse: This comment suggests that these
two features are still experimental - while they are not!
* One experimental feature - alternator-ttl - is missing from this list.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013083247.13223-1-nyh@scylladb.com>
Equivalent to abstract_replication_strategy get_range_addresses,
yet synchronous, as it uses the precalculated map.
Call it from storage_service::get_new_source_ranges
and range_streamer::get_all_ranges_with_sources_for.
Consequently, get_new_source_ranges and removenode_add_ranges
can become synchronous too.
Unfortunately we can't entirely get rid of
abstract_replication_strategy::get_range_addresses
as it's still needed by
range_streamer::get_all_ranges_with_strict_sources_for.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is not used any more.
Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for deleting the _shared_token_metadata member.
All we need for recognized_options is the topology
(for network_topology_strategy).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that abstract_replication_strategy methods are all async
clone_only_token_map_sync, and update_normal_tokens_sync
are unused.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is no longer in use.
And with it, the virtual calculate_natural_endpoint_sync method
of which it was the only caller.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Remaining callers of get_address_ranges and get_pending_address_ranges
are all either from a seastar thread or from a coroutine
so we can make the methods always async and drop the
can_yield param.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All remaining use sites are called in a seastar thread
so we drop the can_yield param and make get_range_addresses
always async.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is called only from repair, in a thread,
so it can be made always async and the need_preempt param
can be dropped.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.
Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So they can be easily computed using an async task
before constructing the compaction object
in a following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that all falvors of get_natural_endpoints methods
were moved to effective_replication_map,
do_get_natural_endpoints and its overrides are unused.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We should invalidate the cached rings every time the
token metadata changes, not only on topology changes
to invalidate cached token/replication mappings
when the modified token_metadata is committed.
Currently we can do without it (apparently)
but this will become a requirement for keep
versions of the effective_replication_map
in a registry, indexed by the token_metadata ring version,
among other things.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For generating unique _ring_version.
Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.
To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.
Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.
Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
implementation
Now that all users of it were converted to use the
effective_replication_map, the legacy
abstract_replication_strategy::get_natural_endpoints method
can be deleted.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we call find_keyspace and then
get_effective_replication_map on the _same_ keyspace
to get_natural_endpoints for multiple tokens.
Get the effective_replication_map once in these cases
and use it for each token.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.
Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.
We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps. Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.
Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Serialize the metadata changes with
keyspace create, update, or drop.
This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.
Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than a _pending_token_metadata_ptr member in the storeage_service
class. This is now much easier that the function was converted to a
coroutine.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>