Commit Graph

38625 Commits

Author SHA1 Message Date
Benny Halevy
cfecb68245 compaction_manager: stop: close compaction_state:s gates
Make sure the compaction_state:s are idle before
they are destroyed. Although all tasks are stopped
in stop_ongoing_compactions, make sure there is
fiber holding the compaction_state gate.

compaction_manager::remove now needs to close the
compaction_state gate and to stop_ongoing_compactions
only if the gate is not closed yet.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
96055414c7 compaction_manager: gracefully handle gate close
Check if the compaction_state gate is closed
along with _state != state::enabled and return early
in this case.

At this point entering the gate is guaranteed to succeed.
So enter the gate before calling `perform_compaction`
keeping the std::optional<gate_holder> throughout
the compaction task.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
a5b7f1a275 task_manager: task: start: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
f9a7635390 task_manager: module: make_task: enter gate when the task is created
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb/scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
51792d2292 task_manaer: module: stop: request abort
Have a private about_source for every module
and request abort on stop() to signal all outstanding
tasks to abort (especially when they are sleeping
for the task_ttl).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
d7205db863 task_manager: task::impl: subscribe to module about_source
Rather to the top-level task_manager about_source,
to provide separation between task_manager modules
so each one can be aborted and stopped independentally
of the others (in the next patch).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
062684eb1f test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
b9127f55ac test: simple_backlog_controller_test: stop compaction and task managers
The compaction_manager and task_manager should
be orderly stopped before they are destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Piotr Smaroń
eb46f1bd17 guardrails: restrict replication factor (RF)
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.

Resolves #8619
Refs #8892 (the RF part, not the replication-strategy part)

Closes #14262
2023-09-04 19:22:17 +03:00
Patryk Jędrzejczak
06c0c6a9d9 raft_group0: start leadership monitor fiber during restart
Currently, we start group 0 leadership monitor fiber only during
a normal bootstrap. However, we should also do it when we restart
a node (either with or without upgrading it to Raft).

Fixes #15166

Closes #15204
2023-09-04 10:41:50 +02:00
Kefu Chai
dd59b90999 open-coredump: pass the content of "image" file not its path to dbuild
in a4eb3c6e0f, we passed the path of
"image" to `dbuild`, but that was wrong. we should pass its content
to this script. so in this change, it is fixed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15247
2023-09-04 07:34:44 +03:00
Avi Kivity
78fc3b5f56 config: rename stream_plan_ranges_percentage to *_fraction
The value is specified as a fraction between 0 and 1, so don't
mislead users into specifying a value between 0 and 100.

Closes #15261
2023-09-03 23:24:29 +03:00
Avi Kivity
9a3d57256a Merge 'config: add index_cache_fraction' from Michał Chojnowski
Index caching was disabled by default because it caused performance regressions
for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202.

However, it also means that there are workloads which could benefit from the
index cache, but (by default) don't.

As a compromise, we can set a default limit on the memory usage of index cache,
which should be small enough to avoid catastrophic regressions in
small-partition workloads, but big enough to accommodate workloads where
index cache is obviously beneficial.

This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default,
and re-enables index caching by default.

Fixes #15118

Closes #14994

* github.com:scylladb/scylladb:
  test: boost/cache_algorithm_test: add cache_algorithm_test
  sstables: partition_index_cache: deglobalize stats
  utils: cached_file: deglobalize cached_file metrics
  db: config: enable index caching by default
  config: add index_cache_fraction
  utils: lru: add move semantics to list links
2023-09-03 19:39:31 +03:00
Dawid Medrek
a5448fade9 Simplify service_set constructor in init.hh
Leverage the comma operator with a parameter pack.

Closes #15246
2023-09-03 18:14:44 +03:00
Aleksandra Martyniuk
cf37ab96f4 api: task_manager: fix indentation
Closes #15173
2023-09-02 08:18:59 +03:00
Michał Chojnowski
bcc235ad5f test: boost/cache_algorithm_test: add cache_algorithm_test
The tests added in this patch validate that index_cache_fraction
does what it's supposed to do.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
f00bed9429 sstables: partition_index_cache: deglobalize stats
Move partition_index_cache stats from a thread_local variable
to cache_tracker. After the change, partition_index_cache
receives a reference to the stats via constructor, instead of
referencing a global.

This is needed so that cache_tracker can know the memory usage
of index caches (for cache eviction purposes) without relying on
globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
c7d9d35030 utils: cached_file: deglobalize cached_file metrics
Move cached_file metrics from a thread_local variable
to cache_tracker.

This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
023accf246 db: config: enable index caching by default
Index caching was disabled by default because it caused performance regressions
for some small-partition workloads. See #11202.

However, it also means that there are workloads which could benefit from the
index cache, but (by default) don't.

As a compromise, we can set a default limit on the memory usage of index cache,
which should be small enough to avoid catastrophical regressions in
small-partition workloads, but big enough to accomodate workloads where
index cache is obviously beneficial.

This patch sets such a limit to 0.2 of total cache memory, and re-enables
index caching by default.
2023-09-01 22:34:23 +02:00
Michał Chojnowski
50b429f255 config: add index_cache_fraction
Adds a configurable upper limit to memory usage by index caches.
See the source code comments added in this patch for more details.

This patch shouldn't change visible behaviour, because the limit is set to 1.0
by default, so it is never triggerred. We will change the default in a future
patch.
2023-09-01 22:34:23 +02:00
Michał Chojnowski
6a7ce6781e utils: lru: add move semantics to list links
Before the patch, fixing list links is done manually in the move constructor of
`evictable`. After the patch, it is done by the move constructors of the links
themselves.

This makes for slightly cleaner code, especially after we add more links in an
upcoming patch.
2023-09-01 22:34:23 +02:00
Tomasz Grabiec
7b65d4d947 Merge 'Gossiper: provide strong exception safety for endpoint state changes' from Benny Halevy
This series ensures that endpoint state changes (for each single endpoint) are applied to the gossiper endpoint_state_map as a whole and on all shards.
Any failure in the process will keep the existing endpoint state intact.

Note that verbs that modify the endpoint states of multiple endpoints may still succeed to modify some of them before hitting an error and those changes are committed to the endpoint_state_map, so we don't ensure atomicity when updating multiple endpoints' states.

Fixes scylladb/scylladb#14794
Fixes scylladb/scylladb#14799

Closes #15073

* github.com:scylladb/scylladb:
  gossiper: move endpoint_state by value to apply it
  gossiper: replicate: make exception safe
  gms: pass endpoint_state_ptr to endpoint_state change subscribers
  gossiper: modify endpoint state only via replicate
  gossiper: keep and serve shared endpoint_state_ptr in map
  gossiper: get_max_endpoint_state_version: get state by reference
  api/failure_detector: get_all_endpoint_states: reduce allocations
  cdc/generation: get_generation_id_for: get endpoint_state&
  gossiper: add for_each_endpoint_state helpers
  gossiper: add num_endpoints
  gossiper: add my_endpoint_state
2023-09-01 12:23:19 +02:00
Kamil Braun
117dedab19 Merge 'Cluster features on raft: topology coordinator + check on boot followups' from Piotr Dulikowski
This PR collects followups described in #14972:

- The `system.topology` table is now flushed every time feature-related
  columns are modified. This is done because of the feature check that
  happens before the schema commitlog is replayed.
- The implementation now guarantees that, if all nodes support some
  feature as described by the `supported_features` column, then support
  for that feature will not be revoked by any node.  Previously, in an
  edge case where a node is the last one to add support for some feature
  `X` in `supported_features` column, crashes before applying/persisting
  it and then restarts without supporting `X`, it would be allowed to boot
  anyway and would revoke support for the `X` in `system.topology`.
  The existing behavior, although counterintuitive, was safe - the
  topology coordinator is responsible for explicitly marking features as
  enabled, and in order to enable a feature it needs to perform a special
  kind of a global barrier (`barrier_after_feature_update`) which only
  succeeds after the node has updated its features column - so there is no
  risk of enabling an unsupported feature.  In order to make the behavior
  less confusing, the node now will perform a second check when it tries
  to update its `supported_features` column in `system.topology`.
- The `barrier_after_feature_update` is removed and the regular global
  `barrier` topology command is used instead. The `barrier` handler now
  performs a feature check if the node did not have a chance to verify and
  update its cluster features for the second time.

JOIN_NODE rpc will be sent separately as it is a big item on its own.

Fixes: #14972

Closes #15168

* github.com:scylladb/scylladb:
  test: topology{_experimental_raft}: don't stop gracefully in feature tests
  storage_service: remove _topology_updated_with_local_metadata
  topology_coordinator: remove barrier_after_feature_update
  topology_coordinator: perform feature check during barrier
  storage_service: repeat the feature check after read barrier
  feature_service: introduce unsupported_feature_exception
  feature_service: move startup feature check to a separate function
  topology_coordinator: account for features to enable in should_preempt_balancing
  group0_state_machine: flush system.topology when updating features columns
2023-09-01 11:52:26 +02:00
Nadav Har'El
5625624533 doc/dev: add document about analyzing build time
Add a document describing in detail how to use clang's "-ftime-trace"
option, and the ClangBuildAnalyzer tool, to find the source files,
header files and templates which slow down Scylla's build the most.

I've used this tool in the past to reduce Scylla build time - see
commits:

   fa7a302130 (reduced 6.5%)
   f84094320d (reduced 0.1%)
   6ebf32f4d7 (reduced 1%)
   d01e1a774b (reduced 4%)

I'm hoping that documenting how to use this tool will allow other
developers to suggest similar commits.

Refs #1.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15209
2023-09-01 11:33:36 +03:00
Botond Dénes
b6986668d4 Merge 'sstables: change make_descriptor() to accept fs::path' from Kefu Chai
to lower the programmer's cognitive load. as programmer might want
to pass the full path as the `fname` when calling
`make_descriptor(sstring sstdir, sstring fname)`, but this overload
only accepts the filename component as its second parameter. a
single `path` parameter would be easier to work with.

Refs #15187

Closes #15188

* github.com:scylladb/scylladb:
  sstable: add samples of fname to be matched by regex
  sstables: change make_descriptor() to accept fs::path
  sstables: switch entry_descriptor(sstring..) to std::string_view
  sstables: change make_descriptor() to accept fs::path
2023-09-01 10:55:25 +03:00
Kefu Chai
a4eb3c6e0f open-coredump: use the dbuild script in current branch
the dbuild script provided by the branch being debugged might not
include the recent fixes included by current branch from which
`open-coredump.sh` is launched.

so, instead of using the dbuild script in the repo being debugged,
let's use the dbuild provided by current branch. also, wrap the
dbuild command line. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15240
2023-09-01 07:27:21 +03:00
Botond Dénes
34d94fb549 test/cql-pytest/test_tools.py: improve tempdir usage for scrub tests
Scrub tests use a lot of temporary directories. This is suspected to
cause problems in some cases. To improve the situation, this patch:
* Creates a single root temporary directory for all scrub tests
* All further fixtures create their files/directories inside this root
  dir.
* All scrub tests create their temporary directories within this root
  dir.
* All temporary directories now use an appropriate "prefix", so we can
  tell which temporary directory is part of the problem if a test fails.

Refs: #14309

Closes #15117
2023-09-01 07:17:49 +03:00
Gleb Natapov
55f047f33f raft: drop assert in server_impl::apply_snapshot for a condition that may happen
server_impl::apply_snapshot() assumes that it cannot receive a snapshots
from the same host until the previous one is handled and usually this is
true since a leader will not send another snapshot until it gets
response to a previous one. But it may happens that snapshot sending
RPC fails after the snapshot was sent, but before reply is received
because of connection disconnect. In this case the leader may send
another snapshot and there is no guaranty that the previous one was
already handled, so the assumption may break.

Drop the assert that verifies the assumption and return an error in this
case instead.

Fixes: #15222

Message-ID: <ZO9JoEiHg+nIdavS@scylladb.com>
2023-09-01 07:17:49 +03:00
Pavel Emelyanov
91cc544b05 Update seastar submodule
* seastar 0784da87...6e80e84a (29):
  > Revert "shared_token_bucket: Make duration->tokens conversion more solid"
  > Merge 'chunked_fifo: let incremetal operator return iterator not basic_iterator' from Kefu Chai
  > memory: diable transparent hugepages if --overprovisioned is specified
Ref https://github.com/scylladb/scylladb/issues/15095
  > http/exception: s/<TAB>/    /
  > install-dependencies.sh: re-add protobuf
  > Merge 'Keep capacity on fair_queue_entry' from Pavel Emelyanov
  > Merge 'Fix server-side RPC stream shutdown' from Pavel Emelyanov
Fixes https://github.com/scylladb/scylladb/issues/13100
  > smp: make service management semaphore thread local
  > tls_test: abort_accept() after getting server socket
  > Merge 'Print more IO info with ioinfo app' from Pavel Emelyanov
  > rpc: Fix client-side stream registration race
Ref https://github.com/scylladb/scylladb/issues/13100
  > tests: perf: shard_token_bucket: avoid capturing unused variables in lambdas
  > build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
  > reactor: Drop some dangling friend declarations
  > fair_queue: Do not re-evaluate request capacity twice
  > build: do not use serial number file when signing a cert
  > shared_token_bucket: Make duration->tokens conversion more solid
  > tests: Add perf test for shard_token_bucket
  > Merge 'Make make_file_impl() less yielding' from Pavel Emelyanov
  > fair_queue: Remove individual requests counting
  > reactor, linux-aio: print value of aio-max-nr on error
  > Merge 'build, net: disable implicit fallthough' from Kefu Chai
  > shared_token_bucket: Fix duration_for() underflow
  > rpc: Generalize get_stats_internal() method
  > doc/building-dpdk.md: fix invalid file path of README-DPDK.md
  > install-dependencies: add centos9
  > Merge 'log: report scheduling group along with shard id' from Kefu Chai
  > dns: handle exception in do_sendv for udp
  > Merge 'Add a stall detector histogram' from Amnon Heiman

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15218
2023-09-01 07:17:49 +03:00
Alexey Novikov
87fa7d0381 compact and remove expired range tombstones from cache on read
during read from cache compact and expire range tombstones
remove expired empty rows from cache

Refs #2252
Fixes #6033

Closes #14463
2023-09-01 07:17:49 +03:00
Kefu Chai
94a056bda8 sstable: add samples of fname to be matched by regex
for better readability of code

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Kefu Chai
a29838f9e1 sstables: change make_descriptor() to accept fs::path
change another overload of `make_descriptor()` to accept `fs::path`,
in the same spirit of a previous change in this area. so we have
a more consistent API for creating sstable descriptor. and this
new API is simpler to use.

Refs #15187
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Kefu Chai
50c1d9aee7 sstables: switch entry_descriptor(sstring..) to std::string_view
so its callers don't need to construct a temporary `sstring` if
the parameter's type is not `sstring`. for instance, before
this change, `entry_descriptor::make_descriptor(const std::filesystem::path...)`
would have to construct two temporary instances of `sstring`
for calling this function.

after this change, it does not have to do so.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Kefu Chai
6656707164 sstables: change make_descriptor() to accept fs::path
to lower the programmer's cognitive load. as programmer might want
to pass the full path as the `fname` when calling
`make_descriptor(sstring sstdir, sstring fname)`, but this overload
only accepts the filename component as its second parameter. a
single `path` parameter would be easier to work with.

Refs #15187
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Piotr Dulikowski
5471330ee7 test: topology{_experimental_raft}: don't stop gracefully in feature tests
The current cluster feature tests are stopping nodes in a graceful way.
Doing it gracefully isn't strictly necessary for the test scenarios and
we can switch `server_stop_gracefully` calls to `server_stop`. This only
became possible after a previous commit which causes `system.topology`
table to be flushed when cluster feature columns are modified, and will
server as a good test for it.
2023-08-31 16:46:11 +02:00
Piotr Dulikowski
62167d584b storage_service: remove _topology_updated_with_local_metadata
After removing barrier_after_feature_update, the flag is no longer
needed by anybody. The field in storage_service is removed and
do_update_topology_with_local_metadata is inlined.
2023-08-31 16:46:11 +02:00
Piotr Dulikowski
3d7cf3bfe6 topology_coordinator: remove barrier_after_feature_update
The `barrier_after_feature_update` was introduced as a variant of the
`barrier` command, meant to be used by the topology coordinator when
enabling a feature. It was meant to give more guarantees to the topology
coordinator than the regular barrier, but the regular barrier has been
adjusted in the previous commits so that it can be used instead of the
special barrier.

This commit gets rid of `barrier_after_feature_update` and replaces its
uses with `barrier`.
2023-08-31 16:46:11 +02:00
Piotr Dulikowski
1b62abfc42 topology_coordinator: perform feature check during barrier
Due to the possible situation where a node applies a command that
advertises support for a feature but crashes before applying it, there
is a period of time where a node might have its group 0 server running
but does not support all of the features. Currently, we solve the issue
by using a special `barrier_after_feature_update` which will not succeed
until the node makes sure to update its `supported_features` column (or,
since the previous commit, shuts down if it doesn't support all required
features).

However, we can make it work with regular barrier after adjusting it
slightly. In case the local metadata was not updated yet, it will
perform a feature check. This will make sure that the global barrier
issued by the topology coordinator before enabling features will not
succeed if the problematic situation occurs.
2023-08-31 16:46:11 +02:00
Piotr Dulikowski
3edfd29c86 storage_service: repeat the feature check after read barrier
We would like to guarantee the following property: if all nodes have
some feature X in their `supported_features` column in
`system.topology`, then it's no longer possible for anybody to revoke
support for it. Currently, it is not guaranteed because the following
can happen:

1. A node commits a command that updates its `supported_features`,
   marking feature X as supported. It is the last node to do so and now
   all nodes support X.
2. Node crashes before applying the command locally.
3. Node is downgraded not to support X and restarted.
4. The feature check in `enable_features_on_startup` passes because it
   happens before starting the group 0 server.
5. The `supported_features` column is updated in
   `update_topology_with_local_metadata`, removing support for X.

Even though the guarantee does not hold, it's not a problem because the
`barrier_after_metadata_update` is required to succeed on all nodes
before topology coordinator moves to enable a feature, and - as the name
suggests - it requires `update_topology_with_local_metadata` to finish.

However, choosing to give this guarantee makes it simpler to reason
about how cluster features on raft work and removes some pathological
cases (e.g. trying to downgrade some other node after step 1 will fail,
but will be again possible after step 5). Therefore, this commit adds a
second check to `update_topology_with_local_metadata` which disallows
removing support for a feature that is supported by everybody - and
stops the boot process if necessary.
2023-08-31 16:46:11 +02:00
Piotr Dulikowski
aa5401383f feature_service: introduce unsupported_feature_exception
The new `unsupported_feature_exception` is introduced so that the
exception thrown by `check_features` can be caught in a type-safe way.
2023-08-31 16:46:10 +02:00
Piotr Dulikowski
8286a2c369 feature_service: move startup feature check to a separate function
The logic responsible for checking supported features agains the
currently enabled features (and features that are unsafe to disable) is
moved to a separate function, `check_features`. Currently, it is only
used from `enable_features_on_startup`, but more checks against features
in raft will be added in the commits that follow.
2023-08-31 16:45:40 +02:00
Benny Halevy
98fd9fcc11 gossiper: move endpoint_state by value to apply it
Save a copy of the applied endpoint state by moving
the value towards replicate.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
38c2347a3c gossiper: replicate: make exception safe
First replicate the new endpoint_state on all shards
before applying the replicated endpoint_state objects
to _endpoint_state_map.

Fixes scylladb/scylladb#14794

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
c16ec870da gms: pass endpoint_state_ptr to endpoint_state change subscribers
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
1d04242a90 gossiper: modify endpoint state only via replicate
And restrict the accessor methods to return const pointers
or refrences.

With that, the endpoint_state_ptr:s held in the _endpoint_state_map
point to immutable endpoint_state objects - with one exception:
the endpoint_state update_timestamp may be updated in place,
but the endpoint_state_map is immutable.

replicate() replaces the endpoint_state_ptr in the map
with a new one to maintain immutability.

A later change will also make this exception safe so
replicate will guarantee strong exception safety so that all shards
are updated or none of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
d00e49a1bb gossiper: keep and serve shared endpoint_state_ptr in map
This commit changes the interface to
using endpoint_state_ptr = lw_shared_ptr<const endpoint_state>
so that users can get a snapshot of the endpoint_state
that they must not modify in-place anyhow.
While internally, gossiper still has the legacy helpers
to manage the endpoint_state.

Fixes scylladb/scylladb#14799

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:34:36 +03:00
Benny Halevy
f33a6d37f2 gossiper: get_max_endpoint_state_version: get state by reference
No need to copy the endpoint_state since the function
is synchronous.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:33:42 +03:00
Benny Halevy
f1a88c01a2 api/failure_detector: get_all_endpoint_states: reduce allocations
reserve the result vector based on the known
number of endpoints and then move-construct each entry
rather than copying it.
Also, use refrences to traverse the application_state_map
rather than copying each of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:34:42 +03:00
Benny Halevy
1e0d19b89d cdc/generation: get_generation_id_for: get endpoint_state&
No need to lookup the application_state again using the
endpoint, as both callers already have a reference to
the endpoint_state handy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:34:23 +03:00
Benny Halevy
4f5ffc7719 gossiper: add for_each_endpoint_state helpers
Before changing _endpoint_state_map to hold a
lw_shared_ptr<endpoint_state>, provide synchronous helpers
for users to traverse all endpoint_states with no need
to copy them (as long as the called func does not yield).

With that, gossiper::get_endpoint_states() can be made private.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:32:31 +03:00