Commit Graph

4858 Commits

Author SHA1 Message Date
Kamil Braun
101c1d50f0 Merge 'fix nodetool status to show zero-token nodes' from Abhinav Kumar Jha
In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions don't support zero token nodes.

Fixes: scylladb/scylladb#19849
Fixes: scylladb/scylladb#17857

Closes scylladb/scylladb#20909

* github.com:scylladb/scylladb:
  fix nodetool status to show zero-token nodes
  test: move `wait_for_first_completed` to pylib/util.py
  token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes
2024-10-28 12:19:36 +01:00
Kefu Chai
24d14b601b treewide: s/boost::adaptors::map_values/std::views::values/
now that we are allowed to use C++23. we now have the luxury of using
`std::views::values`.

in this change, we:

- replace `boost::adaptors::map_values` with `std::views::values`
- update affected code to work with `std::views::values`
- the places where we use `boost::join()` are not changed, because
  we cannot use `std::views::concat` yet. this helper is only
  available in C++26.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21265
2024-10-27 21:32:45 +02:00
Abhinav
c00d40b239 fix nodetool status to show zero-token nodes
In the current scenario, the nodetool status doesn’t display information
regarding zero token nodes. For example, if 5 nodes are spun by the
administrator, out of which, 2 nodes are zero token nodes, then nodetool
status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id
” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

Robust topology tests are added, which spins up scylla nodes and confirm nodetool
status output for various cases, providing good coverage.
A test is also added in nodetool/test_status.py to verify this logic. These tests fail
without this commit’s zero token node support logic, hence verifying the behavior.

The test `test_status_keyspace_joining_node` has been removed. This test is
based on case where host_id=None, which is impossible. Since we now use
host_id_map for node discovery in nodetool, the nodes with "host_id=None"
go undetected. Since this case is anyway impossible, we can get rid of this.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions dont support zero token nodes.

Fixes: scylladb/scylladb#19849
2024-10-25 13:28:09 +05:30
Kamil Braun
f5c60e538d Merge 'cql/tablets: fix retrying ALTER tablets KEYSPACE' from Piotr Smaron
ALTER tablets-enabled KEYSPACES (KS) may fail due to
`group0_concurrent_modification`, in which case it's repeated by a `for`
loop surrounding the code. But because raft's `add_entry` consumes the
raft's guard (by `std::move`'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
Note: refactor is implemented in the follow-up commit.

Fixes: scylladb/scylladb#21102

Should be backported to every 6.x branch, as it may lead to a crash.

Closes scylladb/scylladb#21121

* github.com:scylladb/scylladb:
  test: add UT to test retrying ALTER tablets KEYSPACE
  cql/tablets: fix indentation in `rf_change` event handler
  cql/tablets: fix retrying ALTER tablets KEYSPACE
2024-10-23 10:01:21 +02:00
Piotr Smaron
522bede8ec test: add UT to test retrying ALTER tablets KEYSPACE
The newly added testcase is based on the already existing
`test_alter_dropped_tablets_keyspace`.
A new error injection is created, which stops the ALTER execution just
before the changes are submitted to RAFT. In the meantime, a new schema
change is performed using the 2nd node in the cluster, thus causing the
1st node to retry the ALTER statement.
2024-10-22 18:22:01 +02:00
Piotr Smaron
3f4c8a30e3 cql/tablets: fix indentation in rf_change event handler
Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left.
2024-10-22 18:22:01 +02:00
Piotr Smaron
de511f56ac cql/tablets: fix retrying ALTER tablets KEYSPACE
ALTER tablets-enabled KEYSPACES (KS) may fail due to
`group0_concurrent_modification`, in which case it's repeated by a `for`
loop surrounding the code. But because raft's `add_entry` consumes the
raft's guard (by `std::move`'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
`topology_coordinator::handle_topology_coordinator_error` handling the
case of `group0_concurrent_modification` has been extended with logging
in order not to write catch-log-throw boilerplate.
Note: refactor is implemented in the follow-up commit.

Fixes: scylladb/scylladb#21102
2024-10-22 18:22:00 +02:00
Benny Halevy
04d741bcbb storage_service: on_change: update_peer_info only if peer info changed
Return an optional peer_info from get_peer_info_for_update
when the `app_state_map` arg does not change peer_info,
so that we can skip calling update_peer_info, if it didn't
change.

Fixes scylladb/scylladb#20991
Refs scylladb/scylladb#16376

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#21152
2024-10-22 10:26:08 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Kefu Chai
5cd619a60c treewide: s/boost::adaptors::map_keys/std::views::keys/
now that we are allowed to use C++23. we now have the luxury of using
`std::views::keys`.

in this change, we:

- replace `boost::adaptors::map_keys` with `std::views::keys`
- update affected code to work with `std::views::keys`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21198
2024-10-21 12:47:52 +03:00
Kefu Chai
d28d64f7fe service: remove extraneous space in #pragma once
to be more consistent with the rest of the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21188
2024-10-20 20:27:38 +03:00
Avi Kivity
c3be2489ce treewide: drop includes of <boost/range/adaptors.hpp>
This includes way too much, including <boost/regex.hpp>, which is huge.
Drop includes of adaptors.hpp and replace by what is needed.

Closes scylladb/scylladb#21187
2024-10-20 17:17:11 +03:00
Kamil Braun
f02afefd34 Merge 'raft: consider the gossiper state then sending the group0 state id' from Emil Maskovsky
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).

Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.

The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.

Fixes: scylladb/scylladb#21117

No backport: Fixes an issue that is currently only present in master

Closes scylladb/scylladb#21119

* github.com:scylladb/scylladb:
  raft: consider the gossiper state then sending the group0 state id
  raft: add the test for GROUP0_STATE_ID gossip application state
2024-10-17 13:41:15 +03:00
Emil Maskovsky
e082fef32c raft: remove the group0 state id handler stop check
The stop assertion check in the group0 state id handler was triggering
under some circumstances (stopping server during restart). In that case
it might be that the stop is initiated before the server is fully
initialized, and then the handler destructor is being called without
calling to the `stop()` method first. This is a valid scenario.

The whole `stop()` in the group0 state id handler is not necessary,
as the only operation being done is cancelling the timer which is done
by the timer destructor automatically anyway.

There is the concern of a currently running timer callback, but it
doesn't preempt (not async) so the timer shouldn't be destroyed before
the callback finishes.

Fixes: scylladb/scylladb#21074

Closes scylladb/scylladb#21127
2024-10-17 13:41:15 +03:00
Emil Maskovsky
3f1af268c2 raft: consider the gossiper state then sending the group0 state id
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).

Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.

The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.

Fixes: scylladb/scylladb#21117
2024-10-16 19:26:25 +02:00
Kefu Chai
32f508d450 raft: fix typo in logging message
s/miminum/minimum/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21073
2024-10-16 06:33:43 +03:00
Avi Kivity
d59038fa93 storage_proxy: convert boost range algorithms to std::ranges
Standardize on a single range library.

The changes are mostly mechanical. The only exception is boost::join,
which has no analog in std::ranges (rightly so, since it cannot be
implemented efficiently). A variety of tricks were used to convert it:

 - use std::ranges::join() on an std::array of std::span (when the
   inputs were all contiguous)
 - copy to a utils::small_vector (when it is expected that there will
   be no allocation)
 - use a small_vector of pointers and iterate+dereference that

Closes scylladb/scylladb#21082
2024-10-15 16:52:27 +02:00
Tomasz Grabiec
3e438d23e1 Merge 'Check system.tablets update before putting it into the table' from Pavel Emelyanov
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.

fixes #20043

Closes scylladb/scylladb#21020

* github.com:scylladb/scylladb:
  tablets: Validate system.tablets update
  group0_client: Introduce change validation
  group0_client: Add shared_token_metadata dependency
2024-10-15 00:38:59 +02:00
Kamil Braun
96070bb5b3 Merge 'storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from Sergey Zolotukhin
During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in  filter_for_query(): the map is considered incorrect if the list  of replicas contains a node from a data center whose replication factor is 0.

 Please note: This PR does not fix the issue found in scylladb/scylladb#20282;   it only adds condition checks to prevent undefined behavior in cases of  inconsistent inputs.

Refs scylladb/scylladb#20625

As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2.

Closes scylladb/scylladb#20851

* github.com:scylladb/scylladb:
  Add conditions checking for get_read_executor
  Avoid an extra call to block_for in db::filter_for_query.
  Improve code readability in consistency_level.cc and storage_proxy.cc
  tools: Add build_info header with functions providing build type information
  tests: Add tests for alter table with RF=1 to RF=0
2024-10-11 15:02:02 +02:00
Kamil Braun
4d99cd2055 Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky
Add the gossip state for broadcasting the nodes state_id.

Implemented the Group0 state broadcaster (based on the gossip) that will broadcast the state id of each node and check the minimal state id for the tombstone GC.

When there is a change in the tombstone GC minimal state id, the state broadcaster will update the tombstone GC time for the group0-managed tables.

The main component of the change is the newly added `group0_state_id_handler` that keeps track, broadcasts and receives the last group0 state_ids across all nodes and sets the tombstone GC deletion time accordingly:
* on each group0 change applied, the state_id handler broadcasts the state_id as a gossip state (only if the value has changed)
* the handler checks for the node state ids every refresh period (configurable, 1h by default)
* on every check, the handler figures out the lowest state_id (timeuuid), which is state_id that all of the nodes already have
* the timestamp of this minimum state_id is then used to set the tombstone GC deletion time
* the tombstone GC calculation then uses that deletion time to provide the GC time back to the callers, e.g. when doing the compaction
* (as the time for tombstone GC calculation has the 1s granularity we actually deduce 1s from the determined timestamp, because it can happen that there were some newer mutations received in the same second that were not distributed across the nodes yet)

This change introduces a new flag to the static schema descriptor (`is_group0_table`) that is being checked for this newly added mode in the tombstone GC. We also add a check (in non-release builds only) on every group0 modification that the table has this flag set.

The group0 tombstone GC handling is similar to the "repair" tombstone GC mode in a sense (that the tombstone GC time is determined according to a reconciliation action), however it is not explicitly visible to (nor editable by) the user. And also the tombstone GC calculation is much simpler than the "repair" mode calculation - for example, we always use the whole range (as opposed to the "repair" mode that can have specific repair times set for specific ranges).

We use the group0 configuration to determine the set of nodes (both current and previous in case of joint configuration) - we need to make sure that we account for all the group0 nodes (if any node didn't provide the state_id yet, the current check round will be skipped, i.e. no GC will be done until all known nodes provide their state_id timestamp value).

Also note that the group0 state_id handling works on all nodes independently, i.e. each node might have its own (possibly different) state depending on the gossip application state propagation. This is however not a problem, as some nodes might be behind, but they will catch up eventually, and this solution has the benefit of being distributed (as opposed to having a central point to handle the state, like for example the topology coordinator that has been considered in the early stages of the design).

Fixes: scylladb/scylla#15607

New feature, should not be backported.

Closes scylladb/scylladb#20394

* github.com:scylladb/scylladb:
  raft: add the check for the group0 tables
  raft: fast tombstone GC for group0-managed tables
  tombstone_gc: refactor the repair map
  raft: flag the group0-managed tables
  gossip: broadcast the group0 state id
  raft/test: add test for the group0 tombstone GC
  treewide: code cleanup and refactoring
2024-10-11 11:52:27 +02:00
Sergey Zolotukhin
c373edab2d Add conditions checking for get_read_executor
During the investigation of scylladb/scylladb#20282, it was discovered that
implementations of speculating read executors have undefined behavior
when called with an incorrect number of read replicas. This PR
introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in
  get_endpoints_for_reading(): the map is considered incorrect the number of
  read replica nodes is higher than replication factor. The check is
  applied only when built in non release mode.

Please note: This PR does not fix the issue found in scylladb/scylladb#20282;
it only adds condition checks to prevent undefined behavior in cases of
inconsistent inputs.

Refs scylladb/scylladb#20625
2024-10-11 09:38:25 +02:00
Sergey Zolotukhin
ad93cf5753 Improve code readability in consistency_level.cc and storage_proxy.cc
Add const correctness and rename some variables to improve code readability.
2024-10-11 09:38:25 +02:00
Pavel Emelyanov
1863ccd900 tablets: Validate system.tablets update
Implement change validation for raft topology_change command. For now
the only check is that the "pending replicas" contains at most one
entry. The check mirrors similar one in `process_one_row` function.

If not passed, this prevents system.tablets from being updated with the
mutation(s) that will not be loaded later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:39:58 +03:00
Pavel Emelyanov
e5bf376cbc group0_client: Introduce change validation
Add validate_change() methods (well, a template and an overload) that
are called by prepare_command() and are supposed to validate the
proposed change before it hits persistent storage

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:31:52 +03:00
Pavel Emelyanov
f09fe4f351 group0_client: Add shared_token_metadata dependency
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:27:46 +03:00
Pavel Emelyanov
7163fbcef5 Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity
To avoid depending on two similar libraries (boost ranges and std \<ranges), replace
uses of the former with the latter. This series tackles the utils/ directory.

Code cleanup, no backport.

Closes scylladb/scylladb#20997

* github.com:scylladb/scylladb:
  utils: logalloc: replace boost with std
  utils: lsa: chunked_managed_vector: replace boost with std
  utils: config_file: replace boost with std
  utils: loading_cache: replace boost with std
  utils: fragment_range: replace boost with std
  utils: error_injector: replace boost with std
  utils: crc: replace boost for_each with built-in range for
  utils: class_registrator: replace boost with std
  utils: chunked_vector: replace boost with std
  utils: observable: replace boost with std
2024-10-09 16:04:48 +03:00
Piotr Smaron
e0c1a51642 cql/tablets: handle MVs in ALTER tablets KEYSPACE
ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized
views (MV), and only produced tablets mutations changing tables.
With this patch we're producing tablets mutations for both tables and
MVs, hence when e.g. we change the replication factor (RF) of a KS, both the
tables' RFs and MVs' RFs are updated along with tablets replicas.
The `test_tablet_rf_change` testcase has been extended to also verify
that MVs' tablets replicas are updated when RF changes.

Fixes: #20240

Closes scylladb/scylladb#21007
2024-10-09 10:51:18 +02:00
Emil Maskovsky
0c9308cf48 raft: add the check for the group0 tables
Added the runtime check to ensure that all the tables that are used with
the group0 commands are marked as group0 tables.
2024-10-08 21:08:11 +02:00
Emil Maskovsky
a03e98d6e8 raft: fast tombstone GC for group0-managed tables
Set the tombstone GC time for group0-managed tables to the minimal state
id of the group0 nodes.

The check is being done based on a timer, iterating through each node
(according to the group0 topology configuration) and taking the minimum
across all nodes.

This miminum timestamp is then be used to set the tombstone GC time
for the tombstone GC of all the group0-managed tables.

Fixes: scylladb/scylla#15607
2024-10-08 21:07:30 +02:00
Emil Maskovsky
baea9cfa67 gossip: broadcast the group0 state id
Implemented the group0 state_id handler (based on the gossip) that will
broadcast the group0 state id of each node.

This will be used to set the tombstone GC time for the group0 tables.
2024-10-08 20:53:54 +02:00
Emil Maskovsky
a840949ea0 treewide: code cleanup and refactoring
Fix the clang-tidy warnings, code cleanup and improvements.

Applied the clang format to the updated places.
2024-10-08 20:53:54 +02:00
Kamil Braun
1b9337bf99 Merge 'Wait for all users of group0 server to complete before destroying it' from Gleb Natapov
Group0 server is often used in asynchronous context, but we do not wait
for them to complete before destroying the server. We already have
shutdown gate for it, so lets use it in those asynch functions.

Also make sure to signal group0 abort source if initialization fails.

Fixes scylladb/scylladb#20701

Backport to 6.2 since it contains af83c5e53e and it made the race easier to hit, so tests became flaky.

Closes scylladb/scylladb#20891

* github.com:scylladb/scylladb:
  group: hold group0 shutdown gate during async operations
  group0: Stop group0 if node initialization fails
2024-10-08 13:46:54 +02:00
Gleb Natapov
d62fbd795b storage_proxy: make sure there is no end iterator in _live_iterators array
storage_proxy::cancellable_write_handlers_list::update_live_iterators
assumes that iterators in _live_iterators can be dereferenced, but
the code does not make any attempt to make sure this is the case. The
iterator can be the end iterator which cannot be dereferenced.

The patch makes sure that there is no end iterator in _live_iterators.

Fixes scylladb/scylladb#20874

Closes scylladb/scylladb#20977
2024-10-08 13:16:27 +03:00
Avi Kivity
b259389a3e utils: observable: replace boost with std 2024-10-07 21:11:07 +03:00
Gleb Natapov
e642f0a86d group: hold group0 shutdown gate during async operations
Wait for all outstanding async work that uses group0 to complete before
destroying group0 server.

Fixes scylladb/scylladb#20701
2024-10-06 17:20:52 +03:00
Gleb Natapov
ba22493a69 group0: Stop group0 if node initialization fails
Commit af83c5e53e moved aborting of group0 into the storage service
drain function. But it is not called if node fails during initialization
(if it failed to join cluster for instance). So lets abort on both
paths (but only once).
2024-10-06 17:20:52 +03:00
Botond Dénes
07094c3e44 Merge 'replica: Fix tombstone GC during tablet split preparation' from Raphael "Raph" Carvalho
During split prepare phase, there will be more than 1 compaction group with
overlapping token range for a given replica.

Assume tablet 1 has sstable A containing deleted data, and sstable B containing
a tombstone that shadows data in A.

Then split starts:
1) sstable B is split first, and moved from main (unsplit) group to a
split-ready group
2) now compaction runs in split-ready group before sstable A is split

tombstone GC logic today only looks at underlying group, so compaction is step
2 will discard the deleted data in A, since it belongs to another group (the
unsplit one), and so the tombstone can be purged incorrectly.

To fix it, compaction will now work with all uncompacting sstables that belong
to the same replica, since tombstone GC requires all sstables that possibly
contain shadowed data to be available for correct decision to be made.

Fixes https://github.com/scylladb/scylladb/issues/20044.

Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed.

Closes scylladb/scylladb#20939

* github.com:scylladb/scylladb:
  replica: Fix tombstone GC during tablet split preparation
  service: Improve error handling for split
2024-10-04 10:29:42 +03:00
Raphael S. Carvalho
bcd358595f service: Improve error handling for split
Retry wasn't really happening since the loop was broken and sleep
part was skipped on error. Also, we were treating abort of split
during shutdown as if it were an actual error and that confused
longevity tests that parse for logs with error level. The fix is
about demoting the level of logs when we know the exception comes
from shutdown.

Fixes #20890.
2024-10-02 11:23:44 -03:00
Sergey Zolotukhin
6398b7548c config: Add a warning about use of IP address for join topology and replace
operations.

When the '--ignore-dead-nodes-for-replace' config option contains
IP addresses, a warning will be logged, notifying the user that
using IP addresses with this option is deprecated and will no
longer be supported in the next release.

Fixes scylladb/scylladb#19218
2024-10-02 11:56:59 +02:00
Sergey Zolotukhin
3b9033423d utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists
- utils::split_comma_separated_list now accepts a reference to sstring instead
  of a copy to avoid extra memory allocations. Additionally, the results of
  trimming are moved to the resulting vector instead of being copied.
- service/storage_service removenode, raft_removenode, find_raft_nodes_from_hoeps,
  parse_node_list and api/storage_service::set_storage_service were changed to use
  std::vector<host_id_or_endpoint> instead of std::list<host_id_or_endpoint> as
  std::vector is a more cache-friendly structure,  resulting in better performance.
2024-10-02 11:56:59 +02:00
Benny Halevy
5a0f3889e0 treewide: use std::ranges sort functions rather than boost
Using the standard library is preffered over boost.

In cql3/expr/expression.cc to_sorted_vector got more of a
face-list and was modernized to use also std::unique
and while at it, to move its input range in the uniquely sorted
result vector.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-10-01 14:19:05 +03:00
Pavel Emelyanov
1dfe780457 cql: Check that CREATEing tablets/vnodes is consistent with the CLI
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:

    if (strategy.supports_tablets) {
         if (cql.with_tablets) {
             if (cfg.enable_tablets) {
                 return create_keyspace_with_tablets();
             } else {
                 throw "tablets are not enabled";
             }
         } else if (cql.with_tablets = off) {
              return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
              if (cfg.enable_tablets) {
                  return create_keyspace_with_tablets();
              } else {
                  return create_keyspace_without_tablets();
              }
         }
     } else { // strategy doesn't support tablets
         if (cql.with_tablets == on) {
             throw "invalid cql parameter";
         } else if (cql.with_tablets == off) {
             return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
             return create_keyspace_without_tablets();
         }
     }

closes: #20088

In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.

There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20779
2024-10-01 10:54:29 +02:00
Avi Kivity
884297ae2e raft_group0_client: uninclude "raft_group0_registry.hh"
Reduce unnecessary recompilations.
2024-09-28 17:25:11 +03:00
Avi Kivity
67cdd0d389 raft_group_registry: extract raft_timeout
It is a vocabulary term that shouldn't need the registry to be visible.
Extract it to a new header.
2024-09-28 17:25:03 +03:00
Avi Kivity
93afc77307 raft_group0_client: uninclude "mutation/mutation.hh"
Lighten the dependency load. Some constructors and destructors
are uninlined to avoid the header depending on the mutation class.
2024-09-28 16:31:53 +03:00
Avi Kivity
5d68efe0bd raft_group0_client: uninclude "db/system_keyspace.hh"
It doesn't need it apart from a forward declaration.

Files that lost necessary includes are adjusted, and some users
of auth_version_t are redirected to the definition outside system_keyspace.
2024-09-28 16:31:53 +03:00
Kamil Braun
9224e48d6b Merge 'Populate raft address map from gossiper on raft configuration change' from Gleb Natapov
For each new node added to the raft config populate its ID to IP mapping
in raft address map from the gossiper. The mapping may have expired if a
node is added to the raft configuration long after it first appears in
the gossiper.

Fixes scylladb/scylladb#20600

Backport to all supported versions since the bug may cause bootstrapping failure.

Closes scylladb/scylladb#20601

* github.com:scylladb/scylladb:
  test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
  group0: make sure that address map has an entry for each new node in the raft configuration
2024-09-26 12:41:25 +02:00
Lakshmi Narayanan Sreethar
7beea03196 build: cmake: link cql3 library to the service library
After commit d16ea0af, compiling the server using cmake fails with the
following error :

```
FAILED: service/CMakeFiles/service.dir/Dev/qos/service_level_controller.cc.o
...
/home/Scylla/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found
   21 | #include "cql3/CqlParser.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~
1 error generated.
```

Fix it by linking the cql3 to the service library.

Closes scylladb/scylladb#20805
2024-09-26 09:17:30 +03:00
Gleb Natapov
9e4cd32096 test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join 2024-09-25 17:10:09 +03:00
Kamil Braun
7d8f1d251a Merge 'Mark node as being replaced earlier' from Gleb Natapov
Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.

Fixes: scylladb/scylladb#20629

Need to be backported since this is a regression

Closes scylladb/scylladb#20743

* github.com:scylladb/scylladb:
  test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
  topology coordinator:: mark node as being replaced earlier
  topology coordinator: do metadata barrier before calling finish_accepting_node() during replace
2024-09-25 15:46:12 +02:00