390 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
4c8dba15f1 Merge 'strong_consistency/state_machine: ensure and upgrade mutations schema' from Michał Jadwiszczak
This patch fixes 2 issues within strong consistency state machine:
- it might happen that apply is called before the schema is delivered to the node
- on the other hand, the apply may be called after the schema was changed and purged from the schema registry

The first problem is fixed by doing `group0.read_barrier()` before applying the mutations.
The second one is solved by upgrading the mutations using column mappings in case the version of the mutations' schema is older.

Fixes SCYLLADB-428

Strong consistency is in experimental phase, no need to backport.

Closes scylladb/scylladb#28546

* https://github.com/scylladb/scylladb:
  test/cluster/test_strong_consistency: add reproducer for old schema during apply
  test/cluster/test_strong_consistency: add reproducer for missing schema during apply
  test/cluster/test_strong_consistency: extract common function
  raft_group_registry: allow to drop append entries requests for specific raft group
  strong_consistency/state_machine: find and hold schemas of applying mutations
  strong_consistency/state_machine: pull necessary dependencies
  db/schema_tables: add `get_column_mapping_if_exists()`
2026-03-09 09:49:22 +01:00
Dawid Mędrek
5feed00caa Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak
When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`.

The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.

The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.

I tested the performance impact of this change with the following test:
```python
for _ in range(10):
    await manager.servers_add(3)
```
It consistently takes 44-45s with the change and 50-51s without the change
in dev mode.

No backport:
- non-critical performance improvement mostly relevant in tests,
- the change requires some soak time in master.

Closes scylladb/scylladb#28891

* github.com:scylladb/scylladb:
  raft: server: fix the repeating typo
  raft: clarify the comment about read_barrier_reply
  raft: read_barrier: update local commit_idx to read_idx when it's safe
  raft: log: clarify the specification of term_for
2026-03-06 18:50:08 +01:00
Michał Jadwiszczak
6aef4d3541 test/cluster/test_strong_consistency: add reproducer for missing schema during apply 2026-03-05 13:50:16 +01:00
Patryk Jędrzejczak
f1978d8a22 raft: server: fix the repeating typo 2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
5a43695f6a raft: clarify the comment about read_barrier_reply
The comment could be misleading. It could suggest that the returned index is
already safe to read. That's not necessarily true. The entry with the
returned index could, for example, be dropped by the leader if the leader's
entry with this index had a different term.
2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
1ae2ae50a6 raft: read_barrier: update local commit_idx to read_idx when it's safe
When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`. The argument for
safety is in the new comment above `maybe_update_commit_idx_for_read`.

The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.

The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.

Writing a test for this change would be difficult, so we trust the nemesis
tests to do the job. They have already found consistency issues in read
barriers. See #10578.
2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
1cbd0da519 raft: log: clarify the specification of term_for
When `idx > last_idx()`, the function does an out-of-bounds access to `_log`.
This may look contradictory to the current specification.
2026-03-05 13:06:07 +01:00
Dawid Mędrek
d44fc00c4c raft: Make methods throw stopped_error if server aborted
After the previous changes in `raft::server::{add_entry, modify_config}`
(cf. SCYLLADB-841), we also go through other methods of `raft::server`
and verify that they handle the aborted state properly.

I found two methods that do not:

(A) `wait_for_leader`
(B) `wait_for_state_change`

What happened before these changes?

In case (A), the dangerous scenario occurred when `_leader_promise` was
empty on entering the function. In that case, we would construct the
promise and wait on the corresponding future. However, if the server
had been already aborted before the call, the future would never
resolve and we'd be effectively stuck.

Case (B) is fully analogous: instead of `_leader_promise`, we'd work
with `_stte_change_promise`.

There's probably a more proper solution to this problem, but since I'm
not familiar with the internal code of Raft, I fix it this way. We can
improve it further in the future.

We provide two simple validation tests. They verify that after aborting
a `raft::server`, the calls:

* do not hang (the tests would time out otherwise),
* throw raft::stopped_error.

Fixes SCYLLADB-841
2026-03-04 16:28:11 +01:00
Dawid Mędrek
c200d6ab4f raft: Throw stopped_error if server aborted
Before the change, calling `add_entry` or `modify_config` on an already
aborted Raft server could result in an error `not_a_leader` containing
a null server ID. It was possible precisely when forwarding was
disabled in the server configuration.

`not_a_leader` is supposed to return the ID of the current leader,
so that was wrong. Furthermore, the description of the function
specified that if a server is aborted, then it should throw
`stopped_error`.

We fix that issue. A few small reproducer tests were provided to verify
that the functions behave correctly with and without forwarding enabled.

Refs SCYLLADB-841
2026-03-04 16:28:08 +01:00
Dawid Mędrek
e4f2b62019 raft: Describe exception types for wait_for_state_change and wait_for_leader
The methods of `raft::server` are abortable and if the passed
`abort_source` is triggered, they throw `raft::request_aborted`.
We document that.

Although `raft::server` is an interface, this is consistent with
the descriptions of its other methods.
2026-02-19 12:47:14 +01:00
Dawid Mędrek
c36623baad raft: Await instead of returning future in wait_for_state_change
The `try-catch` expression is pretty much useless in its current form.
If we return the future, the awaiting will only be performed by the
caller, completely circumventing the exception handling.

As a result, instead of handling `raft::request_aborted` with a proper
error message, the user will face `seastar::abort_requested_exception`
whose message is cryptic at best. It doesn't even point to the root
of the problem.

Fixes SCYLLADB-665
2026-02-19 12:47:14 +01:00
Petr Gusev
801bf82a34 raft.hh: make server::wait_for_leader() public
When a strongly consistent request arrives at a node, we
need to know which replica is the leader, since such requests
are generally executed only on the leader. If a leader has
not yet been elected, we must wait. This commit exposes
wait_for_leader() so it can be used for that purpose.

We cannot rely solely on wait_for_state_change(), because it does not
trigger when some other node becomes a leader.
2026-01-21 14:56:01 +01:00
Piotr Smaron
cb3b96b8f4 raft: correct lease->least typo in a comment
Funny, when researching if our raft implementation relies on the 'lease'
mechanism, I noticed this typo.

Closes scylladb/scylladb#27803
2025-12-22 14:39:55 +02:00
Pavel Emelyanov
54edb44b20 code: Stop using seastar::compat::source_location
And switch to std::source_location.
Upcoming seastar update will deprecate its compatibility layer.

The patch is

  for f in $(git grep -l 'seastar::compat::source_location'); do
    sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f;
  done

and removal of few header includes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#27309
2025-11-27 19:10:11 +02:00
Radosław Cybulski
d589e68642 Add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime

Closes scylladb/scylladb#26617
2025-11-21 12:27:41 +02:00
Emil Maskovsky
6769c313c2 raft: small fixes for voters code
Minor cleanups and improvements to voter-related code.

No backport: cleanup only, no functional changes.

Closes scylladb/scylladb#26559
2025-10-16 18:41:08 +02:00
Alex Dathskovsky
5e89a78c8f raft: refactor can_vote logic and type
This PR refactors the can_vote function in the Raft algorithms for improved clarity and maintainability by providing safer strong boolean types to the raft algorithm.

Fixes: #21937

Backport: No backport required

Closes scylladb/scylladb#25787
2025-09-24 13:55:05 +02:00
Radosław Cybulski
c242234552 Revert "build: add precompiled headers to CMakeLists.txt"
This reverts commit 01bb7b629a.

Closes scylladb/scylladb#25735
2025-09-03 09:46:00 +03:00
Radosław Cybulski
01bb7b629a build: add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros

Closes #25182
2025-08-27 21:37:54 +03:00
Benny Halevy
9724d87e86 raft: server_impl: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:48 +03:00
Petr Gusev
043291a2b4 fms: extract entry_size to log_entry::get_size
We intend to reuse it in subsequent commit.
2025-02-12 14:33:41 +01:00
Botond Dénes
b2a03e03f7 Merge 'raft: Handle non-critical config update errors in when changing voter status.' from Sergey Zolotukhin
When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried.

Fixes scylladb/scylladb#20814

Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2.

Closes scylladb/scylladb#22253

* github.com:scylladb/scylladb:
  raft: refactor `remove_from_raft_config` to use a timed `modify_config` call.
  raft: Refactor functions using `modify_config` to use a common wrapper for retrying.
  raft: Handle non-critical config update errors in when changing status to voter.
  test: Add test to check that a node does not fail on unknown commit status error when starting up.
  raft: Add run_op_with_retry in raft_group0.
2025-01-16 11:00:47 +02:00
Sergey Zolotukhin
16053a86f0 test: Add test to check that a node does not fail on unknown commit status
error when starting up.

Test that a node is starting successfully if while joining a cluster and becoming a voter, it
receives an unknown commit status error.

Test for scylladb/scylladb#20814
2025-01-14 17:12:06 +01:00
Avi Kivity
9ff6473691 error_injection: replace boost::lexical_cast with std::from_chars
Replace boost with a standard facility; this reduces dependencies
as lexical_cast depends on boost ranges.

As a side effect the exception error message is improved.
2025-01-09 11:14:51 +02:00
Kefu Chai
df36985fc3 raft: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21838
2024-12-19 14:57:22 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Patryk Jędrzejczak
74dad7d1eb raft: improve logs for abort while waiting for apply
New logs allow us to easily distinguish two cases in which
waiting for apply times out:
- the node didn't receive the entry it was waiting for,
- the node received the entry but didn't apply it in time.

Distinguishing these cases simplifies reasoning about failures.
The first case indicates that something went wrong on the leader.
The second case indicates that something went wrong on the node
on which waiting for apply timed out.

As it turns out, many different bugs result in the `read_barrier`
(which calls `wait_for_apply`) timeout. This change should help
us in debugging bugs like these.

We want to backport this change to all supported branches so that
it helps us in all tests.

Closes scylladb/scylladb#21855
2024-12-10 14:23:39 +01:00
Kefu Chai
48c8d24345 treewide: drop support for fmt < v10
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.

in this change, we drop the support fmt < 10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21847
2024-12-09 20:42:38 +02:00
Kefu Chai
a5ee0c896b treewide: migrate from boost::adaptors::filtered to std::views::filter
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.

Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views

Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices

Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
   - `std::ranges::to` adaptor requires rvalue references
   - Necessitated updates to variable and parameter constness
   - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
     from `common` to enable efficient range conversion

2. Range Iteration and Mutation
   - Range views may mutate internal state during iteration
   - Cannot pass ranges by const reference in some scenarios
   - Solution: Pass ranges by rvalue reference to explicitly indicate
     state invalidation

Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
  due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change

This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21648
2024-11-26 14:26:50 +02:00
Nadav Har'El
f23800181a Merge 'Align Metric Family Descriptions' from Amnon Heiman
Metrics families (e.g., all metrics with the same name but with different labels) should have the same description.
The metric layer does not enforce that. Instead, it will use the first description provided.
It's a minor issue but the results are different than what you expect.

No need to backport.

Closes scylladb/scylladb#19947

* github.com:scylladb/scylladb:
  service/storage_proxy.cc  All metric groups should have the same description
  raft/server.cc: All metric groups should have the same description
2024-11-17 16:49:57 +02:00
Gleb Natapov
f7e7e61fa7 raft: add more information to start_read_barrier error
Add requester into to the error about requester being out of config.
Also fix a typo while we are at it.

Message-ID: <ZwVDtOty2cWy3vqD@scylladb.com>
2024-10-09 16:24:34 +02:00
Sergey Zolotukhin
612a141660 raft: Fix race condition on override_snapshot_thresholds.
When the server_impl::applier_fiber is paused by a co_await at line raft/server.cc:1375:
```
co_await override_snapshot_thresholds();
```
a new snapshot may be applied, which updates the actual values of the log's last applied
and snapshot indexes. As a result, the new snapshot index could become higher than the
old value stored in _applied_idx at line raft/server.cc:1365, leading to an assertion
failure in log::last_conf_for().
Since error injection is disabled in release builds, this issue does not affect production releases.

This issue was introduced in the following commit
9dfa041fe1,
when error injection was added to override the log snapshot configuration parameters.

How to reproduce:

1. Build debug version of randomized_nemesis_test
```
ninja-build build/debug/test/raft/randomized_nemesis_test
```
2. Run
```
parallel --halt now,fail=1 -j20 'build/debug/test/raft/randomized_nemesis_test \
--run_test=test_frequent_snapshotting  -- -c2 -m2G --overprovisioned --unsafe-bypass-fsync 1 \
--kernel-page-cache 1 --blocked-reactor-notify-ms 2000000  --default-log-level \
trace > tmp/logs/eraseme_{}.log  2>&1 && rm tmp/logs/eraseme_{}.log' ::: {1..1000}
```

Fixes scylladb/scylladb#20363

Closes scylladb/scylladb#20555
2024-09-12 16:19:27 +02:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Abhi
9b09439065 raft: Add descriptions for requested abort errors
Fixes: scylladb/scylladb#18902

Closes scylladb/scylladb#20291
2024-09-10 17:56:29 +02:00
Sergey Zolotukhin
13b3d3a795 raft: Ensure const correctness in applier_fiber.
Add 'const' to non mutable varibales in server_impl::applier_fiber() function.
2024-08-20 15:24:00 +02:00
Sergey Zolotukhin
c3e52ab942 raft: Invoke store_snapshot_descriptor with actually preserved items.
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.

Fixes scylladb/scylladb#16817
Fixes scylladb/scylladb#20080
2024-08-20 15:22:49 +02:00
Sergey Zolotukhin
922e035629 raft: Use raft_server_set_snapshot_thresholds in tests.
Replace raft_server_snapshot_reduce_threshold with raft_server_set_snapshot_thresholds in tests
as raft_server_set_snapshot_thresholds fully covers the functionality of raft_server_snapshot_reduce_threshold.
2024-08-20 15:08:49 +02:00
Sergey Zolotukhin
00a1d3e305 raft: Fix indentation in server.cc 2024-08-20 15:08:45 +02:00
Sergey Zolotukhin
9dfa041fe1 raft: Add raft_server_set_snapshot_thresholds injection.
Use error injection to allow overriding following snapshot threshold settings:
- snapshot_threshold
- snapshot_threshold_log_size
- snapshot_trailing
- snapshot_trailing_size
2024-08-20 14:15:50 +02:00
Laszlo Ersek
4f1f207be1 raft/server: clean up index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "raft/server.cc":

- addition of tagged and untagged (both should be tagged)

- subscripting an array by tagged (should be untagged)

- comparing a size-like threshold against tagged (should be untagged)

- exposing tagged via gauges (should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
1b134d52ac raft/tracker: don't drop out of index_t space for subtraction
Tagged integers support subtraction; use it.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
b6233209d9 raft/fsm: clean up index_t and term_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "raft/fsm.cc":

- addition of tagged and untagged (both should be tagged)

- comparison (relop) between tagged an untagged (both should be tagged)

- subscripting or sizing an array by tagged (should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
5b9a4428c6 raft/log: clean up index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in raft/log.{cc,h}:

- addition of tagged and untagged (both should be tagged)

- comparison (relop) between tagged an untagged (both should be tagged)

- subscripting an array, or offsetting an iterator, by tagged (should be
  untagged)

- comparing an array bound against tagged (should be untagged)

- subtracting tagged from an array bound (should be untagged)

Note: these files mix uniform initialization syntax (index_t{...}) with
constructor call syntax (index_t()), with the former being more frequent.
Stick with the former here too, for consistency.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Amnon Heiman
23b62540dd raft/server.cc: All metric groups should have the same description 2024-07-31 10:20:39 +03:00
Gleb Natapov
9ebdb23002 raft: add more raft metrics to make debug easier 2024-07-01 10:55:22 +02:00
Kamil Braun
a441d06d6c raft: fsm: add details to on_internal_error_noexcept message
If we receive a message in the same term but from a different leader
than we expect, we print:
```
Got append request/install snapshot/read_quorum from an unexpected leader
```
For some reason the message did not include the details (who the leader
was and who the sender was) which requires almost zero effort and might
be useful for debugging. So let's include them.

Ref: scylladb/scylla-enterprise#4276

Closes scylladb/scylladb#19238
2024-06-12 17:29:42 +03:00
Yaniv Michael Kaul
82875095e9 Raft: improve descriptions of metrics
1. Fixed a single typo (send -> sent)
2. Rephrase 'How many' to 'Number of' and use less passive tense.
3. Be more specific in the description of the different metrics insteda of the more generic descriptions.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#19067
2024-06-06 15:18:47 +03:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00