Coverage handling is divided into 3 steps:
1. Generation of profiling data from a run of an instrumented file
(which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
producing the data in some format that can be manipulated and
unified.
3. Generate some reporting based on this data.
The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
data, it also contains a cli for the (assumed) most common operations
that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
which is a textual form containing a source dependant coverage data.
An example of such manipulation can be `coverage diff` operation
which produces a set like difference operation. cov_a - cov_b = diff
where diff is an lcov formated file containing coverage data for code
cov_a that is not covered at all in cov_b.
The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.
This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.
For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
coverage.
--coverage-mode - which can be used to "turn on" coverage support only
for some of the modes in this run.
The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.
Also added support of 'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.
This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.
Closesscylladb/scylladb#16394
* github.com:scylladb/scylladb:
test:cql-pytest: change service levels intervals in tests
configure service levels interval
Refs #16757
Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.
Test included.
Closesscylladb/scylladb#16801
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.
This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is
* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing
fixes: #16319Closesscylladb/scylladb#16364
* github.com:scylladb/scylladb:
code: Enable tablets if cluster feature is enabled
test: Turn off tablets feature by default
test: Move test_tablet_drain_failure_during_decommission to another suite
test/tablets: Enable tables for real on test keyspace
test/tablets: Make timestamp local
cql3: Add feature service to as_ks_metadata_update()
cql3: Add feature service to ks_prop_defs::as_ks_metadata()
cql3: Add feature service to get_keyspace_metadata()
cql: Add tablets on/off switch to CREATE KEYSPACE
cql: Move initial_tablets from REPLICATION to TABLETS in DDL
network_topology_strategy: Estimate initial_tablets if 0 is set
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.
It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16758
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we
* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16765
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.
we will promote this change to a shared place if we have similar
needs in other tests as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16775
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.
To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.
First, the tables map will no longer be empty on test start.
Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.
This patch fixes all three places:
1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the user can do
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }
to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.
Also one can do just
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }
and let Scylla select the initial tablets value by its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch changes the syntax of enabling tablets from
CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }
to be
CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }
and updates all tests accordingly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test runs two bootstraps and checks that there is no cleanup
in between. Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again. Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete. Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
In the next patch we want to abort topology operations if there is no
enough live nodes to perform them. This will break tests that do a
topology operation right after restarting a node since a topology
coordinator may still not see the restarted node as alive. Fix all those
tests to wait between restart and a topology operation until UP state
propagates.
The loop in `id2ip` lambda makes problems if we are applying an old raft
log that contains long-gone nodes. In this case, we may never receive
the `IP` for a node and stuck in the loop forever. In this series we
replace the loop with an if - we just don't update the `host_id <-> ip`
mapping in the `token_metadata.topology` if we don't have an `IP` yet.
The PR moves `host_id -> IP` resolution to the data plane, now it
happens each time the IP-based methods of `erm` are called. We need this
because IPs may not be known at the time the erm is built. The overhead
of `raft_address_map` lookup is added to each data plane request, but it
should be negligible. In this PR `erm/resolve_endpoints` continues to
treat missing IP for `host_id` as `internal_error`, but we plan to relax
this in the follow-up (see this PR first comment).
Closesscylladb/scylladb#16639
* github.com:scylladb/scylladb:
raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater
gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
storage_service: topology_state_load: remove IP waiting loop
storage_service: sync_raft_topology_nodes: add target_node parameter
storage_service: sync_raft_topology_nodes: move loops to the end
storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node
storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node
storage_service: sync_raft_topology_nodes: move update_topology up
storage_service: topology_state_load: remove clone_async/clear_gently overhead
storage_service: fix indentation
storage_service: extract sync_raft_topology_nodes
storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
address_map: move gossiper subscription logic into storage_service
topology_coordinator: exec_global_command: small refactor, use contains + reformat
storage_service: wait_for_ip for new nodes
storage_service.idl.hh: fix raft_topology_cmd.command declaration
erm: for_each_natural_endpoint_until: use is_vnode == true
erm: switch the internal data structures to host_id-s
erm: has_pending_ranges: switch to host_id
When a node changes its IP we need to store the mapping in
system.peers and update token_metadata.topology and erm
in-memory data structures.
The test_change_ip was improved to verify this new
behaviour. Before this patch the test didn't check
that IPs used for data requests are updated on
IP change. In this commit we add the read/write check.
It fails on insert with 'node unavailable'
error without the fix.
We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.
In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<T>&)
from auth_resource_test.cc.
It prepares the test for removal of the templated helpers
from utils/to_string.hh, which is one of goals of the
referenced issue that is linked below.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16754
This patch adds a reproducer test for the memory leak described in
issue #16493: If a table is repeatedly created and dropped, memory
is leaked by task tracking. Although this "leak" can be temporary
if task_ttl_in_seconds is properly configured, it may still use too
much memory if tables are too frequently created and dropped.
The test here shows that (before #16493 was fixed) as little as
100 tables created and deleted can cause Scylla to run out of
memory.
The problem is severely exacerbated when tablets are used which is
why the test here uses tablets. Before the fix for #16493 (a Seastar
patch, scylladb/seastar#2023), this test of 100 iterations always
failed (with test/cql-pytest/run's default memory allowance).
After the fix, the test doesn't fail in 100 iterations - and even
if increased manually to 10,000 iterations it doesn't fail.
The new test uses the initial_tablets feature, so requires Scylla to be
run with the "tablets" experimental option turned on. This is not
currently the default of test.py or test/cql-pytest/run, so I turned
it on manually to check this test. I also checked that the test is
correctly skipped if tablets are not turned on.
Refs #16493
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16717
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.
Fixes: #16735
Refs: #16694.
Closesscylladb/scylladb#16698
* github.com:scylladb/scylladb:
compaction: make regular compaction tasks internal
tasks: don't keep internal root tasks after they complete
this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree.
Closesscylladb/scylladb#16726
* github.com:scylladb/scylladb:
utils/pretty_printers: add "I" specifier support
utils/pretty_printers: use the formatting of to_hr_size()
before this change, "{:d}" is used for formatting `test_data` y
bptree_stress_test.cc. but the "d" specifier is only used for
formatting integers, not for formatting `test_data` or generic
data types, so this fails when the test is compiled with {fmt} v10,
like:
```
In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20:
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression
294 | fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd);
| ^
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here
267 | return forward_check();
| ^
/home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here
92 | if (!itc->step()) {
| ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
2322 | if (!in(arg_type, set)) throw_format_error("invalid format specifier");
| ^
```
in this change, instead of specifying "{:d}", let's just use "{}",
which works for both integer and `test_data`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16727
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.
Fixes#16180
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#16658
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.
tests are updated to exercise both IEC and SI notations.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionStaticsTest.java into our
cql-pytest framework.
This test file checks various LWT conditional updates which involve
static columns or UDTs (there are separate test file for LWT conditional
updates that do not involve static columns).
This test did not uncover any new bugs, but demonstrates yet again
several places where we intentionally deviated from Cassandra's behavior,
forcing me to add "is_scylla" checks in many of the checks to allow
them to pass on both Scylla and Cassanda. These deviations are known,
intentional and some are documented in docs/kb/lwt-differences.rst but
not all, so it's worth listing here the ones re-discovered by this test:
1. On a successful conditional write, Cassandra returns just True, Scylla
also returns the old contents of the row. This difference is officially
documented in docs/kb/lwt-differences.rst.
2. On a batch request, Scylla always returns a row per statement,
Cassandra doesn't - it often returns just a single failed row,
or just True if the whole batch succeeded. This difference is
officially documented in docs/kb/lwt-differences.rst.
3. In a DELETE statement with a condition, in the returned row
Cassandra lists the deleted column first - while Scylla lists
the static column first (as in any other row). This difference
is probably inconsequential, because columns also have names
so their order in the response usually doesn't matter.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16643
The recently-added test test_fromjson_timestamp_submilli demonstrated a
difference between Scylla's and Cassandra's parsing timestamps in JSON:
Trying to use too many (more than 3) digits of precision is forbidden
in Scylla, but ignored in Cassandra. So we marked the test "xfail",
suggesting we think it's a Scylla bug that should be fixed in the future.
However, it turns out that we already had a different test,
test_type_timestamp_from_string_overprecise, which showed the same
difference in a different context (without JSON). In that older test,
the decision was to consider this a Cassandra bug, not Scylla bug -
because Cassandra seemingly allows the sub-millisecond timestap but
in reality drops the extra precision.
So we need to be consistent in the tests - this is either a Scylla bug
or a Cassandra bug, we can't make once choice in one test and another
in a different test :-) So let's accept our older decision, and consider
Scylla's behavior the correct one in this case.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16586
when stopping the ManagerClient, it would be better to close
all connected connector, otherwise aiohttp complains like:
```
13:57:53.763 ERROR> Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]']
connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890>
```
this warning message is printed to the console, and it is distracting
when testing manually.
so, in this change, let's close the client connecting to unix domain
socket.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16675
this target is used by test.py for enumerating unit tests
* test/CMakeLists.txt: append executable's full path to
`scylla_tests`. add `unit_test_list` target printing
`scylla_tests`, please note, `cmake -E echo` does not
support the `-e` option of `echo`, and ninja does not
support command line with newline in it, we have to use
`echo` to print the list of tests.
* test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests
only if $PWD/suite.yaml exists. we could hardwire this
logic in these files, as it is known that this file
exists in these directory, but this is still put this way,
so that it serves as a comment explaining that the reason
why we update scylla_tests here but not somewhere else
where we also use `add_scylla_test()` function is just
suite.yaml exists here.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16702
Regular compaction tasks are internal.
Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level.
If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR).
Fixes: #16538Closesscylladb/scylladb#16657
* github.com:scylladb/scylladb:
tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
tools/scylla-sstable: allow always passing --scylla-yaml-file option
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.
This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.
Fixesscylladb/scylladb#16109Closesscylladb/scylladb#16584
* github.com:scylladb/scylladb:
test: add test_remove_alive_node
topology_coordinator: reject removenode if the removed node is alive
test: ManagerClient: remove unused wait_for_host_down
test: remove_node: wait until the node being removed is dead
The default error handler throws an exception, which means
scylla-sstable will exit with exception if there is any problem in the
configuration. Not even ScyllaDB itself is this harsh -- it will just
log a warning for most errors. A tool should be much more lenient. So
this patch passes an error handler which just logs all errors with debug
level.
If reading an sstable fails, the user is expected to investigate turning
debug-level logging on. When they do so, they will see any problems
while reading the configuration (if it is relevant, e.g. when using EAR).
Fixes: #16538
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.
Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
The test test_filter_expression.py::test_filter_expression_precedence
is flaky - and can fail very rarely (so far we've only actually seen it
fail once). The problem is that the test generates items with random
clustering keys, chosen as an integer between 1 and 1 million, and there
is a chance (roughly 2/10,000) that two of the 20 items happen to have the
same key, so one of the items is "lost" and the comparison we do to the
expected truth fails.
The solution is to just use sequential keys, not random keys.
There is nothing to gain in this test by using random keys.
To make this test bug easy to reproduce, I temporarily changed
random_i()'s range from 1,000,000 to 3, and saw the test failing every
single run before this patch. After this patch - no longer using
random_i() for the keys - the test doesn't fail any more.
Fixes#16647
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16649
Bootstrap cannot proceed if cdc generation propagation to all nodes
fails, so the patch series handles the error by rolling the ongoing
topology operation back.
* 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev:
test: add test to check failure handling in cdc generation commit
storage_service: topology coordinator: rollback on failure to commit cdc generation
Currently, `add_saved_endpoint` is called from two paths: One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.
In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet. However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.
Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.
This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation. An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.
Fixes#16429Closesscylladb/scylladb#16432
* github.com:scylladb/scylladb:
gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
storage_service: topology_state_load: lock endpoint for add_saved_endpoint
raft_group_registry: move on_alive error injection to gossiper
* seastar e0d515b6...70349b74 (33):
> util/log: drop unused function
> util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0
> Fix edge case in memory sampler at OOM
> exp/geo distribution benchmark
> Additional allocation tests
> Remove null pointer check on free hot path
> Optimize final part of allocation hot path
> Optimize zero size checking in allocator
> memory: Optimize free fast path
> memory: Optimize small alloc alloation path
> memory: Limit alloc_sites size
> memory: Add general comment about sampling strategy
> memory: Use probabilistic sampler
> util: Adapt memory sampler to seastar
> util: Import Android Memory Sampler
> memory: Use separate small pool for tracking sampled allocations
> memory: Support enabling memory profiling at runtime
> util/source_location-compat: mark `source_location::current()` consteval
> build: use new behavior defined by CMP0155 when building C++ modules
> circleci: build with C++20 modules enabled
> seastar.cc: replace cryptopp with gnutls when building seastar modules
> alien: include used header
> seastar.cc: include used headers in the global purview
> docker: install clang-tools-17
> net/tcp: generate a random src_port hashed to current shard if smp::count > 1
> net, websocket: replace Crypto++ calls with GnuTLS
> README-DPDK.md: point user to DPDK's quick start guide
> reactor: print fatal error using logger as well
> Avoid ping-pong in spinlock::lock
> memory: Add allocator perf tests
> memory: Add a basic sized deletion test
> Prometheus: Disable Prometheus protobuf with a configuration
> treewide: bring back prometheus protobuf support
* test/manual/sstable_scan_footprint_test: update to adapt to the
breaking change of "memory: Use probabilistic sampler" in seastar
Closesscylladb/scylladb#16610
Boost::dynamic_linking was introduced as a compatibility target
which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since
Scylla only runs on Linux, there is no need to link against this
library.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16544
we format `std::variant<std::monostate, seastar::timed_out_error,
raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown,
raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>`
in this source file. and currently, we format `std::variant<..>` using
the default-generated `fmt::formatter` from `operator<<`, so in order to
format it using {fmt}'s compile-time check enabled, we have to make the
`operator<<` overload for `std::variant<...>` visible from the caller
sites which format `std::variant<...>` using {fmt}.
in this change, the `operator<<` for `std::variant<...>` is moved to
from the middle of the source file to the top of it, so that it can
be found when the compiler looks up for a matched `fmt::formatter`
for `std::variant<...>`.
please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`,
as its specialization for `std::variant` requires that all the types
of the variant is `is_formattable`. but the default generated formatter
for type `T` is not considered as the proof that `T` is formattable.
this should address the FTBFS with the latest seastar like:
```
/usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>')
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16616
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.
despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.
Refs https://github.com/scylladb/scylladb/issues/13245Closesscylladb/scylladb#16614
* github.com:scylladb/scylladb:
test: randomized_nemesis_test: add formatter for append_entry
test: randomized_nemesis_test: move append_reg_model::entry out
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.
Closesscylladb/scylladb#16571
* github.com:scylladb/scylladb:
table, memtable: share log-structured allocator statistics across all memtables in a table
memtable: consolidate _read_section, _allocating_section in a struct