Commit Graph

6125 Commits

Author SHA1 Message Date
Eliran Sinvani
00a55abdd6 code coverage: Add libraries for coverage handling
Coverage handling is divided into 3 steps:
1. Generation of  profiling data from a run of an instrumented file
   (which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
   producing the data in some format that can be manipulated and
   unified.
3. Generate some reporting based on this data.

The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
   data, it also contains a cli for the (assumed) most common operations
   that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
   which is a textual form containing a source dependant coverage data.
   An example of such manipulation can be `coverage diff` operation
   which produces a set like difference operation. cov_a - cov_b = diff
   where diff is an lcov formated file containing coverage data for code
   cov_a that is not covered at all in cov_b.

The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.

This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
f4b6c9074a test.py: support --coverage and --coverage-mode
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.

For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
             coverage.
--coverage-mode - which can be used to "turn on" coverage support only
                  for some of the modes in this run.

The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.

Also added support of  'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Botond Dénes
f22fc88a64 Merge 'Configure service levels interval' from Michał Jadwiszczak
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.

This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.

Closes scylladb/scylladb#16394

* github.com:scylladb/scylladb:
  test:cql-pytest: change service levels intervals in tests
  configure service levels interval
2024-01-17 12:24:49 +02:00
Calle Wilund
af0772d605 commitlog: Add wait_for_pending_deletes
Refs #16757

Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.

Test included.

Closes scylladb/scylladb#16801
2024-01-17 09:30:55 +02:00
Tomasz Grabiec
49026dc319 Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.

This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is

* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing

fixes: #16319

Closes scylladb/scylladb#16364

* github.com:scylladb/scylladb:
  code: Enable tablets if cluster feature is enabled
  test: Turn off tablets feature by default
  test: Move test_tablet_drain_failure_during_decommission to another suite
  test/tablets: Enable tables for real on test keyspace
  test/tablets: Make timestamp local
  cql3: Add feature service to as_ks_metadata_update()
  cql3: Add feature service to ks_prop_defs::as_ks_metadata()
  cql3: Add feature service to get_keyspace_metadata()
  cql: Add tablets on/off switch to CREATE KEYSPACE
  cql: Move initial_tablets from REPLICATION to TABLETS in DDL
  network_topology_strategy: Estimate initial_tablets if 0 is set
2024-01-16 00:15:10 +01:00
Patryk Wrobel
aec0db1b96 cql_auth_query_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.

It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16758
2024-01-15 13:30:05 +02:00
Kefu Chai
fc97d91f1a auth: add fmt::format for auth::resource and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16765
2024-01-15 13:26:39 +02:00
Kefu Chai
218334eaf5 test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.

we will promote this change to a shared place if we have similar
needs in other tests as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16775
2024-01-15 12:52:35 +02:00
Pavel Emelyanov
dd892b0d8a code: Enable tablets if cluster feature is enabled
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.

To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4838eeb201 test: Turn off tablets feature by default
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
ae7da54f88 test: Move test_tablet_drain_failure_during_decommission to another suite
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
46b36d8c07 test/tablets: Enable tables for real on test keyspace
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.

First, the tables map will no longer be empty on test start.

Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.

This patch fixes all three places:

1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
2376b699e0 test/tablets: Make timestamp local
Just to make next patching simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
6cb3055059 cql: Add tablets on/off switch to CREATE KEYSPACE
Now the user can do

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }

to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.

Also one can do just

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }

and let Scylla select the initial tablets value by its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:11 +03:00
Pavel Emelyanov
941f6d8fca cql: Move initial_tablets from REPLICATION to TABLETS in DDL
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Gleb Natapov
f8b90aeb14 test: add test for automatic cleanup procedure
The test runs two bootstraps and checks that there is no cleanup
in between.  Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again.  Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
2024-01-14 15:45:53 +02:00
Gleb Natapov
5882855669 test: add test for topology requests queue management
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete.  Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
2024-01-14 15:45:53 +02:00
Gleb Natapov
97ab3f6622 storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
2024-01-14 15:45:53 +02:00
Gleb Natapov
0e68073b22 test: use servers_see_each_other when needed
In the next patch we want to abort topology operations if there is no
enough live nodes to perform them. This will break tests that do a
topology operation right after restarting a node since a topology
coordinator may still not see the restarted node as alive. Fix all those
tests to wait between restart and a topology operation until UP state
propagates.
2024-01-14 14:44:07 +02:00
Gleb Natapov
455ffaf5d8 test: add servers_see_each_other helper
The helper makes sure that all nodes in the cluster see each other as
alive.
2024-01-14 14:44:07 +02:00
Kamil Braun
4e18f8b453 Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev
The loop in `id2ip` lambda makes problems if we are applying an old raft
log that contains long-gone nodes. In this case, we may never receive
the `IP` for a node and stuck in the loop forever. In this series we
replace the loop with an if - we just don't update the `host_id <-> ip`
mapping in the `token_metadata.topology` if we don't have an `IP` yet.

The PR moves `host_id -> IP` resolution to the data plane, now it
happens each time the IP-based methods of `erm` are called. We need this
because IPs may not be known at the time the erm is built. The overhead
of `raft_address_map` lookup is added to each data plane request, but it
should be negligible. In this PR `erm/resolve_endpoints` continues to
treat missing IP for `host_id` as `internal_error`, but we plan to relax
this in the follow-up (see this PR first comment).

Closes scylladb/scylladb#16639

* github.com:scylladb/scylladb:
  raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater
  gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
  storage_service: topology_state_load: remove IP waiting loop
  storage_service: sync_raft_topology_nodes: add target_node parameter
  storage_service: sync_raft_topology_nodes: move loops to the end
  storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node
  storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node
  storage_service: sync_raft_topology_nodes: move update_topology up
  storage_service: topology_state_load: remove clone_async/clear_gently overhead
  storage_service: fix indentation
  storage_service: extract sync_raft_topology_nodes
  storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
  address_map: move gossiper subscription logic into storage_service
  topology_coordinator: exec_global_command: small refactor, use contains + reformat
  storage_service: wait_for_ip for new nodes
  storage_service.idl.hh: fix raft_topology_cmd.command declaration
  erm: for_each_natural_endpoint_until: use is_vnode == true
  erm: switch the internal data structures to host_id-s
  erm: has_pending_ranges: switch to host_id
2024-01-12 18:46:51 +01:00
Petr Gusev
6e7bbc94f4 gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
When a node changes its IP we need to store the mapping in
system.peers and update token_metadata.topology and erm
in-memory data structures.

The test_change_ip was improved to verify this new
behaviour. Before this patch the test didn't check
that IPs used for data requests are updated on
IP change. In this commit we add the read/write check.
It fails on insert with 'node unavailable'
error without the fix.
2024-01-12 18:28:57 +04:00
Petr Gusev
15b8e565ed address_map: move gossiper subscription logic into storage_service
We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.

In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
2024-01-12 15:37:50 +04:00
Michał Jadwiszczak
013487e1e1 test:cql-pytest: change service levels intervals in tests
Set the interval to 0.5s to reduce required sleep time.
2024-01-12 10:28:28 +01:00
Patryk Wrobel
87545e40c7 test/boost/auth_resource_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<T>&)
from auth_resource_test.cc.

It prepares the test for removal of the templated helpers
from utils/to_string.hh, which is one of goals of the
referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16754
2024-01-12 10:48:01 +02:00
Nadav Har'El
5c7e029012 test/cql-pytest: add reproducer for task-tracking memory leak
This patch adds a reproducer test for the memory leak described in
issue #16493: If a table is repeatedly created and dropped, memory
is leaked by task tracking. Although this "leak" can be temporary
if task_ttl_in_seconds is properly configured, it may still use too
much memory if tables are too frequently created and dropped.
The test here shows that (before #16493 was fixed) as little as
100 tables created and deleted can cause Scylla to run out of
memory.

The problem is severely exacerbated when tablets are used which is
why the test here uses tablets. Before the fix for #16493 (a Seastar
patch, scylladb/seastar#2023), this test of 100 iterations always
failed (with test/cql-pytest/run's default memory allowance).
After the fix, the test doesn't fail in 100 iterations - and even
if increased manually to 10,000 iterations it doesn't fail.

The new test uses the initial_tablets feature, so requires Scylla to be
run with the "tablets" experimental option turned on. This is not
currently the default of test.py or test/cql-pytest/run, so I turned
it on manually to check this test. I also checked that the test is
correctly skipped if tablets are not turned on.

Refs #16493

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16717
2024-01-12 09:37:32 +02:00
Botond Dénes
697ebef149 Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.

Fixes: #16735
Refs: #16694.

Closes scylladb/scylladb#16698

* github.com:scylladb/scylladb:
  compaction: make regular compaction tasks internal
  tasks: don't keep internal root tasks after they complete
2024-01-11 12:10:44 +02:00
Botond Dénes
ac69473bac Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai
this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree.

Closes scylladb/scylladb#16726

* github.com:scylladb/scylladb:
  utils/pretty_printers: add "I" specifier support
  utils/pretty_printers: use the formatting of to_hr_size()
2024-01-11 10:54:14 +02:00
Kefu Chai
0c2ef5de54 test/unit/bptree_validation: use "{}" for formatting test_data
before this change, "{:d}" is used for formatting `test_data` y
bptree_stress_test.cc. but the "d" specifier is only used for
formatting integers, not for formatting `test_data` or generic
data types, so this fails when the test is compiled with {fmt} v10,
like:

```
In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20:
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression
  294 |             fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd);
      |                                   ^
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here
  267 |             return forward_check();
      |                    ^
/home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here
   92 |                         if (!itc->step()) {
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
```

in this change, instead of specifying "{:d}", let's just use "{}",
which works for both integer and `test_data`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16727
2024-01-11 10:53:33 +02:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Kefu Chai
f11a53856d utils/pretty_printers: add "I" specifier support
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:47 +08:00
Kefu Chai
7d627b328f utils/pretty_printers: use the formatting of to_hr_size()
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.

tests are updated to exercise both IEC and SI notations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:03 +08:00
Nadav Har'El
39dd2a2690 cql-pytest: translated Cassandra's test for LWT with static column
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionStaticsTest.java into our
cql-pytest framework.

This test file checks various LWT conditional updates which involve
static columns or UDTs (there are separate test file for LWT conditional
updates that do not involve static columns).

This test did not uncover any new bugs, but demonstrates yet again
several places where we intentionally deviated from Cassandra's behavior,
forcing me to add "is_scylla" checks in many of the checks to allow
them to pass on both Scylla and Cassanda. These deviations are known,
intentional and some are documented in docs/kb/lwt-differences.rst but
not all, so it's worth listing here the ones re-discovered by this test:

    1. On a successful conditional write, Cassandra returns just True, Scylla
       also returns the old contents of the row. This difference is officially
       documented in docs/kb/lwt-differences.rst.

    2. On a batch request, Scylla always returns a row per statement,
       Cassandra doesn't - it often returns just a single failed row,
       or just True if the whole batch succeeded. This difference is
       officially documented in docs/kb/lwt-differences.rst.

    3. In a DELETE statement with a condition, in the returned row
       Cassandra lists the deleted column first - while Scylla lists
       the static column first (as in any other row). This difference
       is probably inconsequential, because columns also have names
       so their order in the response usually doesn't matter.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16643
2024-01-10 12:14:06 +02:00
Nadav Har'El
b1a441ba56 test/cql-pytest: correct xfail status of timestamp parser
The recently-added test test_fromjson_timestamp_submilli demonstrated a
difference between Scylla's and Cassandra's parsing timestamps in JSON:
Trying to use too many (more than 3) digits of precision is forbidden
in Scylla, but ignored in Cassandra. So we marked the test "xfail",
suggesting we think it's a Scylla bug that should be fixed in the future.

However, it turns out that we already had a different test,
test_type_timestamp_from_string_overprecise, which showed the same
difference in a different context (without JSON). In that older test,
the decision was to consider this a Cassandra bug, not Scylla bug -
because Cassandra seemingly allows the sub-millisecond timestap but
in reality drops the extra precision.

So we need to be consistent in the tests - this is either a Scylla bug
or a Cassandra bug, we can't make once choice in one test and another
in a different test :-) So let's accept our older decision, and consider
Scylla's behavior the correct one in this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16586
2024-01-10 12:12:26 +02:00
Kefu Chai
317af97e41 test/pylib: shutdown unix RESTful client
when stopping the ManagerClient, it would be better to close
all connected connector, otherwise aiohttp complains like:

```
13:57:53.763 ERROR> Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]']
connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890>
```

this warning message is printed to the console, and it is distracting
when testing manually.

so, in this change, let's close the client connecting to unix domain
socket.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16675
2024-01-10 11:07:14 +02:00
Kefu Chai
0dc7db54d1 build: cmake: add "unit_test_list" target
this target is used by test.py for enumerating unit tests

* test/CMakeLists.txt: append executable's full path to
  `scylla_tests`. add `unit_test_list` target printing
  `scylla_tests`, please note, `cmake -E echo` does not
  support the `-e` option of `echo`, and ninja does not
  support command line with newline in it, we have to use
  `echo` to print the list of tests.
* test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests
  only if $PWD/suite.yaml exists. we could hardwire this
  logic in these files, as it is known that this file
  exists in these directory, but this is still put this way,
  so that it serves as a comment explaining that the reason
  why we update scylla_tests here but not somewhere else
  where we also use `add_scylla_test()` function is just
  suite.yaml exists here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16702
2024-01-10 08:43:04 +02:00
Aleksandra Martyniuk
6b87778ef2 compaction: make regular compaction tasks internal
Regular compaction tasks are internal.

Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
2024-01-09 13:13:54 +01:00
Pavel Emelyanov
cdf5124003 Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes
The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level.
If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538

Closes scylladb/scylladb#16657

* github.com:scylladb/scylladb:
  tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
  tools/scylla-sstable: allow always passing --scylla-yaml-file option
2024-01-09 14:28:49 +03:00
Kamil Braun
d4f4b58f3a Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.

This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.

Fixes scylladb/scylladb#16109

Closes scylladb/scylladb#16584

* github.com:scylladb/scylladb:
  test: add test_remove_alive_node
  topology_coordinator: reject removenode if the removed node is alive
  test: ManagerClient: remove unused wait_for_host_down
  test: remove_node: wait until the node being removed is dead
2024-01-08 12:39:23 +01:00
Botond Dénes
9119bcbd67 tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
The default error handler throws an exception, which means
scylla-sstable will exit with exception if there is any problem in the
configuration. Not even ScyllaDB itself is this harsh -- it will just
log a warning for most errors. A tool should be much more lenient. So
this patch passes an error handler which just logs all errors with debug
level.
If reading an sstable fails, the user is expected to investigate turning
debug-level logging on. When they do so, they will see any problems
while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538
2024-01-08 02:18:15 -05:00
Patryk Jędrzejczak
df2034ebd7 server, raft_group0_client: remove the default nullptr values
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.

Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
2024-01-05 18:45:50 +01:00
Nadav Har'El
94580df1c5 test/alternator: fix flaky test in test_filter_expression.py
The test test_filter_expression.py::test_filter_expression_precedence
is flaky - and can fail very rarely (so far we've only actually seen it
fail once). The problem is that the test generates items with random
clustering keys, chosen as an integer between 1 and 1 million, and there
is a chance (roughly 2/10,000) that two of the 20 items happen to have the
same key, so one of the items is "lost" and the comparison we do to the
expected truth fails.

The solution is to just use sequential keys, not random keys.
There is nothing to gain in this test by using random keys.

To make this test bug easy to reproduce, I temporarily changed
random_i()'s range from 1,000,000 to 3, and saw the test failing every
single run before this patch. After this patch - no longer using
random_i() for the keys - the test doesn't fail any more.

Fixes #16647

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16649
2024-01-04 21:36:40 +02:00
Kamil Braun
bf068dd023 Merge handle error in cdc generation propagation during bootstrap from Gleb
Bootstrap cannot proceed if cdc generation propagation to all nodes
fails, so the patch series handles the error by rolling the ongoing
topology operation back.

* 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev:
  test: add test to check failure handling in cdc generation commit
  storage_service: topology coordinator: rollback on failure to commit cdc generation
2024-01-04 15:38:51 +01:00
Kamil Braun
f942bf4a1f Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy
Currently, `add_saved_endpoint` is called from two paths:  One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.

In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet.  However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.

Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.

This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation.  An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.

Fixes #16429

Closes scylladb/scylladb#16432

* github.com:scylladb/scylladb:
  gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
  storage_service: topology_state_load: lock endpoint for add_saved_endpoint
  raft_group_registry: move on_alive error injection to gossiper
2024-01-04 14:47:10 +01:00
Kefu Chai
cf932888de Update seastar submodule
* seastar e0d515b6...70349b74 (33):
  > util/log: drop unused function
  > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0
  > Fix edge case in memory sampler at OOM
  > exp/geo distribution benchmark
  > Additional allocation tests
  > Remove null pointer check on free hot path
  > Optimize final part of allocation hot path
  > Optimize zero size checking in allocator
  > memory: Optimize free fast path
  > memory: Optimize small alloc alloation path
  > memory: Limit alloc_sites size
  > memory: Add general comment about sampling strategy
  > memory: Use probabilistic sampler
  > util: Adapt memory sampler to seastar
  > util: Import Android Memory Sampler
  > memory: Use separate small pool for tracking sampled allocations
  > memory: Support enabling memory profiling at runtime
  > util/source_location-compat: mark `source_location::current()` consteval
  > build: use new behavior defined by CMP0155 when building C++ modules
  > circleci: build with C++20 modules enabled
  > seastar.cc: replace cryptopp with gnutls when building seastar modules
  > alien: include used header
  > seastar.cc: include used headers in the global purview
  > docker: install clang-tools-17
  > net/tcp: generate a random src_port hashed to current shard if smp::count > 1
  > net, websocket: replace Crypto++ calls with GnuTLS
  > README-DPDK.md: point user to DPDK's quick start guide
  > reactor: print fatal error using logger as well
  > Avoid ping-pong in spinlock::lock
  > memory: Add allocator perf tests
  > memory: Add a basic sized deletion test
  > Prometheus: Disable Prometheus protobuf with a configuration
  > treewide: bring back prometheus protobuf support
* test/manual/sstable_scan_footprint_test: update to adapt to the
  breaking change of "memory: Use probabilistic sampler" in seastar

Closes scylladb/scylladb#16610
2024-01-04 09:36:53 +02:00
Kefu Chai
50cf62e186 build: cmake: do not link against Boost::dynamic_linking
Boost::dynamic_linking was introduced as a compatibility target
which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since
Scylla only runs on Linux, there is no need to link against this
library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16544
2024-01-04 08:06:19 +02:00
Kefu Chai
2ad532df43 test: randomized_nemesis_test: move std::variant formatter up
we format `std::variant<std::monostate, seastar::timed_out_error,
raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown,
raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>`
in this source file. and currently, we format `std::variant<..>` using
the default-generated `fmt::formatter` from `operator<<`, so in order to
format it using {fmt}'s compile-time check enabled, we have to make the
`operator<<` overload for `std::variant<...>` visible from the caller
sites which format `std::variant<...>` using {fmt}.

in this change, the `operator<<` for `std::variant<...>` is moved to
from the middle of the source file to the top of it, so that it can
be found when the compiler looks up for a matched `fmt::formatter`
for `std::variant<...>`.

please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`,
as its specialization for `std::variant` requires that all the types
of the variant is `is_formattable`. but the default generated formatter
for type `T` is not considered as the proof that `T` is formattable.

this should address the FTBFS with the latest seastar like:

```
 /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>')
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16616
2024-01-03 16:38:25 +01:00
Avi Kivity
20531872a7 Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#16614

* github.com:scylladb/scylladb:
  test: randomized_nemesis_test: add formatter for append_entry
  test: randomized_nemesis_test: move append_reg_model::entry out
2024-01-03 15:06:33 +02:00
Tomasz Grabiec
715e062d4a Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.

Closes scylladb/scylladb#16571

* github.com:scylladb/scylladb:
  table, memtable: share log-structured allocator statistics across all memtables in a table
  memtable: consolidate _read_section, _allocating_section in a struct
2024-01-03 14:03:40 +01:00