Commit Graph

11801 Commits

Author SHA1 Message Date
Kefu Chai
0c2ef5de54 test/unit/bptree_validation: use "{}" for formatting test_data
before this change, "{:d}" is used for formatting `test_data` y
bptree_stress_test.cc. but the "d" specifier is only used for
formatting integers, not for formatting `test_data` or generic
data types, so this fails when the test is compiled with {fmt} v10,
like:

```
In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20:
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression
  294 |             fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd);
      |                                   ^
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here
  267 |             return forward_check();
      |                    ^
/home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here
   92 |                         if (!itc->step()) {
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
```

in this change, instead of specifying "{:d}", let's just use "{}",
which works for both integer and `test_data`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16727
2024-01-11 10:53:33 +02:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Kefu Chai
f11a53856d utils/pretty_printers: add "I" specifier support
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:47 +08:00
Kefu Chai
7d627b328f utils/pretty_printers: use the formatting of to_hr_size()
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.

tests are updated to exercise both IEC and SI notations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:03 +08:00
Nadav Har'El
39dd2a2690 cql-pytest: translated Cassandra's test for LWT with static column
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionStaticsTest.java into our
cql-pytest framework.

This test file checks various LWT conditional updates which involve
static columns or UDTs (there are separate test file for LWT conditional
updates that do not involve static columns).

This test did not uncover any new bugs, but demonstrates yet again
several places where we intentionally deviated from Cassandra's behavior,
forcing me to add "is_scylla" checks in many of the checks to allow
them to pass on both Scylla and Cassanda. These deviations are known,
intentional and some are documented in docs/kb/lwt-differences.rst but
not all, so it's worth listing here the ones re-discovered by this test:

    1. On a successful conditional write, Cassandra returns just True, Scylla
       also returns the old contents of the row. This difference is officially
       documented in docs/kb/lwt-differences.rst.

    2. On a batch request, Scylla always returns a row per statement,
       Cassandra doesn't - it often returns just a single failed row,
       or just True if the whole batch succeeded. This difference is
       officially documented in docs/kb/lwt-differences.rst.

    3. In a DELETE statement with a condition, in the returned row
       Cassandra lists the deleted column first - while Scylla lists
       the static column first (as in any other row). This difference
       is probably inconsequential, because columns also have names
       so their order in the response usually doesn't matter.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16643
2024-01-10 12:14:06 +02:00
Nadav Har'El
b1a441ba56 test/cql-pytest: correct xfail status of timestamp parser
The recently-added test test_fromjson_timestamp_submilli demonstrated a
difference between Scylla's and Cassandra's parsing timestamps in JSON:
Trying to use too many (more than 3) digits of precision is forbidden
in Scylla, but ignored in Cassandra. So we marked the test "xfail",
suggesting we think it's a Scylla bug that should be fixed in the future.

However, it turns out that we already had a different test,
test_type_timestamp_from_string_overprecise, which showed the same
difference in a different context (without JSON). In that older test,
the decision was to consider this a Cassandra bug, not Scylla bug -
because Cassandra seemingly allows the sub-millisecond timestap but
in reality drops the extra precision.

So we need to be consistent in the tests - this is either a Scylla bug
or a Cassandra bug, we can't make once choice in one test and another
in a different test :-) So let's accept our older decision, and consider
Scylla's behavior the correct one in this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16586
2024-01-10 12:12:26 +02:00
Kefu Chai
317af97e41 test/pylib: shutdown unix RESTful client
when stopping the ManagerClient, it would be better to close
all connected connector, otherwise aiohttp complains like:

```
13:57:53.763 ERROR> Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]']
connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890>
```

this warning message is printed to the console, and it is distracting
when testing manually.

so, in this change, let's close the client connecting to unix domain
socket.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16675
2024-01-10 11:07:14 +02:00
Kefu Chai
0dc7db54d1 build: cmake: add "unit_test_list" target
this target is used by test.py for enumerating unit tests

* test/CMakeLists.txt: append executable's full path to
  `scylla_tests`. add `unit_test_list` target printing
  `scylla_tests`, please note, `cmake -E echo` does not
  support the `-e` option of `echo`, and ninja does not
  support command line with newline in it, we have to use
  `echo` to print the list of tests.
* test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests
  only if $PWD/suite.yaml exists. we could hardwire this
  logic in these files, as it is known that this file
  exists in these directory, but this is still put this way,
  so that it serves as a comment explaining that the reason
  why we update scylla_tests here but not somewhere else
  where we also use `add_scylla_test()` function is just
  suite.yaml exists here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16702
2024-01-10 08:43:04 +02:00
Aleksandra Martyniuk
6b87778ef2 compaction: make regular compaction tasks internal
Regular compaction tasks are internal.

Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
2024-01-09 13:13:54 +01:00
Pavel Emelyanov
cdf5124003 Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes
The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level.
If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538

Closes scylladb/scylladb#16657

* github.com:scylladb/scylladb:
  tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
  tools/scylla-sstable: allow always passing --scylla-yaml-file option
2024-01-09 14:28:49 +03:00
Kamil Braun
d4f4b58f3a Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.

This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.

Fixes scylladb/scylladb#16109

Closes scylladb/scylladb#16584

* github.com:scylladb/scylladb:
  test: add test_remove_alive_node
  topology_coordinator: reject removenode if the removed node is alive
  test: ManagerClient: remove unused wait_for_host_down
  test: remove_node: wait until the node being removed is dead
2024-01-08 12:39:23 +01:00
Botond Dénes
9119bcbd67 tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
The default error handler throws an exception, which means
scylla-sstable will exit with exception if there is any problem in the
configuration. Not even ScyllaDB itself is this harsh -- it will just
log a warning for most errors. A tool should be much more lenient. So
this patch passes an error handler which just logs all errors with debug
level.
If reading an sstable fails, the user is expected to investigate turning
debug-level logging on. When they do so, they will see any problems
while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538
2024-01-08 02:18:15 -05:00
Patryk Jędrzejczak
df2034ebd7 server, raft_group0_client: remove the default nullptr values
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.

Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
2024-01-05 18:45:50 +01:00
Nadav Har'El
94580df1c5 test/alternator: fix flaky test in test_filter_expression.py
The test test_filter_expression.py::test_filter_expression_precedence
is flaky - and can fail very rarely (so far we've only actually seen it
fail once). The problem is that the test generates items with random
clustering keys, chosen as an integer between 1 and 1 million, and there
is a chance (roughly 2/10,000) that two of the 20 items happen to have the
same key, so one of the items is "lost" and the comparison we do to the
expected truth fails.

The solution is to just use sequential keys, not random keys.
There is nothing to gain in this test by using random keys.

To make this test bug easy to reproduce, I temporarily changed
random_i()'s range from 1,000,000 to 3, and saw the test failing every
single run before this patch. After this patch - no longer using
random_i() for the keys - the test doesn't fail any more.

Fixes #16647

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16649
2024-01-04 21:36:40 +02:00
Kamil Braun
bf068dd023 Merge handle error in cdc generation propagation during bootstrap from Gleb
Bootstrap cannot proceed if cdc generation propagation to all nodes
fails, so the patch series handles the error by rolling the ongoing
topology operation back.

* 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev:
  test: add test to check failure handling in cdc generation commit
  storage_service: topology coordinator: rollback on failure to commit cdc generation
2024-01-04 15:38:51 +01:00
Kamil Braun
f942bf4a1f Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy
Currently, `add_saved_endpoint` is called from two paths:  One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.

In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet.  However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.

Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.

This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation.  An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.

Fixes #16429

Closes scylladb/scylladb#16432

* github.com:scylladb/scylladb:
  gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
  storage_service: topology_state_load: lock endpoint for add_saved_endpoint
  raft_group_registry: move on_alive error injection to gossiper
2024-01-04 14:47:10 +01:00
Botond Dénes
9f0bd62d78 test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI 2024-01-04 03:20:17 -05:00
Botond Dénes
58d5339baa test/cql-pytest: test_tools.py: extract some fixture logic to functions
Namely, the fixture for preparing an sstable and the fixture for
producing a reference dump (from an sstable). In the next patch we will
add more similar fixtures, this patch enables them to share their core
logic, without repeating code.
2024-01-04 03:20:17 -05:00
Botond Dénes
f7d59b3af0 test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
In the next patch, we want to add schema-load tests specific to views
and indexes. Best to place these into a separate class, so extract the
to-be-shared parts into a common base-class.
2024-01-04 03:20:17 -05:00
Botond Dénes
276bb16013 test/boost/schema_loader_test: add test for mvs and indexes 2024-01-04 03:20:17 -05:00
Kefu Chai
cf932888de Update seastar submodule
* seastar e0d515b6...70349b74 (33):
  > util/log: drop unused function
  > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0
  > Fix edge case in memory sampler at OOM
  > exp/geo distribution benchmark
  > Additional allocation tests
  > Remove null pointer check on free hot path
  > Optimize final part of allocation hot path
  > Optimize zero size checking in allocator
  > memory: Optimize free fast path
  > memory: Optimize small alloc alloation path
  > memory: Limit alloc_sites size
  > memory: Add general comment about sampling strategy
  > memory: Use probabilistic sampler
  > util: Adapt memory sampler to seastar
  > util: Import Android Memory Sampler
  > memory: Use separate small pool for tracking sampled allocations
  > memory: Support enabling memory profiling at runtime
  > util/source_location-compat: mark `source_location::current()` consteval
  > build: use new behavior defined by CMP0155 when building C++ modules
  > circleci: build with C++20 modules enabled
  > seastar.cc: replace cryptopp with gnutls when building seastar modules
  > alien: include used header
  > seastar.cc: include used headers in the global purview
  > docker: install clang-tools-17
  > net/tcp: generate a random src_port hashed to current shard if smp::count > 1
  > net, websocket: replace Crypto++ calls with GnuTLS
  > README-DPDK.md: point user to DPDK's quick start guide
  > reactor: print fatal error using logger as well
  > Avoid ping-pong in spinlock::lock
  > memory: Add allocator perf tests
  > memory: Add a basic sized deletion test
  > Prometheus: Disable Prometheus protobuf with a configuration
  > treewide: bring back prometheus protobuf support
* test/manual/sstable_scan_footprint_test: update to adapt to the
  breaking change of "memory: Use probabilistic sampler" in seastar

Closes scylladb/scylladb#16610
2024-01-04 09:36:53 +02:00
Kefu Chai
50cf62e186 build: cmake: do not link against Boost::dynamic_linking
Boost::dynamic_linking was introduced as a compatibility target
which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since
Scylla only runs on Linux, there is no need to link against this
library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16544
2024-01-04 08:06:19 +02:00
Kefu Chai
2ad532df43 test: randomized_nemesis_test: move std::variant formatter up
we format `std::variant<std::monostate, seastar::timed_out_error,
raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown,
raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>`
in this source file. and currently, we format `std::variant<..>` using
the default-generated `fmt::formatter` from `operator<<`, so in order to
format it using {fmt}'s compile-time check enabled, we have to make the
`operator<<` overload for `std::variant<...>` visible from the caller
sites which format `std::variant<...>` using {fmt}.

in this change, the `operator<<` for `std::variant<...>` is moved to
from the middle of the source file to the top of it, so that it can
be found when the compiler looks up for a matched `fmt::formatter`
for `std::variant<...>`.

please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`,
as its specialization for `std::variant` requires that all the types
of the variant is `is_formattable`. but the default generated formatter
for type `T` is not considered as the proof that `T` is formattable.

this should address the FTBFS with the latest seastar like:

```
 /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>')
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16616
2024-01-03 16:38:25 +01:00
Avi Kivity
20531872a7 Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#16614

* github.com:scylladb/scylladb:
  test: randomized_nemesis_test: add formatter for append_entry
  test: randomized_nemesis_test: move append_reg_model::entry out
2024-01-03 15:06:33 +02:00
Tomasz Grabiec
715e062d4a Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.

Closes scylladb/scylladb#16571

* github.com:scylladb/scylladb:
  table, memtable: share log-structured allocator statistics across all memtables in a table
  memtable: consolidate _read_section, _allocating_section in a struct
2024-01-03 14:03:40 +01:00
Kefu Chai
3ef0345b7f test/nodetool: log response from mock server when handling JSONDecodeError
it's observed that the mock server could return something not decodable
as JSON. so let's print out the response in the logging message in this case.
this should help us to understand the test failure better if it surfaces again.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16543
2024-01-03 14:46:10 +02:00
Kefu Chai
0484ac46af test: randomized_nemesis_test: add formatter for append_entry
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Kefu Chai
32e55731ab test: randomized_nemesis_test: move append_reg_model::entry out
this change prepares for adding fmt::formatter for append_entry.
as we are using its formatter in the inline member functions of
`append_reg_model`. but its `fmt::formatter` can only be specialized out of
this class. and we don't have access to `format_as()` yet in {fmt} 9.1.0
which is shipped along with fedora38, which is in turn used for
our base build image.

so, in this change, `append_reg_model::entry` is extracted and renamed
to `append_entry`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Sylwia Szunejko
91a5a41313 add a way to negotiate generation of the tablet info for drivers
Tablets metadata is quite expensive to generate (each data_value is
an allocation), so an old driver (without support for tablets) will
generate huge amounts of such notifications. This commit adds a way
to negotiate generation of the notification: a new driver will ask
for them, and an old driver won't get them. It uses the
OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec.

Closes scylladb/scylladb#16611
2024-01-02 20:00:50 +02:00
Kamil Braun
7f6955b883 Merge 'test: make use of concurrent bootstrap' from Patryk Jędrzejczak
In #16102, we added a test for concurrent bootstrap in the raft-based
topology. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low. Therefore,
we can safely make use of it in all tests using the raft-based topology.

This PR:
- makes all initial servers start concurrently in topology tests,
- replaces all multiple `server_add` calls with a single `servers_add`
  call in tests using the raft-based topology,
- removes no longer needed `test_concurrent_bootstrap`.

The changes listed above:
- make running tests a bit faster due to concurrent bootstraps,
- make multiple tests test concurrent bootstrap previously tested by
  a single test.

Fixes scylladb/scylladb#15423

Closes scylladb/scylladb#16384

* github.com:scylladb/scylladb:
  test: test_different_group0_ids: fix comments
  test: remove test_concurrent_bootstrap
  test: replace multiple server_add calls with servers_add
  test: ScyllaCluster: start all initial servers concurrently
  test: ManagerClient: servers_add: specify consistent-topology-changes assumption
2024-01-02 15:11:18 +01:00
Sylwia Szunejko
467d466f7e put all tablet info into one field of custom_payload and update docs
Previously, the tablet information was sent to the drivers
in two pieces within the custom_payload. We had information
about the replicas under the `tablet_replicas` key and token range
information under `token_range`. These names were quite generic
and might have caused problems for other custom_payload users.
Additionally, dividing the information into two pieces raised
the question of what to do if one key is present while the other
is missing.

This commit changes the serialization mechanism to pack all information
under one specific name, `tablets-routing-v1`.

From: Sylwia Szunejko <sylwia.szunejko@scylladb.com>

Closes scylladb/scylladb#16148
2024-01-02 14:35:37 +02:00
Patryk Jędrzejczak
215534d527 test: test_different_group0_ids: fix comments
The test disables consistent topology changes, not cluster
management.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
466723a74f test: remove test_concurrent_bootstrap
This test only adds 3 nodes concurrently to the empty cluster.
After making many other tests use ManagerClient.servers_add, it
serves no purpose.

We had added this test before we decided to use
ManagerClient.servers_add in many tests to avoid multiple failures
in CI if it turned out that the concurrent bootstrap is flaky with
high frequency there. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
a8513bd41b test: replace multiple server_add calls with servers_add
ManagerClient.servers_add can be used in every test that uses
consistent topology changes. We replace all multiple server_add
calls in such tests with a single servers_add call to make these
tests faster and simplify their code. Additionally, these
servers_add calls will test concurrent bootstraps for free.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
debf1db3ef test: ScyllaCluster: start all initial servers concurrently
Starting all initial servers concurrently makes tests in suites
with initial_size > 1 run a bit faster. Additionally, these tests
test concurrent bootstraps for free.

add_servers can be called only if the cluster uses consistent
topology changes. We can use this function unconditionally in
install_and_start because every suite uses consistent topology
changes by default. The only way to not use it is by adding all
servers with a config that contains experimental_features without
consistent-topology-changes.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
16b0eeb3d6 test: ManagerClient: servers_add: specify consistent-topology-changes assumption
ManagerClient.servers_add can be called only if the cluster uses
consistent topology changes. We add this specification to the
leading comment.
2024-01-02 12:19:31 +01:00
Benny Halevy
5abf556399 gms: endpoint_state: define application_state_map
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
db434e8cb5 raft_group_registry: move on_alive error injection to gossiper
Move the `raft_group_registry::on_alive` error injection point
to `gossiper::real_mark_alive` so it can delay marking the endpoint as
alive, and calling the `on_alive` callback, but without holding
the endpoint_lock.

Note that the entry for this endpoint in `_pending_mark_alive_endpoints`
still blocks marking it as alive until real_mark_alive completes.

Fixes #16506

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 15:28:54 +02:00
Patryk Jędrzejczak
da37e82fb9 test: add test_remove_alive_node
We add a test for the Raft-based topology's new feature - rejecting
the removenode operation on the topology coordinator side if the
node being removed is considered alive by the failure detector.

Additionally, the test tests a case when the removenode operation
is rejected on the initiator side.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
cf955094c1 test: ManagerClient: remove unused wait_for_host_down
The previous commit removed the only call to wait_for_host_down.
Moreover, this function is identical to server_not_sees_other_server.
We can safely remove it.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
7038a033f2 test: remove_node: wait until the node being removed is dead
In the following commits, we make the topology coordinator reject
removenode requests if the node being removed is considered alive
by the gossiper. Before making this change, we need to adapt the
testing framework so that we don't have flaky removenode operations
that fail because the node being removed hasn't been marked as dead
yet. We achieve this by waiting until all other running nodes see
the node being removed as dead in all removenode operations.

Some tests are simplified after this change because they don't have
to call server_not_sees_other_server anymore.
2023-12-29 17:12:45 +01:00
Nadav Har'El
91636f6d21 test/cql-pytest: reproducer of slightly too strict parser of timestamp
Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that
has 6 digits of precision for the fractional second, and only allows
3 digits of precision. This restriction makes sense - after all CQL
timestamp columns (note - this is NOT "using timestamp"!) only have
millisecond precision. Nevertheless, Cassandra does not have this
restriction and does allow these over-precise timestamps. In this patch
we add a test that demonstrates this difference.

Curiously, in the past Scylla *generated* this forbidden timestamp
format when outputting the timestamp to a string (e.g. toJson()),
which it then couldn't read back! This was issue #16575.
Today Scylla no longer generates this forbidden timestamp format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16576
2023-12-28 19:01:25 +02:00
Avi Kivity
6394854f04 Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El
This small series improves two things in the multi-node tests for tablet supports in materialized views:

1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)).
2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test).

The third patch in this series also fixes a comment typo discovered in a previous review.

Closes scylladb/scylladb#16440

* github.com:scylladb/scylladb:
  materialized views: fix typo in comment
  test_mv_tablets: simplify lookup of tablets
  alternator, tablets: improve Alternator LSI tablets test
2023-12-27 20:18:14 +02:00
Avi Kivity
3ce0576a31 Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov
The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns.

closes: #16447
closes: #16449

Closes scylladb/scylladb#16565

* github.com:scylladb/scylladb:
  schema_tables: Use new_keyspace() sugar
  keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
  keyspace_metadata: Add default value for new_keyspace's durable_writes
  keyspace_metadata: Pack constructors with default arguments
2023-12-27 17:15:04 +02:00
Botond Dénes
1647b29cba tools/schema_loader: add db::config parameter to all load methods
So that a single centrally managed db::config instance can be shared by
all code requiring it, instead of creating local instances where needed.
This is required to load schema from encrypted schema-tables, and it
also helps memory consumption a bit (db::config consumes a lot of
memory).

Fixes: #16480

Closes scylladb/scylladb#16495
2023-12-27 16:28:38 +02:00
Nadav Har'El
e6dc9bca0d Merge 'Profile dumping rest api support' from Eliran Sinvani
This change is motivated by wanting to have code coverage reporting support.
Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't
suite all cases, more specifically:
1. In dtest, when some of the tests intentionally abruptly kill a node
2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and
teardown procedures (which currently kills the nodes).

This mini series adds two changes:
1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile')
2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current
test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage
reporting.

Refs #16323

Closes scylladb/scylladb#16557

* github.com:scylladb/scylladb:
  test.py: Dump coverage profile before killing a node
  rest api: Add an api for profile dumping
2023-12-27 12:06:39 +02:00
Eliran Sinvani
e49b3ffc89 test.py: Dump coverage profile before killing a node
Up until now the only way to get a coverage profile was to shut down the
ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage
profile was lost for every node that was killed abruptly (SIGKILL).
This in turn would have been requiring us to shut down all nodes
gracefully which is not something we set out to do.
Here we use the rest API for dumping the coverage profile which will
cause the most minimal impact possible on the test runs.
If the dumping fails (due to the node doesn't support the API or due to
a real error in dumping we ignore it as it is not part of the system we
would like to test.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:17:26 +02:00
Avi Kivity
02111d6754 memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.

Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.
2023-12-26 21:11:48 +02:00
Nadav Har'El
fc71c34597 Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani
Commit 62458b8e4f introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions.
Also added test that checks for regression on native functions execution and verified that it fails on authorization before
the fix and passes after the fix.

Closes scylladb/scylladb#16556

* github.com:scylladb/scylladb:
  test.py: Add test for native functions permissions
  select statement: verify EXECUTE permissions only for non native functions
2023-12-26 18:14:21 +02:00