scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Eliran Sinvani	00a55abdd6	code coverage: Add libraries for coverage handling Coverage handling is divided into 3 steps: 1. Generation of profiling data from a run of an instrumented file (which this patch doesn't cover) 2. Processing of profiling data, which involves indexing the profile and producing the data in some format that can be manipulated and unified. 3. Generate some reporting based on this data. The following patch is aiming to deal with the last two steps by providing a cli and a library for this end. This patch adds two libraries: 1. `coverage_utils.py` which is a library for manipulating coverage data, it also contains a cli for the (assumed) most common operations that are needed in order to eventually generate coverage reporting. 2. `lcov_utils.py` - which is a library to deal with lcov format data, which is a textual form containing a source dependant coverage data. An example of such manipulation can be `coverage diff` operation which produces a set like difference operation. cov_a - cov_b = diff where diff is an lcov formated file containing coverage data for code cov_a that is not covered at all in cov_b. The libraries and cli main goal is to provide a unified way to handle coverage data in a way that can be easily scriptable and extensible. This will pave the way for automating the coverage reporting and processing in test.py and in jenkins piplines (for example to also process dtest or sct coverage reporting) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	f4b6c9074a	test.py: support --coverage and --coverage-mode We aim to support code coverage reporting as part of our development process, to this end, we will need the ability to "route" the dumped profiles from scylla and unit test to a predetermined location. We can consider profile data as logged data that should persist after tests have been run. For this we add two supported options to test.py: --coverage - which means that all suits on all modes will participate in coverage. --coverage-mode - which can be used to "turn on" coverage support only for some of the modes in this run. The strategy chosen is to save the profile data in `tmpdir`/mode/coverage/%m.profraw (ref: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) This means that for every suite the profiling data of each object is going to be merged into the same file (llvm claims to lock the file so concurrency is fine). More resolution than the suite level seems to not give us anything useful (at least not at the moment). Moreover, it can also be achieved by running a single test. Data in the suite level will help us to detect suits that don't generate coverage data at all and to fix this or to skip generating the profiles for them. Also added support of 'coverage' parameter in the `suite.yaml` file, which can be used to disable coverage for a specific suite, this parameter defaults to True but if a suite is known to not generate profiles or the suite profile data is not needed or obfuscate the result it can be set to false in order to cancel profiles routing and processing for this suite. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Calle Wilund	af0772d605	commitlog: Add wait_for_pending_deletes Refs #16757 Allows waiting for all previous and pending segment deletes to finish. Useful if a caller of `discard_completed_segments` (i.e. a memtable flush target) not only wants to ensure segments are clean and released, but thoroughly deleted/recycled, and hence no treat to resurrecting data on crash+restart. Test included. Closes scylladb/scylladb#16801	2024-01-17 09:30:55 +02:00
Tomasz Grabiec	49026dc319	Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to. This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is * `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration * `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing fixes: #16319 Closes scylladb/scylladb#16364 * github.com:scylladb/scylladb: code: Enable tablets if cluster feature is enabled test: Turn off tablets feature by default test: Move test_tablet_drain_failure_during_decommission to another suite test/tablets: Enable tables for real on test keyspace test/tablets: Make timestamp local cql3: Add feature service to as_ks_metadata_update() cql3: Add feature service to ks_prop_defs::as_ks_metadata() cql3: Add feature service to get_keyspace_metadata() cql: Add tablets on/off switch to CREATE KEYSPACE cql: Move initial_tablets from REPLICATION to TABLETS in DDL network_topology_strategy: Estimate initial_tablets if 0 is set	2024-01-16 00:15:10 +01:00
Patryk Wrobel	aec0db1b96	cql_auth_query_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&) from test/boost/cql_auth_query_test.cc. It prepares the test for removal of the templated helpers. Such removal is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16758	2024-01-15 13:30:05 +02:00
Kefu Chai	fc97d91f1a	auth: add fmt::format for auth::resource and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `auth::resource` and friends, * update their callers of `operator<<` to use `fmt::print()`. * drop `operator<<`, as they are not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16765	2024-01-15 13:26:39 +02:00
Kefu Chai	218334eaf5	test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate because the CMake-generated build.ninja is located under build/, and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla, instead of at build/$scylla_build_mode/scylla, so let's adapt to this change accordingly. we will promote this change to a shared place if we have similar needs in other tests as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16775	2024-01-15 12:52:35 +02:00
Pavel Emelyanov	dd892b0d8a	code: Enable tablets if cluster feature is enabled If the TABLETS map is missing in the CREATE KEYSPACE statement the tablets are anyway enabled if the respective cluster feature is enabled. To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4838eeb201	test: Turn off tablets feature by default Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	ae7da54f88	test: Move test_tablet_drain_failure_during_decommission to another suite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	46b36d8c07	test/tablets: Enable tables for real on test keyspace When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	2376b699e0	test/tablets: Make timestamp local Just to make next patching simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	6cb3055059	cql: Add tablets on/off switch to CREATE KEYSPACE Now the user can do CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } to turn tablets off. It will be useful in the future to opt-out keyspace from tablets when they will be turned on by default based on cluster features only. Also one can do just CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } and let Scylla select the initial tablets value by its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:11 +03:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Gleb Natapov	f8b90aeb14	test: add test for automatic cleanup procedure The test runs two bootstraps and checks that there is no cleanup in between. Then it runs a decommission and checks that cleanup runs automatically and then it runs one more decommission and checks that no cleanup runs again. Second part checks manual cleanup triggering. It adds a node, triggers cleanup through the REST API, checks that is runs, decommissions a node and check that the cleanup did not run again.	2024-01-14 15:45:53 +02:00
Gleb Natapov	5882855669	test: add test for topology requests queue management This test creates a 5 node cluster with 2 down nodes (A and B). After that it creates a queue of 3 topology operation: bootstrap, removenode A and removenode B with ignore_nodes=A. Check that all operation manage to complete. Then it downs one node and creates a queue with two requests: bootstrap and decommission. Since none can proceed both should be canceled.	2024-01-14 15:45:53 +02:00
Gleb Natapov	97ab3f6622	storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator Introduce new REST API "/storage_service/cleanup_all" that, when triggered, instructs the topology coordinator to initiate cluster wide cleanup on all dirty nodes. It is done by introducing new global command "global_topology_request::cleanup".	2024-01-14 15:45:53 +02:00
Gleb Natapov	0e68073b22	test: use servers_see_each_other when needed In the next patch we want to abort topology operations if there is no enough live nodes to perform them. This will break tests that do a topology operation right after restarting a node since a topology coordinator may still not see the restarted node as alive. Fix all those tests to wait between restart and a topology operation until UP state propagates.	2024-01-14 14:44:07 +02:00
Gleb Natapov	455ffaf5d8	test: add servers_see_each_other helper The helper makes sure that all nodes in the cluster see each other as alive.	2024-01-14 14:44:07 +02:00
Kamil Braun	4e18f8b453	Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev The loop in `id2ip` lambda makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the `IP` for a node and stuck in the loop forever. In this series we replace the loop with an if - we just don't update the `host_id <-> ip` mapping in the `token_metadata.topology` if we don't have an `IP` yet. The PR moves `host_id -> IP` resolution to the data plane, now it happens each time the IP-based methods of `erm` are called. We need this because IPs may not be known at the time the erm is built. The overhead of `raft_address_map` lookup is added to each data plane request, but it should be negligible. In this PR `erm/resolve_endpoints` continues to treat missing IP for `host_id` as `internal_error`, but we plan to relax this in the follow-up (see this PR first comment). Closes scylladb/scylladb#16639 * github.com:scylladb/scylladb: raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes storage_service: topology_state_load: remove IP waiting loop storage_service: sync_raft_topology_nodes: add target_node parameter storage_service: sync_raft_topology_nodes: move loops to the end storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node storage_service: sync_raft_topology_nodes: move update_topology up storage_service: topology_state_load: remove clone_async/clear_gently overhead storage_service: fix indentation storage_service: extract sync_raft_topology_nodes storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata address_map: move gossiper subscription logic into storage_service topology_coordinator: exec_global_command: small refactor, use contains + reformat storage_service: wait_for_ip for new nodes storage_service.idl.hh: fix raft_topology_cmd.command declaration erm: for_each_natural_endpoint_until: use is_vnode == true erm: switch the internal data structures to host_id-s erm: has_pending_ranges: switch to host_id	2024-01-12 18:46:51 +01:00
Petr Gusev	6e7bbc94f4	gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes When a node changes its IP we need to store the mapping in system.peers and update token_metadata.topology and erm in-memory data structures. The test_change_ip was improved to verify this new behaviour. Before this patch the test didn't check that IPs used for data requests are updated on IP change. In this commit we add the read/write check. It fails on insert with 'node unavailable' error without the fix.	2024-01-12 18:28:57 +04:00
Petr Gusev	15b8e565ed	address_map: move gossiper subscription logic into storage_service We are going to remove the IP waiting loop from topology_state_load in subsequent commits. An IP for a given host_id may change after this function has been called by raft. This means we need to subscribe to the gossiper notifications and call it later with a new id<->ip mapping. In this preparatory commit we move the existing address_map update logic into storage_service so that in later commits we can enhance it with topology_state_load call.	2024-01-12 15:37:50 +04:00
Michał Jadwiszczak	013487e1e1	test:cql-pytest: change service levels intervals in tests Set the interval to 0.5s to reduce required sleep time.	2024-01-12 10:28:28 +01:00
Patryk Wrobel	87545e40c7	test/boost/auth_resource_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<T>&) from auth_resource_test.cc. It prepares the test for removal of the templated helpers from utils/to_string.hh, which is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16754	2024-01-12 10:48:01 +02:00
Nadav Har'El	5c7e029012	test/cql-pytest: add reproducer for task-tracking memory leak This patch adds a reproducer test for the memory leak described in issue #16493: If a table is repeatedly created and dropped, memory is leaked by task tracking. Although this "leak" can be temporary if task_ttl_in_seconds is properly configured, it may still use too much memory if tables are too frequently created and dropped. The test here shows that (before #16493 was fixed) as little as 100 tables created and deleted can cause Scylla to run out of memory. The problem is severely exacerbated when tablets are used which is why the test here uses tablets. Before the fix for #16493 (a Seastar patch, scylladb/seastar#2023), this test of 100 iterations always failed (with test/cql-pytest/run's default memory allowance). After the fix, the test doesn't fail in 100 iterations - and even if increased manually to 10,000 iterations it doesn't fail. The new test uses the initial_tablets feature, so requires Scylla to be run with the "tablets" experimental option turned on. This is not currently the default of test.py or test/cql-pytest/run, so I turned it on manually to check this test. I also checked that the test is correctly skipped if tablets are not turned on. Refs #16493 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16717	2024-01-12 09:37:32 +02:00
Botond Dénes	697ebef149	Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: #16735 Refs: #16694. Closes scylladb/scylladb#16698 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-11 12:10:44 +02:00
Botond Dénes	ac69473bac	Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Closes scylladb/scylladb#16726 * github.com:scylladb/scylladb: utils/pretty_printers: add "I" specifier support utils/pretty_printers: use the formatting of to_hr_size()	2024-01-11 10:54:14 +02:00
Kefu Chai	0c2ef5de54	test/unit/bptree_validation: use "{}" for formatting test_data before this change, "{:d}" is used for formatting `test_data` y bptree_stress_test.cc. but the "d" specifier is only used for formatting integers, not for formatting `test_data` or generic data types, so this fails when the test is compiled with {fmt} v10, like: ``` In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20: /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression 294 \| fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here 267 \| return forward_check(); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here 92 \| if (!itc->step()) { \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ ``` in this change, instead of specifying "{:d}", let's just use "{}", which works for both integer and `test_data`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16727	2024-01-11 10:53:33 +02:00
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Kefu Chai	f11a53856d	utils/pretty_printers: add "I" specifier support this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:47 +08:00
Kefu Chai	7d627b328f	utils/pretty_printers: use the formatting of to_hr_size() keep the precision of 4 digits, for instance, so that we format "8191" as "8191" instead of as "8 Ki". this is modeled after the behavior of `to_hr_size()`. for better user experience. and also prepares to consolidate these two formatters. tests are updated to exercise both IEC and SI notations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:03 +08:00
Nadav Har'El	39dd2a2690	cql-pytest: translated Cassandra's test for LWT with static column This is a translation of Cassandra's CQL unit test source file validation/operations/InsertUpdateIfConditionStaticsTest.java into our cql-pytest framework. This test file checks various LWT conditional updates which involve static columns or UDTs (there are separate test file for LWT conditional updates that do not involve static columns). This test did not uncover any new bugs, but demonstrates yet again several places where we intentionally deviated from Cassandra's behavior, forcing me to add "is_scylla" checks in many of the checks to allow them to pass on both Scylla and Cassanda. These deviations are known, intentional and some are documented in docs/kb/lwt-differences.rst but not all, so it's worth listing here the ones re-discovered by this test: 1. On a successful conditional write, Cassandra returns just True, Scylla also returns the old contents of the row. This difference is officially documented in docs/kb/lwt-differences.rst. 2. On a batch request, Scylla always returns a row per statement, Cassandra doesn't - it often returns just a single failed row, or just True if the whole batch succeeded. This difference is officially documented in docs/kb/lwt-differences.rst. 3. In a DELETE statement with a condition, in the returned row Cassandra lists the deleted column first - while Scylla lists the static column first (as in any other row). This difference is probably inconsequential, because columns also have names so their order in the response usually doesn't matter. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16643	2024-01-10 12:14:06 +02:00
Nadav Har'El	b1a441ba56	test/cql-pytest: correct xfail status of timestamp parser The recently-added test test_fromjson_timestamp_submilli demonstrated a difference between Scylla's and Cassandra's parsing timestamps in JSON: Trying to use too many (more than 3) digits of precision is forbidden in Scylla, but ignored in Cassandra. So we marked the test "xfail", suggesting we think it's a Scylla bug that should be fixed in the future. However, it turns out that we already had a different test, test_type_timestamp_from_string_overprecise, which showed the same difference in a different context (without JSON). In that older test, the decision was to consider this a Cassandra bug, not Scylla bug - because Cassandra seemingly allows the sub-millisecond timestap but in reality drops the extra precision. So we need to be consistent in the tests - this is either a Scylla bug or a Cassandra bug, we can't make once choice in one test and another in a different test :-) So let's accept our older decision, and consider Scylla's behavior the correct one in this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16586	2024-01-10 12:12:26 +02:00
Kefu Chai	317af97e41	test/pylib: shutdown unix RESTful client when stopping the ManagerClient, it would be better to close all connected connector, otherwise aiohttp complains like: ``` 13:57:53.763 ERROR> Unclosed connector connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]'] connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890> ``` this warning message is printed to the console, and it is distracting when testing manually. so, in this change, let's close the client connecting to unix domain socket. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16675	2024-01-10 11:07:14 +02:00
Kefu Chai	0dc7db54d1	build: cmake: add "unit_test_list" target this target is used by test.py for enumerating unit tests * test/CMakeLists.txt: append executable's full path to `scylla_tests`. add `unit_test_list` target printing `scylla_tests`, please note, `cmake -E echo` does not support the `-e` option of `echo`, and ninja does not support command line with newline in it, we have to use `echo` to print the list of tests. * test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests only if $PWD/suite.yaml exists. we could hardwire this logic in these files, as it is known that this file exists in these directory, but this is still put this way, so that it serves as a comment explaining that the reason why we update scylla_tests here but not somewhere else where we also use `add_scylla_test()` function is just suite.yaml exists here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16702	2024-01-10 08:43:04 +02:00
Aleksandra Martyniuk	6b87778ef2	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter.	2024-01-09 13:13:54 +01:00
Pavel Emelyanov	cdf5124003	Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538 Closes scylladb/scylladb#16657 * github.com:scylladb/scylladb: tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() tools/scylla-sstable: allow always passing --scylla-yaml-file option	2024-01-09 14:28:49 +03:00
Kamil Braun	d4f4b58f3a	Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in `storage_service::raft_removenode`) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This PR makes the topology coordinator reject removenode if the node being removed is considered alive. It also adds `test_remove_alive_node` that verifies this change. Fixes scylladb/scylladb#16109 Closes scylladb/scylladb#16584 * github.com:scylladb/scylladb: test: add test_remove_alive_node topology_coordinator: reject removenode if the removed node is alive test: ManagerClient: remove unused wait_for_host_down test: remove_node: wait until the node being removed is dead	2024-01-08 12:39:23 +01:00
Botond Dénes	9119bcbd67	tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538	2024-01-08 02:18:15 -05:00
Patryk Jędrzejczak	df2034ebd7	server, raft_group0_client: remove the default nullptr values The previous commit has fixed 5 bugs of the same type - incorrectly passing the default nullptr to one of the changed functions. At least some of these bugs wouldn't appear if there was no default value. It's much harder to make this kind of a bug if you have to write "nullptr". It's also much easier to detect it in review. Moreover, these default values are rarely used outside tests. Keeping them is just not worth the time spent on debugging.	2024-01-05 18:45:50 +01:00
Nadav Har'El	94580df1c5	test/alternator: fix flaky test in test_filter_expression.py The test test_filter_expression.py::test_filter_expression_precedence is flaky - and can fail very rarely (so far we've only actually seen it fail once). The problem is that the test generates items with random clustering keys, chosen as an integer between 1 and 1 million, and there is a chance (roughly 2/10,000) that two of the 20 items happen to have the same key, so one of the items is "lost" and the comparison we do to the expected truth fails. The solution is to just use sequential keys, not random keys. There is nothing to gain in this test by using random keys. To make this test bug easy to reproduce, I temporarily changed random_i()'s range from 1,000,000 to 3, and saw the test failing every single run before this patch. After this patch - no longer using random_i() for the keys - the test doesn't fail any more. Fixes #16647 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16649	2024-01-04 21:36:40 +02:00
Kamil Braun	bf068dd023	Merge `handle error in cdc generation propagation during bootstrap` from Gleb Bootstrap cannot proceed if cdc generation propagation to all nodes fails, so the patch series handles the error by rolling the ongoing topology operation back. * 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev: test: add test to check failure handling in cdc generation commit storage_service: topology coordinator: rollback on failure to commit cdc generation	2024-01-04 15:38:51 +01:00
Kamil Braun	f942bf4a1f	Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy Currently, `add_saved_endpoint` is called from two paths: One, is when loading states from system.peers in the join path (join_cluster, join_token_ring), when `_raft_topology_change_enabled` is false, and the other is from `storage_service::topology_state_load` when raft topology changes are enabled. In the later path, from `topology_state_load`, `add_saved_endpoint` is called only if the endpoint_state does not exist yet. However, this is checked without acquiring the endpoint_lock and so it races with the gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint state may already be populated. Since `add_saved_endpoint` applies local information about the endpoint state (e.g. tokens, dc, rack), it uses the local heart_beat_version, with generation=0 to update the endpoint states, and that is incompatible with changes applies via gossip that will carry the endpoint's generation and version, determining the state's update order. This change makes sure that the endpoint state is never update in `add_saved_endpoint` if it has non-zero generation. An internal error exception is thrown if non-zero generation is found, and in the only call site that might reach that state, in `storage_service::topology_state_load`, the caller acquires the endpoint_lock for checking for the existence of the endpoint_state, calling `add_saved_endpoint` under the lock only if the endpoint_state does not exist. Fixes #16429 Closes scylladb/scylladb#16432 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found storage_service: topology_state_load: lock endpoint for add_saved_endpoint raft_group_registry: move on_alive error injection to gossiper	2024-01-04 14:47:10 +01:00
Kefu Chai	cf932888de	Update seastar submodule * seastar e0d515b6...70349b74 (33): > util/log: drop unused function > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0 > Fix edge case in memory sampler at OOM > exp/geo distribution benchmark > Additional allocation tests > Remove null pointer check on free hot path > Optimize final part of allocation hot path > Optimize zero size checking in allocator > memory: Optimize free fast path > memory: Optimize small alloc alloation path > memory: Limit alloc_sites size > memory: Add general comment about sampling strategy > memory: Use probabilistic sampler > util: Adapt memory sampler to seastar > util: Import Android Memory Sampler > memory: Use separate small pool for tracking sampled allocations > memory: Support enabling memory profiling at runtime > util/source_location-compat: mark `source_location::current()` consteval > build: use new behavior defined by CMP0155 when building C++ modules > circleci: build with C++20 modules enabled > seastar.cc: replace cryptopp with gnutls when building seastar modules > alien: include used header > seastar.cc: include used headers in the global purview > docker: install clang-tools-17 > net/tcp: generate a random src_port hashed to current shard if smp::count > 1 > net, websocket: replace Crypto++ calls with GnuTLS > README-DPDK.md: point user to DPDK's quick start guide > reactor: print fatal error using logger as well > Avoid ping-pong in spinlock::lock > memory: Add allocator perf tests > memory: Add a basic sized deletion test > Prometheus: Disable Prometheus protobuf with a configuration > treewide: bring back prometheus protobuf support * test/manual/sstable_scan_footprint_test: update to adapt to the breaking change of "memory: Use probabilistic sampler" in seastar Closes scylladb/scylladb#16610	2024-01-04 09:36:53 +02:00
Kefu Chai	50cf62e186	build: cmake: do not link against Boost::dynamic_linking Boost::dynamic_linking was introduced as a compatibility target which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since Scylla only runs on Linux, there is no need to link against this library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16544	2024-01-04 08:06:19 +02:00
Kefu Chai	2ad532df43	test: randomized_nemesis_test: move std::variant formatter up we format `std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>` in this source file. and currently, we format `std::variant<..>` using the default-generated `fmt::formatter` from `operator<<`, so in order to format it using {fmt}'s compile-time check enabled, we have to make the `operator<<` overload for `std::variant<...>` visible from the caller sites which format `std::variant<...>` using {fmt}. in this change, the `operator<<` for `std::variant<...>` is moved to from the middle of the source file to the top of it, so that it can be found when the compiler looks up for a matched `fmt::formatter` for `std::variant<...>`. please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`, as its specialization for `std::variant` requires that all the types of the variant is `is_formattable`. but the default generated formatter for type `T` is not considered as the proof that `T` is formattable. this should address the FTBFS with the latest seastar like: ``` /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>') ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16616	2024-01-03 16:38:25 +01:00
Avi Kivity	20531872a7	Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai we are using `seastar::format()` to format `append_entry` in `append_reg_model`, so we have to provide a `fmt::formatter` for these callers which format `append_entry`. despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined by fmt v9, we don't have it since fmt v10. so this change prepares us for fmt v10. Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#16614 * github.com:scylladb/scylladb: test: randomized_nemesis_test: add formatter for append_entry test: randomized_nemesis_test: move append_reg_model::entry out	2024-01-03 15:06:33 +02:00
Tomasz Grabiec	715e062d4a	Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity In `7d5e22b43b` ("replica: memtable: don't forget memtable memory allocation statistics") we taught memtable_list to remember learned memory allocation reserves so a new memtable inherits these statistics from an older memtable. Share it now further across tablets that belong to the same table as well. This helps the statistics be more accurate for tablets that are migrated in, as they can share existing tablet's memory allocation history. Closes scylladb/scylladb#16571 * github.com:scylladb/scylladb: table, memtable: share log-structured allocator statistics across all memtables in a table memtable: consolidate _read_section, _allocating_section in a struct	2024-01-03 14:03:40 +01:00

1 2 3 4 5 ...

6125 Commits