scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 15:03:06 +00:00

Author	SHA1	Message	Date
Benny Halevy	062684eb1f	test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-05 09:17:25 +03:00
Benny Halevy	b9127f55ac	test: simple_backlog_controller_test: stop compaction and task managers The compaction_manager and task_manager should be orderly stopped before they are destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-05 09:17:25 +03:00
Avi Kivity	9a3d57256a	Merge 'config: add index_cache_fraction' from Michał Chojnowski Index caching was disabled by default because it caused performance regressions for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202. However, it also means that there are workloads which could benefit from the index cache, but (by default) don't. As a compromise, we can set a default limit on the memory usage of index cache, which should be small enough to avoid catastrophic regressions in small-partition workloads, but big enough to accommodate workloads where index cache is obviously beneficial. This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default, and re-enables index caching by default. Fixes #15118 Closes #14994 * github.com:scylladb/scylladb: test: boost/cache_algorithm_test: add cache_algorithm_test sstables: partition_index_cache: deglobalize stats utils: cached_file: deglobalize cached_file metrics db: config: enable index caching by default config: add index_cache_fraction utils: lru: add move semantics to list links	2023-09-03 19:39:31 +03:00
Michał Chojnowski	bcc235ad5f	test: boost/cache_algorithm_test: add cache_algorithm_test The tests added in this patch validate that index_cache_fraction does what it's supposed to do.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	f00bed9429	sstables: partition_index_cache: deglobalize stats Move partition_index_cache stats from a thread_local variable to cache_tracker. After the change, partition_index_cache receives a reference to the stats via constructor, instead of referencing a global. This is needed so that cache_tracker can know the memory usage of index caches (for cache eviction purposes) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	c7d9d35030	utils: cached_file: deglobalize cached_file metrics Move cached_file metrics from a thread_local variable to cache_tracker. This is needed so that cache_tracker can know the memory usage of index caches (for purposes of cache eviction) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Kamil Braun	117dedab19	Merge 'Cluster features on raft: topology coordinator + check on boot followups' from Piotr Dulikowski This PR collects followups described in #14972: - The `system.topology` table is now flushed every time feature-related columns are modified. This is done because of the feature check that happens before the schema commitlog is replayed. - The implementation now guarantees that, if all nodes support some feature as described by the `supported_features` column, then support for that feature will not be revoked by any node. Previously, in an edge case where a node is the last one to add support for some feature `X` in `supported_features` column, crashes before applying/persisting it and then restarts without supporting `X`, it would be allowed to boot anyway and would revoke support for the `X` in `system.topology`. The existing behavior, although counterintuitive, was safe - the topology coordinator is responsible for explicitly marking features as enabled, and in order to enable a feature it needs to perform a special kind of a global barrier (`barrier_after_feature_update`) which only succeeds after the node has updated its features column - so there is no risk of enabling an unsupported feature. In order to make the behavior less confusing, the node now will perform a second check when it tries to update its `supported_features` column in `system.topology`. - The `barrier_after_feature_update` is removed and the regular global `barrier` topology command is used instead. The `barrier` handler now performs a feature check if the node did not have a chance to verify and update its cluster features for the second time. JOIN_NODE rpc will be sent separately as it is a big item on its own. Fixes: #14972 Closes #15168 * github.com:scylladb/scylladb: test: topology{_experimental_raft}: don't stop gracefully in feature tests storage_service: remove _topology_updated_with_local_metadata topology_coordinator: remove barrier_after_feature_update topology_coordinator: perform feature check during barrier storage_service: repeat the feature check after read barrier feature_service: introduce unsupported_feature_exception feature_service: move startup feature check to a separate function topology_coordinator: account for features to enable in should_preempt_balancing group0_state_machine: flush system.topology when updating features columns	2023-09-01 11:52:26 +02:00
Botond Dénes	34d94fb549	test/cql-pytest/test_tools.py: improve tempdir usage for scrub tests Scrub tests use a lot of temporary directories. This is suspected to cause problems in some cases. To improve the situation, this patch: * Creates a single root temporary directory for all scrub tests * All further fixtures create their files/directories inside this root dir. * All scrub tests create their temporary directories within this root dir. * All temporary directories now use an appropriate "prefix", so we can tell which temporary directory is part of the problem if a test fails. Refs: #14309 Closes #15117	2023-09-01 07:17:49 +03:00
Alexey Novikov	87fa7d0381	compact and remove expired range tombstones from cache on read during read from cache compact and expire range tombstones remove expired empty rows from cache Refs #2252 Fixes #6033 Closes #14463	2023-09-01 07:17:49 +03:00
Piotr Dulikowski	5471330ee7	test: topology{_experimental_raft}: don't stop gracefully in feature tests The current cluster feature tests are stopping nodes in a graceful way. Doing it gracefully isn't strictly necessary for the test scenarios and we can switch `server_stop_gracefully` calls to `server_stop`. This only became possible after a previous commit which causes `system.topology` table to be flushed when cluster feature columns are modified, and will server as a good test for it.	2023-08-31 16:46:11 +02:00
Kamil Braun	0ee23b260e	Merge 'raft topology: add and deprecate support for --ignore-dead-nodes with IPs' from Patryk Jędrzejczak We want to stop supporting IPs for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` for `raft_replace`. However, we shouldn't remove these features without the deprecation period because the original `removenode` and `replace` operations still support them. So, we add them for now. The `IP -> Raft ID` translation is done through the new `raft_address_map::find_by_addr` member function. We update the documentation to inform about the deprecation of the IP support for `--ignore-dead-nodes`. Fixes #15126 Closes #15156 * github.com:scylladb/scylladb: docs: inform about deprecating IP support for --ignore-dead-nodes raft topology: support IPs for --ignore-dead-nodes raft_address_map: introduce find_by_addr	2023-08-30 10:41:23 +02:00
Botond Dénes	3e7ec6cc83	Merge 'Move cell assertion from cql_test_env to cql_assertions' from Pavel Emelyanov The cql_test_env has a virtual require_column_has_value() helper that better fits cql_assertions crowd. Also, the helper in question duplicates some existing code, so it can also be made shorter (and one class table helper gets removed afterwards) Closes #15208 * github.com:scylladb/scylladb: cql_assertions: Make permit from env table: Remove find_partition_slow() helper sstable_compaction_test: Do not re-decorate key cql_test_env: Move .require_column_has_value cql_test_env: Use table.find_row() shortcut	2023-08-30 08:34:05 +03:00
Kamil Braun	0bff96a611	Merge 'gossip: add group0_id attribute to gossip_digest_syn' from Mikołaj Grzebieluch Motivation: The user can bootstrap 3 different clusters and then connect them (#14448). When these clusters start gossiping, their token rings will be merged, but there will be 3 different group 0s in there. It results in a corrupted cluster. We need to prevent such situations from happening in clusters which don't use Raft-based topology. ------- Gossiper service sets its group0 id on startup if it is stored in `scylla_local` or sets it during joining group0. Send group0_id (if it is set) when the node tries to initiate the gossip round. When a node gets gossip_digest_syn it checks if its group0 id equals the local one and if not, the message is discarded. Fixes #14448 Performed manual tests with the following scenario: 1. setup a cluster of two nodes (one compiled with and one without this patch) 2. setup a new node 3. create a basic keyspace and table 4. execute simple select and insert queries Tested 4 scenarios: the seed node was with or without this patch, and the third node was with or without this patch. These tests didn't detect any errors. Closes #15004 * github.com:scylladb/scylladb: tests: raft: cluster of nodes with different group0 ids gossip: add group0_id attribute to gossip_digest_syn	2023-08-29 16:41:29 +02:00
Pavel Emelyanov	137c7116dc	cql_assertions: Make permit from env To call table::find_row() one needs to provide a permit. Tests have short and neat helper to create one from cql_test_env Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 16:01:29 +03:00
Pavel Emelyanov	0a727a9b2e	sstable_compaction_test: Do not re-decorate key The is_partition_dead() local helper accepts partition key argument and decorates it. Howerver, its caller gets partition key from decorated key itself, and can just pass it along Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:38:41 +03:00
Pavel Emelyanov	4e9f380608	cql_test_env: Move .require_column_has_value This env helper is only used by tests (from cql_query_test) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:38:33 +03:00
Pavel Emelyanov	7597663ef5	cql_test_env: Use table.find_row() shortcut The require_column_has_value() finds the cell in three steps -- finds partition, then row, then cell. The class table already has a method to facilitate row finding by partition and clustering key Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 15:37:27 +03:00
Kamil Braun	ebc9056237	Merge 'Restore storage_service -> cdc_generation_service dependency' from Pavel Emelyanov The main goal of this PR is to stop cdc_generation_service from calling system_keyspace::bootstrap_complete(). The reason why it's there is that gen. service doesn't want to handle generation before node joined the ring or after it was decommissioned. The cleanup is done with the help of storage_service->cdc_generation_service explicit dependency brought back and this, in turn, suddenly freed the raft and API code from the need to carry cdc gen. service reference around. Closes #15047 * github.com:scylladb/scylladb: cdc: Remove bootstrap state assertion from after_join() cdc: Rework gen. service check for bootstrap state api: Don't carry cdc gen. service over storage_service: Use local cdc gen. service in join_cluster() storage_service: Remove cdc gen. service from raft_state_monitor_fiber() raft: Do not carry cdc gen. service over storage_service: Use local cdc gen. service in topo calls storage_service: Bring cdc_generation_service dependency back	2023-08-29 14:10:06 +02:00
Mikołaj Grzebieluch	bac8aa38d9	tests: raft: cluster of nodes with different group0 ids The reproducer for #14448. The test starts two nodes with different group0_ids. The second node is restarted and tries to join the cluster consisting of the first node. gossip_digest_syn message should be rejected by the first node, so the second node will not be able to join the cluster. This test uses repair-based node operations to make this test easier. If the second node successfully joins the cluster, their tokens metadata will be merged and the repair service will allow to decommission the second node. If not - decommissioning the second node will fail with an exception "zero replica after the removal" thrown by the repair service.	2023-08-29 11:09:15 +02:00
Mikołaj Grzebieluch	2230abc9b2	gossip: add group0_id attribute to gossip_digest_syn Gossiper service sets its group0 id on startup if it is stored in `scylla_local` or sets it during joining group0. Send group0_id (if it is set) when the node tries to initiate the gossip round. When a node gets gossip_digest_syn it checks if its group0 id equals the local one and if not, the message is discarded. Fixes #14448.	2023-08-29 11:09:15 +02:00
Botond Dénes	57deeb5d39	Merge 'gossiper: add get_unreachable_members_synchronized and use over api' from Benny Halevy Modeled after get_live_members_synchronized, get_unreachable_members_synchronized calls replicate_live_endpoints_on_change to synchronize the state of unreachable_members on all shards. Fixes #12261 Fixes #15088 Also, add rest_api unit test for those apis Closes #15093 * github.com:scylladb/scylladb: test: rest_api: add test_gossiper gossiper: add get_unreachable_members_synchronized	2023-08-29 10:43:22 +03:00
Pavel Emelyanov	a61454be00	storage_service: Use local cdc gen. service in join_cluster() The method in question accepts cdc_generation_service ref argument from main and cql_test_env, but storage service now has local cdcv gen. service reference, so this argument and its propagation down the stack can be removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Pavel Emelyanov	933ea0afe6	storage_service: Bring cdc_generation_service dependency back It sort of reverts the `5a97ba7121` commit, because storage service now uses the cdc generation service to serve raft topo updates which, in turn, takes the cdc gen. service all over the raft code _just_ to make it as an argument to storage service topo calls. Also there's API carrying cdc gen. service for the single call and also there's an implicit need to kick cdc gen. service on decommission which also needs storage service to reference cdc gen. after boot is complete Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:36:58 +03:00
Mikołaj Grzebieluch	a031a14249	tests: add asynchronous log browsing functionality Add a class that handles log file browsing with the following features: * mark: returns "a mark" to the current position of the log. * wait_for: asynchronously checks if the log contains the given message. * grep: returns a list of lines matching the regular expression in the log. Add a new endpoint in `ManagerClient` to obtain the scylla logfile path. Fixes #14782 Closes #14834	2023-08-25 14:19:09 +02:00
Patryk Jędrzejczak	b2755755f4	raft topology: support IPs for --ignore-dead-nodes We want to stop supporting IPs for --ignore-dead-nodes in raft_removenode and --ignore-dead-nodes-for-replace for raft_replace. However, we shouldn't remove these features without the deprecation period because the original removenode and replace operations still support them. So, we add them for now. Additionally, we modify test_raft_ignore_nodes.py so that it verifies the added IP support.	2023-08-25 12:33:45 +02:00
Patryk Jędrzejczak	9806bddf75	test: fix a test case in raft_address_map_test The test didn't test what it was supposed to test. It would pass even if set_nonexpiring() didn't insert a new entry. Closes #15157	2023-08-25 12:11:33 +02:00
Patryk Jędrzejczak	59df5ce7e4	raft_address_map: introduce find_by_addr In the following commit, we add IP support for --ignore-dead-nodes in raft_removenode and raft_replace. To implement it, we need a way to translate IPs to Raft IDs. The solution is to add a new member function -- find_by_addr -- to raft_address_map that does the IP->ID translation. The IP support for --ignore-dead-nodes will be deprecated and find_by_addr shouldn't be called for other reasons, so it always logs a warning. We also add some unit tests for find_by_addr.	2023-08-24 15:10:43 +02:00
Raphael S. Carvalho	d6cc752718	test: Fix flakiness in sstable_compaction_test.autocompaction_control_test It's possible that compaction task is preempted after completion and before reevaluation, causing pending_tasks to be > 1. Let's only exit the loop if there are no pending tasks, and also reduce 100ms sleep which is an eternity for this test. Fixes #14809. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #15059	2023-08-24 13:37:06 +03:00
Benny Halevy	672ec66769	test: rest_api: add test_gossiper Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:37:12 +03:00
Botond Dénes	1609c76d62	tools/scylla-sstable: scrub: don't qurantine sstables after validate Scylla sstable promises to never mutate its input sstables. This promise was broken by `scylla sstable scrub --scrub-mode=validate`, because validate moves invalid input sstables into qurantine. This is unexpected and caused occasional failures in the scrub tests in test_tools.py. Fix by propagating a flag down to `scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether validate should qurantine invalid sstables, then set this flag to false in `scylla-sstable.cc`. The existing test for validate-mode scrub is ammended to check that the sstable is not mutated. The test now fails before the fix and passes afterwards. Fixes: #14309 Closes #15139	2023-08-23 21:53:12 +03:00
Nadav Har'El	5530c529c2	test/cql-pytest: regression test for old bug with CAST(f AS TEXT) precision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit `71bbd7475c`, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test fails before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs #15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15131	2023-08-23 16:06:52 +03:00
Botond Dénes	139ba553b8	Merge 'sstable, test: log sstable name and pk when capping local_deletion_time ' from Kefu Chai in this series, we also print the sstable name and pk when writing a tombstone whose local_deletion_time (ldt for short) is greater than INT32_MAX which cannot be represented by an uint32_t. Fixes #15015 Closes #15107 * github.com:scylladb/scylladb: sstable/writer: log sstable name and pk when capping ldt test: sstable_compaction_test: add a test for capped tombstone ldt	2023-08-23 09:29:54 +03:00
Kamil Braun	169d19e5b0	Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak We add support for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow passing only host ids of the ignored nodes. Supporting IPs is currently impossible because `raft_address_map` doesn't provide a mapping from IP to a host id. The main steps of the implementation are as follows: - add the `ignore_nodes` column to `system.topology`, - set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`, - extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes, - load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`, - add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`, - pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`. Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes. Fixes #15025 Closes #15113 * github.com:scylladb/scylladb: test: add test_raft_ignore_nodes test: ManagerClient.remove_node: allow List[HostId] for ignore_dead raft topology: pass ignore_nodes to {replace, remove}_with_repair raft topology: exec_global_command: add ignore_nodes to exclude_nodes raft topology: exec_global_command: change type of exclude_nodes topology_state_machine: extend request_param with a set of raft ids raft topology: set ignore_nodes in raft_removenode and raft_replace utils: introduce split_comma_separated_list raft topology: add the ignore_nodes column to system.topology	2023-08-22 18:04:59 +02:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00
Piotr Grabowski	17e3e367ca	test: use more frequent reconnection policy The default reconnection policy in Python Driver is an exponential backoff (with jitter) policy, which starts at 1 second reconnection interval and ramps up to 600 seconds. This is a problem in tests (refs #15104), especially in tests that restart or replace nodes. In such a scenario, a node can be unavailable for an extended period of time and the driver will try to reconnect to it multiple times, eventually reaching very long reconnection interval values, exceeding the timeout of a test. Fix the issue by using a exponential reconnection policy with a maximum interval of 4 seconds. A smaller value was not chosen, as each retry clutters the logs with reconnection exception stack trace. Fixes #15104 Closes #15112	2023-08-22 15:40:39 +02:00
Patryk Jędrzejczak	b044ee535f	test: add test_raft_ignore_nodes We add two tests verifying that --ignore-dead-nodes in raft_removenode and --ignore-dead-nodes-for-replace in raft_replace are handled correctly. We need a 7-cluster to have a Raft majority. Therefore, these tests are quite slow, and we want to run them only in the dev mode.	2023-08-22 14:19:21 +02:00
Patryk Jędrzejczak	6818d13f7d	test: ManagerClient.remove_node: allow List[HostId] for ignore_dead ManagerClient.remove_node allows passing ignore_dead only as List[IPAddress]. However, raft_removenode currently supports only host ids. To write a test that passes ignore_dead to ManagerClient.remove_node in the Raft topology mode, we allow passing ignore_dead as List[HostId]. Note that we don't want to use List[IPAddress \| HostId] because mixing IP addresses and host ids fails anyway. See ss::remove_node.set(...) in api::set_storage_service.	2023-08-22 14:19:09 +02:00
Petr Gusev	1ddc76ffd1	test_fencing: add test_fence_hints The test makes a write through the first node with the third node down, this causes a hint to be stored on the first node for the second. We increment the version and fence_version on the third node, restart it, and expect to see a hint delivery failure because of versions mismatch. Then we update the versions of the first node and expect hint to be successfully delivered.	2023-08-22 15:48:40 +04:00
Petr Gusev	c434d26b36	test.py: add skip_mode decorator and fixture Syntactic sugar for marking tests to be skipped in a particular mode. There is skip_in_debug/skip_in_release in suite.yaml, but they can be applied only on the entire file, which is unnatural and inconvenient. Also, they don't allow to specify a reason why the test is skipped. Separate dictionary skipped_funcs is needed since we can't use pytest fixtures in decorators.	2023-08-22 15:48:40 +04:00
Petr Gusev	a639d161e6	test.py: add mode fixture Sometimes a test wants to know what mode it is running in so that e.g. it can skip itself in some of them.	2023-08-22 15:48:40 +04:00
Petr Gusev	0b7a90dff6	pylib: add ScyllaMetrics This patch adds facilities to work with Scylla metrics from test.py tests. The new metrics property was added to ManagerClient, its query method sends a request to Scylla metrics endpoint and returns and object to conveniently access the result. ScyllaMetrics is copy-pasted from test_shedding.py. It's difficult to reuse code between 'new' and 'old' styles of tests, we can't just import pylib in 'old' tests because of some problems with python search directories. A past commit of mine that attempted to solve this problem was rejected on review.	2023-08-22 14:31:04 +04:00
Petr Gusev	360453fd87	fencing: add simple data plane test The test starts a three node cluster and manually decrements the version on the last node. It then tries to write some data through the last node and expects to get 'stale topology' exception.	2023-08-22 14:31:01 +04:00
Nadav Har'El	a963b59495	test/cql-pytest: add reproducer for IN not working with secondary index We already have a test for issue #13533, where an "IN" doesn't work with a secondary index (the secondary index isn't used in that case, and instead inefficient filtering is required). Recently a user noticed the same problem also exists for local secondary indexes - and this patch includes a reproducing test. The new test is marked xfail, as the issue is still unfixed. The new test is Scylla-only because local secondary index is a Scylla-only extension that doesn't exist in Cassandra. Refs #13533. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15106	2023-08-22 07:25:32 +03:00
Nadav Har'El	18e8e62798	cql-pytest: translate Cassandra's tests for SELECT with LIMIT This is a translation of Cassandra's CQL unit test source file validation/operations/SelectLimitTest.java into our cql-pytest framework. The tests reproduce two already-known bugs: Refs #9879: Using PER PARTITION LIMIT with aggregate functions should fail as Invalid query Refs #10357: Spurious static row returned from query with filtering, despite not matching filter And also helped discover two new issues: Refs #15099: Incorrect sort order when combining IN, and ORDER BY Refs #15109: PER PARTITION LIMIT should be rejected if SELECT DISTINCT is used Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15114	2023-08-21 22:29:11 +03:00
Avi Kivity	ce43effc21	Merge "fix rebuild with consistent topology management" From Gleb Natapov " The series fixes bogus asserting during topology state load and add a test that runs rebuild to make sure the code will not regress again. Fixes #14958 " * 'gleb/rebuilding_fix_v1' of github.com:scylladb/scylla-dev: test: add rebuild test system_keyspace: fix assertion for missing transition_state	2023-08-21 16:00:42 +03:00
Kefu Chai	8cc215db96	test: randomized_nemesis_test: do not brace around scalars Clang and GCC's warning option of `-Wbraced-scalar-init` warns at seeing superfluous use of braces, like: ``` /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:2187:32: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] .snapshot_threshold{1}, ^~~ ``` usually, this does not hurt. but by taking the braces out, we have a more readable piece of code, and less warnings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15086	2023-08-21 15:57:06 +03:00
Kefu Chai	0bc99c7f49	test: sstable_compaction_test: add a test for capped tombstone ldt local_delection_time (short for ldt) is a timestamp used for the purpose of purging the tombstone after gc_grace_seconds. if its value is greater than INT32_MAX, it is capped when being written to sstable. this is very likely a signal of bad configuration or a even a bug in scylla. so we keep track of it with a metric named "scylla_sstables_capped_tombstone_deletion_time". in this change, a test is added to verify that the metric is updated upon seeing a tombstone with this abnormal ldt. because we validate the consistency before and after compaction in tests, this change adds a parameter to disable this check, otherwise, because capping the ldt changes the mutation, the validation would fail the test. Refs #15015 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-21 19:25:32 +08:00
Petr Gusev	9176a3341a	test_topology_smp: more logs for debug/aarch64 The test is flaky on CI in debug builds on aarch64 (#14752), here we sprinkle more logs for debug/aarch64 hoping it'll help to debug it. Ref #14752 Closes #14822	2023-08-21 10:03:09 +03:00
Kefu Chai	1aa01d63d4	test: randomized_nemesis_test: mark direct_fd_{pinger,clock} final `raft_server` in test/raft/randomized_nemesis_test.cc manages instances of direct_fd_pinger and direct_fd_clock with unique_ptr<>. this unique_ptr<> deletes these managed instances using delete. but since these two classes have virtual methods, the compiler feels nervous when deleting them. because these two classes have virtual functions, but they do not have virtual destructor. in other words, in theory, these pointers could be pointing derived classes of them, and deleting them could lead to leak. so to silence the warning and to prevent potential issues, let's just mark these two classes final. this should address the warning like: ``` In file included from /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:9: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/reactor.hh:24: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/aligned_buffer.hh:24: In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/memory:78: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: error: delete called on non-final 'direct_fd_pinger<int>' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<direct_fd_pinger<int>>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1400:5: note: in instantiation of member function 'std::unique_ptr<direct_fd_pinger<int>>::~unique_ptr' requested here ~raft_server() { ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: note: in instantiation of member function 'raft_server<ExReg>::~raft_server' requested here delete __ptr; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<raft_server<ExReg>>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1704:24: note: in instantiation of member function 'std::unique_ptr<raft_server<ExReg>>::~unique_ptr' requested here ._server = nullptr, ^ /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1742:19: note: in instantiation of member function 'environment<ExReg>::new_node' requested here auto id = new_node(first, std::move(cfg)); ^ /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:2113:39: note: in instantiation of member function 'environment<ExReg>::new_server' requested here auto leader_id = co_await env.new_server(true); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15084	2023-08-20 21:26:08 +03:00
Pavel Emelyanov	6bc30f1944	system_keyspace: De-bloat .setup() from messing with system.local On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: #2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15082	2023-08-20 21:24:31 +03:00

1 2 3 4 5 ...

5498 Commits