Commit Graph

5498 Commits

Author SHA1 Message Date
Benny Halevy
062684eb1f test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
b9127f55ac test: simple_backlog_controller_test: stop compaction and task managers
The compaction_manager and task_manager should
be orderly stopped before they are destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Avi Kivity
9a3d57256a Merge 'config: add index_cache_fraction' from Michał Chojnowski
Index caching was disabled by default because it caused performance regressions
for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202.

However, it also means that there are workloads which could benefit from the
index cache, but (by default) don't.

As a compromise, we can set a default limit on the memory usage of index cache,
which should be small enough to avoid catastrophic regressions in
small-partition workloads, but big enough to accommodate workloads where
index cache is obviously beneficial.

This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default,
and re-enables index caching by default.

Fixes #15118

Closes #14994

* github.com:scylladb/scylladb:
  test: boost/cache_algorithm_test: add cache_algorithm_test
  sstables: partition_index_cache: deglobalize stats
  utils: cached_file: deglobalize cached_file metrics
  db: config: enable index caching by default
  config: add index_cache_fraction
  utils: lru: add move semantics to list links
2023-09-03 19:39:31 +03:00
Michał Chojnowski
bcc235ad5f test: boost/cache_algorithm_test: add cache_algorithm_test
The tests added in this patch validate that index_cache_fraction
does what it's supposed to do.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
f00bed9429 sstables: partition_index_cache: deglobalize stats
Move partition_index_cache stats from a thread_local variable
to cache_tracker. After the change, partition_index_cache
receives a reference to the stats via constructor, instead of
referencing a global.

This is needed so that cache_tracker can know the memory usage
of index caches (for cache eviction purposes) without relying on
globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
c7d9d35030 utils: cached_file: deglobalize cached_file metrics
Move cached_file metrics from a thread_local variable
to cache_tracker.

This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Kamil Braun
117dedab19 Merge 'Cluster features on raft: topology coordinator + check on boot followups' from Piotr Dulikowski
This PR collects followups described in #14972:

- The `system.topology` table is now flushed every time feature-related
  columns are modified. This is done because of the feature check that
  happens before the schema commitlog is replayed.
- The implementation now guarantees that, if all nodes support some
  feature as described by the `supported_features` column, then support
  for that feature will not be revoked by any node.  Previously, in an
  edge case where a node is the last one to add support for some feature
  `X` in `supported_features` column, crashes before applying/persisting
  it and then restarts without supporting `X`, it would be allowed to boot
  anyway and would revoke support for the `X` in `system.topology`.
  The existing behavior, although counterintuitive, was safe - the
  topology coordinator is responsible for explicitly marking features as
  enabled, and in order to enable a feature it needs to perform a special
  kind of a global barrier (`barrier_after_feature_update`) which only
  succeeds after the node has updated its features column - so there is no
  risk of enabling an unsupported feature.  In order to make the behavior
  less confusing, the node now will perform a second check when it tries
  to update its `supported_features` column in `system.topology`.
- The `barrier_after_feature_update` is removed and the regular global
  `barrier` topology command is used instead. The `barrier` handler now
  performs a feature check if the node did not have a chance to verify and
  update its cluster features for the second time.

JOIN_NODE rpc will be sent separately as it is a big item on its own.

Fixes: #14972

Closes #15168

* github.com:scylladb/scylladb:
  test: topology{_experimental_raft}: don't stop gracefully in feature tests
  storage_service: remove _topology_updated_with_local_metadata
  topology_coordinator: remove barrier_after_feature_update
  topology_coordinator: perform feature check during barrier
  storage_service: repeat the feature check after read barrier
  feature_service: introduce unsupported_feature_exception
  feature_service: move startup feature check to a separate function
  topology_coordinator: account for features to enable in should_preempt_balancing
  group0_state_machine: flush system.topology when updating features columns
2023-09-01 11:52:26 +02:00
Botond Dénes
34d94fb549 test/cql-pytest/test_tools.py: improve tempdir usage for scrub tests
Scrub tests use a lot of temporary directories. This is suspected to
cause problems in some cases. To improve the situation, this patch:
* Creates a single root temporary directory for all scrub tests
* All further fixtures create their files/directories inside this root
  dir.
* All scrub tests create their temporary directories within this root
  dir.
* All temporary directories now use an appropriate "prefix", so we can
  tell which temporary directory is part of the problem if a test fails.

Refs: #14309

Closes #15117
2023-09-01 07:17:49 +03:00
Alexey Novikov
87fa7d0381 compact and remove expired range tombstones from cache on read
during read from cache compact and expire range tombstones
remove expired empty rows from cache

Refs #2252
Fixes #6033

Closes #14463
2023-09-01 07:17:49 +03:00
Piotr Dulikowski
5471330ee7 test: topology{_experimental_raft}: don't stop gracefully in feature tests
The current cluster feature tests are stopping nodes in a graceful way.
Doing it gracefully isn't strictly necessary for the test scenarios and
we can switch `server_stop_gracefully` calls to `server_stop`. This only
became possible after a previous commit which causes `system.topology`
table to be flushed when cluster feature columns are modified, and will
server as a good test for it.
2023-08-31 16:46:11 +02:00
Kamil Braun
0ee23b260e Merge 'raft topology: add and deprecate support for --ignore-dead-nodes with IPs' from Patryk Jędrzejczak
We want to stop supporting IPs for `--ignore-dead-nodes` in
`raft_removenode` and `--ignore-dead-nodes-for-replace` for
`raft_replace`. However, we shouldn't remove these features without the
deprecation period because the original `removenode` and `replace`
operations still support them. So, we add them for now.

The `IP -> Raft ID` translation is done through the new
`raft_address_map::find_by_addr` member function.

We update the documentation to inform about the deprecation of the IP
support for `--ignore-dead-nodes`.

Fixes #15126

Closes #15156

* github.com:scylladb/scylladb:
  docs: inform about deprecating IP support for --ignore-dead-nodes
  raft topology: support IPs for --ignore-dead-nodes
  raft_address_map: introduce find_by_addr
2023-08-30 10:41:23 +02:00
Botond Dénes
3e7ec6cc83 Merge 'Move cell assertion from cql_test_env to cql_assertions' from Pavel Emelyanov
The cql_test_env has a virtual require_column_has_value() helper that better fits cql_assertions crowd. Also, the helper in question duplicates some existing code, so it can also be made shorter (and one class table helper gets removed afterwards)

Closes #15208

* github.com:scylladb/scylladb:
  cql_assertions: Make permit from env
  table: Remove find_partition_slow() helper
  sstable_compaction_test: Do not re-decorate key
  cql_test_env: Move .require_column_has_value
  cql_test_env: Use table.find_row() shortcut
2023-08-30 08:34:05 +03:00
Kamil Braun
0bff96a611 Merge 'gossip: add group0_id attribute to gossip_digest_syn' from Mikołaj Grzebieluch
Motivation:

The user can bootstrap 3 different clusters and then connect them
(#14448). When these clusters start gossiping, their token rings will be
merged, but there will be 3 different group 0s in there. It results in a
corrupted cluster.

We need to prevent such situations from happening in clusters which
don't use Raft-based topology.

-------

Gossiper service sets its group0 id on startup if it is stored in
`scylla_local` or sets it during joining group0.

Send group0_id (if it is set) when the node tries to initiate the gossip
round. When a node gets gossip_digest_syn it checks if its group0 id
equals the local one and if not, the message is discarded.

Fixes #14448

Performed manual tests with the following scenario:
1. setup a cluster of two nodes (one compiled with and one without this patch)
2. setup a new node
3. create a basic keyspace and table
4. execute simple select and insert queries

Tested 4 scenarios: the seed node was with or without this patch, and the third node was with or without this patch.
These tests didn't detect any errors.

Closes #15004

* github.com:scylladb/scylladb:
  tests: raft: cluster of nodes with different group0 ids
  gossip: add group0_id attribute to gossip_digest_syn
2023-08-29 16:41:29 +02:00
Pavel Emelyanov
137c7116dc cql_assertions: Make permit from env
To call table::find_row() one needs to provide a permit. Tests have
short and neat helper to create one from cql_test_env

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 16:01:29 +03:00
Pavel Emelyanov
0a727a9b2e sstable_compaction_test: Do not re-decorate key
The is_partition_dead() local helper accepts partition key argument and
decorates it. Howerver, its caller gets partition key from decorated key
itself, and can just pass it along

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:41 +03:00
Pavel Emelyanov
4e9f380608 cql_test_env: Move .require_column_has_value
This env helper is only used by tests (from cql_query_test)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:33 +03:00
Pavel Emelyanov
7597663ef5 cql_test_env: Use table.find_row() shortcut
The require_column_has_value() finds the cell in three steps -- finds
partition, then row, then cell. The class table already has a method to
facilitate row finding by partition and clustering key

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:37:27 +03:00
Kamil Braun
ebc9056237 Merge 'Restore storage_service -> cdc_generation_service dependency' from Pavel Emelyanov
The main goal of this PR is to stop cdc_generation_service from calling
system_keyspace::bootstrap_complete(). The reason why it's there is that
gen. service doesn't want to handle generation before node joined the
ring or after it was decommissioned. The cleanup is done with the help
of storage_service->cdc_generation_service explicit dependency brought
back and this, in turn, suddenly freed the raft and API code from the
need to carry cdc gen. service reference around.

Closes #15047

* github.com:scylladb/scylladb:
  cdc: Remove bootstrap state assertion from after_join()
  cdc: Rework gen. service check for bootstrap state
  api: Don't carry cdc gen. service over
  storage_service: Use local cdc gen. service in join_cluster()
  storage_service: Remove cdc gen. service from raft_state_monitor_fiber()
  raft: Do not carry cdc gen. service over
  storage_service: Use local cdc gen. service in topo calls
  storage_service: Bring cdc_generation_service dependency back
2023-08-29 14:10:06 +02:00
Mikołaj Grzebieluch
bac8aa38d9 tests: raft: cluster of nodes with different group0 ids
The reproducer for #14448.

The test starts two nodes with different group0_ids. The second node
is restarted and tries to join the cluster consisting of the first node.
gossip_digest_syn message should be rejected by the first node, so
the second node will not be able to join the cluster.

This test uses repair-based node operations to make this test easier.
If the second node successfully joins the cluster, their tokens metadata
will be merged and the repair service will allow to decommission the second node.
If not - decommissioning the second node will fail with an exception
"zero replica after the removal" thrown by the repair service.
2023-08-29 11:09:15 +02:00
Mikołaj Grzebieluch
2230abc9b2 gossip: add group0_id attribute to gossip_digest_syn
Gossiper service sets its group0 id on startup if it is stored in `scylla_local`
or sets it during joining group0.

Send group0_id (if it is set) when the node tries to initiate the gossip round.
When a node gets gossip_digest_syn it checks if its group0 id equals the local
one and if not, the message is discarded.

Fixes #14448.
2023-08-29 11:09:15 +02:00
Botond Dénes
57deeb5d39 Merge 'gossiper: add get_unreachable_members_synchronized and use over api' from Benny Halevy
Modeled after get_live_members_synchronized,
get_unreachable_members_synchronized calls
replicate_live_endpoints_on_change to synchronize
the state of unreachable_members on all shards.

Fixes #12261
Fixes #15088

Also, add rest_api unit test for those apis

Closes #15093

* github.com:scylladb/scylladb:
  test: rest_api: add test_gossiper
  gossiper: add get_unreachable_members_synchronized
2023-08-29 10:43:22 +03:00
Pavel Emelyanov
a61454be00 storage_service: Use local cdc gen. service in join_cluster()
The method in question accepts cdc_generation_service ref argument from
main and cql_test_env, but storage service now has local cdcv gen.
service reference, so this argument and its propagation down the stack
can be removed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Pavel Emelyanov
933ea0afe6 storage_service: Bring cdc_generation_service dependency back
It sort of reverts the 5a97ba7121 commit, because storage service now
uses the cdc generation service to serve raft topo updates which, in
turn, takes the cdc gen. service all over the raft code _just_ to make
it as an argument to storage service topo calls.

Also there's API carrying cdc gen. service for the single call and also
there's an implicit need to kick cdc gen. service on decommission which
also needs storage service to reference cdc gen. after boot is complete

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Mikołaj Grzebieluch
a031a14249 tests: add asynchronous log browsing functionality
Add a class that handles log file browsing with the following features:
* mark: returns "a mark" to the current position of the log.
* wait_for: asynchronously checks if the log contains the given message.
* grep: returns a list of lines matching the regular expression in the log.

Add a new endpoint in `ManagerClient` to obtain the scylla logfile path.

Fixes #14782

Closes #14834
2023-08-25 14:19:09 +02:00
Patryk Jędrzejczak
b2755755f4 raft topology: support IPs for --ignore-dead-nodes
We want to stop supporting IPs for --ignore-dead-nodes in
raft_removenode and --ignore-dead-nodes-for-replace for
raft_replace. However, we shouldn't remove these features without
the deprecation period because the original removenode and
replace operations still support them. So, we add them for now.

Additionally, we modify test_raft_ignore_nodes.py so that it
verifies the added IP support.
2023-08-25 12:33:45 +02:00
Patryk Jędrzejczak
9806bddf75 test: fix a test case in raft_address_map_test
The test didn't test what it was supposed to test. It would pass
even if set_nonexpiring() didn't insert a new entry.

Closes #15157
2023-08-25 12:11:33 +02:00
Patryk Jędrzejczak
59df5ce7e4 raft_address_map: introduce find_by_addr
In the following commit, we add IP support for --ignore-dead-nodes
in raft_removenode and raft_replace. To implement it, we need
a way to translate IPs to Raft IDs. The solution is to add a new
member function -- find_by_addr -- to raft_address_map that
does the IP->ID translation.

The IP support for --ignore-dead-nodes will be deprecated and
find_by_addr shouldn't be called for other reasons, so it always
logs a warning.

We also add some unit tests for find_by_addr.
2023-08-24 15:10:43 +02:00
Raphael S. Carvalho
d6cc752718 test: Fix flakiness in sstable_compaction_test.autocompaction_control_test
It's possible that compaction task is preempted after completion and
before reevaluation, causing pending_tasks to be > 1.

Let's only exit the loop if there are no pending tasks, and also
reduce 100ms sleep which is an eternity for this test.

Fixes #14809.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #15059
2023-08-24 13:37:06 +03:00
Benny Halevy
672ec66769 test: rest_api: add test_gossiper
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-24 11:37:12 +03:00
Botond Dénes
1609c76d62 tools/scylla-sstable: scrub: don't qurantine sstables after validate
Scylla sstable promises to *never* mutate its input sstables. This
promise was broken by `scylla sstable scrub --scrub-mode=validate`,
because validate moves invalid input sstables into qurantine. This is
unexpected and caused occasional failures in the scrub tests in
test_tools.py. Fix by propagating a flag down to
`scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether
validate should qurantine invalid sstables, then set this flag to false
in `scylla-sstable.cc`. The existing test for validate-mode scrub is
ammended to check that the sstable is not mutated. The test now fails
before the fix and passes afterwards.

Fixes: #14309

Closes #15139
2023-08-23 21:53:12 +03:00
Nadav Har'El
5530c529c2 test/cql-pytest: regression test for old bug with CAST(f AS TEXT) precision
When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd7475c, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs #15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15131
2023-08-23 16:06:52 +03:00
Botond Dénes
139ba553b8 Merge 'sstable, test: log sstable name and pk when capping local_deletion_time ' from Kefu Chai
in this series, we also print the sstable name and pk when writing a tombstone whose local_deletion_time (ldt for short) is greater than INT32_MAX which cannot be represented by an uint32_t.

Fixes #15015

Closes #15107

* github.com:scylladb/scylladb:
  sstable/writer: log sstable name and pk when capping ldt
  test: sstable_compaction_test: add a test for capped tombstone ldt
2023-08-23 09:29:54 +03:00
Kamil Braun
169d19e5b0 Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak
We add support for `--ignore-dead-nodes` in `raft_removenode` and
`--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow
passing only host ids of the ignored nodes. Supporting IPs is currently
impossible because `raft_address_map` doesn't provide a mapping from IP
to a host id.

The main steps of the implementation are as follows:
- add the `ignore_nodes` column to `system.topology`,
- set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`,
- extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes,
- load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`,
- add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`,
- pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`.

Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes.

Fixes #15025

Closes #15113

* github.com:scylladb/scylladb:
  test: add test_raft_ignore_nodes
  test: ManagerClient.remove_node: allow List[HostId] for ignore_dead
  raft topology: pass ignore_nodes to {replace, remove}_with_repair
  raft topology: exec_global_command: add ignore_nodes to exclude_nodes
  raft topology: exec_global_command: change type of exclude_nodes
  topology_state_machine: extend request_param with a set of raft ids
  raft topology: set ignore_nodes in raft_removenode and raft_replace
  utils: introduce split_comma_separated_list
  raft topology: add the ignore_nodes column to system.topology
2023-08-22 18:04:59 +02:00
Kamil Braun
cdc3cd2b79 Merge 'raft: add fencing tests' from Petr Gusev
In this PR a simple test for fencing is added. It exercises the data
plane, meaning if it somehow happens that the node has a stale topology
version, then requests from this node will get an error 'stale
topology'. The test just decrements the node version manually through
CQL, so it's quite artificial. To test a more real-world scenario we
need to allow the topology change fiber to sometimes skip unavailable
nodes. Now the algorithm fails and retries indefinitely in this case.

The PR also adds some logs, and removes one seemingly redundant topology
version increment, see the commit messages for details.

Closes #14901

* github.com:scylladb/scylladb:
  test_fencing: add test_fence_hints
  test.py: output the skipped tests
  test.py: add skip_mode decorator and fixture
  test.py: add mode fixture
  hints: add debug log for dropped hints
  hints: send_one_hint: extend the scope of file_send_gate holder
  pylib: add ScyllaMetrics
  hints manager: add send_errors counter
  token_metadata: add debug logs
  fencing: add simple data plane test
  random_tables.py: add counter column type
  raft topology: don't increment version when transitioning to node_state::normal
2023-08-22 16:28:21 +02:00
Piotr Grabowski
17e3e367ca test: use more frequent reconnection policy
The default reconnection policy in Python Driver is an exponential
backoff (with jitter) policy, which starts at 1 second reconnection
interval and ramps up to 600 seconds.

This is a problem in tests (refs #15104), especially in tests that restart
or replace nodes. In such a scenario, a node can be unavailable for an
extended period of time and the driver will try to reconnect to it
multiple times, eventually reaching very long reconnection interval
values, exceeding the timeout of a test.

Fix the issue by using a exponential reconnection policy with a maximum
interval of 4 seconds. A smaller value was not chosen, as each retry
clutters the logs with reconnection exception stack trace.

Fixes #15104

Closes #15112
2023-08-22 15:40:39 +02:00
Patryk Jędrzejczak
b044ee535f test: add test_raft_ignore_nodes
We add two tests verifying that --ignore-dead-nodes in
raft_removenode and --ignore-dead-nodes-for-replace in
raft_replace are handled correctly.

We need a 7-cluster to have a Raft majority. Therefore, these
tests are quite slow, and we want to run them only in the dev mode.
2023-08-22 14:19:21 +02:00
Patryk Jędrzejczak
6818d13f7d test: ManagerClient.remove_node: allow List[HostId] for ignore_dead
ManagerClient.remove_node allows passing ignore_dead only as
List[IPAddress]. However, raft_removenode currently supports
only host ids. To write a test that passes ignore_dead to
ManagerClient.remove_node in the Raft topology mode, we allow
passing ignore_dead as List[HostId].

Note that we don't want to use List[IPAddress | HostId] because
mixing IP addresses and host ids fails anyway. See
ss::remove_node.set(...) in api::set_storage_service.
2023-08-22 14:19:09 +02:00
Petr Gusev
1ddc76ffd1 test_fencing: add test_fence_hints
The test makes a write through the first node with
the third node down, this causes a hint to be stored on the
first node for the second. We increment the version
and fence_version on the third node, restart it,
and expect to see a hint delivery failure
because of versions mismatch. Then we update the versions
of the first node and expect hint to be successfully
delivered.
2023-08-22 15:48:40 +04:00
Petr Gusev
c434d26b36 test.py: add skip_mode decorator and fixture
Syntactic sugar for marking tests to be
skipped in a particular mode.

There is skip_in_debug/skip_in_release in suite.yaml,
but they can be applied only on the entire file,
which is unnatural and inconvenient. Also, they
don't allow to specify a reason why the test is skipped.

Separate dictionary skipped_funcs is needed since
we can't use pytest fixtures in decorators.
2023-08-22 15:48:40 +04:00
Petr Gusev
a639d161e6 test.py: add mode fixture
Sometimes a test wants to know what mode
it is running in so that e.g. it can skip
itself in some of them.
2023-08-22 15:48:40 +04:00
Petr Gusev
0b7a90dff6 pylib: add ScyllaMetrics
This patch adds facilities to work
with Scylla metrics from test.py tests.
The new metrics property was added to
ManagerClient, its query method
sends a request to Scylla metrics
endpoint and returns and object
to conveniently access the result.

ScyllaMetrics is copy-pasted from
test_shedding.py. It's difficult
to reuse code between 'new' and 'old'
styles of tests, we can't just import
pylib in 'old' tests because of some
problems with python search directories.
A past commit of mine that attempted
to solve this problem was rejected on review.
2023-08-22 14:31:04 +04:00
Petr Gusev
360453fd87 fencing: add simple data plane test
The test starts a three node cluster
and manually decrements the version on
the last node. It then tries to write
some data through the last node and
expects to get 'stale topology' exception.
2023-08-22 14:31:01 +04:00
Nadav Har'El
a963b59495 test/cql-pytest: add reproducer for IN not working with secondary index
We already have a test for issue #13533, where an "IN" doesn't work with
a secondary index (the secondary index isn't used in that case, and
instead inefficient filtering is required). Recently a user noticed the
same problem also exists for local secondary indexes - and this patch
includes a reproducing test. The new test is marked xfail, as the issue is
still unfixed. The new test is Scylla-only because local secondary index
is a Scylla-only extension that doesn't exist in Cassandra.

Refs #13533.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15106
2023-08-22 07:25:32 +03:00
Nadav Har'El
18e8e62798 cql-pytest: translate Cassandra's tests for SELECT with LIMIT
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectLimitTest.java into our cql-pytest framework.

The tests reproduce two already-known bugs:

Refs #9879:  Using PER PARTITION LIMIT with aggregate functions should
             fail as Invalid query
Refs #10357: Spurious static row returned from query with filtering,
             despite not matching filter

And also helped discover two new issues:

Refs #15099: Incorrect sort order when combining IN, and ORDER BY
Refs #15109: PER PARTITION LIMIT should be rejected if SELECT DISTINCT
             is used

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15114
2023-08-21 22:29:11 +03:00
Avi Kivity
ce43effc21 Merge "fix rebuild with consistent topology management" From Gleb Natapov
"
The series fixes bogus asserting during topology state load and add a
test that runs rebuild to make sure the code will not regress again.

Fixes #14958
"

* 'gleb/rebuilding_fix_v1' of github.com:scylladb/scylla-dev:
  test: add rebuild test
  system_keyspace: fix assertion for missing transition_state
2023-08-21 16:00:42 +03:00
Kefu Chai
8cc215db96 test: randomized_nemesis_test: do not brace around scalars
Clang and GCC's warning option of `-Wbraced-scalar-init` warns
at seeing superfluous use of braces, like:
```
/home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:2187:32: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
            .snapshot_threshold{1},
                               ^~~

```
usually, this does not hurt. but by taking the braces out, we have
a more readable piece of code, and less warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15086
2023-08-21 15:57:06 +03:00
Kefu Chai
0bc99c7f49 test: sstable_compaction_test: add a test for capped tombstone ldt
local_delection_time (short for ldt) is a timestamp used for the
purpose of purging the tombstone after gc_grace_seconds. if its value
is greater than INT32_MAX, it is capped when being written to sstable.
this is very likely a signal of bad configuration or a even a bug in
scylla. so we keep track of it with a metric named
"scylla_sstables_capped_tombstone_deletion_time".

in this change, a test is added to verify that the metric is updated
upon seeing a tombstone with this abnormal ldt.

because we validate the consistency before and after compaction in
tests, this change adds a parameter to disable this check, otherwise,
because capping the ldt changes the mutation, the validation would
fail the test.

Refs #15015
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-21 19:25:32 +08:00
Petr Gusev
9176a3341a test_topology_smp: more logs for debug/aarch64
The test is flaky on CI in debug builds
on aarch64 (#14752), here we sprinkle more
logs for debug/aarch64 hoping it'll help to
debug it.

Ref #14752

Closes #14822
2023-08-21 10:03:09 +03:00
Kefu Chai
1aa01d63d4 test: randomized_nemesis_test: mark direct_fd_{pinger,clock} final
`raft_server` in test/raft/randomized_nemesis_test.cc manages
instances of direct_fd_pinger and direct_fd_clock with unique_ptr<>.
this unique_ptr<> deletes these managed instances using delete.
but since these two classes have virtual methods, the compiler feels
nervous when deleting them. because these two classes have virtual
functions, but they do not have virtual destructor. in other words,
in theory, these pointers could be pointing derived classes of them,
and deleting them could lead to leak.

so to silence the warning and to prevent potential issues, let's
just mark these two classes final.

this should address the warning like:

```
In file included from /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:9:
In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/reactor.hh:24:
In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/aligned_buffer.hh:24:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/memory:78:
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: error: delete called on non-final 'direct_fd_pinger<int>' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<direct_fd_pinger<int>>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1400:5: note: in instantiation of member function 'std::unique_ptr<direct_fd_pinger<int>>::~unique_ptr' requested here
    ~raft_server() {
    ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: note: in instantiation of member function 'raft_server<ExReg>::~raft_server' requested here
        delete __ptr;
        ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<raft_server<ExReg>>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1704:24: note: in instantiation of member function 'std::unique_ptr<raft_server<ExReg>>::~unique_ptr' requested here
            ._server = nullptr,
                       ^
/home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:1742:19: note: in instantiation of member function 'environment<ExReg>::new_node' requested here
        auto id = new_node(first, std::move(cfg));
                  ^
/home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:2113:39: note: in instantiation of member function 'environment<ExReg>::new_server' requested here
        auto leader_id = co_await env.new_server(true);
                                      ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15084
2023-08-20 21:26:08 +03:00
Pavel Emelyanov
6bc30f1944 system_keyspace: De-bloat .setup() from messing with system.local
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: #2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15082
2023-08-20 21:24:31 +03:00