Compare commits

..

13 Commits

Author SHA1 Message Date
Benny Halevy
211eb7c32b test/cluster/dtest: wide_rows_test.py: use prepared statements
Replace string-formatted CQL queries in loops with prepared
statements and bind parameters. This avoids repeated query parsing
on the server side and eliminates CQL injection risk from string
interpolation.

Functions converted:
- test_column_index_stress: INSERT (100k iterations) and SELECT (10k)
- create_large_partition_data: UPDATE with TIMESTAMP
- create_large_row_data: UPDATE per column
- create_too_many_rows_data: UPDATE for columns and collections
- delete_too_many_rows_data: DELETE for columns and collections
- create_large_row_static_data: INSERT
- set_ttl_on_few_rows_in_partition: SELECT and UPDATE with TTL
- set_ttl_on_few_large_rows: SELECT and UPDATE with TTL
2026-04-28 11:40:16 +03:00
Benny Halevy
3a640a9ff0 test/cluster/dtest: wide_rows_test.py: randomize compaction strategy
Replace parametrized compaction strategy (4 strategies × 31 tests =
124 test cases) with random selection per test. This reduces the
test count to 31 while still covering all strategies over time.

Add --compaction-strategy option to allow reproducing failures with
a specific strategy, e.g.:
  ./test.py --mode=dev test/cluster/dtest/wide_rows_test.py \
    --pytest-arg="--compaction-strategy=LeveledCompactionStrategy"
2026-04-28 11:40:16 +03:00
Benny Halevy
8288f87beb test/cluster/dtest: wide_rows_test.py: reduce TTL sleep time
Reduce TTL from 60 to 1 second and sleep time from ttl+5 to ttl+1
in set_ttl_on_few_rows_in_partition() and set_ttl_on_few_large_rows().
The original 60-second TTL was unnecessarily high, adding over a
minute of idle wait time per TTL test invocation.
2026-04-28 11:40:16 +03:00
Benny Halevy
77c03354f4 test/cluster/dtest: wide_rows_test.py: fix key_appearance accumulation
In validate_entities_recognized_as_large(), key_appearance was
overwritten on each loop iteration instead of being accumulated.
This meant that for entity_type == "cell" in multi-node clusters,
entities_count only reflected the last node's count rather than
the total across all nodes. Fix by using += to accumulate.

Update expected_entity_number in test_large_cell_in_materialized_view
to account for RF=3 replication (each cell appears on all 3 nodes).

Bug inherited from scylla-dtest.
2026-04-28 11:40:16 +03:00
Benny Halevy
49df9242f7 test/cluster/dtest: wide_rows_test.py: scope compact to test keyspace/table
Pass KEYSPACE_NAME and TABLE_NAME to cluster.compact() instead of
compacting all keyspaces. This avoids unnecessary compaction of
system tables, making tests faster.

Also convert remaining nodetool("compact ...") calls to use
cluster.compact() for consistency.
2026-04-28 11:40:16 +03:00
Benny Halevy
8c82c6646b test/cluster/dtest: wide_rows_test.py: fix expect_warning mutation across nodes
In validate_log_warnings(), expect_warning was reassigned inside the
per-node loop, so if the first node set it to False (due to no
sstables on disk), all subsequent nodes would inherit that value
regardless of their own state.

Use a local variable (node_expect_warning) instead of mutating the
function parameter.
2026-04-28 11:40:16 +03:00
Benny Halevy
d76d0b8a16 test/cluster/dtest: wide_rows_test.py: remove dead code
Remove validation_small_entity() and get_large_entity_info() methods.
These are not called by any test in the migrated file.
get_large_entity_info() also had a bug where the CQL query used
escaped braces ({{keyspace_name}}) instead of actual parameter
substitution, so it would have queried for literal '{keyspace_name}'.
2026-04-28 11:40:16 +03:00
Benny Halevy
690672e4cb test/cluster/dtest: wide_rows_test.py: cosmetic cleanups
Fix typos: aproximately, quering, colection, table_nam, the the.
Fix grammar: 'verify the they didn't recognized as large'.
Use idiomatic 'not in' instead of 'not x in'.
Remove unused variable assignment and commented-out debug line.
Remove unnecessary f-string prefix.
Fix '/n' to use actual newline in error message formatting.
Fix extra trailing quotes in exception messages.
Remove redundant variable assignment (maximum_primary_key_value).
2026-04-28 11:40:13 +03:00
Benny Halevy
85079d7c7a test/cluster/dtest: migrate wide_rows_test.py from scylla-dtest
Adapt wide_rows_test.py to work with the in-tree cluster test
framework:
- Replace dtest imports with in-tree equivalents
- Replace self.cluster.flush() + self.cluster.wait_for_compactions()
  with self.cluster.compact() since nodetool compact handles flush
  and waiting internally
- Add inline wait_for_view() helper (replaces async version)
- Replace node.status with is_running() check
- Add copyright header

Remove from skip_in_dev now that all tests pass.
2026-04-28 11:39:47 +03:00
Benny Halevy
70f8fcbe67 test/cluster/dtest: cache ScyllaNode hostid
Cache the host ID in ScyllaNode._hostid so that hostid() returns
the cached value when the node is stopped.  Without this,
watch_log_for_death() fails with a timeout because it tries to
query the stopped node's API to get its host ID for the log
pattern match.
2026-04-28 11:36:08 +03:00
Benny Halevy
1d6403ddad test/cluster/dtest: add ScyllaCluster.compact() method
Add compact() method to ScyllaCluster, delegating to
ScyllaNode.compact() on each running node. Accepts optional
keyspace and tables parameters to allow scoping compaction to
specific keyspaces/tables.

Also fix ScyllaNode.compact() to use list[str] for tables
parameter and extend() instead of +=, so that passing a single
table name as a string does not iterate over its characters.
2026-04-28 11:36:08 +03:00
Benny Halevy
9a24be2fe9 test/cluster/dtest: add assertion helpers for wide_rows_test
Add assert_equal_more_with_deviation() and assert_less_equal_lists()
to tools/assertions.py.  These are needed by the wide_rows_test.py
migration from scylla-dtest.
2026-04-28 11:36:08 +03:00
Benny Halevy
5c93ccb6d8 test/cluster/dtest: copy wide_rows_test.py verbatim from scylla-dtest
Copy wide_rows_test.py as-is from scylla-dtest. The test is added
to run_in_dev but also skip_in_dev in test_config.yaml since it
requires functional changes to work with the in-tree test
framework. The next commit will make the necessary changes and
remove it from skip_in_dev.
2026-04-28 11:36:08 +03:00
14 changed files with 1515 additions and 23 deletions

View File

@@ -45,7 +45,7 @@ Example:
.. code-block:: console
nodetool removenode 675ed9f4-6564-6dbd-ca08-43fddce952de
nodetool removenode 675ed9f4-6564-6dbd-can8-43fddce952gy
To only mark the node as permanently down without doing actual removal, use :doc:`nodetool excludenode </operating-scylla/nodetool-commands/excludenode>`:
@@ -79,6 +79,6 @@ Example:
.. code-block:: console
nodetool removenode --ignore-dead-nodes 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c,125ed9f4-7777-1db0-aac8-43fddce9123e 675ed9f4-6564-6dbd-ca08-43fddce952de
nodetool removenode --ignore-dead-nodes 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c,125ed9f4-7777-1dbn-mac8-43fddce9123e 675ed9f4-6564-6dbd-can8-43fddce952gy
.. include:: nodetool-index.rst

View File

@@ -74,7 +74,7 @@ Procedure
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
UJ 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
UJ 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
Nodes in the cluster finished streaming data to the new node:
@@ -86,7 +86,7 @@ Procedure
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
#. When the new node status is Up Normal (UN), run the :doc:`nodetool cleanup </operating-scylla/nodetool-commands/cleanup>` command on all nodes in the cluster except for the new node that has just been added. Cleanup removes keys that were streamed to the newly added node and are no longer owned by the node.

View File

@@ -192,7 +192,7 @@ Adding new nodes
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.10 500 MB 256 33.3% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c RACK0
UN 192.168.1.11 500 MB 256 33.3% 125ed9f4-7777-1dbn-mac8-43fddce9123e RACK1
UN 192.168.1.12 500 MB 256 33.3% 675ed9f4-6564-6dbd-ca08-43fddce952de RACK2
UN 192.168.1.12 500 MB 256 33.3% 675ed9f4-6564-6dbd-can8-43fddce952gy RACK2
UJ 192.168.2.10 250 MB 256 ? a1b2c3d4-5678-90ab-cdef-112233445566 RACK0
**Example output after bootstrap completes:**
@@ -205,7 +205,7 @@ Adding new nodes
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.10 400 MB 256 25.0% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c RACK0
UN 192.168.1.11 400 MB 256 25.0% 125ed9f4-7777-1dbn-mac8-43fddce9123e RACK1
UN 192.168.1.12 400 MB 256 25.0% 675ed9f4-6564-6dbd-ca08-43fddce952de RACK2
UN 192.168.1.12 400 MB 256 25.0% 675ed9f4-6564-6dbd-can8-43fddce952gy RACK2
UN 192.168.2.10 400 MB 256 25.0% a1b2c3d4-5678-90ab-cdef-112233445566 RACK0
#. For tablets-enabled clusters, wait for tablet load balancing to complete.

View File

@@ -163,5 +163,5 @@ This example shows how to install and configure a three-node cluster using Gossi
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c 43
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e 44
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de 45
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy 45

View File

@@ -19,7 +19,7 @@ Prerequisites
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-lac8-23fddce9123e B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
Datacenter: ASIA-DC
Status=Up/Down
@@ -165,7 +165,7 @@ Procedure
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
Datacenter: EUROPE-DC
Status=Up/Down

View File

@@ -18,7 +18,7 @@ Removing a Running Node
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
UN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
#. If the node status is **Up Normal (UN)**, run the :doc:`nodetool decommission </operating-scylla/nodetool-commands/decommission>` command
to remove the node you are connected to. Using ``nodetool decommission`` is the recommended method for cluster scale-down operations. It prevents data loss
@@ -75,7 +75,7 @@ command providing the Host ID of the node you are removing. See :doc:`nodetool r
.. code-block:: console
nodetool removenode 675ed9f4-6564-6dbd-ca08-43fddce952de
nodetool removenode 675ed9f4-6564-6dbd-can8-43fddce952gy
The ``nodetool removenode`` command notifies other nodes that the token range it owns needs to be moved and
the nodes should redistribute the data using streaming. Using the command does not guarantee the consistency of the rebalanced data if

View File

@@ -23,7 +23,7 @@ Prerequisites
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
DN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
Login to one of the nodes in the cluster with (UN) status, collect the following info from the node:

View File

@@ -29,7 +29,7 @@ Down (DN), and the node can be replaced.
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
Remove the Data
==================
@@ -72,7 +72,7 @@ Procedure
For example (using the Host ID of the failed node from above):
``replace_node_first_boot: 675ed9f4-6564-6dbd-ca08-43fddce952de``
``replace_node_first_boot: 675ed9f4-6564-6dbd-can8-43fddce952gy``
#. Start the new node.
@@ -90,7 +90,7 @@ Procedure
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.201 112.82 KB 256 32.7% 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c B1
UN 192.168.1.202 91.11 KB 256 32.9% 125ed9f4-7777-1dbn-mac8-43fddce9123e B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-ca08-43fddce952de B1
DN 192.168.1.203 124.42 KB 256 32.6% 675ed9f4-6564-6dbd-can8-43fddce952gy B1
``192.168.1.203`` is the dead node.
@@ -121,7 +121,7 @@ Procedure
/192.168.1.203
generation:1553759866
heartbeat:2147483647
HOST_ID:675ed9f4-6564-6dbd-ca08-43fddce952de
HOST_ID:675ed9f4-6564-6dbd-can8-43fddce952gy
STATUS:shutdown,true
RELEASE_VERSION:3.0.8
X3:3
@@ -178,7 +178,7 @@ In this case, the node's data will be cleaned after restart. To remedy this, you
.. code-block:: none
echo 'replace_node_first_boot: 675ed9f4-6564-6dbd-ca08-43fddce952de' | sudo tee --append /etc/scylla/scylla.yaml
echo 'replace_node_first_boot: 675ed9f4-6564-6dbd-can8-43fddce952gy' | sudo tee --append /etc/scylla/scylla.yaml
#. Run the following command to re-setup RAID

View File

@@ -227,6 +227,11 @@ class ScyllaCluster:
def flush(self) -> None:
self.nodetool("flush")
def compact(self, keyspace: str = "", tables: list[str] | None = None) -> None:
for node in self.nodelist():
if node.is_running():
node.compact(keyspace=keyspace, tables=tables)
@staticmethod
def debug(message: str) -> None:
logger.debug(message)

View File

@@ -111,6 +111,7 @@ class ScyllaNode:
self.data_center = server.datacenter
self.rack = server.rack
self._hostid = None
self._smp_set_during_test = None
self._smp = None
self._memory = None
@@ -465,6 +466,9 @@ class ScyllaNode:
if wait_for_binary_proto:
self.wait_for_binary_interface(from_mark=self.mark)
if not self._hostid:
self.hostid()
if wait_other_notice:
timeout = self.cluster.default_wait_other_notice_timeout
for node, mark in marks:
@@ -647,11 +651,12 @@ class ScyllaNode:
cmd.append(table)
self.nodetool(" ".join(cmd), **kwargs)
def compact(self, keyspace: str = "", tables: str | None = ()) -> None:
def compact(self, keyspace: str = "", tables: list[str] | None = None) -> None:
compact_cmd = ["compact"]
if keyspace:
compact_cmd.append(keyspace)
compact_cmd += tables
if tables:
compact_cmd.extend(tables)
self.nodetool(" ".join(compact_cmd))
def drain(self, block_on_log: bool = False) -> None:
@@ -824,10 +829,13 @@ class ScyllaNode:
assert timeout is None, "argument `timeout` is not supported" # not used in scylla-dtest
assert force_refresh is None, "argument `force_refresh` is not supported" # not used in scylla-dtest
try:
return self.cluster.manager.get_host_id(server_id=self.server_id)
except Exception as exc:
self.error(f"Failed to get hostid: {exc}")
if not self._hostid:
try:
self._hostid = self.cluster.manager.get_host_id(server_id=self.server_id)
except Exception as exc:
self.error(f"Failed to get hostid: {exc}")
return self._hostid
def rmtree(self, path: str | Path) -> None:
"""Delete a directory content without removing the directory.

View File

@@ -34,6 +34,7 @@ def pytest_addoption(parser: Parser) -> None:
parser.addoption("--experimental-features", type=lambda s: s.split(","), action="store", help="Pass experimental features <feature>,<feature> to enable", default=None)
parser.addoption("--tablets", action=argparse.BooleanOptionalAction, default=False, help="Whether to enable tablets support (default: %(default)s)")
parser.addoption("--force-gossip-topology-changes", action="store_true", default=False, help="force gossip topology changes in a fresh cluster")
parser.addoption("--compaction-strategy", action="store", default=None, help="Compaction strategy to use in tests that support it (e.g. wide_rows_test.py). One of LeveledCompactionStrategy, SizeTieredCompactionStrategy, TimeWindowCompactionStrategy, or IncrementalCompactionStrategy. If not set, a random strategy is chosen per test.")
def pytest_configure(config: Config) -> None:

View File

@@ -263,3 +263,25 @@ def assert_lists_equal_ignoring_order(list1, list2, sort_key=None):
sorted_list2 = sorted(normalized_list2, key=lambda elm: str(elm[sort_key]))
assert sorted_list1 == sorted_list2
def assert_equal_more_with_deviation(actual, expect, deviation_perc):
"""
Assert actual is within inclusive interval [expected...expected+deviation_perc]
@param actual Value inspected
@param expect Beginning of expected interval
@param deviation_perc allowed percent increase
"""
deviation_high = (expect * (100 + deviation_perc)) / 100
assert expect <= actual <= deviation_high, f"Expect result interval {expect}..{deviation_high}, received {actual}"
def assert_less_equal_lists(actual_list, expected_list, msg=None):
"""
Assert actual_list is a subset of the expected list, prints hardcoded or parameterized error message
@param actual_list Inspected list
@param expected_list List that supposed to include actual_list
@param msg Configured message default None.
"""
standardMsg = msg or f"{actual_list} not less than or equal to {expected_list}"
assert set(actual_list) <= set(expected_list), standardMsg

File diff suppressed because it is too large Load Diff

View File

@@ -48,5 +48,6 @@ run_in_dev:
- dtest/commitlog_test
- dtest/cfid_test
- dtest/rebuild_test
- dtest/wide_rows_test
run_in_debug:
- random_failures/test_random_failures