Compare commits

...

10 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
ae0208e35c Use permissions: {} (no token permissions) instead of contents: read
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Agent-Logs-Url: https://github.com/scylladb/scylladb/sessions/11b4f0e2-dd65-47f3-9d02-0c01e28fcd99
2026-03-23 16:59:37 +00:00
copilot-swe-agent[bot]
b418e7a489 Initial plan 2026-03-23 16:58:39 +00:00
Yaniv Kaul
2c5727753a .github/workflows/trigger-scylla-ci.yaml:3: Potential fix for code scanning alert no. 184: Workflow does not contain permissions
Reduce permissions to 'read'.

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-03-23 16:46:06 +02:00
Yaniv Michael Kaul
051107f5bc scylla-gdb: fix sstable-summary crash on ms-format sstables
The 'scylla sstable-summary' GDB command crashes with
'ValueError: Argument "count" should be greater than zero' when
inspecting ms-format (trie-based) sstables. This happens because
ms-format sstables don't populate the traditional summary structure,
leaving all fields zeroed out, which causes gdb.read_memory() to be
called with a zero count.

Fix by:
- Adding zero-length guards to sstring.to_hex() and sstring.as_bytes()
  to return early when the data length is zero, consistent with the
  existing guard in managed_bytes.get().
- Adding the same guard to scylla_sstable_summary.to_hex().
- Detecting ms-format sstables (version == 5) early in
  scylla_sstable_summary.invoke() and printing an informative message
  instead of attempting to read the unpopulated summary.

Fixes: SCYLLADB-1180

Closes scylladb/scylladb#29162
2026-03-23 12:44:47 +02:00
Piotr Szymaniak
c8e7e20c5c test/cluster: retry create_table on transient schema agreement timeout
In test_index_requires_rf_rack_valid_keyspace, the create_table call
for a plain tablet-based table can fail with 'Unable to reach schema
agreement' after the server's 10s timeout is exceeded. This happens
when schema gossip propagation across the 4-node cluster takes longer
than expected after a sequence of rapid schema changes earlier in the
test.

Add a retry (up to 2 attempts) on schema agreement errors for this
specific create_table call rather than increasing the server-side
timeout.

Fixes: SCYLLADB-1135

Closes scylladb/scylladb#29132
2026-03-23 10:45:30 +02:00
Yaniv Kaul
fb1f995d6b .github/workflows/backport-pr-fixes-validation.yaml: workflow does not contain permissions (Potential fix for code scanning alert no. 139)
Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/139,

To fix the problem, explicitly restrict the `GITHUB_TOKEN` permissions
for this workflow/job so it has only what is needed. The script reads PR
data and repository info (which is covered by `contents: read`/default
read scopes) and posts a comment via `github.rest.issues.createComment`,
which requires `issues: write`. No other write scopes (e.g., `contents:
write`, `pull-requests: write`) are necessary.

The best fix without changing functionality is to add a `permissions`
block scoped to this job (or at the workflow root). Since we only see a
single job here, we’ll add it under `check-fixes-prefix`. Concretely, in
`.github/workflows/backport-pr-fixes-validation.yaml`, between the
`runs-on: ubuntu-latest` line (line 10) and `steps:` (line 11), add:

```yaml
    permissions:
      contents: read
      issues: write
```

This keeps the token minimally privileged while still allowing the script
to create issue/PR comments.

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Closes scylladb/scylladb#27810
2026-03-23 10:30:01 +02:00
Piotr Smaron
32225797cd dtest: fix flaky test_writes_schema_recreated_while_node_down
`read_barrier(session2)` was supposed to ensure `node2` has caught up on schema
before a CL=ALL write. But `patient_cql_connection(node2)` creates a
cluster-aware driver session `(TokenAwarePolicy(DCAwareRoundRobinPolicy()))`
that can route the barrier CQL statement to any node — not necessarily `node2`.
If the barrier runs on `node1` or `node3` (which already have the new schema),
it's a no-op, and `node2` remains stale, thus the observed `WriteFailure`.
The fix is to switch to `patient_exclusive_cql_connection(node2)`,
which uses `WhiteListRoundRobinPolicy([node2_ip])` to pin all CQL to `node2`.
This is already the established pattern used by other tests in the same file.

Fixes: SCYLLADB-1139

No need to backport yet, appeared only on master.

Closes scylladb/scylladb#29151
2026-03-23 10:25:54 +02:00
Michał Chojnowski
f29525f3a6 test/boost/cache_algorithm_test: disable sstable compression to avoid giant index pages
The test intentionally creates huge index pages.
But since 5e7fb08bf3,
the index reader allocates a block of memory for a whole index page,
instead of incrementally allocating small pieces during index parsing.
This giant allocation causes the test to fail spuriously in CI sometimes.

Fix this by disabling sstable compression on the test table,
which puts a hard cap of 2000 keys per index page.

Fixes: SCYLLADB-1152

Closes scylladb/scylladb#29152
2026-03-23 09:57:11 +02:00
Raphael S. Carvalho
05b11a3b82 sstables_loader: use new sstable add path
Use add_new_sstable_and_update_cache() when attaching SSTables
downloaded by the node-scoped local loader.

This is the correct variant for new SSTables: it can unlink the
SSTable on failure to add it, and it can split the SSTable if a
tablet split is in progress. The older
add_sstable_and_update_cache() helper is intended for preexisting
SSTables that are already stable on disk.

Additionally, downloaded SSTables are now left unsealed (TemporaryTOC)
until they are successfully added to the table's SSTable set. The
download path (download_fully_contained_sstables) passes
leave_unsealed=true to create_stream_sink, and attach_sstable opens
the SSTable with unsealed_sstable=true and seals it only inside the
on_add callback — matching the pattern used by stream_blob.cc and
storage_service.cc for tablet streaming.

This prevents a data-resurrection hazard: previously, if the process
crashed between download and attach_sstable, or if attach_sstable
failed mid-loop, sealed (TOC) SSTables would remain in the table
directory and be reloaded by distributed_loader on restart. With
TemporaryTOC, sstable_directory automatically cleans them up on
restart instead.

Fixes  https://scylladb.atlassian.net/browse/SCYLLADB-1085.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#29072
2026-03-23 10:33:04 +03:00
Piotr Szymaniak
f511264831 alternator/test: fix test_ttl_with_load_and_decommission flaky Connection refused error
The native Scylla nodetool reports ECONNREFUSED as 'Connection refused',
not as 'ConnectException' (which is the Java nodetool format). Add
'Connection refused' to the valid_errors list so that transient
connection failures during concurrent decommission/bootstrap topology
changes are properly tolerated.

Fixes SCYLLADB-1167

Closes scylladb/scylladb#29156
2026-03-22 11:01:45 +02:00
8 changed files with 47 additions and 8 deletions

View File

@@ -8,6 +8,9 @@ on:
jobs:
check-fixes-prefix:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write
steps:
- name: Check PR body for "Fixes" prefix patterns
uses: actions/github-script@v7

View File

@@ -1,5 +1,7 @@
name: Trigger Scylla CI Route
permissions: {}
on:
issue_comment:
types: [created]

View File

@@ -952,6 +952,8 @@ class sstring:
@staticmethod
def to_hex(data, size):
if size == 0:
return ''
inf = gdb.selected_inferior()
return bytes(inf.read_memory(data, size)).hex()
@@ -974,6 +976,8 @@ class sstring:
return self.ref['u']['external']['str']
def as_bytes(self):
if len(self) == 0:
return b''
inf = gdb.selected_inferior()
return bytes(inf.read_memory(self.data(), len(self)))
@@ -5636,6 +5640,8 @@ class scylla_sstable_summary(gdb.Command):
self.inf = gdb.selected_inferior()
def to_hex(self, data, size):
if size == 0:
return ''
return bytes(self.inf.read_memory(data, size)).hex()
def invoke(self, arg, for_tty):
@@ -5647,6 +5653,10 @@ class scylla_sstable_summary(gdb.Command):
sst = seastar_lw_shared_ptr(arg).get().dereference()
else:
sst = arg
ms_version = int(gdb.parse_and_eval('sstables::sstable_version_types::ms'))
if int(sst['_version']) >= ms_version:
gdb.write("sstable uses ms format (trie-based index); summary is not populated.\n")
return
summary = seastar_lw_shared_ptr(sst['_components']['_value']).get().dereference()['summary']
gdb.write("header: {}\n".format(summary['header']))

View File

@@ -221,10 +221,16 @@ private:
sst->set_sstable_level(0);
auto units = co_await sst_manager.dir_semaphore().get_units(1);
sstables::sstable_open_config cfg {
.unsealed_sstable = true,
.ignore_component_digest_mismatch = db.get_config().ignore_component_digest_mismatch(),
};
co_await sst->load(table.get_effective_replication_map()->get_sharder(*table.schema()), cfg);
co_await table.add_sstable_and_update_cache(sst);
co_await table.add_new_sstable_and_update_cache(sst, [&sst_manager, sst] (sstables::shared_sstable loading_sst) -> future<> {
if (loading_sst == sst) {
auto writer_cfg = sst_manager.configure_writer(loading_sst->get_origin());
co_await loading_sst->seal_sstable(writer_cfg.backup);
}
});
}
future<>
@@ -295,7 +301,8 @@ private:
sstables::sstable_state::normal,
sstables::sstable::component_basename(
_table.schema()->ks_name(), _table.schema()->cf_name(), descriptor.version, gen, descriptor.format, it->first),
sstables::sstable_stream_sink_cfg{.last_component = std::next(it) == components.cend()});
sstables::sstable_stream_sink_cfg{.last_component = std::next(it) == components.cend(),
.leave_unsealed = true});
auto out = co_await sstable_sink->output(foptions, stream_options);
input_stream src(co_await [this, &it, sstable, f = files.at(it->first)]() -> future<input_stream<char>> {

View File

@@ -62,7 +62,11 @@ SEASTAR_TEST_CASE(test_index_doesnt_flood_cache_in_small_partition_workload) {
// cfg.db_config->index_cache_fraction.set(1.0);
return do_with_cql_env_thread([] (cql_test_env& e) {
// We disable compactions because they cause confusing cache mispopulations.
e.execute_cql("CREATE TABLE ks.t(pk blob PRIMARY KEY) WITH compaction = { 'class' : 'NullCompactionStrategy' };").get();
// We disable compression because the sstable writer targets a specific
// (*compressed* data file size : summary file size) ratio,
// so the number of keys per index page becomes hard to control,
// and might be arbitrarily large.
e.execute_cql("CREATE TABLE ks.t(pk blob PRIMARY KEY) WITH compaction = { 'class' : 'NullCompactionStrategy' } AND compression = {'sstable_compression': ''};").get();
auto insert_query = e.prepare("INSERT INTO ks.t(pk) VALUES (?)").get();
auto select_query = e.prepare("SELECT * FROM t WHERE pk = ?").get();
@@ -154,7 +158,11 @@ SEASTAR_TEST_CASE(test_index_is_cached_in_big_partition_workload) {
// cfg.db_config->index_cache_fraction.set(0.0);
return do_with_cql_env_thread([] (cql_test_env& e) {
// We disable compactions because they cause confusing cache mispopulations.
e.execute_cql("CREATE TABLE ks.t(pk bigint, ck bigint, v blob, primary key (pk, ck)) WITH compaction = { 'class' : 'NullCompactionStrategy' };").get();
// We disable compression because the sstable writer targets a specific
// (*compressed* data file size : summary file size) ratio,
// so the number of keys per index page becomes hard to control,
// and might be arbitrarily large.
e.execute_cql("CREATE TABLE ks.t(pk bigint, ck bigint, v blob, primary key (pk, ck)) WITH compaction = { 'class' : 'NullCompactionStrategy' } AND compression = {'sstable_compression': ''};").get();
auto insert_query = e.prepare("INSERT INTO ks.t(pk, ck, v) VALUES (?, ?, ?)").get();
auto select_query = e.prepare("SELECT * FROM t WHERE pk = ? AND ck = ?").get();

View File

@@ -691,7 +691,7 @@ class TesterAlternator(BaseAlternator):
random.choice(nodes_for_maintenance).compact()
except NodetoolError as exc:
error_message = str(exc)
valid_errors = ["ConnectException", "status code 404 Not Found"]
valid_errors = ["ConnectException", "Connection refused", "status code 404 Not Found"]
if not any(err in error_message for err in valid_errors):
raise

View File

@@ -353,7 +353,7 @@ class TestSchemaManagement(Tester):
logger.debug("Restarting node2")
node2.start(wait_for_binary_proto=True)
session2 = self.patient_cql_connection(node2)
session2 = self.patient_exclusive_cql_connection(node2)
read_barrier(session2)
rows = session.execute(SimpleStatement("SELECT * FROM cf", consistency_level=ConsistencyLevel.ALL))
@@ -382,7 +382,7 @@ class TestSchemaManagement(Tester):
logger.debug("Restarting node2")
node2.start(wait_for_binary_proto=True)
session2 = self.patient_cql_connection(node2)
session2 = self.patient_exclusive_cql_connection(node2)
read_barrier(session2)
session.execute(SimpleStatement("INSERT INTO cf (p, v) VALUES (2, '2')", consistency_level=ConsistencyLevel.ALL))

View File

@@ -808,7 +808,16 @@ async def test_index_requires_rf_rack_valid_keyspace(manager: ManagerClient):
# Create a table with tablets and no indexes, then add a GSI - the update should fail
table_name = unique_table_name()
create_table_with_index(alternator, table_name, index_type=None, initial_tablets='1')
# The server waits 10s for schema agreement after creating a table,
# which may not be enough after a sequence of rapid schema changes
# on a multi-node cluster (see SCYLLADB-1135). Retry if needed.
for attempt in range(2):
try:
create_table_with_index(alternator, table_name, index_type=None, initial_tablets='1')
break
except ClientError as e:
if 'schema agreement' not in str(e) or attempt == 1:
raise
with pytest.raises(ClientError, match=expected_err_update_add_gsi):
alternator.meta.client.update_table(
TableName=table_name,