This patch fixes "test/cqlpy/run --release 2025.1" which fails as
follows on all tests with indexes or views:
Secondary indexes are not supported on base tables with tablets
test/cqlpy/run can run cqlpy (and alternator) tests on various official
releases of Scylla which it knows how to download. When running old
versions of Scylla, we need to change the configuration options to those
that were needed on specific versions.
On new versions of Scylla we need to pass
--experimental-features=views-with-tablets
to be able to test materialized views, but in older versions we need to
remove that parameter because it didn't exist. We incorrectly removed it
for any versions 2025.1 or earlier, but that's incorrect - it just needs
to be removed for versions strictly earlier than 2025.1 - it is needed
for 2025.1 (I tested it is indeed needed even in the earliers RCs).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#24144
Much easier to avoid sstable collisions. Makes it possible to scrub
multiple sstables, with multiple calls to scylla-sstable, reusing the
same output directory. Previously, each new call to scylla-sstable
scrub, would start from generation 0, guaranteeing collision.
Remove the unit test for generation clash -- with UUID generations, this
is no longer possible to reproduce in practice.
Refs: #21387Closesscylladb/scylladb#23990
The cqlpy and alternator test frameworks use a single Scylla node started
once for all tests to run on. In the distant past, we had a problem where
if one test caused Scylla to crash, the result was a confusing report of
hundreds of failed tests - all tests after the crash "failed" and it wasn't
easy to find which test really caused the crash.
Our old solution to this problem was to have an autouse fixture (called
cql_test_connection or dynamodb_test_connection) which tested the
connection at the end of each test, and if it detected Scylla has
crashed - it used pytest.exit() to report the error and have pytest
exit and therefore stop running any further tests (which would have
led to all of them testing).
This approach had two problems:
1. The pytest.exit() caused the entire cqlpy suite to report a failure,
but but not the individual test - the individual test might have
failed as well, but that isn't guaranteed and in any case this test's
output is missing the informative message that Scylla crashed during
the test. This was fine when for each cqlpy failure we had two separate
error logs in Jenkins - the specific failed function, and the failed
file - but when we recently got rid of the suplication by removing the
second one, we no longer see the "Scylla crashed" messages any more.
2. Exiting pytest will be the wrong thing to do if the same pytest
run could run tests from different test suites. We don't do this
today, but we plan to support this approach soon.
This patch fixes both problems by replacing the pytest.exit() call by
setting a "scylla_crashed" flag and using pytest.fail(). The pytest.fail()
causes the current test - the one which caused Scylla to crash - to be
reported as an "ERROR" and the "Scylla crashed" message will correctly
appear in this test's log. The flag will cause all other tests in the
same test suite to be skip()ed. But other tests in other directories,
depending on different fixtures, might continue to run normally.
Fixes#23287
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23307
Both test.py and test/cqlpy/run run many test functions against the same
Scylla process. In the resulting log file, it is hard to understand which
log messages are related to which test. In this patch, we log a message
(using the "/system/log" REST API) every time a test is started or ends.
The messages look like this:
INFO 2025-04-22 15:10:44,625 [shard 1:strm] api - /system/log:
test/cqlpy: Starting test_lwt.py::test_lwt_missing_row_with_static
...
INFO 2025-04-22 15:10:44,631 [shard 0:strm] api - /system/log:
test/cqlpy: Ended test_lwt.py::test_lwt_missing_row_with_static
We already had a similar feature in test/alternator, added three years
ago in commit b0371b6bf8. The implementation
is similar but not identical due to different available utility functions,
and in any case it's very simple.
While at it, this patch also fixes the has_rest_api() to timeout after
one second. Without this, if the REST API is blocked in a way that
a connection attempt just hangs, the tests can hang. With the new
timeout, the test will hang for a second, realize the REST API is
not available, and remember this decision (the next tests will not
wait one second again). We had the same bug in Alternator, and fixed
it in 758f8f01d7. This one second "pause"
will only happen if the REST API port is blocked - in the more typical
case the REST API port is just not listening but not blocked, and the
failure will be noticed immediately and won't wait a whole second.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23857
In this PR, we adjust tests in the cqlpy test suite so they
only use RF-rack-valid keyspaces. After that, we enable
the configuration option `rf_rack_valid_keyspaces` in the
suite by default.
Refs scylladb/scylladb#23428
Backport: backporting to 2025.1 so we can test the option there too.
Closesscylladb/scylladb#23489
* github.com:scylladb/scylladb:
test/cqlpy: Enable rf_rack_valid_keyspaces by default
test: Move test_alter_tablet_keyspace_rf to cluster suite
test/cqlpy: Adjust tests to RF-rack-valid keyspaces
test/cqlpy/cassandra_tests: Adjust to RF-rack-valid keyspaces
A recent commit 370707b111 (re)introduced
a timeout for every group0 Raft operation. This timeout was set to 60
seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody".
However, one of the things we do as a group0 operation is schema
changes, and we already noticed a few years ago, see commit
0b2cf21932, that in some extremely
overloaded test machines where tests run hundreds of times (!) slower
than usual, a single big schema operation - such as Alternator's
DeleteTable deleting a table and multiple of its CDC or view tables -
sometimes takes more than 60 seconds. The above fix changed the
client's timeout to wait for 300 seconds instead of 60 seconds,
but now we also need to increase our Raft timeout, or the server can
time out. We've seen this happening recently making some tests flaky
in CI (issue #23543).
So let's make this timeout configurable, as a new configuration option
group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e,
60 seconds), the same as the existing default. The test framework
overrides this default with a a higher 300 second timeout, matching
the client-side timeout.
Before this patch, this timeout was already configurable in a strange
way, using injections. But this was a misstep: We already have more
than a dozen timeouts configurable through the normal configration,
and this one should have been configured in the same way. There is
nothing "holy" about the default of 60 seconds we chose, and who
knows maybe in the future we might need to tweek it in the field,
just like we made the other timeouts tweakable. Injections cannot
be used in release mode, but configuration options can.
Fixes#23543
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23717
We move the test `test_alter_tablet_keyspace_rf` from the cqlpy to the
cluster test suite. The reason behind the change is that the test cannot
be run with `rf_rack_valid_keyspaces` turned on in the configuration.
During the test, we make the keyspace RF-rack-invalid multiple times.
Since RF-rack-validity is a very strong constraint, adjust the test
otherwise is impossible.
By moving it to the cluster test suite, we're able to change the
configuration of the node used in the test, and so the test can work
again.
We adjust three existing Cassandra tests so that they don't create
RF-rack-invalid keyspaces. We modify the replication factor used
in the problematic tests. The changes don't affect the tests as
the value of the RF is unrelated to what they verify. Thanks to
that, we can run them now even with enforced RF-rack-valid keyspaces.
The drawback is that the modified ALTER statements do not modify
the RF at all. However, since the tests seem to verify that the code
responsible for VALIDATING a request works as intended, that should
have little to no impact on them.
These tests seem to be hitting the io-uring bug in the kernel from
time-to-time, making CI flaky. Force the use of the AIO backend in these
tests, as a workaround until fixed kernels (>=6.8.13) are available.
Fixes: #23517Fixes: #23546Closesscylladb/scylladb#23648
That map became redundant once we added
object_storage_endpoints in the config, this patch removes
it and switches all the user code to use the new option.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces.
Fixes: scylladb/scylladb#23218Closesscylladb/scylladb#23219
Tests in `test_service_level_api` were written before
scylladb/scylladb#16585 and they were doing 10s sleeps to wait for
service level controller to update its configuration. Now performing
a read barrier is sufficient to ensure SL configuration is up-to-date,
which significantly reduces tests time (from ~60s to ~2-3s).
Moreover, there was flakiness in the `test_switch_tenants` test.
Until now, the test waited up to 60s for the connections to update
their scheduling groups. However, it is difficult to determine
how long the process might take because a connection may be blocked
while waiting for the next request to be processed,
and the scheduling group will be updated only after a request is processed
(see `generic_server::connection::process_until_tenant_switch()`).
To address this issue, 100 simple queries are executed so that
connections on all shards process at least one request
and update their scheduling groups.
Fixesscylladb/scylladb#22768Closesscylladb/scylladb#23381
Filter out sstables which don't have a TOC or have a temporary TOC. Such
sstables are incomplete and can dissapear if the compaction which writes
them is interrupted.
When we rename columns in a table which has materialized views depending
on it, we need to also rename them in the materialized views' WHERE
clauses.
Currently, we do that by creating a new WHERE clause after each rename,
with the updated column. This is later converted to a mutation that
overwrites the WHERE clause. After multiple renames, we have multiple
mutations, each overwriting the WHERE clause with one column renamed.
As a result, the final WHERE clause is one of the modified clauses with
one column renamed.
Instead, we should prepare one new WHERE clause which includes all the
renamed columns. This patch accomplishes this by processing all the
column renames first, and only preparing the new view schema with the
new WHERE clause afterwards.
This patch also includes a test reproducer for this scenario.
Fixesscylladb/scylladb#22194Closesscylladb/scylladb#23152
scylla-sstable: Enable support for S3-stored sstables
Minimal implementation of what was mentioned in this [issue](https://github.com/scylladb/scylladb/issues/20532)
This update allows Scylla to work with sstables stored on AWS S3. Users can specify the fully qualified location of the sstable using the format: `s3://bucket/prefix/sstable_name`. One should have `object_storage_config_file` referenced in the `scylla.yaml` as described in docs/operating-scylla/admin.rst
ref: https://github.com/scylladb/scylladb/issues/20532
fixes: https://github.com/scylladb/scylladb/issues/20535
No backport needed since the S3 functionality was never released
Closesscylladb/scylladb#22321
* github.com:scylladb/scylladb:
tests: Add Tests for Scylla-SSTable S3 Functionality
docs: Update Scylla Tools Documentation for S3 SSTable Support
scylla-sstable: Enable Support for S3 SSTables
s3: Implement S3 Fully Qualified Name Manipulation Functions
object_storage: Refactor `object_storage.yaml` parsing logic
Issue #6058 complained that "DESCRIBE TABLE" or "DESCRIBE KEYSPACE" list
a secondary index as materialized view (the view used to back the index
in Scylla's implementation of secondary indexes). This patch adds a test
to verify that this issue no longer exists in server-side describe - so we
can mark the issue as fixed.
While preparing this test, I noticed that Scylla and Cassandra behave
differently on whether DESC TABLE should list materialized views or not,
so this patch also includes a test for that as well - and I opened
issue #23014 on Scylla and CASSANDRA-20365 on Cassandra to further
discuss that new issue.
Fixes#6058
Refs #23014.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23015
Scylla inherited a 48-character limit on the length of table (and
keyspace) names from Cassandra 3. It turns out that Cassandra 4 and
5 unintentionally dropped this limit (see history lesson in
CASSANDRA-20425), and now Cassandra accepts longer table names.
Some Cassandra users are using such longer names and disappointed
that Scylla doesn't allow them.
This patch includes tests for this feature. One test tries a
48-character table name - it passes on Scylla and all versions
of Cassandra. A second test tries a 100-character table name - this
one passes on Cassandra version 4 and above (but not on 3), and
fails on Scylla so marked "xfail". A third test tries a 500-character
table name. This one fails badly on Cassandra (see CASSANDRA-20389),
but passes on Scylla today. This test is important because we need to
be sure that it continues to pass on Scylla even after the Scylla is
fixed to allow the 100-character test.
Refs #4480 - an issue we already have about supporting longer names
Note on the test implementation:
Ideally, the test for a particular table-name length shouldn't just
create the table - it should also make sure we can write table to it
and flush it, i.e., that sstables can get written correctly. But in
practice, these complications are not needed, because in modern Scylla
it is the directory name which contains the table's name, and the
individual sstable files do not contain the table's name. Just creating
the table already creates the long directory name, so that is the part
that needs to be tested. If we created this directory successfully,
later creating the short-named sstables inside it can't fail.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23229
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
While using aggregate functions in conjunction with PER PARTITION LIMIT
can make sense, we want to disable it until we can offer proper
implementation, see #9879 for discussion.
We want to match Cassandra, and for queries with aggregate functions it
behaves as follows:
- it silently ignores PER PARTITION LIMIT if GROUP BY is present, which
matches our previous implementation.
- rejects PER PARTITION LIMIT when GROUP BY is *not* present.
This patch adds rejection of the second group.
Fixes#9879Closesscylladb/scylladb#23086
Extended existing Scylla Tools tests to cover the new functionality of
reading SSTables from S3. This ensures that the new S3 integration is
thoroughly tested and performs as expected.
The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.
Select everything from a table:
$ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
keyspace_name | durable_writes | replication
-------------------------------+----------------+-------------------------------------------------------------------------------------
system_replicated_keys | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
system_auth | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
system_schema | true | ({class : org.apache.cassandra.locator.LocalStrategy})
system_distributed | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
system | true | ({class : org.apache.cassandra.locator.LocalStrategy})
ks | true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
system_traces | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
system_distributed_everywhere | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):
$ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
[
{
"keyspace_name": "system_schema",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
},
{
"keyspace_name": "system",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
}
]
Select a specific field in a specific partition using the command-line:
$ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
Select a specific field in a specific partition using ``--query-file``:
$ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
$ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
New functionality: no backport needed.
Closesscylladb/scylladb#22007
* github.com:scylladb/scylladb:
docs/operating-scylla: document scylla-sstable query
test/cqlpy/test_tools.py: add tests for scylla-sstable query
test/cqlpy/test_tools.py: make scylla_sstable() return table name also
scylla-sstable: introduce the query command
tools/utils: get_selected_operation(): use std::string for operation_options
utils/rjson: streaming_writer: add RawValue()
cql3/type_json: add to_json_type()
test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
In commit 2463e524ed, Scylla's default changed
from starting with one tablet per shard to starting 10 per shard. The
functional tests don't need more tablets and it can only slow down the
tests, so the patch added --tablets-initial-scale-factor=1 to test/*/suite.yaml
but forgot to add it to test/cqlpy/run.py (to affect test/cqlpy/run) so
this patch does this now.
This patch should *only* be about making tests faster, although to be
honest, I don't see any measurable improvement in test speed (10 isn't
so many). But, unfortunately, this is only part of the story. Over time
we allowed a few cqlpy tests to be written in a way that relies on having
only a small number of tablets or even exactly one tablet per shard (!).
These tests are buggy and should be fixed - see issues #23115 and #23116
as examples. But adding the option --tablets-initial-scale-factor=1 also
to run.py will make these bugs not affect test/cqlpy/run in the same way
as it doesn't affect test.py.
These buggy tests will still break with `pytest cqlpy` against a Scylla
you ran yourself manually, so eventually will still need to fix those
test bugs.
Refs #23115
Refs #23116Closesscylladb/scylladb#23125
The cqlpy test test_compaction.py::test_compactionstats_after_major_compaction
was written to assume we have just one tablet per shard - if there are many
tablets compaction splitting the data, the test scenario might not need
compaction in the way that the test assumes it does.
Recently (commit 2463e524ed) Scylla's default
was changed to have 10 tablets per shard - not one. This broke this test.
The same commit modified test/cqlpy/suite.yaml, but that affects only test.py
and not test/cqlpy/run, and also not manual runs against a manually-installed
Scylla. If this test absolutely requires a keyspace with 1 and not 10
tablets, then it should create one explicitly. So this is what this test
does (but only if tablets are in use; if vnodes are used that's fine
too).
Before this patch,
test/cqlpy/run test_compaction.py::test_compactionstats_after_major_compaction
fails. After the patch, it passes.
Fixes#23116Closesscylladb/scylladb#23121
The currently used versions of "wasmtime", "idna", "cap-std" and
"cap-primitives" packages had low to moderate security issues.
In this patch we update the dependencies to versions with these
issues fixed.
The update was performed by changing the "wasmtime" (and "wasmtime-wasi")
version in rust/wasmtime_bindings/Cargo.toml and updating rust/Cargo.lock
using the "cargo update" command with the affected package. To fix an
issue with different dependencies having different versions of
sub-dependencies, the package "smallvec" was also updated to "1.13.1".
After the dependency update, the Rust code also needed to be updated
because of the slightly changed API. One Wasm test case needed to be
updated, as it was actually using an incorrect Wat module and not
failing before. The crate also no longer allows multiple tables in
Wasm modules by default - it is now enabled by setting the "gc" crate
feature and configuring the Engine with config.wasm_reference_types(true).
Fixes https://github.com/scylladb/scylladb/issues/23127Closesscylladb/scylladb#23128
LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken
into consideration by a query. It makes no logical sense to have this
value at less than 1. Cassandra also has this requirement.
This patch ensures that the limit value is strictly positive and adds
an explicit test for it - it was only tested in a test ported from
Cassandra, that is disabled due to other issues.
Closesscylladb/scylladb#23013
This series achieves two things:
1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes https://github.com/scylladb/scylladb/issues/21967
2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count
The per-shard goal is enforced by controlling average per-shard tablet replica
count in a given DC, which is controlled by per-table tablet
count. This is effective in respecting the limit on individual shards
as long as tablet replicas are distributed evenly between shards.
There is no attempt to move tablets around in order to enforce limits
on individual shards in case of imbalance between shards.
If the average per-shard tablet count exceeds the limit, all tables
which contribute to it (have replicas in the DC) are scaled down
by the same factor. Due to rounding up to the nearest power of 2,
we may overshoot the per-shard goal by at most a factor of 2.
The scaling is applied after computing desired tablet count due to
all other factors: per-table tablet count hints, defaults, average tablet size.
If different DCs want different scale factors of a given table, the
lowest scale factor is chosen for a given table.
When creating a new table, its tablet count is determined by tablet
scheduler using the scheduler logic, as if the table was already created.
So any scaling due to per-shard tablet count goal is reflected immediately
when creating a table. It may however still take some time for the system
to shrink existing tables. We don't reject requests to create new tables.
Fixes#21458Closesscylladb/scylladb#22522
* github.com:scylladb/scylladb:
config, tablets: Allow tablets_initial_scale_factor to be a fraction
test: tablets_test: Test scaling when creating lots of tables
test: tablets_test: Test tablet count changes on per-table option and config changes
test: tablets_test: Add support for auto-split mode
test: cql_test_env: Expose db config
config: Make tablets_initial_scale_factor live-updateable
tablets: load_balancer: Pick initial_scale_factor from config
tablets, load_balancer: Fix and improve logging of resize decisions
tablets, load_balancer: Log reason for target tablet count
tablets: load_balancer: Move hints processing to tablet scheduler
tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
tablets: load_balancer: Determine desired count from size separately from count from options
tablets: load_balancer: Determine resize decision from target tablet count
tablets: load_balancer: Allow splits even if table stats not available
tablets: load_balancer: Extract make_sizing_plan()
tablets: Add formatter for resize_decision::way_type
tablets: load_balancer: Simplify resize_urgency_cmp()
tablets: load_balancer: Keep config items as instance members
locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()
tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
tablets: Set default initial tablet count scale to 10
tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
tablets: load_balancer: Extract get_schema_and_rs()
tablets: load_balancer: Drop test_mode
Altering a keyspace (that has tablets enabled) without changing
tablets attributes, i.e. no `AND tablets = {...}` results in incorrect
"Update Keyspace..." log message being printed. The printed log
contains "tablets={"enabled":false}".
Refs https://github.com/scylladb/scylladb/issues/22261Closesscylladb/scylladb#22324
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.
Cassandra behaves the same way, so this patch also ensures
compatibility.
Fixesscylladb/scylladb#15109Closesscylladb/scylladb#22950
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes#21967
In some tests, we explicity set the initial scale to 1 as some of the
existing tests assume 1 compaction group per shard.
test.py uses a lower default. Having many tablets per shard slows down
certain topology operations like decommission/replace/removenode,
where the running time is proportional to tablet count, not data size,
because constant cost (latency) of migration dominates. This latency
is due to group0 operations and barriers. This is especially
pronounced in debug mode. Scheduler allows at most 2 migrations per
shard, so this latency becomes a determining factor for decommission
speed.
To avoid this problem in tests, we use lower default for tablet count per
shard, 2 in debug/dev mode and 4 in release mode. Alternatively, we
could compensate by allowing more concurrency when migrating small
tablets, but there's no infrastructure for that yet.
I observed that with 10 tablets per shard, debug-mode
topology_custom.mv/test_mv_topology_change starts to time-out during
removenode (30 s).
Recently, when running Alternator tests we get hundreds of warnings like
the following from basically all test files:
/usr/lib/python3.12/site-packages/botocore/crt/auth.py:59:
DeprecationWarning: datetime.datetime.utcnow() is deprecated and
scheduled for removal in a future version. Use timezone-aware objects
to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
/usr/local/lib/python3.12/site-packages/pytest_elk_reporter.py:299:
DeprecationWarning: datetime.datetime.utcnow() is deprecated and
scheduled for removal in a future version. Use timezone-aware objects
to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
These warnings all come from two libraries that we use in the tests -
botocore is used by Alternator tests, and elk reporter is a plugin that
we don't actually use, but it is installed by dtest and we often see
it in our runs as well. These warnings have zero interest to us - not
only do we not care if botocore uses some deprecated Python APIs and
will need to be updated in the future, all these warnings are hiding
*real* warnings about deprecated things we actually use in our own
test code.
The patch modifies test/pytest.ini (used by all our Python tests,
including but not limited to Alternator tests) to ignore deprecation
warnings from *inside* these two libraries, botocore and elk_reporter.
After this patch, test/alternator/run finishes without any warnings
at all. test/cqlpy does still have a few warnings left, which earlier
were hidden by the thousands of spammy warning eliminated in this patch.
We fix one of these warnings in this patch:
ResultSet indexing support will be removed in 4.0.
Consider using ResultSet.one()
by doing exactly what the warning recommended.
Some deprecation warnings in test/cqlpy remain in calls to
get_query_trace(). The "blame" for these warning is misplaced - this
function is part of the cassandra driver, but Python seems to think it's
part of our test code so I can't avoid them with the pytest.ini trick,
I'm not sure why. So I don't know yet how to eliminate these last warnings.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22881
This patch adds to the fetch_scylla.py script, used by the "--release"
option of test/{cqlpy,alternator}/run, the ability to download the new
2025.1 releases.
In the new single-stream releases, the number looks like the old
Scylla Enterprise releases, but the location of the artifacts in the
S3 bucket look like the old open-source releases (without the word
"-enterprise" in the paths). So this patch introduces a new "if"
for the (major >= 2025) case.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22778
Given two sets of equivalent types, return the set
intersection.
This is a generic function which adapts to the actual
input type.
A unit test is added.
Closesscylladb/scylladb#22763
The script fetch_scylla.py is used by the "--release" option of
test/cqlpy/run and test/alternator/run to fetch a given release of
Scylla. The release is fetched from S3, and the script assumed that the
user properly set up $HOME/.aws/config and $HOME/.aws/credentials
to determine the source of that download and the credentials to do this.
But this is unnecessary - Scylla's "downloads.scylladb.com" bucket
actually allows **anonymous** downloads, and this is what we should use.
After this patch, fetch_scylla.py (and the "--release" option of the
run scripts) work correctly even for a user that doesn't have $HOME/.aws
set up at all.
This fix is especially important to new developers, who might not even
have AWS credentials to put into these files.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22773
test_complex_null_values is currently flaky, causing many failures
in CI. The reason for the failures is unclear, and a fix might not
be simple, so because UDFs are experimental, for now let's skip
this test until the corresponding issue is fixed.
Refs scylladb/scylladb#22799Closesscylladb/scylladb#22818
Oversized materialized view and index names are rejected;
Materialized view names with invalid symbols are rejected.
fixes: #20755Closesscylladb/scylladb#21746
Developers are expected to run new cqlpy tests against Cassandra - to
verify that the new test itself is correct. Usually there is no need
to run the entire cqlpy test suite against Cassandra, but when users do
this, it isn't confidence-inspiring to see hundreds of tests failing.
In this patch I fix many but not all of these failures.
Refs #11690 (which will remain open until we fix all the failures on
Cassandra)
* Fixed the "compact_storage" fixture recently introduced to enable the
deprecated feature in Scylla for the tests. This fixture was broken on
Cassandra and caused all compact-storage related tests to fail
on Cassandra.
* Marked all tests in test_tombstone_limit.py as scylla_only - as they
check the Scylla-only query_tombstone_page_limit configuration option.
* Marked all tests in test_service_level_api.py as scylla_only - as they
check the Scylla-only service levels feature.
* Marked a test specific to the Scylla-only IncrementalCompactionStrategy
as scylla_only. Some tests mix STCS and ICS testing in one test - this
is a mistake and isn't fixed in this patch.
* Various tests in test_tablets.py forgot to use skip_without_tablets
to skip them on Cassandra or older Scylla that doesn't have the
tablets feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
x
Closesscylladb/scylladb#22561
This patch removes expansion of "SELECT *" in DESC MATERIALIZED VIEW.
Instead of explicitly printing each column, DESC command will now just
use SELECT *, if view was created with it. Also, adds a correspodning test.
Fixes#21154Closesscylladb/scylladb#21962
This series extends the table schema with per-table tablet options.
The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions,
when the table size changes.
* New feature, no backport required
Closesscylladb/scylladb#22090
* github.com:scylladb/scylladb:
tablets: resize_decision: get rid of initial_decision
tablet_allocator: consider tablet options for resize decision
tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
cql3/create_keyspace_statement: add deprecation warning for initial tablets
test: cqlpy: test_tablets: add tests for per-table tablet options
schema: add per-table tablet options
feature_service: add TABLET_OPTIONS cluster schema feature
It's possible to modify 'memtable_flush_period_in_ms' option only and as
single option, not with any other options together
Refs #20999Fixes#21223Closesscylladb/scylladb#22536
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.
The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
- add `collection_constructor::style_type::vector`
- rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation
Co-authored-by: @janpiotrlakomy
Fixes https://github.com/scylladb/scylladb/issues/19455Closesscylladb/scylladb#22488
* github.com:scylladb/scylladb:
docs: add vector type documentation
cassandra_tests: translate tests covering the vector type
type_codec: add vector type encoding
boost/expr_test: add vector expression tests
expression: adjust collection constructor list style
expression: add vector style type
test/boost: add vector type cql_env boost tests
test/boost: add vector type_parser tests
type_parser: support vector type
cql3: add vector type syntax
types: implement vector_type_impl
Test specifying of per-table tablet options on table creation
and alter table.
Also, add a negative test for atempting to use tablet options
with vnodes (that should fail).
And add a basic test for testing tablet options also with
materialized views.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.
On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).
Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization. The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.
Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.
The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations. This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>