We added this filter after detecting a bug in the Raft-based
topology. We weren't sending `barrier_and_drain` commands to a
decommissioning node that could still be coordinating requests.
It could cause stale topology exceptions on replicas if the
decommissioning node sent a request with an old topology version
after normal nodes received the new fence version.
This bug has been fixed in the previous commit, so we remove the
filter.
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.
Tests are also added.
Fixes: #16492Closesscylladb/scylladb#16517
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
test/cql-pytest: test_tools.py: extract some fixture logic to functions
test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
test/boost/schema_loader_test: add test for mvs and indexes
tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
replica/database: extract existing_index_names and get_available_index_name
tools/schema_loader: make real_db.tables the only source of truth on existing tables
tools/schema_loader: table(): store const keyspace&
tools/schema_loader: make database,keyspace,table non-movable
cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
cql3/statements/create_index_statement: make build_index_schema() public
cql3/statements/create_index_statement: relax some method's dependence on qp
cql3/statements/create_view_statement: make prepare_view() public
Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests.
Fixes#16904.
Closesscylladb/scylladb#16910
* github.com:scylladb/scylladb:
test/alternator: add more tests for TagResource
alternator: allow empty tag value
There are currently two options how to "request" the number of initial tables for a table
1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own
Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.
Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.
Closesscylladb/scylladb#16919
* github.com:scylladb/scylladb:
config: Add --tablets-initial-scale-factor
tablet_allocator: Add initial tablets scale to config
tablet_allocator: Add config
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.
These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.
As usual, all the new tests also pass on DynamoDB.
Refs #16904
Refs #16908
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This PR fixes test_tablet_missing_data_repair and enable the test again.
If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail. Check nodes see each other as UP before repair.
Closesscylladb/scylladb#16930
* github.com:scylladb/scylladb:
test: Enable test_tablet_missing_data_repair again
test: Wait for nodes to be up when repair
test: Check repair status in ScyllaRESTAPIClient
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.
It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.
As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.
Fixes: #16377Closesscylladb/scylladb#16881
Before the patch we called `gossiper.remove_endpoint` for IP-s of the
left nodes. The problem is that in replace-with-same-ip scenario we
called `gossiper.remove_endpoint` for IP which is used by the new,
replacing node. The `gossiper.remove_endpoint` method puts the IP into
quarantine, which means gossiper will ignore all events about this IP
for `quarantine_delay` (one minute by default). If we immediately
replace just replaced node with the same IP again, the bootstrap will
fail since the gossiper events are blocked for this IP, and we won't be
able to resolve an IP for the new host_id.
Another problem was that we called gossiper.remove_endpoint method,
which doesn't remove an endpoint from `_endpoint_state_map`, only from
live and unreachable lists. This means the IP will keep circulating in
the gossiper message exchange between cluster nodes until full cluster
restart.
This patch fixes both of these problems. First, we rely on the fact that
when topology coordinator moves the `being_replaced` node to the left
state, the IP of the `replacing` node is known to all nodes. This means
before removing an IP from the gossiper we can check if this IP is
currently used by another node in the current raft topology. This is
done by constructing the `used_ips` map based on normal and transition
nodes. This map is cached to avoid quadratic behaviour.
Second, we call `gossiper.force_remove_endpoint`, not
`gossiper.remove_endpoint`. This function removes and IP from
`_endpoint_state_map`, as well as from live and unreachable lists.
Closesscylladb/scylladb#16820
* github.com:scylladb/scylladb:
get_peer_info_for_update: update only required fields in raft topology mode
get_peer_info_for_update: introduce set_field lambda
storage_service::on_change: fix indent
storage_service::on_change: skip handle_state functions in raft topology mode
test_replace_different_ip: check old IP is removed from gossiper
test_replace: check two replace with same IP one after another
storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
Skip test_tablet_missing_data_repair, it is failing a lot breaking
promotion and CI. Can't revert because the PR introducing it was already
piled on. So disable while investigated.
Refs: #16859Closesscylladb/scylladb#16879
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.
Make the API more uniform by removing this special constructor.
The only caller, in a test, is adjusted.
Closesscylladb/scylladb#16905
Running test/cql-pytest/run now defaults to enabling the "tablets"
experimental feature when running Scylla - and tests detect this and
use this feature as appropriate. This is the correct default going
forward, but in the short term it would be nice to also have an
option to easily do a manual test run *without* tablets.
So this patch adds a "--vnodes" option to the test/cql-pytest/run
script. This option causes "run" to run Scylla without enabling the
"tablets" experimental feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16896
Issue #16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
In this commit we modify the existing
test_replace_different_ip. We add the check that the old
IP is not contained in alive or down lists, which
means it's completely wiped from gossiper. This test is failing
without the force_remove_endpoint fix from
a previous commit. We also check that the state of
local system.peers table is correct.
This mini-set includes code coverage support for ScyllaDB, it provides:
1. Support for building ScyllaDB with coverage support.
2. Utilities for processing coverage profiling data
3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports.
Refs #16323Closesscylladb/scylladb#16784
* github.com:scylladb/scylladb:
Add code coverage documentation
test.py: support code coverage
code coverage: Add libraries for coverage handling
test.py: support --coverage and --coverage-mode
configure.py support coverage profiles on standrad build modes
Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events.
Closesscylladb/scylladb#16749
* github.com:scylladb/scylladb:
test/pylib: scylla_cluster: enable raft_topology=debug level by default
raft topology: increase level of some TRACE messages
raft topology: log when entering transition states
raft topology: don't include null ID in exclude_nodes
raft topology: INFO log when executing global commands and updating topology state
storage_service: separate logger for raft topology
For tests that cover functionality, which doesn't yet work with tablets.
These tests and the respective functionality they test, are expected to
be fixed soon, and then these fixtures will be removed.
When tablets are enabled on a keyspace, they cannot be altered to simple
replication strategy anymore.
These keyspaces are testing exactly that, so disable tablets on the
initial keyspace create statements.
This test expects a keyspace with local storage option, to not have a
row in system_schema.scylla_keyspace. With tablets enabled by default,
this won't be the case. Adjust the test to check for the specific
storage-related columns instead.
Recently, in commit 49026dc319, the
way to choose the number of tablets in a new keyspace changed.
This broke the test we had for a memory leak when many tablets were
used, which saw the old syntax wasn't recognized and assumed Scylla
is running without tablet support - so the test was skipped.
Let's fix the syntax. After this patch the test passes if the tablets
experimental feature is enabled, and only skipped if it isn't.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Add keyspace_has_tablets() utility function, which, given a keyspace,
returns whether it is using tablets or not.
In addition, 3 new fixtures are added:
* has_tablets - does scylla has tablets by default?
* xfail_tablets - the test is marked xfail, when tablets are enabled by
default.
* skip_with_tablets - the test is skipped when tablets are enabled by
default, because it might crash with tablets.
We expect the latter two to be removed soon(ish), as we make all test,
and the functionality they test work with tablets.
This is a test case for the problem, described in the
previous commit. Before that fix the second replace
failed since it couldn't resolve an IP for the new host_id.
This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do.
This is possible in the clean path by:
- improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted
- checking that list of sstables selected for cleanup isn't empty before creating the cleanup task
For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty.
For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them.
Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage.
Refs scylladb/scylladb#15673
Refs scylladb/scylladb#16694Fixesscylladb/scylladb#16803Closesscylladb/scylladb#16808
* github.com:scylladb/scylladb:
table: add_sstable_and_update_cache: trigger compaction only in compaction group
compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
Allows selectively enabling higher logging levels for just raft-topology
related things, without doing it for the entire storage_service (which
includes things like gossiper callbacks).
Also gets rid of the redundant "raft topology:" prefix which was also
not included everywhere.
test.py already support the routing of coverage data into a
predetermined folder under the `tmpdir` logs folder. This patch extends
on that and leverage the code coverage processing libraries to produce
test coverage lcov files and a coverage summary at the end of the run.
The reason for not generating the full report (which can be achieved
with a one liner through the `coverage_utils.py` cli) is that it is
assumed that unit testing is not necessarily the "last stop" in the
testing process and it might need to be joined with other coverage
information that is created at other testing stages (for example dtest).
The result of this patch is that when running test.py with one of the
coverage options (`--coverage` / `--mode-coverage`) it will perform
another step of processing and aggregating the profiling information
created.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Coverage handling is divided into 3 steps:
1. Generation of profiling data from a run of an instrumented file
(which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
producing the data in some format that can be manipulated and
unified.
3. Generate some reporting based on this data.
The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
data, it also contains a cli for the (assumed) most common operations
that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
which is a textual form containing a source dependant coverage data.
An example of such manipulation can be `coverage diff` operation
which produces a set like difference operation. cov_a - cov_b = diff
where diff is an lcov formated file containing coverage data for code
cov_a that is not covered at all in cov_b.
The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.
This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.
For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
coverage.
--coverage-mode - which can be used to "turn on" coverage support only
for some of the modes in this run.
The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.
Also added support of 'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
The test verifies repair brings the missing rows to the owner.
- Shutdown part of the nodes in the cluster
- Insert data
- Start all nodees
- Run repair
- Shutdown part of the nodes
- Check all data is present
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.
This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.
Closesscylladb/scylladb#16394
* github.com:scylladb/scylladb:
test:cql-pytest: change service levels intervals in tests
configure service levels interval
Refs #16757
Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.
Test included.
Closesscylladb/scylladb#16801
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.
This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is
* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing
fixes: #16319Closesscylladb/scylladb#16364
* github.com:scylladb/scylladb:
code: Enable tablets if cluster feature is enabled
test: Turn off tablets feature by default
test: Move test_tablet_drain_failure_during_decommission to another suite
test/tablets: Enable tables for real on test keyspace
test/tablets: Make timestamp local
cql3: Add feature service to as_ks_metadata_update()
cql3: Add feature service to ks_prop_defs::as_ks_metadata()
cql3: Add feature service to get_keyspace_metadata()
cql: Add tablets on/off switch to CREATE KEYSPACE
cql: Move initial_tablets from REPLICATION to TABLETS in DDL
network_topology_strategy: Estimate initial_tablets if 0 is set
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.
It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16758
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we
* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16765
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.
we will promote this change to a shared place if we have similar
needs in other tests as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16775
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.
To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>