In c5668d99, a new source file row_cache.cc was added to the `db` target,
but with an extraneous trailing comma. In CMake's target_sources(),
source files should be space-separated - any comma is interpreted as
part of the filename, causing build failures like:
```
CMake Error at db/CMakeLists.txt:2 (target_sources):
Cannot find source file:
row_cache.cc,
```
Fix the issue by removing the trailing comma.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22754
Add the possibility to run boost and unit tests with pytest
test.py should follow the next paradigm - the ability to run all test cases sequentially by ONE pytest command.
With this paradigm, to have the better performance, we can split this 1 command into 2,3,4,5,100,200... whatever we want
It's a new functionality that does not touch test.py way of executing the boost and unit tests.
It supports the main features of test.py way of execution: automatic discovery of modes, repeats.
There is an additional requirement to execute tests in parallel: pytest-xdist. To install it, execute `pip install pytest-xdist`
To run test with pytest execute `pytest test/boost`. To execute only one file, provide the path filename `pytest test/boost/aggregate_fcts_test.cc` since it's a normal path, autocompletion will work on the terminal. To provide a specific mode, use the next parameter `--mode dev`, if parameter will not be provided pytest will try to use `ninja mode_list` to find out the compiled modes.
Parallel execution controlled by pyest-xdist and the parameter `-n 12`.
The useful command to discover the tests in the file or directory is `pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc`. That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command, but suffix for mode should be skipped, for example output will be `test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev`, so to execute this specific test function, please use the next command `pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg`
There is a parameter `--repeat` that used to repeat the test case several times in the same way as test.py did.
It's not possible to run both boost and unit tests directories with one command, so we need to provide explicitly which directory should be executed. Like this `pytest --mode dev test/unit` or `pytest --mode dev test/boost`
Fixes: https://github.com/scylladb/qa-tasks/issues/1775Closesscylladb/scylladb#21108
* github.com:scylladb/scylladb:
test.py: Add possibility to run ldap tests from pytest
test.py: Add the possibility to run unit tests from pytest
test.py: Add the possibility to run boost test from pytest
test.py: Add discovery for C++ tests for pytest
test.py: Modify s3 server mock
test.py: Add method to get environment variables from MinIO wrapper
test.py: Move get configured modes to common lib
When upgrading for example from `2024.1` to `2025.1` the package name is
not identical casuing the upgrade command to fail:
```
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Exit code: 100
Stdout:
Selecting previously unselected package scylla.
Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ...
Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
Stderr:
E: Sub-process /usr/bin/dpkg returned an error code (1)
```
Adding `Obsoletes` (for rpm) and `Replaces` (for deb)
Fixes: https://github.com/scylladb/scylladb/issues/22420Closesscylladb/scylladb#22457
* tools/python3 8415caf4...3e0b8932 (2):
> reloc: collect package files correctly if the package has an optional dependency
> dist: support smooth upgrade from enterprise to source availalbe
Closesscylladb/scylladb#22517
This series extends the table schema with per-table tablet options.
The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions,
when the table size changes.
* New feature, no backport required
Closesscylladb/scylladb#22090
* github.com:scylladb/scylladb:
tablets: resize_decision: get rid of initial_decision
tablet_allocator: consider tablet options for resize decision
tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
cql3/create_keyspace_statement: add deprecation warning for initial tablets
test: cqlpy: test_tablets: add tests for per-table tablet options
schema: add per-table tablet options
feature_service: add TABLET_OPTIONS cluster schema feature
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.
Fixes: https://github.com/scylladb/scylladb/issues/22629
Backport required to all active releases, they are all affected.
Closesscylladb/scylladb#22701
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: set_notify_handler(): disable timeout
reader_permit: mark check_abort() as const
Add posibility to run ldap tests with pytest.
LDAP server will be created for each worker if xdist will be used.
For one thread one LDAP server will be used for all tests.
Add the possibility to run boost test from pytest.
Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.
Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs.
Modify generate report to not modify the names, since it will break
xdist way of working. Instead modification will be done in post collect
but before executing the tests.
Add the possibility to return environment as a dict to use it later it subprocess created by xdist, without starting another s3 mock server for each thread.
Add method to retrieve MinIO server wrapper environment variables for
later processing.
This change will allow to sharing connection information with other
processes and allow reusing the server across multiple tests.
This patch series contains improvements to our GitHub license header check workflow.
The first patch grants necessary write permissions to the workflow, allowing it to comment directly on pull requests when license header issues are found. This addresses a permissions-related error that previously prevented the workflow from creating comments.
The second patch optimizes the workflow by skipping the license check step when no relevant files have been modified in the pull request. This prevents unnecessary workflow failures that occurred when the check was run without any files to analyze.
Together, these changes make the license header checking process more robust and efficient. The workflow now properly communicates findings through PR comments and avoids running unnecessary checks.
---
no need to backport, as the workflow updated by this change only exists in master.
Closesscylladb/scylladb#22736
* github.com:scylladb/scylladb:
.github: grant write permissions for PR comments in license check workflow
.github: skip license check when no relevant files changed
It's possible to modify 'memtable_flush_period_in_ms' option only and as
single option, not with any other options together
Refs #20999Fixes#21223Closesscylladb/scylladb#22536
Grant write permissions to the check-license-header workflow to enable
commenting on pull requests. This fixes the "Resource not accessible by
integration" HTTP error that occurred when the workflow attempted to
create comments.
The permission is required according to GitHub's API documentation for
creating issue comments.
see also https://docs.github.com/en/rest/issues/comments?apiVersion=2022-11-28#create-an-issue-comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Skip the license header check step in `check-license-header.yaml` workflow
when no files with configured extensions were changed in the pull request.
Previously, the workflow would fail in this case since the --files
argument requires at least one file path:
```
check-license.py: error: argument --files: expected at least one argument
```
Add `if` condition to only run the check when steps.changed-files.outputs.files
is not empty.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Until now this action checked if we have a `backport/none` or `backport/x.y` label only, since we moved to the source available and the releases like 2025.1 don't match this regex this action keeps failing
Closesscylladb/scylladb#22734
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.
Fixes: #scylladb/scylladb#22629
This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.
Closesscylladb/scylladb#22729
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.
This change:
- Introduce authenticator::ensure_superuser_is_created() to allow
waiting for complete initialization of super user authentication
- Implement ensure_superuser_is_created in password_authenticator, so
waiting for superuser password initialization is possible
- Implement ensure_superuser_is_create in transitional_authenticator,
so the implementation from password_authenticator is used
- Implement no-op ensure_superuser_is_create for other authenticators
- Extend service::ensure_superuser_is_created to wait for superuser
initialization in authenticator, just as it was implemented earlier
for role_manager
- Add injected error (sleep) in password_authenticator::start to
reproduce a case of delayed password creation
- Implement test_delayed_deafult_password to verify the correctness of the fix
- Ensure superuser is created in single_node_cql_env::run_in_thread to
make single_node_cql more similar to scylla_main in main.cc
Fixesscylladb/scylladb#20566
Backport not needed - a minor bugfix
Closesscylladb/scylladb#22532
* github.com:scylladb/scylladb:
test: implement test_auth_password_ensured
test: implement connect_driver argument in ManagerClient::server_add
auth: ensure default superuser password is set before serving CQL
auth: added password_authenticator_start_pause injected error
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.
The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
- add `collection_constructor::style_type::vector`
- rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation
Co-authored-by: @janpiotrlakomy
Fixes https://github.com/scylladb/scylladb/issues/19455Closesscylladb/scylladb#22488
* github.com:scylladb/scylladb:
docs: add vector type documentation
cassandra_tests: translate tests covering the vector type
type_codec: add vector type encoding
boost/expr_test: add vector expression tests
expression: adjust collection constructor list style
expression: add vector style type
test/boost: add vector type cql_env boost tests
test/boost: add vector type_parser tests
type_parser: support vector type
cql3: add vector type syntax
types: implement vector_type_impl
Do not merge tablets if that would drop the tablet_count
below the minimum provided by hints.
Split tablets if the current tablet_count is less than
the minimum tablet count calculated using the table's tablet options.
TODO: override min_tablet_count if the tablet count per shard
is greater than the maximum allowed. In this case
the tables tablet counts should be scaled down proportionally.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In column_family.cc and storage_service.cc there exist a bunch of helpers that parse and/or validate ks/cf names, and different endpoints use different combinations of those, duplicating the functionality of each other and generating some mess. This PR cleans the endpoints from column_family.cc that parse and validate fully qualified table name (the '$ks:$cf' string).
A visible "improvement" is that `validate_table()` helper usage in the api/ directory is narrowed down to storage_service.cc file only (with the intent to remove that helper completely), and the aforementioned `for_tables_on_all_shards()` helper becomes shorter and tiny bit faster, because it doesn't perform some re-lookups of tables, that had been performed by validation sanity checks before it.
There's more to be done in those helpers, this PR wraps only one part of this mess.
Below is the list of endpoints this PR affects and the tests that validate the changes:
|endpoint|test|
|-|-|
|column_family/autocompaction|rest_api/test_column_family::test_column_family_auto_compaction_table|
|column_family/tombstone_gc|rest_api/test_column_family::test_column_family_tombstone_gc_api|
|column_family/compaction_strategy|rest_api/test_column_family/test_column_family_compaction_strategy|
|compaction_manager/stop_keyspace_compaction/|rest_api/test_compaction_manager::{test_compaction_manager_stop_keyspace_compaction,test_compaction_manager_stop_keyspace_compaction_tables}|
Closesscylladb/scylladb#21533
* github.com:scylladb/scylladb:
api: Hide parse_tables() helper
api: Use parse_table_infos() in stop_keyspace_compaction handler
api: Re-use parse_table_info() in column_family API
api: Make get_uuid() return table_info (and rename)
api: Remove keyspace argument from for_table_on_all_shards()
api: Switch for_table_on_all_shards() to use table_info-s
api: Hide validate_table() helper
api: Tables vector is never empty now in for_table_on_all_shards()
api: Move vectors of tables, not copy
api: Add table validation to set_compaction_strategy_class endpoint
api: Use get_uuid() to validate_table() in column family API
api: Use parse_table_infos() in column family API
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
also, took this opportunity to remove an unused namespace alias. and
add an include which is used actually. please note,
`std::ranges::pop_heap()` and friends are actually provided by
`<algorithm>` not `<ranges>`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22716
Before fix of scylladb#20566, CQL was served irrespectively of default
superuser password creation, which led to an incorrect product behavior
and sporadic test failures. This test verifies race condition of serving
CQL and creating default superuser password. Injected failure is used to
ensure CQL use is attempted before default superuser password creation,
however, the attempt is expected to fail because scylladb#20566 is
fixed. Following that, the injected error is notified, so CQL driver can
be started correctly. Finally, CREATE USER query is executed to confirm
successful superuser authentication.
This change:
- Implement test_auth_password_ensured.py
The test starts a server without expecting CQL serving, because
expected_server_up_state=ServerUpState.HOST_ID_QUERIED and
connect_driver=False. Error password_authenticator_start_pause is
injected to block superuser password setup during server startup.
Next, the test waits for a log to confirm that the code implementing
injected error is reached. When the server startup procedure is
unfinished, some operations might not complete on a first try, so
waiting for driver connection is wrapped in repeat_if_host_unavailable.
This commit introduces connect_driver argument in
ManagerClient::server_add. The argument allow skipping CQL driver
initialization part during server start. Starting a server without
the driver is necessary to implement some test scenarios related
to system initialization.
After stopping a server, ManagerClient::server_start can be used to
start the server again, so connect_driver argument is also added here to
allow preventing connecting the driver after a server restart.
This change:
- Implement connect_driver argument in ManagerClient::server_add
- Implement connect_driver argument in ManagerClient::server_start
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.
This change:
- Introduce authenticator::ensure_superuser_is_created() to allow
waiting for complete initialization of super user authentication
- Implement ensure_superuser_is_created in password_authenticator, so
waiting for superuser password initialization is possible
- Implement ensure_superuser_is_create in transitional_authenticator,
so the implementation from password_authenticator is used
- Implement no-op ensure_superuser_is_create for other authenticators
- Modify service::ensure_superuser_is_created to wait for superuser
initialization in authenticator, just as it was implemented earlier
for role_manager
Fixesscylladb/scylladb#20566
This change:
- Implement password_authenticator_start_pause injected error to allow
deterministic blocking of default superuser password creation
This change facilitates manual testing of system behavior when default
superuser password is being initialized. Moreover, this mechanism will
be used in next commits to implement a test to verify a fix for
erroneous CQL serving before default superuser password creation.
this workflow checks the first 10 lines for
"LicenseRef-ScyllaDB-Source-Available-1.0" in newly introduced files
when a new pull request is created against "master" or "next".
if "LicenseRef-ScyllaDB-Source-Available-1.0" is not found, the
workflow fails. for the sake of simplicity, instead of parsing the
header for SPDX License ID, we just check to see if the
"LicenseRef-ScyllaDB-Source-Available-1.0" is included.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22065
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
- Implement ScyllaRESTAPIClient::get_config() that obtains a current
value of given configuration parameter via /v2/config REST API
- Implement test to confirm that only liveupdatable parameters are
changed when SIGHUP is sent after configuration file change
Function set_initialization_completed() is called only once in main.cc,
and the effect is expected to be visible in all shards, as a side effect
of cfg->broadcast_to_all_shards() that is called shortly after. The same
technique was already used for enable_3_1_0_compatibility_mode() call.
Fixesscylladb/scylladb#5382
No backport - minor fix.
Closesscylladb/scylladb#22655
* github.com:scylladb/scylladb:
test: SIGHUP doesn't change non-liveupdatable configuration
test: implement ScyllaRESTAPIClient::get_config()
config: prevent SIGHUP from changing non-liveupdatable parameters
config: remove unused set_value_on_all_shards(const YAML::Node&)
This update introduces four types of credential providers:
1. Environment variables
2. Configuration file
3. AWS STS
4. EC2 Metadata service
The first two providers should only be used for testing and local runs. **They must NEVER be used in production.**
The last two providers are intended for use on real EC2 instances:
- **AWS STS**: Preferred method for obtaining temporary credentials using IAM roles.
- **EC2 Metadata Service**: Should be used as a last resort.
Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result.
fixes: #21828Closesscylladb/scylladb#21830
* github.com:scylladb/scylladb:
docs: update the `object_storage.md` and `admin.rst`
aws creds: add STS and Instance Metadata service credentials providers
aws creds: add env. and file credentials providers
s3 creds: move credentials out of endpoint config
Rather than target_max_tablet_size. We need both the target
as well as max and min tablet sizes, so there is no sense
in keeping the max and deriving the target and the minimum
for the max value.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Use the keyspace initial_tablets for min_tablet_count, if the latter
isn't set, then take the maximum of the option-based tablet counts:
- min_tablet_count
- and expected_data_size_in_gb / target_tablet_size
- min_per_shard_tablet_count (via
calculate_initial_tablets_from_topology)
If none of the hints produce a positive tablet_count,
fall back to calculate_initial_tablets_from_topology * initial_scale.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Current implementation is inefficient as it calls
get_datacenter_token_owners_ips and then find_node(ep)
while for_each_node easily provides a host_id for
is_normal_token_owner.
Then, since we're interested only in datacenters
configure with a replication factor (but it still might be 0),
simply iterate over the dc->rf map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, if a datacenter has no replication_factor option
we consider its replication factor to be 1 in
calculate_initial_tablets_from_topology, but since
we're not going to have any replica on it, it should
be 0.
This is very minor since in the worst case, it
will pessimize the calculation and calculate a value
for initial_tablets that's higher than it could be.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Keyspace `initial` tablets option is deprecated
and may be removed in the future.
Rather than relying on `initial`:0 to always enabled
tablets, explicitly print "enabled":true when tablets
are enabled and initial_tablets=0, same as keyspace_metadata::describe.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Per-table hints should be used instead.
Note: the warning is produced by check_against_restricted_replication_strategies
which is called also from alter_keyspace_statement.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Test specifying of per-table tablet options on table creation
and alter table.
Also, add a negative test for atempting to use tablet options
with vnodes (that should fail).
And add a basic test for testing tablet options also with
materialized views.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.
On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).
Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization. The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.
Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.
The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations. This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For example, nodes which are being decommissioned should not be
consider as available capacity for new tables. We don't allocate
tablets on such nodes.
Would result in higher per-shard load then planned.
Closesscylladb/scylladb#22657
in order to reduce the external header dependency, let's switch to
the standardlized std::ranges::min_element().
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22572
This config item controls how many CPU-bound reads are allowed to run in
parallel. The effective concurrency of a single CPU core is 1, so
allowing more than one CPU-bound reads to run concurrently will just
result in time-sharing and both reads having higher latency.
However, restricting concurrency to 1 means that a CPU bound read that
takes a lot of time to complete can block other quick reads while it is
running. Increase this default setting to 2 as a compromise between not
over-using time-sharing, while not allowing such slow reads to block the
queue behind them.
Fixes: #22450Closesscylladb/scylladb#22679