The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.
Select everything from a table:
$ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
keyspace_name | durable_writes | replication
-------------------------------+----------------+-------------------------------------------------------------------------------------
system_replicated_keys | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
system_auth | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
system_schema | true | ({class : org.apache.cassandra.locator.LocalStrategy})
system_distributed | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
system | true | ({class : org.apache.cassandra.locator.LocalStrategy})
ks | true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
system_traces | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
system_distributed_everywhere | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):
$ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
[
{
"keyspace_name": "system_schema",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
},
{
"keyspace_name": "system",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
}
]
Select a specific field in a specific partition using the command-line:
$ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
Select a specific field in a specific partition using ``--query-file``:
$ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
$ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
New functionality: no backport needed.
Closesscylladb/scylladb#22007
* github.com:scylladb/scylladb:
docs/operating-scylla: document scylla-sstable query
test/cqlpy/test_tools.py: add tests for scylla-sstable query
test/cqlpy/test_tools.py: make scylla_sstable() return table name also
scylla-sstable: introduce the query command
tools/utils: get_selected_operation(): use std::string for operation_options
utils/rjson: streaming_writer: add RawValue()
cql3/type_json: add to_json_type()
test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
Previously, variables were marked as const, causing std::move() calls to
be redundant as reported by GCC warnings. This change either removes
const qualifiers or marks related lambdas as mutable, allowing the
compiler to properly utilize move constructors for better performance.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23066
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23062
Fixes#22314
Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.
Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points.
Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line.
Closesscylladb/scylladb#22327
* github.com:scylladb/scylladb:
tools: Add standard extensions and propagate to schema load
cql_test_env: Use add all extensions instead of inidividually
main: Move extensions adding to function
tomstone_gc: Make validate work for tools
While generally better to reduce inline code, here we get
rid of the clustering_interval_set.hh dependency, which in turns
depends on boost interval_set, a large dependency.
incremental_compaction_test.cc is adjusted for a missing header.
Closesscylladb/scylladb#22957
This series achieves two things:
1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes https://github.com/scylladb/scylladb/issues/21967
2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count
The per-shard goal is enforced by controlling average per-shard tablet replica
count in a given DC, which is controlled by per-table tablet
count. This is effective in respecting the limit on individual shards
as long as tablet replicas are distributed evenly between shards.
There is no attempt to move tablets around in order to enforce limits
on individual shards in case of imbalance between shards.
If the average per-shard tablet count exceeds the limit, all tables
which contribute to it (have replicas in the DC) are scaled down
by the same factor. Due to rounding up to the nearest power of 2,
we may overshoot the per-shard goal by at most a factor of 2.
The scaling is applied after computing desired tablet count due to
all other factors: per-table tablet count hints, defaults, average tablet size.
If different DCs want different scale factors of a given table, the
lowest scale factor is chosen for a given table.
When creating a new table, its tablet count is determined by tablet
scheduler using the scheduler logic, as if the table was already created.
So any scaling due to per-shard tablet count goal is reflected immediately
when creating a table. It may however still take some time for the system
to shrink existing tables. We don't reject requests to create new tables.
Fixes#21458Closesscylladb/scylladb#22522
* github.com:scylladb/scylladb:
config, tablets: Allow tablets_initial_scale_factor to be a fraction
test: tablets_test: Test scaling when creating lots of tables
test: tablets_test: Test tablet count changes on per-table option and config changes
test: tablets_test: Add support for auto-split mode
test: cql_test_env: Expose db config
config: Make tablets_initial_scale_factor live-updateable
tablets: load_balancer: Pick initial_scale_factor from config
tablets, load_balancer: Fix and improve logging of resize decisions
tablets, load_balancer: Log reason for target tablet count
tablets: load_balancer: Move hints processing to tablet scheduler
tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
tablets: load_balancer: Determine desired count from size separately from count from options
tablets: load_balancer: Determine resize decision from target tablet count
tablets: load_balancer: Allow splits even if table stats not available
tablets: load_balancer: Extract make_sizing_plan()
tablets: Add formatter for resize_decision::way_type
tablets: load_balancer: Simplify resize_urgency_cmp()
tablets: load_balancer: Keep config items as instance members
locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()
tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
tablets: Set default initial tablet count scale to 10
tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
tablets: load_balancer: Extract get_schema_and_rs()
tablets: load_balancer: Drop test_mode
Replace boost::find() calls with std::ranges::find() and std::ranges::contains()
to leverage modern C++ standard library features. This change reduces external
dependencies and modernizes the codebase.
The following changes were made:
- Replaced boost::find() with std::ranges::find() where index/iterator is needed
- Used std::ranges::contains() for simple element presence checks
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22920
The config variable `components_memory_reclaim_threshold` limits the
memory available to the sstable bloom filters. Any change to its value
is not immediately propagated to the sstable manager, despite it being
a LiveUpdate variable. The updated value takes effect only when a new
sstable is created or deleted.
This PR first refactors the reclaim and reload logic into a single
background fiber. It then updates the sstable manager to subscribe to
changes in the `components_memory_reclaim_threshold` configuration value
and immediately triggers the reclaim/reload fiber when a change is
detected.
Fixes#21947
This is an improvement and does not need to be backported.
Closesscylladb/scylladb#22725
* github.com:scylladb/scylladb:
sstables_manager: trigger reclaim/reload on `components_memory_reclaim_threshold` update
sstables_manager: maybe_reclaim_components: yield between iterations
sstables_manager: rename `increment_total_reclaimable_memory_and_maybe_reclaim()`
sstables_manager: move reclaim logic into `components_reclaim_reload_fiber()`
sstables_manager: rename `_sstable_deleted_event` condition variable
sstables_manager: rename `components_reloader_fiber()`
sstables_manager: fix `maybe_reclaim_components()` indentation
sstables_manager: reclaim components memory until usage falls below threshold
sstables_manager: introduce `get_components_memory_reclaim_threshold()`
sstables_manager: extract `maybe_reclaim_components()`
sstables_manager: fix `maybe_reload_components()` indentation
sstables_manager: extract out `maybe_reload_components()`
This variant of do_with_cql_env(), forgoes the reentrancy support in the
regular do_with_cql_env() variants, and re-uses the caller's exsting
seastar thread. This is an optimized version for callers which don't
need reentrancy and already have a thread.
Both, repair and streaming depend on view builder, but since the builder is started too late, both keep sharded<> reference on it and apply `if (view_builder.local_is_initialized())` safety checks.
However, view builder can do its sharded start much earlier, there's currently nothing that prevents it from doing so. This PR moves view builder start up together with some other of its dependencies, and relaxes the way repair and streaming use their view-builder references, in particular -- removes those ugly initialization checks.
refs: scylladb/scylladb#2737Closesscylladb/scylladb#22676
* github.com:scylladb/scylladb:
streaming: Relax streaming::make_streamig_consumer() view builder arg
streaming: Keep non-sharded view_builder dependency reference
streaming: Remove view_builder.local_is_initialized() checks
repair: Keep non-sharded view_builder dependency reference
repair: Remove view_builder.local_is_initialized() checks
main: Start sharded<view_builder> earlier
test/cql_env: Move stream manager start lower
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22857
The view_builder service is needed by repair service, but is started
after it. It's OK in a sense that repair service holds a sharded
reference on it and checks whether local_is_initialized() before using
it, which is not nice.
Fortunately, starting sharded view buidler can be done early enough,
because most of its dependencies would be already started by that time.
Two exceptions are -- view_update_generator and
system_distributed_keyspace. Both can be moved up too with the same
justification.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to keep it in-sync with main code, where stream manager is
started after storage_proxy's and query_processor's remotes. This
doesn't change nothing for now, but next patches will move other
services around main/cql_test_env and early start of stream manager in
cql_test_env will be problematic.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Renamed the aboved mentioned method to `increment_total_reclaimable_memory()`
as it doesn't directly reclaim memory anymore.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Use host_id in a children list of a task in task manager to indicate
a node on which the child was created.
Move TASKS_CHILDREN_REQUEST to IDL. Send it by host_id.
Fixes: https://github.com/scylladb/scylladb/issues/22284.
Ip to host_id transition; backport isn't needed.
Closesscylladb/scylladb#22487
* github.com:scylladb/scylladb:
tasks: drop task_manager::config::broadcast_address as it's unused
tasks: replace ip with host_id in task_identity
api: task_manager: pass gossiper to api::set_task_manager
tasks: keep host_id in task_manager
tasks: move tasks_get_children to IDL
This PR converts boost load balancer tests in preparation for load balancer changes
which add per-table tablet hints. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.
Topology is created in tests via group0 commands, which is abstracted by
the new `topology_builder` class.
Tests cannot modify token_metadata only in memory now as it needs to be
consistent with the schema and on-disk metadata. That's why modifications to
tablet metadata are now made under group0 guard and save back metadata to disk.
Closesscylladb/scylladb#22648
* github.com:scylladb/scylladb:
test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
tests: tablets: Set initial tablets to 1 to exit growing mode
test: tablets_test: Create proper schema in load balancer tests
test: lib: Introduce topology_builder
test: cql_test_env: Expose topology_state_machine
topology_state_machine: Introduce lock transition
Will be used by load balancer tests which need more than a single-node
topology, and which want to create proper schema in the database which
depends on that topology, in particular creating keyspaces with
replication factor > 1.
We need to do that because load balancer will use replication strategy
from the database as part of plan making.
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.
The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
- add `collection_constructor::style_type::vector`
- rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation
Co-authored-by: @janpiotrlakomy
Fixes https://github.com/scylladb/scylladb/issues/19455Closesscylladb/scylladb#22488
* github.com:scylladb/scylladb:
docs: add vector type documentation
cassandra_tests: translate tests covering the vector type
type_codec: add vector type encoding
boost/expr_test: add vector expression tests
expression: adjust collection constructor list style
expression: add vector style type
test/boost: add vector type cql_env boost tests
test/boost: add vector type_parser tests
type_parser: support vector type
cql3: add vector type syntax
types: implement vector_type_impl
This update introduces four types of credential providers:
1. Environment variables
2. Configuration file
3. AWS STS
4. EC2 Metadata service
The first two providers should only be used for testing and local runs. **They must NEVER be used in production.**
The last two providers are intended for use on real EC2 instances:
- **AWS STS**: Preferred method for obtaining temporary credentials using IAM roles.
- **EC2 Metadata Service**: Should be used as a last resort.
Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result.
fixes: #21828Closesscylladb/scylladb#21830
* github.com:scylladb/scylladb:
docs: update the `object_storage.md` and `admin.rst`
aws creds: add STS and Instance Metadata service credentials providers
aws creds: add env. and file credentials providers
s3 creds: move credentials out of endpoint config
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
Keep host_id of a node in task manager. If host_id wasn't resolved
yet, task manager will keep an empty id.
It's a preparation for the following changes.
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
Introduce `defer_verbose_shutdown` in `cql_test_env` which logs
a message before and after shutting down a service, distinguishing
between success and failure.
The function is similar to the one in `main` but skips special error
handling logic applicable only to the main Scylla binary. The purpose
of the `cql_test_env` version of this function is only more verbose
logging. If necessary it can be extended in the future with additional
logic.
I estimated the impact on the size of produced log files using
`cdc_test` as an example:
```
$ build/dev/test/boost/combined_tests --run_test=cdc_test -- --smp=2 \
>logfile 2>&1
$ du -b logfile
```
the result before this commit: 1964064 bytes, after: 2196432 bytes,
so estimated ~12% increase of log file size for boost tests that use
`cql_test_env`, assuming that the number of logs printed by each test is
similar to the logs printed by `cdc_test` (but I believe `cdc_test` is
one of the less verbose tests so this is an overestimate).
The motivation for this change is easier debugging of shutdown issues.
When investigating scylladb/scylladb#21983, where an exception is
thrown somewhere during the shutdown procedure, I found it hard to
pinpoint the service from which the exception originates. This change
will make it easier to debug issues like that by wrapping shutdown of
each service in a pair of messages logged when shutdown starts and when
it finishes (including when it fails). We should get more details on
this issue when it reproduces again in CI after this commit is merged
into `master`. (I failed to reproduce it locally with 1000 runs.)
Ref scylladb/scylladb#21983Closesscylladb/scylladb#22566
Like mentioned in the previous commit, this changes introduce usage
of vector style type and adjusts the functions using list style type
to distinguish vectors from lists.
Rename collection constructor style list to list_or_vector.
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.
In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.
Fixes#20322
* The test was flaky forever, so backport is required for all live versions.
Closesscylladb/scylladb#22064
* github.com:scylladb/scylladb:
tests: loading_cache_test: use manual_clock
utils: loading_cache: make clock_type a template parameter
test: loading_cache_test: use function-scope loader
test: loading_cache_test: simlute loader using sleep
test: lib: eventually: add sleep function param
test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
Relying on a real-time clock like lowres_clock
can be flaky (in particular in debug mode).
Use manual_clock instead to harden the test against
timing issues.
Fixes#20322
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
rather then macros.
This is a first cleanup step before adding a sleep function
parameter to support also manual_clock.
Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL,
respectively, to make an error more visible in the test log
since those entry points print the offending values
when not equal.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Introduces a comprehensive audit system to track database operations for security
and compliance purposes. This change includes:
Core Components:
- New audit subsystem for logging database operations
- Service level integration for proper resource management
- CQL statement tracking with operation categories
- Login process integration for tenant management
Key Features:
- Configurable audit logging (syslog/table)
- Operation categorization (QUERY/DML/DDL/DCL/AUTH/ADMIN)
- Selective auditing by keyspace/table
- Password sanitization in audit logs
- Service level shares support (1-1000) for workload prioritization
- Proper lifecycle management and cleanup
I ran the dtests for audit (manually enabled) and they pass.
The in-repo tests pass.
Notably, there should be no non-whitespace changes between this and scylla-enterprise
Fixesscylladb/scylla-enterprise#4999Closesscylladb/scylladb#22147
* github.com:scylladb/scylladb:
audit: Add shares support to service level management
audit: Add service level support to CQL login process
audit: Add support to CQL statements
audit: Integrate audit subsystem into Scylla main process
audit: Add documentation for the audit subsystem
audit: Add the audit subsystem
Now that all topology related code uses host ids there is not point to
maintain ip to id (and back) mappings in the token metadata. After the
patch the mapping will be maintained in the gossiper only. The rest of
the system will use host ids and in rare cases where translation is
needed (mostly for UX compatibility reasons) the translation will be
done using gossiper.
Fixes: scylladb/scylla#21777
* 'gleb/drop-ip-from-tm-v3' of github.com:scylladb/scylla-dev: (57 commits)
hint manager: do not translate ip to id in case hint manager is stopped already
locator: token_metadata: drop update_host_id() function that does nothing now
locator: topology: drop indexing by ips
repair: drop unneeded code
storage_service: use host_id to look for a node in on_alive handler
storage_proxy: translate ips to ids in forward array using gossiper
locator: topology: remove unused functions
storage_service: check for outdated ip in on_change notification in the peers table
storage_proxy: translate id to ip using address map in tablets's describe_ring code instead of taking one from the topology
topology coordinator: change connection dropping code to work on host ids
cql3: report host id instead of ip in error during SELECT FROM MUTATION_FRAGMENTS query
locator: drop unused function from tablet_effective_replication_map
api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead
locator: token_metadata: remove unused ip based functions
locator: network_topology_strategy: use host_id based function to check number of endpoints in dcs
gossiper: drop get_unreachable_token_owners functions
storage_service: use gossiper to map ip to id in node_ops operations
storage_service: fix indentation after the last patch
storage_service: drop loops from node ops replace_prepare handling since there can be only one replacing node
token_metadata: drop no longer used functions
...
We want to drop ips from token_metadata so move to use host id based
counterparts. Messaging service gets a function that maps from ips to id
when is starts listening.
This change integrates service level functionality into the CQL authentication and connection handling:
- Add scheduling_group_name to client_data to track service level assignments
- Extend SASL challenge interface to expose authenticated username
- Modify connection processing to support tenant switching:
- Add switch_tenant() method to handle scheduling group changes
- Add process_until_tenant_switch() to handle request processing boundaries
- Implement no_tenant() default executor
- Add execute_under_tenant_type for scheduling group management
- Update connection lifecycle to properly handle service level changes:
- Initialize connections with default scheduling group
- Support dynamic scheduling group updates when service levels change
- Ensure proper cleanup of scheduling group assignments
The changes enable proper scheduling group assignment and management based on
authenticated users' service levels, while maintaining backward compatibility
for connections without service level assignments.
This change introduces a new audit subsystem that allows tracking and logging of database operations for security and compliance purposes. Key features include:
- Configurable audit logging to either syslog or a dedicated system table (audit.audit_log)
- Selective auditing based on:
- Operation categories (QUERY, DML, DDL, DCL, AUTH, ADMIN)
- Specific keyspaces
- Specific tables
- New configuration options:
- audit: Controls audit destination (none/syslog/table)
- audit_categories: Comma-separated list of operation categories to audit
- audit_tables: Specific tables to audit
- audit_keyspaces: Specific keyspaces to audit
- audit_unix_socket_path: Path for syslog socket
- audit_syslog_write_buffer_size: Buffer size for syslog writes
The audit logs capture details including:
- Operation timestamp
- Node and client IP addresses
- Operation category and query
- Username
- Success/failure status
- Affected keyspace and table names
In this PR, we pair draining the view builder with its start.
To better understand what was done and why, let's first look at the
situation before this commit and the context of it:
(a) The following things happened in order:
1. The view builder would be constructed.
2. Right after that, a deferred lambda would be created to stop the
view builder during shutdown.
3. group0_service would be started.
4. A deferred lambda stopping group0_service would be created right
after that.
5. The view builder would be started.
(b) Because the view builder depends on group0_client, it couldn't be
started before starting group0_service. On the other hand, other
services depend on the view builder, e.g. the stream manager. That
makes changing the order of initialization a difficult problem,
so we want to avoid doing that unless we're sure it's the right
choice.
(c) Since the view builder uses group0_client, there was a possibility
of running into a segmentation fault issue in the following
scenario:
1. A call to `view_builder::mark_view_build_success()` is issued.
2. We stop group0_service.
3. `view_builder::mark_view_build_success()` calls
`announce_with_raft()`, which leads to a use-after-free because
group0_service has already been destroyed.
This very scenario took place in scylladb/scylladb#20772.
Initially, we decided to solve the issue by initializing
group0_service a bit earlier (scylladb/scylladb@7bad8378c7).
Unfortunately, it led to other issues described in scylladb/scylladb#21534,
so we revert that patch. These changes are the second attempt
to the problem where we want to solve it in a safer manner.
The solution we came up with is to pair the start of the view builder
with a deferred lambda that deinitializes it by calling
`view_builder::drain()`. No other component of the system should be
able to use the view builder anymore, so it's safe to do that.
Furthermore, that pairing makes the analysis of
initialization/deinitialization order much easier. We also solve the
aformentioned use-after-free issue because the view builder itself
will no longer attempt to use group0_client.
Note that we still pair a deferred lambda calling `view_builder::stop()`
with the construction of the view builder; that function will also call
`view_builder::drain()`. Another notable thing is `view_builder::drain()`
may be called earlier by `storage_service::do_drain()`. In other words,
these changes cover the situation when Scylla runs into a problem when
starting up.
Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too.
Fixesscylladb/scylladb#20772Fixesscylladb/scylladb#21534Closesscylladb/scylladb#21909
* github.com:scylladb/scylladb:
test/topology_custom: Add test for Scylla with disabled view building
main, view: Pair view builder drain with its start
Revert "main,cql_test_env: start group0_service before view_builder"
In these changes, we pair draining the view builder with its start.
To better understand what was done and why, let's first look at the
situation before this commit and the context of it:
(a) The following things happened in order:
1. The view builder would be constructed.
2. Right after that, a deferred lambda would be created to stop the
view builder during shutdown.
3. group0_service would be started.
4. A deferred lambda stopping group0_service would be created right
after that.
5. The view builder would be started.
(b) Because the view builder depends on group0_client, it couldn't be
started before starting group0_service. On the other hand, other
services depend on the view builder, e.g. the stream manager. That
makes changing the order of initialization a difficult problem,
so we want to avoid doing that unless we're sure it's the right
choice.
(c) Since the view builder uses group0_client, there was a possibility
of running into a segmentation fault issue in the following
scenario:
1. A call to `view_builder::mark_view_build_success()` is issued.
2. We stop group0_service.
3. `view_builder::mark_view_build_success()` calls
`announce_with_raft()`, which leads to a use-after-free because
group0_service has already been destroyed.
This very scenario took place in scylladb/scylladb#20772.
Initially, we decided to solve the issue by initializing
group0_service a bit earlier (scylladb/scylladb@7bad8378c7).
Unfortunately, it led to other issues described in scylladb/scylladb#21534.
We reverted that change in the previous commit. These changes are the
second attempt to the problem where we want to solve it in a safer manner.
The solution we came up with is to pair the start of the view builder
with a deferred lambda that deinitializes it by calling
`view_builder::drain()`. No other component of the system should be
able to use the view builder anymore, so it's safe to do that.
Furthermore, that pairing makes the analysis of
initialization/deinitialization order much easier. We also solve the
aformentioned use-after-free issue because the view builder itself
will no longer attempt to use group0_client.
Note that we still pair a deferred lambda calling `view_builder::stop()`
with the construction of the view builder; that function will also call
`view_builder::drain()`. Another notable thing is `view_builder::drain()`
may be called earlier by `storage_service::do_drain()`. In other words,
these changes cover the situation when Scylla runs into a problem when
starting up.
Fixesscylladb/scylladb#20772
The patch solved a problem related to an initialization order
(scylladb/scylladb#20772), but we ran into another one: scylladb/scylladb#21534.
After moving the initialization of group0_service, it ended up being destroyed
AFTER the CDC generation service would. Since CDC generations are accessed
in `storage_service::topology_state_load()`:
```
for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) {
rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id);
co_await _cdc_gens.local().handle_cdc_generation(gen_id);
```
we started getting the following failure:
```
Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed.
```
We're reverting the patch to go back to a more stable version of Scylla
and in the following commit, we'll solve the original issue in a more
systematic way.
This reverts commit 7bad8378c7.
Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631
EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data.
Introduces OpenSSL based file level encrypted storage, managed via a set of providers
ranging from local files to cloud KMS providers.
For a more comprehensive explanation, see the included docs (or if possible, original
source tree).
Manual bulk merge of EAR feature from enterprise repo to main scylla repo.
Breaks some features apart, but main EAR is still a humongous commit, because to separate this
I would have to mess with code incrementally, adding time and risk.
This PR includes the local file gen tool, tests and also p11 validation.
Note: CI will not execute the full tests unless master CI is set to provide the same environment
as the enterprise one. Not sure about the status of this ATM.
Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to
check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality
will be enabled, otherwise not.
Closesscylladb/scylladb#22233
* github.com:scylladb/scylladb:
docs: Add EAR docs
main/build: Add p11-kit and initialize
tools: Add local-file-key-generator tool
tests: Add EAR tests
tmpdir: shorten test tempdir path
EAR: port the ear feature from enterprise
cql_test_env: Add optional query timeout
schema/migration_manager: Add schema validate
sstables: add get_shared_components accessor
config/config_file: Add exports and definitions of config_type_for<>
Instantiated only on shard 0.
Currently, only subscribe from unit test
Manual unit test using loop mount was added.
Note that the test requires sudo access
and root access to /dev/loop, so it cannot
run in rootless podman instance, and it'd
fail with Permission denied.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#21523
now that we are allowed to use C++23. we now have the luxury of using
`std::views::reverse`.
- replace `boost::adaptors::transformed` with `std::views::transform`
- remove unused `#include <boost/range/adaptor/reversed.hpp>`
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve
performance and reduce dependency on Boost. `fmt::join()` allows direct
formatting of ranges and tuples with custom separators without creating
intermediate strings.
When formatting comma-separated values into another string, fmt::join()
avoids the overhead of temporary string creation that
`boost::algorithm::join()` requires. This change also helps streamline
our dependencies by leveraging the existing fmt library instead of
Boost.Algorithm.
To avoid the ambiguity, some caller sites were updated to call
`seastar::format()` explicitly.
See also
- boost::algorithm::join():
https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp
- fmt::join():
https://fmt.dev/11.0/api/#ranges-api
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22082