scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Benny Halevy	736f89b31a	tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}`. Refs scylladb/scylla-enterprise#4355 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `62aeba759b`)	2025-04-08 08:35:14 +03:00
Benny Halevy	a49e27ac8f	db/config: add tablets_mode_for_new_keyspaces option The new option deprecates the existing `enable_tablets` option. It will be extended in the next patch with a 3rd value: "enforced" while will enable tablets by default for new keyspace but without the posibility to opt out using the `tablets = {'enabled': false}` keyspace schema option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c62865df90`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-08 08:08:47 +03:00
Dawid Mędrek	c56e47f72f	db/hints: Cancel draining when stopping node Draining hints may occur in one of the two scenarios: * a node leaves the cluster and the local node drains all of the hints saved for that node, * the local node is being decommissioned. Draining may take some time and the hint manager won't stop until it finishes. It's not a problem when decommissioning a node, especially because we want the cluster to retain the data stored in the hints. However, it may become a problem when the local node started draining hints saved for another node and now it's being shut down. There are two reasons for that: * Generally, in situations like that, we'd like to be able to shut down nodes as fast as possible. The data stored in the hints won't disappear from the cluster yet since we can restart the local node. * Draining hints may introduce flakiness in tests. Replaying hints doesn't have the highest priority and it's reflected in the scheduling groups we use as well as the explicitly enforced throughput. If there are a large number of hints to be replayed, it might affect our tests. It's already happened, see: scylladb/scylladb#21949. To solve those problems, we change the semantics of draining. It will behave as before when the local node is being decommissioned. However, when the local node is only being stopped, we will immediately cancel all ongoing draining processes and stop the hint manager. To amend for that, when we start a node and it initializes a hint endpoint manager corresponding to a node that's already left the cluster, we will begin the draining process of that endpoint manager right away. That should ensure all data is retained, while possibly speeding up the shutdown process. There's a small trade-off to it, though. If we stop a node, we can then remove it. It won't have a chance to replay hints it might've before these changes, but that's an edge case. We expect this commit to bring more benefit than harm. We also provide tests verifying that the implementation works as intended. Fixes scylladb/scylladb#21949 Closes scylladb/scylladb#22811 (cherry picked from commit `0a6137218a`) Closes scylladb/scylladb#23370	2025-04-03 09:09:05 +02:00
Tomasz Grabiec	51ee15f02d	Merge '[Backport 2025.1] tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Fixes #23042 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config Closes scylladb/scylladb#23443 * github.com:scylladb/scylladb: Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config	2025-04-01 20:31:05 +02:00
Patryk Jędrzejczak	39c20144e5	Merge '[Backport 2025.1] raft topology: Add support for raft topology init to happen before group0 initialization' from Scylladb[bot] In the current scenario, the problem discovered is that there is a time gap between group0 creation and raft_initialize_discovery_leader call. Because of that, the group0 snapshot/apply entry enters wrong values from the disk(null) and updates the in-memory variables to wrong values. During the above time gap, the in-memory variables have wrong values and perform absurd actions. This PR removes the variable `_manage_topology_change_kind_from_group0` which was used earlier as a work around for correctly handling `topology_change_kind` variable, it was brittle and had some bugs (causing issues like scylladb/scylladb#21114). The reason for this bug that _manage_topology_change_kind used to block reading from disk and was enabled after group0 initialization and starting raft server for the restart case. Similarly, it was hard to manage `topology_change_kind` using `_manage_topology_change_kind_from_group0` correctly in bug free anner. Post `_manage_topology_change_kind_from_group0` removal, careful management of `topology_change_kind` variable was needed for maintaining correct `topology_change_kind` in all scenarios. So this PR also performs a refactoring to populate all init data to system tables even before group0 creation(via `raft_initialize_discovery_leader` function). Now because `raft_initialize_discovery_leader` happens before the group 0 creation, we write mutations directly to system tables instead of a group 0 command. Hence, post group0 creation, the node can read the correct values from system tables and correct values are maintained throughout. Added a new function `initialize_done_topology_upgrade_state` which takes care of updating the correct upgrade state to system tables before starting group0 server. This ensures that the node can read the correct values from system tables and correct values are maintained throughout. By moving `raft_initialize_discovery_leader` logic to happen before starting group0 server, and not as group0 command post server start, we also get rid of the potential problem of init group0 command not being the 1st command on the server. Hence ensuring full integrity as expected by programmer. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#21114 - (cherry picked from commit `4748125a48`) - (cherry picked from commit `e491950c47`) - (cherry picked from commit `623e01344b`) - (cherry picked from commit `d7884cf651`) Parent PR: #22484 Closes scylladb/scylladb#22966 * https://github.com/scylladb/scylladb: storage_service: Remove the variable _manage_topology_change_kind_from_group0 storage_service: fix indentation after the previous commit raft topology: Add support for raft topology system tables initialization to happen before group0 initialization service/raft: Refactor mutation writing helper functions.	2025-04-01 11:46:15 +02:00
Avi Kivity	cff90755d8	Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec Before this patch, the load balancer was equalizing tablet count per shard, so it achieved balance assuming that: 1) tablets have the same size 2) shards have the same capacity That can cause imbalance of utilization if shards have different capacity, which can happen in heterogeneous clusters with different instance types. One of the causes for capacity difference is that larger instances run with fewer shards due to vCPUs being dedicated to IRQ handling. This makes those shards have more disk capacity, and more CPU power. After this patch, the load balancer equalizes shard's storage utilization, so it no longer assumes that shards have the same capacity. It still assumes that each tablet has equal size. So it's a middle step towards full size-aware balancing. One consequence is that to be able to balance, the load balancer need to know about every node's capacity, which is collected with the same RPC which collects load_stats for average tablet size. This is not a significant set back because migrations cannot proceed anyway if nodes are down due to barriers. We could make intra-node migration scheduling work without capacity information, but it's pointless due to above, so not implemented. Also, per-shard goal for tablet count is still the same for all nodes in the cluster, so nodes with less capacity will be below limit and nodes with more capacity will be slightly above limit. This shouldn't be a significant problem in practice, we could compensate for this by increasing the limit. Refs #23042 Closes scylladb/scylladb#23079 * github.com:scylladb/scylladb: tablets: Make load balancing capacity-aware topology_coordinator: Fix confusing log message topology_coordinator: Refresh load stats after adding a new node topology_coordinator: Allow capacity stats to be refreshed with some nodes down topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places test: boost: tablets_test: Always provide capacity in load_stats test: perf_load_balancing: Set node capacity test: perf_load_balancing: Convert to topology_builder config, disk_space_monitor: Allow overriding capacity via config storage_service, tablets: Collect per-node capacity in load_stats (cherry picked from commit `b1d9f80d85`)	2025-03-25 23:16:35 +01:00
Dawid Mędrek	864528eb9b	db/config: Introduce RF-rack-valid keyspaces We introduce a new term in the glossary: RF-rack-valid keyspace. We also highlight in our user documentation that all keyspaces must remain RF-rack-valid throughout their lifetime, and failing to guarantee that may result in data inconsistencies or other issues. We base that information on our experience with materialized views in keyspaces using tablets, even though they remain an experimental feature. Along with the new term, we introduce a new configuration option called `rf_rack_valid_keyspaces`, which, when enabled, will enforce preserving all keyspaces RF-rack-valid. That functionality will be implemented in upcoming commits. For now, we materialize the restriction in form of a named requirement: a function verifying that the passed keyspace is RF-rack-valid. The option is disabled by default. That will change once we adjust the existing tests to the new semantics. Once that is done, the option will first be enabled by default, and then it will be removed. Fixes scylladb/scylladb#20356 (cherry picked from commit `32879ec0d5`)	2025-03-21 12:27:04 +00:00
Abhinav Jha	98977e9465	raft topology: Add support for raft topology system tables initialization to happen before group0 initialization In the current scenario, topology_change_kind variable, was been handled using _manage_topology_change_kind_from_group0 variable. This method was brittle and had some bugs(e.g. for restart case, it led to a time gap between group0 server start and topology_change_kind being managed via group0) Post _manage_topology_change_kind_from_group0 removal, careful management of topology_change_kind variable was needed for maintaining correct topology_change_kind in all scenarios. So this PR also performs a refactoring to populate all init data to system tables even before group0 creation(via raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader happens before the group 0 creation, we write mutations directly to system tables instead of a group 0 command. Hence, post group0 creation, the node can read the correct values from system tables and correct values are maintained throughout. Added a new function initialize_done_topology_upgrade_state which takes care of updating the correct upgrade state to system tables before starting group0 server. This ensures that the node can read the correct values from system tables and correct values are maintained throughout. By moving raft_initialize_discovery_leader logic to happen before starting group0 server, and not as group0 command post server start, we also get rid of the potential problem of init group0 command not being the 1st command on the server. Hence ensuring full integrity as expected by programmer. Fixes: scylladb/scylladb#21114 (cherry picked from commit `e491950c47`)	2025-02-20 21:21:31 +00:00
Pavel Emelyanov	aa5cb15166	Merge 'Alternator: implement UpdateTable operation to add or delete GSI' from Nadav Har'El In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is not the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes #11567. We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes #11471. This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. It should be backported to 2025.1, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1. Closes scylladb/scylladb#21989 * github.com:scylladb/scylladb: alternator: fix view build on oversized GSI key attribute mv: clean up do_delete_old_entry test/alternator: unflake test for IndexStatus test/alternator: work around unrelated bug causing test flakiness docs/alternator: adding a GSI is no longer an unimplemented feature test/alternator: remove xfail from all tests for issue 11567 alternator: overhaul implementation of GSIs and support UpdateTable mv: support regular_column_transformation key columns in view alternator: add new materialized-view computed column for item in map build: in cmake build, schema needs alternator build: build tests with Alternator alternator: add function serialized_value_if_type() mv: introduce regular_column_transformation, a new type of computed column alternator: add IndexStatus/Backfilling in DescribeTable alternator: add "LimitExceededException" error type docs/alternator: document two more unimplemented Alternator features (cherry picked from commit `529ff3efa5`) Closes scylladb/scylladb#22826	2025-02-18 19:05:21 +02:00
Botond Dénes	ff7e93ddd5	db/config: reader_concurrency_semaphore_cpu_concurrency: bump default to 2 This config item controls how many CPU-bound reads are allowed to run in parallel. The effective concurrency of a single CPU core is 1, so allowing more than one CPU-bound reads to run concurrently will just result in time-sharing and both reads having higher latency. However, restricting concurrency to 1 means that a CPU bound read that takes a lot of time to complete can block other quick reads while it is running. Increase this default setting to 2 as a compromise between not over-using time-sharing, while not allowing such slow reads to block the queue behind them. Fixes: #22450 Closes scylladb/scylladb#22679 (cherry picked from commit `3d12451d1f`) Closes scylladb/scylladb#22722	2025-02-13 09:40:25 +02:00
Michael Litvak	58eda6670f	view_builder: fix loop in view builder when tokens are moved The view builder builds a view by going over the entire token ring, consuming the base table partitions, and generating view updates for each partition. A view is considered as built when we complete a full cycle of the token ring. Suppose we start to build a view at a token F. We will consume all partitions with tokens starting at F until the maximum token, then go back to the minimum token and consume all partitions until F, and then we detect that we pass F and complete building the view. This happens in the view builder consumer in `check_for_built_views`. The problem is that we check if we pass the first token F with the condition `_step.current_token() >= it->first_token` whenever we consume a new partition or the current_token goes back to the minimum token. But suppose that we don't have any partitions with a token greater than or equal to the first token (this could happen if the partition with token F was moved to another node for example), then this condition will never be satisfied, and we don't detect correctly when we pass F. Instead, we go back to the minimum token, building the same token ranges again, in a possibly infinite loop. To fix this we add another step when reaching the end of the reader's stream. When this happens it means we don't have any more fragments to consume until the end of the range, so we advance the current_token to the end of the range, simulating a partition, and check for built views in that range. Fixes scylladb/scylladb#21829 Closes scylladb/scylladb#22493 (cherry picked from commit `6d34125eb7`) Closes scylladb/scylladb#22607	2025-02-02 22:29:52 +02:00
Asias He	4018dc7f0d	Introduce file stream for tablet File based stream is a new feature that optimizes tablet movement significantly. It streams the entire SSTable files without deserializing SSTable files into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network, and less CPU is consumed, especially for data models that contain small cells. The following patches are imported from the scylla enterprise: ) Merge 'Introduce file stream for tablet' from Asias He This patch uses Seastar RPC stream interface to stream sstable files on network for tablet migration. It streams sstables instead of mutation fragments. The file based stream has multiple advantages over the mutation streaming. - No serialization or deserialization for mutation fragments - No need to read and process each mutation fragments - On wire data is more compact and smaller In the test below, a significant speed up is observed. Two nodes, 1 shard per node, 1 initial_tablets: - Start node 1 - Insert 10M rows of data with c-s - Bootstrap node 2 Node 1 will migration data to node2 with the file stream. Test results: 1) File stream: bytes on wire = 1132006250 bytes, bw = 836MB/s [shard 0:stre] stream_blob - stream_sstables[eadaa8e0-a4f2-4cc6-bf10-39ad1ce106b0] Finished sending sstable_nr=2 files_nr=18 files={} range=(-1,9223372036854775807] bytes_sent=1132006250 stream_bw=836MB/s [shard 0:stre] storage_service - Streaming for tablet migration of a4f68900-568a-11ee-b7b9-c2b13945eed2:1 took 1.08004s seconds 2) Mutation stream: bytes on wire = 3030004736 bytes, bw = 125410.87 KiB/s = 128MB/s [shard 0:stre] stream_session - [Stream #406dc8b0-56b5-11ee-bc2d-000bf4871058] Streaming plan for Tablet migration-ks1-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=2958989 KiB, 125410.87 KiB/s [shard 0:stre] storage_service - Streaming for tablet migration of a4f68900-568a-11ee-b7b9-c2b13945eed2:1 took 23.5992s seconds Test Summary: File stream v.s. Mutation stream improvements - Stream bandwidth = 836 / 128 (MB/s) = 6.53X - Stream time = 23.60 / 1.08 (Seconds) = 21.85X - Stream bytes on wire = 3030004736 / 1132006250 (Bytes)= 2.67X Closes scylladb/scylla-enterprise#3438 github.com:scylladb/scylla-enterprise: tests: Add file_stream_test streaming: Implement file stream for tablet ) streaming: Use new take_storage_snapshot interface The new take_storage_snapshot returns a file object instead of a file name. This allows the file stream sender to read from the file even if the file is deleted by compaction. Closes scylladb/scylla-enterprise#3728 ) streaming: Protect unsupported file types for file stream Currently, we assume the file streamed over the stream_blob rpc verb is a sstable file. This patch rejects the unsupported file types on the receiver side. This allows us to stream more file types later using the current file stream infrastructure without worrying about old nodes processing the new file types in the wrong way. - The file_ops::noop is renamed to file_ops::stream_sstables to be explicit about the file types - A missing test_file_stream_error_injection is added to the idl Fixes: #3846 Tests: test_unsupported_file_ops Closes scylladb/scylla-enterprise#3847 ) idl: Add service::session_id id to idl It will be used in the next patch. Refs #3907 ) streaming: Protect file stream with topology_guard Similar to "storage_service, tablets: Use session to guard tablet streaming", this patch protects file stream with topology_guard. Fixes #3907 ) streaming: Take service topology_guard under the try block Taking the service::topology_guard could throw. Currently, it throws outside the try block, so the rpc sink will not be closed, causing the following assertion: ``` scylla: seastar/include/seastar/rpc/rpc_impl.hh:815: virtual seastar::rpc::sink_impl<netw::serializer, streaming::stream_blob_cmd_data>::~sink_impl() [Serializer = netw::serializer, Out = <streaming::stream_blob_cmd_data>]: Assertion `this->_con->get()->sink_closed()' failed. ``` To fix, move more code including the topology_guard taking code to the try block. Fixes https://github.com/scylladb/scylla-enterprise/issues/4106 Closes scylladb/scylla-enterprise#4110 ) Merge 'Preserve original SSTable state with file based tablet migration' from Raphael "Raph" Carvalho We're not preserving the SSTable state across file based migration, so staging SSTables for example are being placed into main directory, and consequently, we're mixing staging and non-staging data, losing the ability to continue from where the old replica left off. It's expected that the view update backlog is transferred from old into new replica, as migration doesn't wait for leaving replica to complete view update work (which can take long). Elasticity is preferred. So this fix guarantees that the state of the SSTable will be preserved by propagating it in form of subdirectory (each subdirectory is statically mapped with a particular state). The staging sstables aren't being registered into view update generator yet, as that's supposed to be fixed in OSS (more details can be found at https://github.com/scylladb/scylladb/issues/19149). Fixes #4265. Closes scylladb/scylla-enterprise#4267 * github.com:scylladb/scylla-enterprise: tablet: Preserve original SSTable state with file based tablet migration sstables: Add get method for sstable state ) sstable: (Re-)add shareabled_components getter ) Merge 'File streaming sstables: Use sstable source/sink to transfer snapshots' from Calle Wilund Fixes #4246 Alternative approach/better separation of concern, transport vs. sstable layer. Builds on #4472, but fancier. Ensures we transfer and pre-process scylla metadata for streamed file blobs first, then properly apply receiving nodes local config by using a source and sink layer exported from sstables, which handles things like ordering, metadata filtering (on source) as well as handling metadata and proper IO paths when writing data on receiver node (sink). This implementation maintains the statelessness of the current design, and the delegated sink side will re-read and re-write the metadata for each component processed. This is a little wasteful, but the meta is small, and it is less error prone than trying to do caching cross-shards etc. The transport is isolated from the knowledge. This is an alternative/complement to #4436 and #4472, fixing the underlying issue. Note that while the layers/API:s here allows easy fixing of other fundamental problems in the feature (such as destination location etc), these are not included in the PR, to keep it as close to the current behaviour as possible. Closes scylladb/scylla-enterprise#4646 * github.com:scylladb/scylla-enterprise: raft_tests: Copy/add a topology test with encryption file streaming: Use sstable source/sink to transfer snapshots sstables: Add source and sink objects + producers for transfering a snapshot sstable::types: Add remove accessor for extension info in metadata ) The change for error injection in merge commit 966ea5955dd8760: File streaming now has "stream_mutation_fragments" error injection points so test_table_dropped_during_streaming works with file streaming. ) doc: document file-based streaming This commit adds a description of the file-based streaming feature to the documentation. It will be displayed in the docs using the scylladb_include_flag directive after https://github.com/scylladb/scylladb/pull/20182 is merged, backported to branch-6.0, and, in turn, branch-2024.2. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 Refs https://github.com/scylladb/scylla-enterprise/issues/4254 Closes scylladb/scylla-enterprise#4587 ) doc: move File-based streaming to the Tablets source file-based-streaming This commit moves the description of file-based streaming from a common include file to the regular doc source file where tablets are described. Closes scylladb/scylla-enterprise#4652 ) streaming: sstable_stream_sink_impl: abort: prevent null pointer dereference Closes scylladb/scylladb#22467	2025-01-26 12:51:59 +02:00
Avi Kivity	59d3a66d18	Revert "Introduce file stream for tablet" This reverts commit `8208688178`. It was contributed from enterprise, but is too different from the original for me to merge back.	2025-01-22 09:42:20 +02:00
Nadav Har'El	3e16b80014	Merge 'Reject create table with compact storage' from Benny Halevy As discussed in https://github.com/scylladb/scylladb/issues/12263#issuecomment-1853576813, compact storage tables are deprecated. Yet, there's is nothing in the code that prevents users from creating such tables. This patch adds a live-updateable config option: `enable_create_table_with_compact_storage`, set to `false` by default, that require users to opt-in in order to create new tables WITH COMPACT STORAGE. Refs scylladb/scylladb#12263, scylladb/scylladb#16375 * Since this guardrail is an enhancement, no backport is needed Closes scylladb/scylladb#16403 * github.com:scylladb/scylladb: docs: ddl: document the deprecation of compact tables test: enable_create_table_with_compact_storage for tests that need it config: add enable_create_table_with_compact_storage	2025-01-20 22:02:02 +02:00
Benny Halevy	88ae067ddb	everywhere: add skeletal support for the in_memory_tables feature Forward-ported from scylla-enterprise. Note that the feature has been deprecated and the implementation is provided only for backward compatibility with pre-existing features and schema. Tested manually after adding the following to feature_service: ``` gms::feature workload_prioritization { *this, "WORKLOAD_PRIORITIZATION"sv }; ``` Launched a single-node cluster running 2023.1.10 ``` cqlsh> create KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> create TABLE ks.test ( pk int PRIMARY KEY, val int ) WITH compaction = {'class': 'InMemoryCompactionStrategy'}; ``` log: ``` Scylla version 2023.1.10-0.20241227.21cffccc1ccd with build-id bd65b8399cb13b713a87e57fe333cfcabfd50be7 starting ... ... INFO 2024-12-27 19:45:16,563 [shard 0] migration_manager - Create new ColumnFamily: org.apache.cassandra.config.CFMetaData@0x600000f1b400[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName=ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,readRepairChance=0,dcLocalReadRepairChance=0,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,keyValidator=org.apache.cassandra.db.marshal.Int32Type,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,in_memory=false,version=5529c631-c47a-11ef-bd1d-4295734ce5a8,droppedColumns={},collections={},indices={}] INFO 2024-12-27 19:45:16,564 [shard 0] schema_tables - Creating ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440 ``` Upgraded to this branch and started scylla. Verified that ks.test was successfuly loaded: log: ``` INFO 2024-12-27 19:48:58,115 [shard 0:main] init - Scylla version 6.3.0~dev-0.20241227.a64c6dfc153e with build-id f9496134a09cf2e55d3865b9e9ff499f672aa7da starting ... ... WARN 2024-12-27 19:53:02,948 [shard 1:main] CompactionStrategy - InMemoryCompactionStrategy is no longer supported. Defaulting to NullCompactionStrategy. ... INFO 2024-12-27 19:53:02,948 [shard 0:main] database - Keyspace ks: Reading CF test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440 storage=/home/bhalevy/scylladb/data/ks/test-5529c630c47a11efbd1d4295734ce5a8 ``` Then, tested: ``` cqlsh> describe KEYSPACE ks; CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false}; CREATE TABLE ks.test ( pk int, val int, PRIMARY KEY (pk) ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'InMemoryCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE'; cqlsh> alter TABLE ks.test with compaction = {'class': 'SizeTieredCompactionStrategy'}; cqlsh> describe KEYSPACE ks; CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false}; CREATE TABLE ks.test ( pk int, val int, PRIMARY KEY (pk) ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE' AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'}; ``` log: ``` INFO 2024-12-27 19:56:40,465 [shard 0:stmt] migration_manager - Update table 'ks.test' From org.apache.cassandra.config.CFMetaData@0x60000362d800[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ec88d510-6aff-344a-914d-541d37081440,droppedColumns={},collections={},indices={}] To org.apache.cassandra.config.CFMetaData@0x60000336e000[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ecccf010-c47b-11ef-b52c-622f2f0e87c4,droppedColumns={},collections={},indices={}] INFO 2024-12-27 19:56:40,466 [shard 0: gms] schema_tables - Altering ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ecccf010-c47b-11ef-b52c-622f2f0e87c4 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#22068	2025-01-20 16:55:17 +02:00
Asias He	8208688178	Introduce file stream for tablet File based stream is a new feature that optimizes tablet movement significantly. It streams the entire SSTable files without deserializing SSTable files into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network, and less CPU is consumed, especially for data models that contain small cells. The following patches are imported from the scylla enterprise: ) Merge 'Introduce file stream for tablet' from Asias He This patch uses Seastar RPC stream interface to stream sstable files on network for tablet migration. It streams sstables instead of mutation fragments. The file based stream has multiple advantages over the mutation streaming. - No serialization or deserialization for mutation fragments - No need to read and process each mutation fragments - On wire data is more compact and smaller In the test below, a significant speed up is observed. Two nodes, 1 shard per node, 1 initial_tablets: - Start node 1 - Insert 10M rows of data with c-s - Bootstrap node 2 Node 1 will migration data to node2 with the file stream. Test results: 1) File stream: bytes on wire = 1132006250 bytes, bw = 836MB/s [shard 0:stre] stream_blob - stream_sstables[eadaa8e0-a4f2-4cc6-bf10-39ad1ce106b0] Finished sending sstable_nr=2 files_nr=18 files={} range=(-1,9223372036854775807] bytes_sent=1132006250 stream_bw=836MB/s [shard 0:stre] storage_service - Streaming for tablet migration of a4f68900-568a-11ee-b7b9-c2b13945eed2:1 took 1.08004s seconds 2) Mutation stream: bytes on wire = 3030004736 bytes, bw = 125410.87 KiB/s = 128MB/s [shard 0:stre] stream_session - [Stream #406dc8b0-56b5-11ee-bc2d-000bf4871058] Streaming plan for Tablet migration-ks1-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=2958989 KiB, 125410.87 KiB/s [shard 0:stre] storage_service - Streaming for tablet migration of a4f68900-568a-11ee-b7b9-c2b13945eed2:1 took 23.5992s seconds Test Summary: File stream v.s. Mutation stream improvements - Stream bandwidth = 836 / 128 (MB/s) = 6.53X - Stream time = 23.60 / 1.08 (Seconds) = 21.85X - Stream bytes on wire = 3030004736 / 1132006250 (Bytes)= 2.67X Closes scylladb/scylla-enterprise#3438 github.com:scylladb/scylla-enterprise: tests: Add file_stream_test streaming: Implement file stream for tablet ) streaming: Use new take_storage_snapshot interface The new take_storage_snapshot returns a file object instead of a file name. This allows the file stream sender to read from the file even if the file is deleted by compaction. Closes scylladb/scylla-enterprise#3728 ) streaming: Protect unsupported file types for file stream Currently, we assume the file streamed over the stream_blob rpc verb is a sstable file. This patch rejects the unsupported file types on the receiver side. This allows us to stream more file types later using the current file stream infrastructure without worrying about old nodes processing the new file types in the wrong way. - The file_ops::noop is renamed to file_ops::stream_sstables to be explicit about the file types - A missing test_file_stream_error_injection is added to the idl Fixes: #3846 Tests: test_unsupported_file_ops Closes scylladb/scylla-enterprise#3847 ) idl: Add service::session_id id to idl It will be used in the next patch. Refs #3907 ) streaming: Protect file stream with topology_guard Similar to "storage_service, tablets: Use session to guard tablet streaming", this patch protects file stream with topology_guard. Fixes #3907 ) streaming: Take service topology_guard under the try block Taking the service::topology_guard could throw. Currently, it throws outside the try block, so the rpc sink will not be closed, causing the following assertion: ``` scylla: seastar/include/seastar/rpc/rpc_impl.hh:815: virtual seastar::rpc::sink_impl<netw::serializer, streaming::stream_blob_cmd_data>::~sink_impl() [Serializer = netw::serializer, Out = <streaming::stream_blob_cmd_data>]: Assertion `this->_con->get()->sink_closed()' failed. ``` To fix, move more code including the topology_guard taking code to the try block. Fixes https://github.com/scylladb/scylla-enterprise/issues/4106 Closes scylladb/scylla-enterprise#4110 ) Merge 'Preserve original SSTable state with file based tablet migration' from Raphael "Raph" Carvalho We're not preserving the SSTable state across file based migration, so staging SSTables for example are being placed into main directory, and consequently, we're mixing staging and non-staging data, losing the ability to continue from where the old replica left off. It's expected that the view update backlog is transferred from old into new replica, as migration doesn't wait for leaving replica to complete view update work (which can take long). Elasticity is preferred. So this fix guarantees that the state of the SSTable will be preserved by propagating it in form of subdirectory (each subdirectory is statically mapped with a particular state). The staging sstables aren't being registered into view update generator yet, as that's supposed to be fixed in OSS (more details can be found at https://github.com/scylladb/scylladb/issues/19149). Fixes #4265. Closes scylladb/scylla-enterprise#4267 * github.com:scylladb/scylla-enterprise: tablet: Preserve original SSTable state with file based tablet migration sstables: Add get method for sstable state ) sstable: (Re-)add shareabled_components getter ) Merge 'File streaming sstables: Use sstable source/sink to transfer snapshots' from Calle Wilund Fixes #4246 Alternative approach/better separation of concern, transport vs. sstable layer. Builds on #4472, but fancier. Ensures we transfer and pre-process scylla metadata for streamed file blobs first, then properly apply receiving nodes local config by using a source and sink layer exported from sstables, which handles things like ordering, metadata filtering (on source) as well as handling metadata and proper IO paths when writing data on receiver node (sink). This implementation maintains the statelessness of the current design, and the delegated sink side will re-read and re-write the metadata for each component processed. This is a little wasteful, but the meta is small, and it is less error prone than trying to do caching cross-shards etc. The transport is isolated from the knowledge. This is an alternative/complement to #4436 and #4472, fixing the underlying issue. Note that while the layers/API:s here allows easy fixing of other fundamental problems in the feature (such as destination location etc), these are not included in the PR, to keep it as close to the current behaviour as possible. Closes scylladb/scylla-enterprise#4646 * github.com:scylladb/scylla-enterprise: raft_tests: Copy/add a topology test with encryption file streaming: Use sstable source/sink to transfer snapshots sstables: Add source and sink objects + producers for transfering a snapshot sstable::types: Add remove accessor for extension info in metadata ) The change for error injection in merge commit 966ea5955dd8760: File streaming now has "stream_mutation_fragments" error injection points so test_table_dropped_during_streaming works with file streaming. ) doc: document file-based streaming This commit adds a description of the file-based streaming feature to the documentation. It will be displayed in the docs using the scylladb_include_flag directive after https://github.com/scylladb/scylladb/pull/20182 is merged, backported to branch-6.0, and, in turn, branch-2024.2. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 Refs https://github.com/scylladb/scylla-enterprise/issues/4254 Closes scylladb/scylla-enterprise#4587 ) doc: move File-based streaming to the Tablets source file-based-streaming This commit moves the description of file-based streaming from a common include file to the regular doc source file where tablets are described. Closes scylladb/scylla-enterprise#4652 ) streaming: sstable_stream_sink_impl: abort: prevent null pointer dereference Closes scylladb/scylladb#22034	2025-01-20 16:43:21 +02:00
Benny Halevy	f3ab00e61c	test: enable_create_table_with_compact_storage for tests that need it Now enable_create_table_with_compact_storage can be set to `false` by default in db/config. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-01-20 08:14:37 +02:00
Benny Halevy	0110eb0506	config: add enable_create_table_with_compact_storage As discussed in https://github.com/scylladb/scylladb/issues/12263#issuecomment-1853576813, compact storage tables are deprecated. Yet, there's is nothing in the code that prevents users from creating such tables. This patch adds a live-updateable config option: `enable_create_table_with_compact_storage` that require users to opt-in in order to create new tables WITH COMPACT STORAGE. The option is currently set to `true` by default in db/config to reduce the churn to tests and to `false` in scylla.yaml, for new clusters. TODO: once regressions tests that use compact storage are converted to enable the option, change the default in db/config to false. A unit test was added to test/cql-pytest that checks that the respective cql query fails as expected with the default option or when it is explicitly set to `false`, and that the query succeeds when the option is set to `true`. Note that `check_restricted_table_properties` already returns an optional warning, but it is only logged but not returned in the `prepared_statement`. Fixing that is out of the scope of this patch. See https://github.com/scylladb/scylladb/issues/20945 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-01-20 08:03:25 +02:00
Piotr Dulikowski	6aa962f5f4	Merge 'Add audit subsystem for database operations' from Paweł Zakrzewski Introduces a comprehensive audit system to track database operations for security and compliance purposes. This change includes: Core Components: - New audit subsystem for logging database operations - Service level integration for proper resource management - CQL statement tracking with operation categories - Login process integration for tenant management Key Features: - Configurable audit logging (syslog/table) - Operation categorization (QUERY/DML/DDL/DCL/AUTH/ADMIN) - Selective auditing by keyspace/table - Password sanitization in audit logs - Service level shares support (1-1000) for workload prioritization - Proper lifecycle management and cleanup I ran the dtests for audit (manually enabled) and they pass. The in-repo tests pass. Notably, there should be no non-whitespace changes between this and scylla-enterprise Fixes scylladb/scylla-enterprise#4999 Closes scylladb/scylladb#22147 * github.com:scylladb/scylladb: audit: Add shares support to service level management audit: Add service level support to CQL login process audit: Add support to CQL statements audit: Integrate audit subsystem into Scylla main process audit: Add documentation for the audit subsystem audit: Add the audit subsystem	2025-01-17 13:14:55 +01:00
Kamil Braun	89ee2a6834	Merge 'drop ip addresses from token metadata' from Gleb Now that all topology related code uses host ids there is not point to maintain ip to id (and back) mappings in the token metadata. After the patch the mapping will be maintained in the gossiper only. The rest of the system will use host ids and in rare cases where translation is needed (mostly for UX compatibility reasons) the translation will be done using gossiper. Fixes: scylladb/scylla#21777 * 'gleb/drop-ip-from-tm-v3' of github.com:scylladb/scylla-dev: (57 commits) hint manager: do not translate ip to id in case hint manager is stopped already locator: token_metadata: drop update_host_id() function that does nothing now locator: topology: drop indexing by ips repair: drop unneeded code storage_service: use host_id to look for a node in on_alive handler storage_proxy: translate ips to ids in forward array using gossiper locator: topology: remove unused functions storage_service: check for outdated ip in on_change notification in the peers table storage_proxy: translate id to ip using address map in tablets's describe_ring code instead of taking one from the topology topology coordinator: change connection dropping code to work on host ids cql3: report host id instead of ip in error during SELECT FROM MUTATION_FRAGMENTS query locator: drop unused function from tablet_effective_replication_map api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead locator: token_metadata: remove unused ip based functions locator: network_topology_strategy: use host_id based function to check number of endpoints in dcs gossiper: drop get_unreachable_token_owners functions storage_service: use gossiper to map ip to id in node_ops operations storage_service: fix indentation after the last patch storage_service: drop loops from node ops replace_prepare handling since there can be only one replacing node token_metadata: drop no longer used functions ...	2025-01-17 11:00:52 +01:00
Avi Kivity	d6f7f873d0	utils: config_file: don't use extern fully specialized variable templates Declaring-but-not-defining a fully specialized template is a great way to cut dependencies between users and providers, but unfortunately not supported for variable templates. Clang 18 does support it, but apparently it is a misinterpretation of the standard, and was removed in clang 19. We started using this non-feature in `7ed89266b3`. The fix is to use function templates. This is more verbose as each specialization needs to define a static variable to return, but is fully supported. Closes scylladb/scylladb#22299	2025-01-17 11:06:50 +03:00
Gleb Natapov	a40e810442	hint manager: do not translate ip to id in case hint manager is stopped already Since we do not stop storage proxy on shutdown this code can be called during shutdown when address map is no longer usable.	2025-01-16 16:37:08 +02:00
Gleb Natapov	122d58b4ad	api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead	2025-01-16 16:37:07 +02:00
Gleb Natapov	7c4c485651	host_id_or_endpoint: use gossiper to resolve ip to id and back mappings host_id_or_endpoint is a helper class that hold either id or ip and translate one into another on demand. Use gossiper to do a translation there instead of token_metadata since we want to drop ip based APIs from the later.	2025-01-16 16:37:07 +02:00
Gleb Natapov	ae8dc595e1	hints: move id to ip translation into store_hint() function Also use gossiper to translate instead of token_metadata since we want to get rid of ip base APIs there.	2025-01-16 16:37:06 +02:00
Gleb Natapov	8a0fea5fef	locator: topology: drop is_me ip overload along with remaning users	2025-01-16 16:37:06 +02:00
Gleb Natapov	8433947932	locator: topology: remove get_location overload that works on ip and its last users	2025-01-16 16:37:06 +02:00
Gleb Natapov	315db647dd	consistency_level: drop templates since the same types of ranges are used by all the callers	2025-01-16 16:37:06 +02:00
Gleb Natapov	8c85350d4b	db/virtual_tables: use host id from the gossiper endpoint state in cluster_status table The state always has host id now, so there is no point to looks it up in the token metadata.	2025-01-15 16:30:28 +02:00
Gleb Natapov	844cb090bf	view: do not use get_endpoint_for_host_id_if_known to check if a node is part of the topology Check directly in the topology instead.	2025-01-15 16:30:28 +02:00
Gleb Natapov	f685c7d0af	hints: use gossiper to map ip to id in wait_for_sync_point We want to drop ips from token_metadata so move to different API to map ip to id.	2025-01-15 16:30:28 +02:00
Gleb Natapov	4d7c05ad82	hints: move create_hint_sync_point function to host ids One of its caller is in the RESTful API which gets ips from the user, so we convert ips to ids inside the API handler using gossiper before calling the function. We need to deprecate ip based API and move to host id based.	2025-01-15 16:30:28 +02:00
Gleb Natapov	0d4d066fe3	hints: simplify can_send() function Since there is gossiper::is_alive version that works on host_id now there is no need to convert _ep_key to ip which simplifies the code a lot.	2025-01-15 16:30:28 +02:00
Botond Dénes	25fbe488ef	Merge 'view_builder: write status to tables before starting to build' from Michael Litvak When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes #20638 The PR contains few additional small fixes in separate commits related to the view build status table. It addresses flakiness issues in tests that use the view build status table to determine when view building is complete. The table may be in incorrect state due to these issues, having a row with status STARTED when it actually finished building the view, which will cause us to wait in `wait_for_view` until it timeouts. For testing I used a test similar to `test_view_build_status_with_replace_node`, but it only creates the views and calls `wait_for_view`. Without these commits it failed in 4/1024 runs, and with the commits it passed 2048/2048. backport to fix the bugs that affects previous versions and improve CI stability Closes scylladb/scylladb#22307 * github.com:scylladb/scylladb: view_builder: hold semaphore during entire startup view_builder: pass view name by value to write_view_build_status view_builder: write status to tables before starting to build	2025-01-15 15:01:17 +02:00
Paweł Zakrzewski	384641194a	audit: Add the audit subsystem This change introduces a new audit subsystem that allows tracking and logging of database operations for security and compliance purposes. Key features include: - Configurable audit logging to either syslog or a dedicated system table (audit.audit_log) - Selective auditing based on: - Operation categories (QUERY, DML, DDL, DCL, AUTH, ADMIN) - Specific keyspaces - Specific tables - New configuration options: - audit: Controls audit destination (none/syslog/table) - audit_categories: Comma-separated list of operation categories to audit - audit_tables: Specific tables to audit - audit_keyspaces: Specific keyspaces to audit - audit_unix_socket_path: Path for syslog socket - audit_syslog_write_buffer_size: Buffer size for syslog writes The audit logs capture details including: - Operation timestamp - Node and client IP addresses - Operation category and query - Username - Success/failure status - Affected keyspace and table names	2025-01-15 11:10:35 +01:00
Kefu Chai	7215d4bfe9	utils: do not include unused headers these unused includes were identifier by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. please note, because quite a few source files relied on `utils/to_string.hh` to pull in the specialization of `fmt::formatter<std::optional<T>>`, after removing `#include <fmt/std.h>` from `utils/to_string.hh`, we have to include `fmt/std.h` directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-14 07:56:39 -05:00
Michael Litvak	7a6aec1a6c	view_builder: hold semaphore during entire startup Guard the whole view builder startup routine by holding the semaphore until it's done instead of releasing it early, so that it's not intercepted by migration notifications.	2025-01-14 12:31:29 +02:00
Michael Litvak	1104411f83	view_builder: pass view name by value to write_view_build_status The function write_view_build_status takes two lambda functions and chooses which of them to run depending on the upgrade state. It might run both of them. The parameters ks_name and view_name should be passed by value instead of by reference because they are moved inside each lambda function. Otherwise, if both lambdas are run, the second call operates on invalid values that were moved.	2025-01-14 12:31:29 +02:00
Michael Litvak	b1be2d3c41	view_builder: write status to tables before starting to build When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes scylladb/scylladb#20638	2025-01-14 12:31:20 +02:00
Avi Kivity	814942505f	Merge 'Introduce Encryption-at-Rest (EAR) for sstables and commitlog' from Calle Wilund Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631 EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data. Introduces OpenSSL based file level encrypted storage, managed via a set of providers ranging from local files to cloud KMS providers. For a more comprehensive explanation, see the included docs (or if possible, original source tree). Manual bulk merge of EAR feature from enterprise repo to main scylla repo. Breaks some features apart, but main EAR is still a humongous commit, because to separate this I would have to mess with code incrementally, adding time and risk. This PR includes the local file gen tool, tests and also p11 validation. Note: CI will not execute the full tests unless master CI is set to provide the same environment as the enterprise one. Not sure about the status of this ATM. Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality will be enabled, otherwise not. Closes scylladb/scylladb#22233 * github.com:scylladb/scylladb: docs: Add EAR docs main/build: Add p11-kit and initialize tools: Add local-file-key-generator tool tests: Add EAR tests tmpdir: shorten test tempdir path EAR: port the ear feature from enterprise cql_test_env: Add optional query timeout schema/migration_manager: Add schema validate sstables: add get_shared_components accessor config/config_file: Add exports and definitions of config_type_for<>	2025-01-12 16:10:46 +02:00
Benny Halevy	8d2ff8a915	utils: add disk_space_monitor Instantiated only on shard 0. Currently, only subscribe from unit test Manual unit test using loop mount was added. Note that the test requires sudo access and root access to /dev/loop, so it cannot run in rootless podman instance, and it'd fail with Permission denied. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21523	2025-01-12 14:51:15 +02:00
Piotr Smaron	288f9b2b15	Introduce LDAP role manager & saslauthd authenticator This PR extends authentication with 2 mechanisms: - a new role_manager subclass, which allows managing users via LDAP server, - a new authenticator, which delegates plaintext authentication to a running saslauthd daemon. The features have been ported from the enterprise repository with their test.py tests and the documentation as part of changing license to source available. Fixes: scylladb/scylla-enterprise#5000 Fixes: scylladb/scylla-enterprise#5001 Closes scylladb/scylladb#22030	2025-01-12 14:50:29 +02:00
Michael Litvak	2a8ff478f0	view_builder: register listener for new views before reading views When starting the view builder, we find all existing views in `calculate_shard_build_step` and then register a listener for new views. Between these steps we may yield and create a new view, then we miss initializing the view build step for the new view, and we won't start building it. To fix this we first register the listener and then read existing views, so a view can't be missed. Fixes scylladb/scylladb#20338 Closes scylladb/scylladb#22184	2025-01-09 13:18:28 +02:00
Calle Wilund	7ed89266b3	config/config_file: Add exports and definitions of config_type_for<> Required for implementors. Other than config.cc.	2025-01-08 12:50:03 +00:00
Michał Chojnowski	9f639b176f	db/config: increase the default value of internode_compression_zstd_min_message_size from 0 to 1024 Usually, the smaller the messsage, the higher the CPU cost per each network byte saved by compression, so it often makes sense to reserve heavier compression for bigger messages (where it can make the biggest impact for a given CPU budget) and use ligher compression for smaller messages. There is a knob -- internode_compression_zstd_min_message_size -- which excludes RPC messages below certain size from being compressed with zstd. We arbitrarily set its default to 0 bytes before. Now we want to arbitrarily set it to 1024 bytes. This is based purely on intuition and isn't backed by any solid data. Fixes scylladb/scylla-enterprise#4731 Closes scylladb/scylla-enterprise#4990 Closes scylladb/scylladb#22204	2025-01-07 18:14:01 +02:00
Wojciech Mitros	d04f376227	mv: add an experimental feature for creating views using tablets We still have a number of issues to be solved for views with tablets. Until they are fixed, we should prevent users from creating them, and use the vnode-based views instead. This patch prepares the feature for enabling views with tablets. The feature is disabled by default, but currently it has no effect. After all tests are adjusted to use the feature, we should depend on the feature for deciding whether we can create materialized views in tablet-enabled keyspaces. The unit tests are adjusted to enable this feature explicitly, and it's also added to the scylla sstable tool config - this tool treats all tables as if they were tablet-based (surprisingly, with SimpleStrategy), so for it to work on views, the new feature must be enabled. Refs scylladb/scylladb#21832 Closes scylladb/scylladb#21833	2025-01-07 15:52:36 +01:00
Asias He	d719f423e5	config: Enable enable_small_table_optimization_for_rbno by default Since the problematic dtests are with the enable_small_table_optimization_for_rbno turn off now, we can enable the flag by default. https://github.com/scylladb/scylla-dtest/pull/5383 Refs: #19131 Closes scylladb/scylladb#21861	2025-01-07 16:20:36 +02:00
Michael Litvak	0617564123	db/commitlog: make the commit log hard limit mandatory mark the config parameter --commitlog-use-hard-size-limit as deprecated so the default 'true' is always used, making the hard limit mandatory. Fixes scylladb/scylladb#16471 Closes scylladb/scylladb#21804	2025-01-07 15:03:56 +02:00
Botond Dénes	b3f8c4faa7	Merge 'node_ops: filter topology_requests entries shown by node_ops_virtual_task' from Aleksandra Martyniuk node_ops_virtual_task does not filter the entries of system.topology_request and so it creates statuses of operations that aren't node ops. Filter the entries used by node_ops_virtual_task. With this change, the status of a bootstrap of the first node will not be visible. Fixes: https://github.com/scylladb/scylladb/issues/22008. Needs backport to 6.2 that introduced node_ops_virtual_task Closes scylladb/scylladb#22009 * github.com:scylladb/scylladb: test: truncate the table before node ops task checks node_ops: rename a method that get node ops entries node_ops: filter topology_requests entries	2025-01-07 14:17:01 +02:00
Kefu Chai	353b522ca0	treewide: migrate from boost::adaptors::reversed to std::views::reverse now that we are allowed to use C++23. we now have the luxury of using `std::views::reverse`. - replace `boost::adaptors::transformed` with `std::views::transform` - remove unused `#include <boost/range/adaptor/reversed.hpp>` this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-07 13:22:00 +02:00

1 2 3 4 5 ...

4125 Commits