scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Anna Stuchlik	d303edbc39	doc: remove copyright from Cassandra Stress This commit removes the Apache copyright note from the Cassandra Stress page. It's a follow up to https://github.com/scylladb/scylladb/pull/21723, which missed that update (see https://github.com/scylladb/scylladb/pull/21723#discussion_r1944357143). Cassandra Stress is a separate tool with separate repo with the docs, so the copyright information on the page is incorrect. Fixes https://github.com/scylladb/scylladb/issues/23240 Closes scylladb/scylladb#24219	2025-05-26 09:35:30 +02:00
Pavel Emelyanov	c0796244bb	nodetool: Add refresh --skip-cleanup option The option "conflicts" with load-and-stream. Tests and doc included. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-05-13 19:07:38 +03:00
Kefu Chai	46f7ff6cfc	docs: nodetool: reference "nodetool task" page * Rewrite the documentation for the "nodetool restore" command. * Clarify the relationship between the `--nowait` flag and asynchronous operation. * Reference the "nodetool task" page for managing background tasks. Fixes scylladb#21888 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22023	2025-05-12 15:37:22 +03:00
Pavel Emelyanov	eb5b52f598	Merge 'main: make DC and rack immutable after bootstrap' from Piotr Dulikowski Changing DC or rack on a node which was already bootstrapped is, in case of vnodes, very unsafe (almost guaranteed to cause data loss or unavailability), and is outright not supported if the cluster has a tablet-backed keyspaces. Moreover, the possibility of doing that makes it impossible to uphold some of the invariants promised by the RF-rack-valid flag, which is eventually going to become unconditionally enabled. Get rid of the above problems by removing the possibility of changing the DC / rack of a node. A node will now fail to start if its snitch reports a different DC or rack than the one that was reported during the first boot. Fixes: scylladb/scylladb#23278 Fixes: scylladb/scylladb#22869 Marking for backport to 2025.1, as this is a necessary part of the RF-rack-valid saga Closes scylladb/scylladb#23800 * github.com:scylladb/scylladb: doc: changing topology when changing snitches is no longer supported test: cluster: introduce test_no_dc_rack_change storage_service: don't update DC/rack in update_topology_with_local_metadata main: make dc and rack immutable after bootstrap test: cluster: remove test_snitch_change	2025-04-21 15:52:55 +03:00
Piotr Dulikowski	325a89638c	doc: changing topology when changing snitches is no longer supported Update the "How to Switch Snitches" document to indicate that changing topology (i.e. changing node's DC or rack) while changing the snitch is no longer supported. Remove a note which said that switching snitches is not supported with tablets. It was introduced because of the concern that switching a snitch might change DC or rack of the node, for which our current tablet load balancer is completely unprepated. Now that changing DC/rack is forbidden, there doesn't seem to be anything related to snitches which could cause trouble for tablets.	2025-04-17 16:22:58 +02:00
Aleksandra Martyniuk	9769d7a564	docs: nodetool: update repair and add tablet-repair docs	2025-04-08 09:13:14 +02:00
Pavel Emelyanov	2ee9cec1d3	Merge 'Remove object_storage.yaml and move the endpoints to scylla.yaml' from Robert Bindar Move `object_storage.yaml` endpoints to `scylla.yaml` This change also removes the `object_storage.yaml` file altogether and adds tests for fetching the endpoints via the `v2/config/object_storage_endpoints` REST api. Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed. This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f Refs https://github.com/scylladb/scylladb/issues/22428 Closes scylladb/scylladb#22952 * github.com:scylladb/scylladb: Remove db::config::object_storage_config Move `object_storage.yaml` endpoints to `scylla.yaml`	2025-04-01 16:01:44 +03:00
Michał Chojnowski	36be9d1c9b	docs: add user-facing documentation for SSTable compression with shared dicts	2025-04-01 00:07:31 +02:00
Robert Bindar	e3a3508960	Move `object_storage.yaml` endpoints to `scylla.yaml` This change also removes the `object_storage.yaml` file altogether and adds tests for fetching the endpoints via the `v2/config/object_storage_endpoints` REST api. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2025-03-31 13:39:39 +03:00
Kefu Chai	1ab2b7e7a0	tree: fix misspellings these two misspellings were flagged by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23357	2025-03-19 09:13:20 +02:00
Anna Stuchlik	dbbf9e19e4	doc: remove the outdated info on seeds-info This commit removes the outdated information about seed nodes. We no longer need it in the docs, as a) the documentation is versioned, and b) the ScyllaDB Open Source 4.3 and ScyllaDB Enterprise 2021.1 versions mentioned in the docs are no longer supported. In addition, some clarification has been added to the existing sections. Fixes https://github.com/scylladb/scylladb/issues/22400 Closes scylladb/scylladb#23282	2025-03-17 13:53:48 +03:00
Ernest Zaslavsky	112b4c8764	docs: Update Scylla Tools Documentation for S3 SSTable Support Updated the Scylla Tools documentation to include changes related to the enhanced support for S3-stored SSTables. This update ensures that the documentation accurately reflects the latest functionality and improvements.	2025-03-09 09:50:37 +02:00
Anna Stuchlik	9ac0aa7bba	doc: zero-token nodes and Arbiter DC This commit adds documentation for zero-token nodes and an explanation of how to use them to set up an arbiter DC to prevent a quorum loss in multi-DC deployments. The commit adds two documents: - The one in Architecture describes zero-token nodes. - The other in Cluster Management explains how to use them. We need separate documents because zero-token nodes may be used for other purposes in the future. In addition, the documents are cross-linked, and the link is added to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document. Refs https://github.com/scylladb/scylladb/pull/19684 Fixes https://github.com/scylladb/scylladb/issues/20294 Closes scylladb/scylladb#21348	2025-03-07 16:39:02 +01:00
Avi Kivity	28906c9261	Merge 'scylla-sstable: introduce the query command' from Botond Dénes The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit. Select everything from a table: $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-/-big-Data.db keyspace_name \| durable_writes \| replication -------------------------------+----------------+------------------------------------------------------------------------------------- system_replicated_keys \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) system_auth \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1}) system_schema \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) system_distributed \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3}) system \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) ks \| true \| ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) system_traces \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2}) system_distributed_everywhere \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability): $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db \| jq [ { "keyspace_name": "system_schema", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } }, { "keyspace_name": "system", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } } ] Select a specific field in a specific partition using the command-line: $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) Select a specific field in a specific partition using ``--query-file``: $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) New functionality: no backport needed. Closes scylladb/scylladb#22007 github.com:scylladb/scylladb: docs/operating-scylla: document scylla-sstable query test/cqlpy/test_tools.py: add tests for scylla-sstable query test/cqlpy/test_tools.py: make scylla_sstable() return table name also scylla-sstable: introduce the query command tools/utils: get_selected_operation(): use std::string for operation_options utils/rjson: streaming_writer: add RawValue() cql3/type_json: add to_json_type() test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()	2025-03-06 13:42:45 +02:00
Petr Hála	f3c3eb6ae3	doc: Fix object_storage_config_file option It needs to use underscores, not dash Closes scylladb/scylladb#23161	2025-03-06 10:30:51 +03:00
Benny Halevy	55dbf5493c	docs: document the views-with-tablets experimental feature Refs scylladb/scylladb#22217 Fixes scylladb/scylladb#22893 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#22896	2025-02-24 17:23:08 +01:00
Anna Stuchlik	d0a48c5661	doc: remove the reference to the 6.2 version This commit removes the OSS version name, which is irrelevant and confusing for 2025.1 and later users. Also, it updates the warning to avoid specifying the release when the deprecated feature will be removed. Fixes https://github.com/scylladb/scylladb/issues/22839 Closes scylladb/scylladb#22936	2025-02-24 15:02:11 +02:00
Anna Stuchlik	a28bbc22bd	doc: remove references to Enterprise This commit removes the redundant references to Enterprise, which are no longer valid. Fixes https://github.com/scylladb/scylladb/issues/22927 Closes scylladb/scylladb#22930	2025-02-20 11:24:34 +02:00
Dusan Malusev	4e6ea232d2	docs: add instruction for installing cassandra-stress Signed-off-by: Dusan Malusev <dusan.malusev@scylladb.com> Closes scylladb/scylladb#21723	2025-02-19 09:25:16 +02:00
Botond Dénes	2e062e4e10	docs/operating-scylla: document scylla-sstable query	2025-02-18 07:37:05 -05:00
Calle Wilund	342df0b1a8	network_topology_strategy/alter ks: Remove dc:s from options once rf=0 Fixes #22688 If we set a dc rf to zero, the options map will still retain a dc=0 entry. If this dc is decommissioned, any further alters of keyspace will fail, because the union of new/old options will now contained an unknown keyword. Change alter ks options processing to simply remove any dc with rf=0 on alter, and treat this as an implicit dc=0 in nw-topo strategy. This means we change the reallocate_tablets routine to not rely on the strategy objects dc mapping, but the full replica topology info for dc:s to consider for reallocation. Since we verify the input on attribute processing, the amount of rf/tablets moved should still be legal. v2: * Update docs as well. v3: * Simplify dc processing * Reintroduce options empty check, but do early in ks_prop_defs * Clean up unit test some Closes scylladb/scylladb#22693	2025-02-15 20:32:22 +02:00
Pavel Emelyanov	951625ca13	Merge 's3 client: add aws credentials providers' from Ernest Zaslavsky This update introduces four types of credential providers: 1. Environment variables 2. Configuration file 3. AWS STS 4. EC2 Metadata service The first two providers should only be used for testing and local runs. They must NEVER be used in production. The last two providers are intended for use on real EC2 instances: - AWS STS: Preferred method for obtaining temporary credentials using IAM roles. - EC2 Metadata Service: Should be used as a last resort. Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result. fixes: #21828 Closes scylladb/scylladb#21830 * github.com:scylladb/scylladb: docs: update the `object_storage.md` and `admin.rst` aws creds: add STS and Instance Metadata service credentials providers aws creds: add env. and file credentials providers s3 creds: move credentials out of endpoint config	2025-02-06 11:12:37 +03:00
Ernest Zaslavsky	29e60288de	docs: update the `object_storage.md` and `admin.rst` Added additional options and best practices for AWS authentication.	2025-02-05 14:57:19 +02:00
Aleksandra Martyniuk	683176d3db	tasks: add shard, start_time, and end_time to task_stats task_stats contains short info about a task. To get a list of task_stats in the module, one needs to request /task_manager/list_module_tasks/{module}. To make identification and navigation between tasks easier, extend task_stats to contain shard, start_time, and end_time. Closes scylladb/scylladb#22351	2025-02-04 12:11:24 +02:00
Aleksandra Martyniuk	477ad98b72	nodetool: tasks: print empty string for start_time/end_time if unspecified If start_time/end_time is unspecified for a task, task_manager API returns epoch. Nodetool prints the value in task status. Fix nodetool tasks commands to print empty string for start_time/end_time if it isn't specified. Modify nodetool tasks status docs to show empty end_time. Fixes: #22373. Closes scylladb/scylladb#22370	2025-01-30 11:29:36 +02:00
Botond Dénes	d8b8a6c5fc	Merge 'api: task_manager: do not unregister finish task when its status is queried' from Aleksandra Martyniuk Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches Closes scylladb/scylladb#22310 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-01-30 11:27:44 +02:00
Kefu Chai	ce2d235c88	docs: correct typo of "abd" to "and" Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22442	2025-01-28 14:11:02 +03:00
Anna Stuchlik	b2a718547f	doc: remove Enterprise labels and directives This PR removes the now redundant Enterprise labels and directives from the ScyllDB documentation. Fixes https://github.com/scylladb/scylladb/issues/22432 Closes scylladb/scylladb#22434	2025-01-27 16:01:48 +02:00
Anna Stuchlik	1d5ef3dddb	doc: enable the FIPS note in the ScyllaDB docs This commit removes the information about FIPS out of the '.. only:: enterprise' directive. As a result, the information will now show in the doc in the ScyllaDB repo (previously, the directive included the note in the Entrprise docs only). Refs https://github.com/scylladb/scylla-enterprise/issues/5020 Closes scylladb/scylladb#22374	2025-01-27 15:48:54 +02:00
Calle Wilund	bae5b44b97	docs: Remove configuration_encryptor Fixes #21993 Removes configuration_encryptor mention from docs. The tool itself (java) is not included in the main branch java tools, thus need not remove from there. Only the words. Closes scylladb/scylladb#22427	2025-01-27 15:45:18 +02:00
Aleksandra Martyniuk	18cc79176a	api: task_manager: do not unregister tasks on get_status Currently, /task_manager/task_status_recursive/{task_id} and /task_manager/task_status/{task_id} unregister queries task if it has already finished. The status should not disappear after being queried. Do not unregister finished task when its status or recursive status is queried.	2025-01-27 11:23:45 +01:00
Aleksandra Martyniuk	e37d1bcb98	api: task_manager: add /task_manager/drain In the following patches, get_status won't be unregistering finished tasks. However, tests need a functionality to drop a task, so that they could manipulate only with the tasks for operations that were invoked by these tests. Add /task_manager/drain/{module} to unregister all finished tasks from the module. Add respective nodetool command.	2025-01-27 11:23:45 +01:00
Anna Stuchlik	e340d6a452	doc: remove Open Source references in the docs Fixes https://github.com/scylladb/scylladb/issues/22325 Closes scylladb/scylladb#22377	2025-01-20 16:43:21 +02:00
Paweł Zakrzewski	702e727e33	audit: Add documentation for the audit subsystem Adds detailed documentation covering the new audit subsystem: - Add new audit.md design document explaining: - Core concepts and design decisions - CQL extensions for audit management - Implementation details and trigger evaluation - Prior art references from other databases - Add user-facing documentation: - New auditing.rst guide with configuration and usage details - Integration with security documentation index - Updates to cluster management procedures - Updates to security checklist The documentation covers all aspects of the audit system including: - Configuration options and storage backends (syslog/table) - Audit categories (DCL/DDL/AUTH/DML/QUERY/ADMIN) - Permission model and security considerations - Failure handling and logging - Example configurations and output formats This ensures users have complete guidance for setting up and using the new audit capabilities.	2025-01-15 11:10:35 +01:00
Nadav Har'El	15c252fd8f	Merge 'docs: Update documentation on CREATE ROLE WITH HASHED PASSWORD' from Dawid Mędrek As part of #18750, we added a CQL statement CREATE ROLE WITH SALTED HASH that prevented hashing a password when creating a role, effectively leading to inserting a hash given by the user directly into the database. In #21350, we noticed that Cassandra had implemented a CQL statement of similar semantics but different syntax. We decided to rename Scylla's statement to be compatible with Cassandra. Unfortunately, we didn't notice one more difference between what we had in Scylla and what was part of Cassandra. Scylla's statement was originally supposed to only be used when restoring the schema and the user needn't have to be aware of its existence at all: the database produced a sequence of CQL statements that the user saved to a file and when a need to restore the schema arose, they would execute the contents of the file. That's why that although we documented the feature, it was only done in the necessary places. Those that weren't related to the backup & restore procedure were deliberately skipped. Cassandra, on the other hand, added the statement for a different purpose (for details, see the relevant issue) and it was supposed to be used by the user by design. The statement is also documented as such. Since we want to preserve compatibility with Cassandra, we document the statement and its semantics in the user documentation, explicitly implying that it can be used by the user. We also add a test verifying that logging in works correctly. Fixes scylladb/scylladb#21691 Backport: not needed. The relevant code didn't make it to 6.2 or any previous version of OSS. Closes scylladb/scylladb#21752 * github.com:scylladb/scylladb: docs: Update documentation on CREATE ROLE WITH HASHED PASSWORD test/boost: Add test for creating roles with hashed passwords	2025-01-14 15:33:30 +02:00
Kefu Chai	f8885a4afd	dist/docker,docs: replace "--experimental" with "--experimental-features" The "--experimental" option was removed in commit `f6cca741ea`. Using this deprecated option now causes Scylla to fail with the error: ``` error: the argument ('on') for option '--experimental-features' is invalid ``` So, in this change, let's update the docker entry point script to use `--experimental-features` command line option instead. The related document is updated accordingly. Fixes scylladb/scylladb#22207 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22283	2025-01-14 07:56:38 -05:00
Geoff Montee	25e8478051	docs: rest.rst: use latest docker tag to view Swagger UI for REST API Closes scylladb/scylladb#21681	2025-01-14 07:56:38 -05:00
Botond Dénes	686a997c04	Merge 'Complete implementation of configuring IO bandwidth limits' from Pavel Emelyanov In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction\|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool. This set implements the missing bits and adds tests for new functionality. Closes scylladb/scylladb#21877 * github.com:scylladb/scylladb: nodetool: Implement [gs]etstreamthroughput commands nodetool: Implement [gs]etcompationthroughput commands test: Add validation of how IO-updating endpoints work api: Implement /storage_service/(stream\|compaction)_throughput endpoints api: Disqualify const config reference api: Implement /storage_service/stream_throughput endpoint api: Move stream throughput set/get endpoints from storage service block api: Move set_compaction_throughput_mb_per_sec to config block util: Include fmt/ranges.h in config_file.hh	2025-01-14 07:56:38 -05:00
Geoff Montee	c8ca2bd212	docs: operating-scylla/admin-tools/virtual-tables.rst: fix link to virtual tables Closes scylladb/scylladb#22198	2025-01-14 08:45:49 +02:00
Botond Dénes	f899f0e411	tools/scylla-sstable: dump-statistics: fix handling of {min,max}_column_names Said fields in statistics are of type `disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled as array of regular strings. However these fields store exploded clustering keys, so the elements store binary data and converting to string can yield invalid UTF-8 characters that certain JSON parsers (jq, or python's json) can choke on. Fix this by treating them as binary and using `to_hex()` to convert them to string. This requires some massaging of the json_dumper: passing field offset to all visit() methods and using a caller-provided disk-string to sstring converter to convert disk strings to sstring, so in the case of statistics, these fields can be intercepted and properly handled. While at it, the type of these fields is also fixed in the documentation. Before: "min_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "�2y\u0000�}\u007f" ], "max_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "}��B\u0019l%^" ], After: "min_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "80327900e2827d7f" ], "max_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "7df79242196c255e" ], Fixes: #22078 Closes scylladb/scylladb#22225	2025-01-13 09:19:04 +03:00
Avi Kivity	814942505f	Merge 'Introduce Encryption-at-Rest (EAR) for sstables and commitlog' from Calle Wilund Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631 EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data. Introduces OpenSSL based file level encrypted storage, managed via a set of providers ranging from local files to cloud KMS providers. For a more comprehensive explanation, see the included docs (or if possible, original source tree). Manual bulk merge of EAR feature from enterprise repo to main scylla repo. Breaks some features apart, but main EAR is still a humongous commit, because to separate this I would have to mess with code incrementally, adding time and risk. This PR includes the local file gen tool, tests and also p11 validation. Note: CI will not execute the full tests unless master CI is set to provide the same environment as the enterprise one. Not sure about the status of this ATM. Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality will be enabled, otherwise not. Closes scylladb/scylladb#22233 * github.com:scylladb/scylladb: docs: Add EAR docs main/build: Add p11-kit and initialize tools: Add local-file-key-generator tool tests: Add EAR tests tmpdir: shorten test tempdir path EAR: port the ear feature from enterprise cql_test_env: Add optional query timeout schema/migration_manager: Add schema validate sstables: add get_shared_components accessor config/config_file: Add exports and definitions of config_type_for<>	2025-01-12 16:10:46 +02:00
Piotr Smaron	288f9b2b15	Introduce LDAP role manager & saslauthd authenticator This PR extends authentication with 2 mechanisms: - a new role_manager subclass, which allows managing users via LDAP server, - a new authenticator, which delegates plaintext authentication to a running saslauthd daemon. The features have been ported from the enterprise repository with their test.py tests and the documentation as part of changing license to source available. Fixes: scylladb/scylla-enterprise#5000 Fixes: scylladb/scylla-enterprise#5001 Closes scylladb/scylladb#22030	2025-01-12 14:50:29 +02:00
Calle Wilund	8e828f608d	docs: Add EAR docs Merge docs relating to EAR.	2025-01-09 10:40:47 +00:00
Kefu Chai	23729beeb5	docs: remove "ScyllaDB Enterprise" labels remove the "ScyllaDB Enterprise" labels in document. because there is no need to differentiate ScyllaDB Enterprise from its OSS variant, let's stop adding the "ScyllaDB Enterprise" labels to enterprise-only features. this helps to reduce the confusion. as we are still in the process of porting the enterprise features to this repo, this change does not fix scylladb/scylladb#22175. we will review the document again when completing the migration. we also take this opportunity to stop referencing "Enterprise" in the changed paragraph. Refs scylladb/scylladb#22175 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22177	2025-01-08 09:02:52 +02:00
Raphael S. Carvalho	c973254362	Introduce incremental compaction strategy (ICS) ICS is a compaction strategy that inherits size tiered properties -- therefore it's write optimized too -- but fixes its space overhead of 100% due to input files being only released on completion. That's achieved with the concept of sstable run (similar in concept to LCS levels) which breaks a large sstable into fixed-size chunks (1G by default), known as run fragments. ICS picks similar-sized runs for compaction, and fragments of those runs can be released incrementally as they're compacted, reducing the space overhead to about (number_of_input_runs * 1G). This allows user to increase storage density of nodes (from 50% to ~80%), reducing the cost of ownership. NOTE: test_system_schema_version_is_stable adjusted to account for batchlog using IncrementalCompactionStrategy contains: compaction/: added incremental_compaction_strategy.cc (.hh), incremental_backlog_tracker.cc (.hh) compaction/CMakeLists.txt: include ICS cc files configure.py: changes for ICS files, includes test db/legacy_schema_migrator.cc / db/schema_tables.cc: fallback to ICS when strategy is not supported db/system_keyspace: pick ICS for some system tables schema/schema.hh: ICS becomes default test/boost: Add incremental_compaction_test.cc test/boost/sstable_compaction_test.cc: ICS related changes test/cqlpy/test_compaction_strategy_validation.py: ICS related changes docs/architecture/compaction/compaction-strategies.rst: changes to ICS section docs/cql/compaction.rst: changes to ICS section docs/cql/ddl.rst: adds reference to ICS options docs/getting-started/system-requirements.rst: updates sentence mentioning ICS docs/kb/compaction.rst: changes to ICS section docs/kb/garbage-collection-ics.rst: add file docs/kb/index.rst: add reference to <garbage-collection-ics> docs/operating-scylla/procedures/tips/production-readiness.rst: add ICS section some relevant commits throughout the ICS history: commit 434b97699b39c570d0d849d372bf64f418e5c692 Merge: 105586f747 30250749b8 Author: Paweł Dziepak <pdziepak@scylladb.com> Date: Tue Mar 12 12:14:23 2019 +0000 Merge "Introduce Incremental Compaction Strategy (ICS)" from Raphael " Introduce new compaction strategy which is essentially like size tiered but will work with the existing incremental compaction. Thus incremental compaction strategy. It works like size tiered, but each element composing a tier is a sstable run, meaning that the compaction strategy will look for N similar-sized sstable runs to compact, not just individual sstables. Parameters: * "sstable_size_in_mb": defines the maximum sstable (fragment) size composing a sstable run, which impacts directly the disk space requirement which is improved with incremental compaction. The lower the value the lower the space requirement for compaction because fragments involved will be released more frequently. * all others available in size tiered compaction strategy HOWTO ===== To change an existing table to use it, do: ALTER TABLE mykeyspace.mytable WITH compaction = {'class' : 'IncrementalCompactionStrategy'}; Set fragment size: ALTER TABLE mykeyspace.mytable WITH compaction = {'class' : 'IncrementalCompactionStrategy', 'sstable_size_in_mb' : 1000 } " commit 94ef3cd29a196bedbbeb8707e20fe78a197f30a1 Merge: dca89ce7a5 e08ef3e1a3 Author: Avi Kivity <avi@scylladb.com> Date: Tue Sep 8 11:31:52 2020 +0300 Merge "Add feature to limit space amplification in Incremental Compaction" from Raphael " A new option, space_amplification_goal (SAG), is being added to ICS. This option will allow ICS user to set a goal on the space amplification (SA). It's not supposed to be an upper bound on the space amplification, but rather, a goal. This new option will be disabled by default as it doesn't benefit write-only (no overwrites) workloads and could hurt severely the write performance. The strategy is free to delay triggering this new behavior, in order to increase overall compaction efficiency. The graph below shows how this feature works in practice for different values of space_amplification_goal: https://user-images.githubusercontent.com/1409139/89347544-60b7b980-d681-11ea-87ab-e2fdc3ecb9f0.png When strategy finds space amplification crossed space_amplification_goal, it will work on reducing the SA by doing a cross-tier compaction on the two largest tiers. This feature works only on the two largest tiers, because taking into account others, could hurt the compaction efficiency which is based on the fact that the more similar-sized sstables are compacted together the higher the compaction efficiency will be. With SAG enabled, min_threshold only plays an important role on the smallest tiers, given that the second-largest tier could be compacted into the largest tier for a space_amplification_goal value < 2. By making the options space_amplification_goal and min_threshold independent, user will be able to tune write amplification and space amplification, based on the needs. The lower the space_amplification_goal the higher the write amplification, but by increasing the min threshold, the write amplification can be decreased to a desired amount. " commit 7d90911c5fb3fa891ad64a62147c3a6ca26d61b1 Author: Raphael S. Carvalho <raphaelsc@scylladb.com> Date: Sat Oct 16 13:41:46 2021 -0300 compaction: ICS: Add garbage collection Today, ICS lacks an approach to persist expired tombstones in a timely manner, which is a problem because accumulation of tombstones are known to affecting latency considerably. For an expired tombstone to be purged, it has to reach the top of the LSM tree and hope that older overlapping data wasn't introduced at the bottom. The condition are there and must be satisfied to avoid data resurrection. STCS, today, has an inefficient garbage collection approach because it only picks a single sstable, which satisfies the tombstone density threshold and file staleness. That's a problem because overlapping data either on same tier or smaller tiers will prevent tombstones from being purged. Also, nothing is done to push the tombstones to the top of the tree, for the conditions to be eventually satisfied. Due to incremental compaction, ICS can more easily have an effecient GC by doing cross-tier compaction of relevant tiers. The trigger will be file staleness and tombstone density, which threshold values can be configured by tombstone_compaction_interval and tombstone_threshold, respectively. If ICS finds a tier which meets both conditions, then that tier and the larger[1] and closest-in-size[2] tier will be compacted together. [1]: A larger tier is picked because we want tombstones to eventually reach the top of the tree. [2]: It also has to be the closest-in-size tier as the smaller the size difference the higher the efficiency of the compaction. We want to minimize write amplification as much as possible. The staleness condition is there to prevent the same file from being picked over and over again in a short interval. With this approach, ICS will be continuously working to purge garbage while not hurting overall efficiency on a steady state, as same-tier compactions are prioritized. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211016164146.38010-1-raphaelsc@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22063	2025-01-04 15:43:52 +02:00
Piotr Dulikowski	07b162fb5b	docs: add documentation for workload prioritization The doc pages were slightly adjusted during migration not to mention Scylla Enterprise and to fix some whitespace issues.	2025-01-02 07:13:34 +01:00
Avi Kivity	76cf5148e1	Merge 'message: introduce advanced rpc compression' from Michał Chojnowski This is a forward port (from scylla-enterprise) of additional compression options (zstd, dictionaries shared across messages) for inter-node network traffic. It works as follows: After the patch, messaging_service (Scylla's interface for all inter-node communication) compresses its network traffic with compressors managed by the new advanced_rpc_compression::tracker. Those compressors compress with lz4, but can also be configured to use zstd as long as a CPU usage limit isn't crossed. A precomputed compression dictionary can be fed to the tracker. Each connection handled by the tracker will then start a negotiation with the other end to switch to this dictionary, and when it succeeds, the connection will start being compressed using that dictionary. All traffic going through the tracker is passed as a single merged "stream" through dict_sampler. dictionary_service has access to the dict_sampler. On chosen nodes (in the "usual" configuration: the Raft leader), it uses the sampler to maintain a random multi-megabyte sample of the sampler's stream. Every several minutes, it copies the sample, trains a compression dictionary on it (by calling zstd's training library via the alien_worker thread) and publishes the new dictionary to system.dicts via Raft's write_mutation command. This update triggers (eventually) a callback on all nodes, which feeds the new dictionary to advanced_rpc_compression::tracker, and this switches (eventually) all inter-node connections to this dictionary. Closes scylladb/scylladb#22032 * github.com:scylladb/scylladb: messaging_service: use advanced_rpc_compression::tracker for compression message/dictionary_service: introduce dictionary_service service: make Raft group 0 aware of system.dicts db/system_keyspace: add system.dicts utils: add advanced_rpc_compressor utils: add dict_trainer utils: introduce reservoir_sampling utils: introduce alien_worker utils: add stream_compressor	2024-12-31 15:02:57 +02:00
Michał Chojnowski	fdb2d2209c	messaging_service: use advanced_rpc_compression::tracker for compression This patch sets up an `alien_worker`, `advanced_rpc_compression::tracker`, `dict_sampler` and `dictionary_service` in `main()`, and wires them to each other and to `messaging_service`. `messaging_service` compresses its network traffic with compressors managed by the `advanced_rpc_compression::tracker`. All this traffic is passed as a single merged "stream" through `dict_sampler`. `dictionary_service` has access to `dict_sampler`. On chosen nodes (by default: the Raft leader), it uses the sampler to maintain a random multi-megabyte sample of the sampler's stream. Every several minutes, it copies the sample, trains a compression dictionary on it (by calling zstd's training library via the `alien_worker` thread) and publishes the new dictionary to `system.dicts` via Raft. This update triggers a callback into `advanced_rpc_compression::tracker` on all nodes, which updates the dictionary used by the compressors it manages.	2024-12-27 10:17:58 +01:00
Pavel Emelyanov	a24dc02255	api: New "scope" API param to load-and-stream calls There are two of those -- the POST /storage_service/keyspace that loads and streams new sstables from /upload and POST /storage_service/restore that does the same, but gets sstables from object store. The new optional parameter allow users to tun the streaming phase behavior. The test/pylib client part is also updated here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:28:05 +03:00
Dawid Mędrek	461a6b129c	docs: Update documentation on CREATE ROLE WITH HASHED PASSWORD As part of #18750, we added a CQL statement CREATE ROLE WITH SALTED HASH that prevented hashing a password when creating a role, effectively leading to inserting a hash given by the user directly into the database. In #21350, we noticed that Cassandra had implemented a CQL statement of similar semantics but different syntax. We decided to rename Scylla's statement to be compatible with Cassandra. Unfortunately, we didn't notice one more difference between what we had in Scylla and what was part of Cassandra. Scylla's statement was originally supposed to only be used when restoring the schema and the user needn't have to be aware of its existence at all: the database produced a sequence of CQL statements that the user saved to a file and when a need to restore the schema arose, they would execute the contents of the file. That's why that although we documented the feature, it was only done in the necessary places. Those that weren't related to the backup & restore procedure were deliberately skipped. Cassandra, on the other hand, added the statement for a different purpose (for details, see the relevant issue) and it was supposed to be used by the user by design. The statement is also documented as such. Since we want to preserve compatibility with Cassandra, we document the statement and its semantics in the user documentation, explicitly implying that it can be used by the user. Fixes scylladb/scylladb#21691	2024-12-17 13:43:36 +01:00

1 2 3 4 5 ...

327 Commits