scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Botond Dénes	83ea1877ab	Merge 'scylla-sstable: add native S3 support' from Ernest Zaslavsky scylla-sstable: Enable support for S3-stored sstables Minimal implementation of what was mentioned in this [issue](https://github.com/scylladb/scylladb/issues/20532) This update allows Scylla to work with sstables stored on AWS S3. Users can specify the fully qualified location of the sstable using the format: `s3://bucket/prefix/sstable_name`. One should have `object_storage_config_file` referenced in the `scylla.yaml` as described in docs/operating-scylla/admin.rst ref: https://github.com/scylladb/scylladb/issues/20532 fixes: https://github.com/scylladb/scylladb/issues/20535 No backport needed since the S3 functionality was never released Closes scylladb/scylladb#22321 * github.com:scylladb/scylladb: tests: Add Tests for Scylla-SSTable S3 Functionality docs: Update Scylla Tools Documentation for S3 SSTable Support scylla-sstable: Enable Support for S3 SSTables s3: Implement S3 Fully Qualified Name Manipulation Functions object_storage: Refactor `object_storage.yaml` parsing logic	2025-03-14 15:05:52 +02:00
Kefu Chai	9f411f9962	tools/scylla-nodetool: refactor to use std::tie() for cleaner code Replace explicit pair member access with std::tie() throughout scylla-nodetool. This simplifies the code by eliminating repetitive pair.first/pair.second references and makes the codebase more maintainable and readable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23250	2025-03-13 11:56:07 +02:00
Ernest Zaslavsky	17e3c01f4e	scylla-sstable: Enable Support for S3 SSTables Configure the sstable manager to correctly handle storage options based on the input type (local or S3-stored sstables). This tweak allows for mixing both storage types within a single call, improving flexibility and functionality.	2025-03-09 09:50:36 +02:00
Avi Kivity	28906c9261	Merge 'scylla-sstable: introduce the query command' from Botond Dénes The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit. Select everything from a table: $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-/-big-Data.db keyspace_name \| durable_writes \| replication -------------------------------+----------------+------------------------------------------------------------------------------------- system_replicated_keys \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) system_auth \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1}) system_schema \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) system_distributed \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3}) system \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) ks \| true \| ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) system_traces \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2}) system_distributed_everywhere \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability): $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db \| jq [ { "keyspace_name": "system_schema", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } }, { "keyspace_name": "system", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } } ] Select a specific field in a specific partition using the command-line: $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) Select a specific field in a specific partition using ``--query-file``: $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) New functionality: no backport needed. Closes scylladb/scylladb#22007 github.com:scylladb/scylladb: docs/operating-scylla: document scylla-sstable query test/cqlpy/test_tools.py: add tests for scylla-sstable query test/cqlpy/test_tools.py: make scylla_sstable() return table name also scylla-sstable: introduce the query command tools/utils: get_selected_operation(): use std::string for operation_options utils/rjson: streaming_writer: add RawValue() cql3/type_json: add to_json_type() test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()	2025-03-06 13:42:45 +02:00
Botond Dénes	5d63ef4d15	Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points. Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line. Closes scylladb/scylladb#22327 * github.com:scylladb/scylladb: tools: Add standard extensions and propagate to schema load cql_test_env: Use add all extensions instead of inidividually main: Move extensions adding to function tomstone_gc: Make validate work for tools	2025-02-26 13:52:47 +02:00
Yaron Kaikov	e6227f9a25	install-dependencies.sh: update node_exporter to 1.9.0 Update node_exporter to 1.9.0 to resolve the following CVE's https://github.com/advisories/GHSA-49gw-vxvf-fc2g https://github.com/advisories/GHSA-8xfx-rj4p-23jm https://github.com/advisories/GHSA-crqm-pwhx-j97f https://github.com/advisories/GHSA-j7vj-rw65-4v26 Fixes: https://github.com/scylladb/scylladb/issues/22884 regenerate frozen toolchain with optimized clang from * https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz * https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz Closes scylladb/scylladb#22987	2025-02-24 13:49:36 +02:00
Botond Dénes	aba4d07c62	tools/utils: configure_tool_mode: set auto_handle_sigint_sigterm = false Disable seastar's built in handlers for SIGINT and SIGTERM and thus fall-back to the OS's default handlers, which terminate the process. This makes tool applications interruptable by SIGINT and SIGTERM. The default handler just terminates the tool app immediately and doesn't allow for cleanup, but this is fine: the tools have no important data to save or any critical cleanup to do before exiting. Fixes: scylladb/scylladb#16954 Closes scylladb/scylladb#22838	2025-02-17 23:28:18 +02:00
Botond Dénes	5d09182ce5	scylla-sstable: introduce the query command Allows querying the content of sstables. Simple queries can be constructed on the command-line. More advanced queries can be passed in a file. The output can be text (similar to CQLSH) or json (similar to SELECT JSON). Uses a cql_test_env behind the scenes to set-up a query pipeline. The queried sstables are not registered into cql_test_env, instead they are queried via the virtual-table interface. This is to isolate the sstables from any accidental modifications cql_test_env might want to do to them.	2025-02-17 08:01:39 -05:00
Botond Dénes	5e76dd90a9	tools/utils: get_selected_operation(): use std::string for operation_options tool_app_template::run() calls get_selected_operation() to obtain the operation (command) the user selected. To do this, get_selected_operation() does a CLI pre-parsing pass, with a minimal boost::program_options, so things like mixed positional/non-positional args are correctly handled. This code use `sstring` for generic operation-options. The problem is that boost doesn't allow values with spaces inside for non-std::string types. This therefore prevents such values from being used for any option downstream, because parsing would fail at this stage. Change the type to std::string to solve this problem.	2025-02-17 08:01:39 -05:00
Botond Dénes	87e8e00de6	tools/scylla-nodetool: netstats: don't assume both senders and receivers The code currently assumes that a session has both sender and receiver streams, but it is possible to have just one or the other. Change the test to include this scenario and remove this assumption from the code. Fixes: #22770 Closes scylladb/scylladb#22771	2025-02-15 20:32:22 +02:00
Kefu Chai	7ff0d7ba98	tree: Remove unused boost headers This commit eliminates unused boost header includes from the tree. Removing these unnecessary includes reduces dependencies on the external Boost.Adapters library, leading to faster compile times and a slightly cleaner codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22857	2025-02-15 20:32:22 +02:00
Botond Dénes	c57492bd73	Update tools/java submodule * tools/java 807e991d...4f1353ba (1): > dist: support smooth upgrade from enterprise to source availalbe Refs scylladb/scylladb#22820	2025-02-13 12:32:07 +02:00
Nadav Har'El	e6dcb605cb	Merge 'Fix typos' from Dmitriy Rokhfeld (TripleChecker) Hey, our tool caught a few typos in your repository. Also, here is your site's error report: https://triplechecker.com/s/Dza11H/scylladb.com Hope it's helpful! Closes scylladb/scylladb#22787 * github.com:scylladb/scylladb: Fix typos Fix typos	2025-02-13 11:14:29 +02:00
TripleChecker	8d64be94e2	Fix typos	2025-02-13 01:54:08 +02:00
Avi Kivity	770dc37f0f	tools: toolchain: prepare: fix optimized_clang archive printout prepare helpfully prints out the path where optimized clang is stored, but a couple of typos mean it prints out an empty string. Fix that. Closes scylladb/scylladb#22714	2025-02-11 11:50:01 +02:00
TripleChecker	e72e6fadeb	Fix typos	2025-02-11 00:17:43 +02:00
Avi Kivity	cf72c31617	treewide: improve bash error reporting bash error handling and reporting is atrocious. Without -e it will just ignore errors. With -e it will stop on errors, but not report where the error happened (apart from exiting itself with an error code). Improve that with the `trap ERR` command. Note that this won't be invoked on intentional error exit with `exit 1`. We apply this on every bash script that contains -e or that it appears trivial to set it in. Non-trivial scripts without -e are left unmodified, since they might intentionally invoke failing scripts. Closes scylladb/scylladb#22747	2025-02-10 18:28:52 +03:00
Evgeniy Naydanov	06793978c1	test.py: new Python dependencies for dtest->test.py migration 3rd-party library which provide compatibility between sync and async code: universalasync Few deps from scylla-dtest: deepdiff cryptography boto3-stubs[dynamodb] [avi: regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz ] Closes scylladb/scylladb#22497	2025-02-10 10:52:27 +02:00
Botond Dénes	be23ebf20f	Update tools/python3 submodule * tools/python3 8415caf4...3e0b8932 (2): > reloc: collect package files correctly if the package has an optional dependency > dist: support smooth upgrade from enterprise to source availalbe Closes scylladb/scylladb#22517	2025-02-08 21:54:42 +02:00
Avi Kivity	d3b8c9f5ef	build: update frozen toolchain to Fedora 41 with clang 19 Update from clang 18 to clang 19. perf-simple-query reports: clang 18 278102.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 36056 insns/op, 16560 cycles/op, 0 errors) 288801.19 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 36018 insns/op, 16004 cycles/op, 0 errors) 287795.23 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 36039 insns/op, 15995 cycles/op, 0 errors) 290495.86 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 36027 insns/op, 15939 cycles/op, 0 errors) 293116.10 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 36020 insns/op, 15780 cycles/op, 0 errors) clang 19 284742.08 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 35517 insns/op, 16419 cycles/op, 0 errors) 297974.97 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 35497 insns/op, 15926 cycles/op, 0 errors) 279527.99 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 35513 insns/op, 16724 cycles/op, 0 errors) 298229.61 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 35494 insns/op, 15892 cycles/op, 0 errors) 297982.67 tps ( 63.0 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 35494 insns/op, 15819 cycles/op, 0 errors) So the update delivers a nice performance improvement. Optimized clang regenerated and stored in https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz Script to prepare optimized clang updated, and upstreamed patch dropped. Closes scylladb/scylladb#22380	2025-02-08 17:18:17 +02:00
Avi Kivity	f3751f0eba	tools: toolchain: dbuild: don't use `which` command The `which` command is typically not installed on cloud OS images and so requires the user to remember to install it (or to be prompted by a failure to install it). Replace it with the built-in `type` that is always there. Wrap it in a function to make it clear what it does. Closes scylladb/scylladb#22594	2025-02-05 17:18:05 +03:00
Aleksandra Martyniuk	683176d3db	tasks: add shard, start_time, and end_time to task_stats task_stats contains short info about a task. To get a list of task_stats in the module, one needs to request /task_manager/list_module_tasks/{module}. To make identification and navigation between tasks easier, extend task_stats to contain shard, start_time, and end_time. Closes scylladb/scylladb#22351	2025-02-04 12:11:24 +02:00
Avi Kivity	6913f054e7	Update tools/cqlsh submodule The driver update makes cqlsh work well with Python 3.13. * tools/cqlsh 52c6130...02ec7c5 (18): > chore(deps): update dependency scylla-driver to v3.28.2 > dist: support smooth upgrade from enterprise to source availalbe > github action: fix downloading of artifacts > chore(deps): update docker/setup-buildx-action action to v3 > chore(deps): update docker/login-action action to v3 > chore(deps): update docker/build-push-action action to v6 > chore(deps): update docker/setup-qemu-action action to v3 > chore(deps): update peter-evans/dockerhub-description action to v4 > upload actions: update the usage for multiple artifacts > chore(deps): update actions/download-artifact action to v4.1.8 > chore(deps): update dependency scylla-driver to v3.28.0 > chore(deps): update pypa/cibuildwheel action to v2.22.0 > chore(deps): update actions/checkout action to v4 > chore(deps): update python docker tag to v3.13 > chore(deps): update actions/upload-artifact action to v4 > github actions: update it to work > add option to output driver debug > Add renovate.json (#107) Closes scylladb/scylladb#22593	2025-02-04 12:06:54 +02:00
Aleksandra Martyniuk	477ad98b72	nodetool: tasks: print empty string for start_time/end_time if unspecified If start_time/end_time is unspecified for a task, task_manager API returns epoch. Nodetool prints the value in task status. Fix nodetool tasks commands to print empty string for start_time/end_time if it isn't specified. Modify nodetool tasks status docs to show empty end_time. Fixes: #22373. Closes scylladb/scylladb#22370	2025-01-30 11:29:36 +02:00
Botond Dénes	d8b8a6c5fc	Merge 'api: task_manager: do not unregister finish task when its status is queried' from Aleksandra Martyniuk Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches Closes scylladb/scylladb#22310 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-01-30 11:27:44 +02:00
Botond Dénes	b70dccb638	sstables: disk_types: disk_set_of_tagged_union: boost::variant -> std::variant In the spirit of using standard-library types, instead of boost ones where possible. Although a disk type, it is serialized/deserialized with custom code, so the change shouldn't cause any changes in the disk representation.	2025-01-27 09:29:26 -05:00
Aleksandra Martyniuk	e37d1bcb98	api: task_manager: add /task_manager/drain In the following patches, get_status won't be unregistering finished tasks. However, tests need a functionality to drop a task, so that they could manipulate only with the tasks for operations that were invoked by these tests. Add /task_manager/drain/{module} to unregister all finished tasks from the module. Add respective nodetool command.	2025-01-27 11:23:45 +01:00
Botond Dénes	2428f22d3e	Update tools/python3 submodule * tools/python3 fbf12d02...8415caf4 (1): > dist: Support FIPS mode	2025-01-17 09:17:29 +02:00
Calle Wilund	48fda00f12	tools: Add standard extensions and propagate to schema load Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.	2025-01-15 12:10:23 +00:00
Botond Dénes	686a997c04	Merge 'Complete implementation of configuring IO bandwidth limits' from Pavel Emelyanov In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction\|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool. This set implements the missing bits and adds tests for new functionality. Closes scylladb/scylladb#21877 * github.com:scylladb/scylladb: nodetool: Implement [gs]etstreamthroughput commands nodetool: Implement [gs]etcompationthroughput commands test: Add validation of how IO-updating endpoints work api: Implement /storage_service/(stream\|compaction)_throughput endpoints api: Disqualify const config reference api: Implement /storage_service/stream_throughput endpoint api: Move stream throughput set/get endpoints from storage service block api: Move set_compaction_throughput_mb_per_sec to config block util: Include fmt/ranges.h in config_file.hh	2025-01-14 07:56:38 -05:00
Botond Dénes	f899f0e411	tools/scylla-sstable: dump-statistics: fix handling of {min,max}_column_names Said fields in statistics are of type `disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled as array of regular strings. However these fields store exploded clustering keys, so the elements store binary data and converting to string can yield invalid UTF-8 characters that certain JSON parsers (jq, or python's json) can choke on. Fix this by treating them as binary and using `to_hex()` to convert them to string. This requires some massaging of the json_dumper: passing field offset to all visit() methods and using a caller-provided disk-string to sstring converter to convert disk strings to sstring, so in the case of statistics, these fields can be intercepted and properly handled. While at it, the type of these fields is also fixed in the documentation. Before: "min_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "�2y\u0000�}\u007f" ], "max_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "}��B\u0019l%^" ], After: "min_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "80327900e2827d7f" ], "max_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "7df79242196c255e" ], Fixes: #22078 Closes scylladb/scylladb#22225	2025-01-13 09:19:04 +03:00
Botond Dénes	a21ecc3253	tools/scylla-sstable: also try reading scylla.yaml from /etc/scylla scylla-sstable tries to read scylla.yaml via the following sequence: 1) Use user-provided location is provided (--scylla-yaml-file parameter) 2) Use the environment variables SCYLLA_HOME and/or SCYLLA_CONF if set 3) Use the default location ./conf/scylla.yaml Step 3 is fine on dev machines, where the binaries are usually invoked from scylla.git, which does have conf/scylla.yaml, but it doesn't work on production machines, where the default location for scylla.yaml is /etc/scylla/scylla.yaml. To reduce friction when used on production machines, add another fallback in case (3) fails, which tries to read scylla.yaml from /etc/scylla/scylla.yaml location. Fixes: scylladb/scylladb#22202 Closes scylladb/scylladb#22241	2025-01-13 09:11:29 +03:00
Avi Kivity	814942505f	Merge 'Introduce Encryption-at-Rest (EAR) for sstables and commitlog' from Calle Wilund Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631 EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data. Introduces OpenSSL based file level encrypted storage, managed via a set of providers ranging from local files to cloud KMS providers. For a more comprehensive explanation, see the included docs (or if possible, original source tree). Manual bulk merge of EAR feature from enterprise repo to main scylla repo. Breaks some features apart, but main EAR is still a humongous commit, because to separate this I would have to mess with code incrementally, adding time and risk. This PR includes the local file gen tool, tests and also p11 validation. Note: CI will not execute the full tests unless master CI is set to provide the same environment as the enterprise one. Not sure about the status of this ATM. Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality will be enabled, otherwise not. Closes scylladb/scylladb#22233 * github.com:scylladb/scylladb: docs: Add EAR docs main/build: Add p11-kit and initialize tools: Add local-file-key-generator tool tests: Add EAR tests tmpdir: shorten test tempdir path EAR: port the ear feature from enterprise cql_test_env: Add optional query timeout schema/migration_manager: Add schema validate sstables: add get_shared_components accessor config/config_file: Add exports and definitions of config_type_for<>	2025-01-12 16:10:46 +02:00
Yaron Kaikov	6f30d26f2a	Update tools/cqlsh submodule * tools/cqlsh b09bc793...52c61306 (3): > cleanup: remove un-used Dockerfiles > .github/workflows/build-push.yml: update to newer macos images > cython: fix the usage of cython Closes scylladb/scylladb#22250	2025-01-12 16:06:30 +02:00
Calle Wilund	f901beec87	tools: Add local-file-key-generator tool For generating key files for local provider	2025-01-09 10:40:47 +00:00
Wojciech Mitros	d04f376227	mv: add an experimental feature for creating views using tablets We still have a number of issues to be solved for views with tablets. Until they are fixed, we should prevent users from creating them, and use the vnode-based views instead. This patch prepares the feature for enabling views with tablets. The feature is disabled by default, but currently it has no effect. After all tests are adjusted to use the feature, we should depend on the feature for deciding whether we can create materialized views in tablet-enabled keyspaces. The unit tests are adjusted to enable this feature explicitly, and it's also added to the scylla sstable tool config - this tool treats all tables as if they were tablet-based (surprisingly, with SimpleStrategy), so for it to work on views, the new feature must be enabled. Refs scylladb/scylladb#21832 Closes scylladb/scylladb#21833	2025-01-07 15:52:36 +01:00
Avi Kivity	748d30a34d	tools: toolchain: simplify non-emulated build procedure Avoid using temporary names and instead treat the final image tag as a temporary. The new procedure is more or less remote-final := local-x86_64 local-aarch64 += remote-final remote-final := local-aarch64 (which now contains the x86_64 image too) Closes scylladb/scylladb#21981	2025-01-07 16:17:29 +02:00
Kefu Chai	353b522ca0	treewide: migrate from boost::adaptors::reversed to std::views::reverse now that we are allowed to use C++23. we now have the luxury of using `std::views::reverse`. - replace `boost::adaptors::transformed` with `std::views::transform` - remove unused `#include <boost/range/adaptor/reversed.hpp>` this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-07 13:22:00 +02:00
Botond Dénes	173fad296a	tools/schema_loader.cc: remove duplicate include of short_streams.hh Closes scylladb/scylladb#21982	2025-01-07 13:03:17 +02:00
Kefu Chai	e4463b11af	treewide: replace boost::algorithm::join() with fmt::join() Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve performance and reduce dependency on Boost. `fmt::join()` allows direct formatting of ranges and tuples with custom separators without creating intermediate strings. When formatting comma-separated values into another string, fmt::join() avoids the overhead of temporary string creation that `boost::algorithm::join()` requires. This change also helps streamline our dependencies by leveraging the existing fmt library instead of Boost.Algorithm. To avoid the ambiguity, some caller sites were updated to call `seastar::format()` explicitly. See also - boost::algorithm::join(): https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp - fmt::join(): https://fmt.dev/11.0/api/#ranges-api Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22082	2025-01-07 12:45:05 +02:00
Pavel Emelyanov	a24dc02255	api: New "scope" API param to load-and-stream calls There are two of those -- the POST /storage_service/keyspace that loads and streams new sstables from /upload and POST /storage_service/restore that does the same, but gets sstables from object store. The new optional parameter allow users to tun the streaming phase behavior. The test/pylib client part is also updated here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:28:05 +03:00
Avi Kivity	a4440392d7	build: update dependencies for features to be ported from enterprise ldap/slapd/toxiproxy/cyrus-sasl - for ldap authentication and authorization git-lfs/bolt - for profile-guided optimization lz4-static - for dictionary based network compression jwt - for Oauth/GCP connectivity (for key management) openkmip - for kmip testing fipscheck - for FIPS validation Frozen toolchain regenerated, with optimized clang from https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz	2024-12-19 14:26:31 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Pavel Emelyanov	3081ce24cd	nodetool: Implement [gs]etstreamthroughput commands They exist in the original documentation, but are not yet implemented. Now it's possible to do it. It slightly more complex that its compaction counterpart in a sense than get method reports megabits/s by default and has an option to convert to MiBs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-13 14:39:47 +03:00
Pavel Emelyanov	67089fd5a1	nodetool: Implement [gs]etcompationthroughput commands They exist in the original documentation, but are not yet implemented. Now it's possible to do it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-13 14:39:47 +03:00
Avi Kivity	1bac6b75dc	Merge 'Reserve IOCBs for tool applications' from Botond Dénes Artifact tests have been failing since the switch to the native nodetool, because ScyllaDB doesn't leave any IOCBs for tools. On some setups it will consume all of them and then nodetool and any other native app will refuse to start because it will fail to allocate IOCBs. This PR fixes this by making use of the freshly introduced `--reserve-io-control-blocks` seastar option, to reserve IOCBs for tool applications. Since the `linux-aio` and `epoll` reactor backends require quite a bit of these, we enable the `io_uring` reactor backend and switch tools to use this backend instead. The `io_uring` reactor backend needs just 2 IOCBs to function, so the reserve of 10 IOCBs set up in this PR is good for running 5 tool applications in parallel, which should be more than enough. Fixes: https://github.com/scylladb/scylladb/issues/19185 The problem this PR fixes has a manual workaround (and is rare to begin with), no backport needed. Closes scylladb/scylladb#21527 * github.com:scylladb/scylladb: main: configure a reserve IOCB for scylla-nodetool and friends configure: enable the io_uring backend main: use configure seastar defaults via app_template::seastar_options	2024-12-09 19:22:19 +02:00
Botond Dénes	f55dc71c3f	Merge 'Use checksummed input streams in `validate_checksums()`' from Nikos Dragazis With commits `ed7d352e7d` and `bb1867c7c7`, we now have input streams for both compressed and uncompressed SSTables that provide seamless checksum and digest checking. The code for these was based on `validate_checksums()`, which implements its own validation logic over raw streams. This has led to some duplicate code. This PR deduplicates the uncompressed case by modifying `validate_checksums()` to use a checksummed input stream instead of a raw stream. The same cannot be done for compressed SSTables though. The reason is that `validate_checksums()` needs to examine the whole data file, even if an invalid chunk is encountered. In the checksummed case we support that by offloading the error handling logic from the data source via a function parameter. In the compressed data source we cannot do that because it needs to return decompressed data and decompression may fail if the data are invalid. This PR also enables `validate_checksums()` to partially verify SSTables with just the per-chunk checksums if the digest is missing. In more detail, this PR consists of: * Port of some integrity checks from `do_validate_uncompressed()` to the checksummed data source. It should now be able to detect corruption due to truncated or appended chunks (expected number of chunks is retrieved from the CRC component). * Introduction of `error_handler` parameter in checksummed data source and `data_stream()`. * Refactoring of `validate_checksums()`. The JSON response of `sstable validate-checksums` was also modified to report a missing digest. * Tests for `validate_checksums()` against SSTables with truncated data, appended data, invalid digests, or no digest. Refs #19058. This PR is a hybrid of cleanup and feature. No backport is needed. Closes scylladb/scylladb#20933 * github.com:scylladb/scylladb: tools/scylla-sstable: Rename valid_checksums -> valid test: Check validate_checksums() with missing digest sstables: Allow validate_checksums() to report missing digests sstables: Refactor validate_checksums() to use checksummed data stream sstables: Add error_handler parameter to data_stream() sstables: Add error handler in checksummed data source sstables: Check for excessive chunks in checksummed data source sstables: Check for premature EOF in checksummed data source test: test_validate_checksums: Check SSTable with invalid digest test: test_validate_checksums: Check SSTable with appended data test: test_validate_checksums: Complement test for truncated SSTable	2024-12-04 10:46:18 +02:00
Benny Halevy	d5d4307a20	scylla-sstable: dump-summary: print also first and last tokens To help scylla-manager restore to map sstables to nodes or tablets, print also the tokens of the sstable first and last keys. For example, the json output will now look like this: ``` $ build/dev/scylla sstable dump-summary /tmp/scylla-344593/data/ks/t-52a92590afd011ef9b68ba86378ed63b/me-3glp_0tm9_00uv52doobo0bvk2t7-big-Data.db \| jq { "sstables": { "/tmp/scylla-344593/data/ks/t-52a92590afd011ef9b68ba86378ed63b/me-3glp_0tm9_00uv52doobo0bvk2t7-big-Data.db": { "header": { "min_index_interval": 128, "size": 1, "memory_size": 16, "sampling_level": 128, "size_at_full_sampling": 0 }, "positions": [ 4 ], "entries": [ { "key": { "token": "2008715943680221220", "raw": "000400000064", "value": "100" }, "position": 0 } ], "first_key": { "token": "2008715943680221220", "raw": "000400000064", "value": "100" }, "last_key": { "token": "9010454139840013625", "raw": "000400000003", "value": "3" } } } } ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21735	2024-12-04 10:16:13 +02:00
Botond Dénes	ca956c0180	configure: enable the io_uring backend To be used by the tool apps -- also change the backend selected in tools::utils::configure_tool_mode(). We keep using the more mature AIO backend in ScyllaDB itself, so main.cc sets the linux_aio backend as the default one (the user can still change this, same as before).	2024-12-04 02:55:31 -05:00
Avi Kivity	841481c202	Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb " This rather large patch series moves storage proxy and some adjacent services (like migration manager) to use host ids to identify nodes rather than ips. Messaging service gains a capability to address nodes by host ids (which allows dropping translations from topology coordinator code that worked on host ids already) and also makes sure that a node with incorrect host id will reject a message (can happen during address changes). The series gets rid of the raft address map completely and replaces it with the gossiper address map which is managed by the gossiper since translation is now done in the layer below raft. Fixes: scylladb/scylladb#6403 perf-simple-query -- smp 1 -m 1G output Before: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64336.82 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41291 insns/op, 24485 cycles/op, 0 errors) 62669.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41277 insns/op, 24695 cycles/op, 0 errors) 69172.12 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41326 insns/op, 24463 cycles/op, 0 errors) 56706.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41143 insns/op, 24513 cycles/op, 0 errors) 56416.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41186 insns/op, 24851 cycles/op, 0 errors) throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65 instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80 cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70 After: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 65237.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 40733 insns/op, 23145 cycles/op, 0 errors) 59283.09 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40624 insns/op, 23948 cycles/op, 0 errors) 70851.03 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40625 insns/op, 23027 cycles/op, 0 errors) 70549.61 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40650 insns/op, 23266 cycles/op, 0 errors) 68634.96 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40622 insns/op, 22935 cycles/op, 0 errors) throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09 instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33 cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59 CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/ SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/ Tested mixed cluster manually. " * 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits) group0: drop unused field from replace_info struct test: rename raft_address_map_test to address_map_test and move if from raft tests raft_address_map: remove raft address map topology coordinator: do not modify expire state for left/new nodes any more in raft address map topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used group0: drop raft address map dependency from raft_rpc group0: move raft_ticker_type definition from raft_address_map.hh storage_service: do not update raft address map on gossiper events group0: drop raft address map dependency from raft_server_with_timeouts group0: move group0 upgrade code to host ids repair: drop raft address map dependency group0: remove unused raft address map getter from raft_group0 group0: drop raft address map from group0_state_machine dependency since it is not used there any more group0: remove dependency on raft address map from group0_state_id_handler gossiper: add get_application_state_ptr that searches by host_id gossiper: change get_live_token_owners to return host ids view: move view building to host id hints: use host id to send hints storage_proxy: remove id_vector_to_addr since it is no longer used db: consistency_level: change is_sufficient_live_nodes to work on host ids ...	2024-12-03 18:18:48 +02:00

1 2 3 4 5 ...

1038 Commits