scylladb

Author	SHA1	Message	Date
Avi Kivity	28906c9261	Merge 'scylla-sstable: introduce the query command' from Botond Dénes The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit. Select everything from a table: $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-/-big-Data.db keyspace_name \| durable_writes \| replication -------------------------------+----------------+------------------------------------------------------------------------------------- system_replicated_keys \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) system_auth \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1}) system_schema \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) system_distributed \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3}) system \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) ks \| true \| ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) system_traces \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2}) system_distributed_everywhere \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability): $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db \| jq [ { "keyspace_name": "system_schema", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } }, { "keyspace_name": "system", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } } ] Select a specific field in a specific partition using the command-line: $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) Select a specific field in a specific partition using ``--query-file``: $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) New functionality: no backport needed. Closes scylladb/scylladb#22007 github.com:scylladb/scylladb: docs/operating-scylla: document scylla-sstable query test/cqlpy/test_tools.py: add tests for scylla-sstable query test/cqlpy/test_tools.py: make scylla_sstable() return table name also scylla-sstable: introduce the query command tools/utils: get_selected_operation(): use std::string for operation_options utils/rjson: streaming_writer: add RawValue() cql3/type_json: add to_json_type() test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()	2025-03-06 13:42:45 +02:00
Botond Dénes	5d63ef4d15	Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points. Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line. Closes scylladb/scylladb#22327 * github.com:scylladb/scylladb: tools: Add standard extensions and propagate to schema load cql_test_env: Use add all extensions instead of inidividually main: Move extensions adding to function tomstone_gc: Make validate work for tools	2025-02-26 13:52:47 +02:00
Botond Dénes	aba4d07c62	tools/utils: configure_tool_mode: set auto_handle_sigint_sigterm = false Disable seastar's built in handlers for SIGINT and SIGTERM and thus fall-back to the OS's default handlers, which terminate the process. This makes tool applications interruptable by SIGINT and SIGTERM. The default handler just terminates the tool app immediately and doesn't allow for cleanup, but this is fine: the tools have no important data to save or any critical cleanup to do before exiting. Fixes: scylladb/scylladb#16954 Closes scylladb/scylladb#22838	2025-02-17 23:28:18 +02:00
Botond Dénes	5e76dd90a9	tools/utils: get_selected_operation(): use std::string for operation_options tool_app_template::run() calls get_selected_operation() to obtain the operation (command) the user selected. To do this, get_selected_operation() does a CLI pre-parsing pass, with a minimal boost::program_options, so things like mixed positional/non-positional args are correctly handled. This code use `sstring` for generic operation-options. The problem is that boost doesn't allow values with spaces inside for non-std::string types. This therefore prevents such values from being used for any option downstream, because parsing would fail at this stage. Change the type to std::string to solve this problem.	2025-02-17 08:01:39 -05:00
Calle Wilund	48fda00f12	tools: Add standard extensions and propagate to schema load Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.	2025-01-15 12:10:23 +00:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Botond Dénes	ca956c0180	configure: enable the io_uring backend To be used by the tool apps -- also change the backend selected in tools::utils::configure_tool_mode(). We keep using the more mature AIO backend in ScyllaDB itself, so main.cc sets the linux_aio backend as the default one (the user can still change this, same as before).	2024-12-04 02:55:31 -05:00
Kefu Chai	bab12e3a98	treewide: migrate from boost::adaptors::transformed to std::views::transform now that we are allowed to use C++23. we now have the luxury of using `std::views::transform`. in this change, we: - replace `boost::adaptors::transformed` with `std::views::transform` - use `fmt::join()` when appropriate where `boost::algorithm::join()` is not applicable to a range view returned by `std::view::transform`. - use `std::ranges::fold_left()` to accumulate the range returned by `std::view::transform` - use `std::ranges::fold_left()` to get the maximum element in the range returned by `std::view::transform` - use `std::ranges::min()` to get the minimal element in the range returned by `std::view::transform` - use `std::ranges::equal()` to compare the range views returned by `std::view::transform` - remove unused `#include <boost/range/adaptor/transformed.hpp>` - use `std::ranges::subrange()` instead of `boost::make_iterator_range()`, to feed `std::views::transform()` a view range. to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. limitations: there are still a couple places where we are still using `boost::adaptors::transformed` due to the lack of a C++23 alternative for `boost::join()` and `boost::adaptors::uniqued`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21700	2024-12-03 09:41:32 +02:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Aleksandra Martyniuk	fb160afaf6	nodetool: add suboperations support Modify nodetool methods so that it support suboperations.	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	c6f8a0116a	nodetool: prepare operation related classes for suboperations Modify operation and add operation_action class so that information about suboperations is stored. It's a preparation for adding suboperations support to nodetool.	2024-08-29 13:53:39 +02:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Botond Dénes	12516b0861	tools/utils: make finding the operation command line option more flexible Currently all scylla-tools assume that the operation/command is in argv[1]. This is not very flexible, because most programs allow global options (that are not dependent on the current operation/command) to be passed before the operation name on the command line. Notably C*'s nodetool is one such program and indeed scripts and tests using nodetool do utilize this. This patch makes this more flexible. Instead of looking at argv[1], do an initial option parsing with boost::program_options to locate the operation parameter. This initial parser knows about the global options, and the operation positional argument. It allows for unrecognized positional and non-positional arguments, but only after the command. With this, any combination of global options + operation is allowed, in any order.	2024-03-20 02:11:47 -04:00
Botond Dénes	7ae98c586a	tools/utils: get_selected_operation(): remove alias param This method has a single caller, who always passes "operation". Just hard-code this into the method, no need to keep a param for it.	2024-03-20 02:11:47 -04:00
Botond Dénes	28e7eecf0b	tools: add constant with current help command-line arguments Unfortunately, we have code in scylla-nodetool.cc which needs to know what are the current help options available. Soon, there will be more code like this in tools/utils.cc, so centralize this list in a const static tool_app_template member.	2024-03-20 02:11:47 -04:00
Botond Dénes	94dac43b2f	tools/utils: configure tools to use the epoll reactor backend The default AIO backend requires AIO blocks. On production systems, all available AIO blocks could have been already taken by ScyllaDB. Even though the tools only require a single unit, we have seen cases where not even that is available, ScyllDB having siphoned all of the available blocks. We could try to ensure all deployments have some spare blocks, but it is just less friction to not have to deal with this problem at all, by just using the epoll backend. We don't care about performance in the case of the tools anyway, so long as they are not unreasonably slow. And since these tools are replacing legacy tools written in Java, the bar is low. Closes scylladb/scylladb#17438	2024-02-21 11:58:09 +02:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Botond Dénes	76492407ab	tools/utils: tool_app_template: handle the case of no args Currently, tool_app_template::run_async() crashes when invoked with empty argv (with just argv[0] populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of argv[1] to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.	2023-12-19 04:08:33 -05:00
Botond Dénes	975c11a54b	tools/utils: tool_app_template: remove "scylla-" prefix from app name In other words, have all tools pass their name without the "scylla-" prefix to `tool_app_template::config::name`. E.g., replace "scylla-nodetool" with just "nodetool". Patch all usages to re-add the prefix if needed. The app name is just more flexible this way, some users might want the name without the "scylla-" prefix (in the next patch).	2023-12-19 04:04:57 -05:00
Calle Wilund	6de4e7af21	tools: Add db config + extensions to tool app run Initializes extensions for tools runs, allowing potentially more interaction with, say, sstables in some versions of scylla.	2023-10-30 10:20:53 +00:00
Botond Dénes	adb65e18a1	tools/scylla-*: use operation_option for positional options Use operation_option to describe positional options. The structure used before -- app_template::positional_option -- was not a good fit for this, as it was designed to store a description that is immediately passed to the boost::program_options subsystem and then discarded. As such, it had a raw pointer member, which was expected to be immediately wrapped by boost::shared_ptr<> by boost::program_options. This produced memory leaks for tools, for options that ended up not being used. To avoid this altogether, use operation_option, converting to the app_template::positional_option at the last moment.	2023-10-03 02:05:30 -04:00
Botond Dénes	c252ff4f03	tools/utils: add support for operation aliases Some operations may have additional names, beyond their "main". Add support for this.	2023-10-03 02:05:30 -04:00
Botond Dénes	caeddb9c88	tools/utils: return a distinct error-code on unknown operation Currently, the tools loosely follow the following convention on error-codes: * return 1 if the error is with any of the command-line arguments * return 2 on other errors This patch changes the returned error-code on unknown operation/command to 100 (instead of the previous 1). The intent is to allow any wrapper script to determine that the tool failed because the operation is unrecognized and not because of something else. In particular this should enable us to write a wrapper script for scylla-nodetool, which dispatches commands still un-implemented in scylla-nodetool, to the java nodetool. Note that the tool will still print an error message on an unknown operation. So such wrapper script would have to make sure to not let this bleed-through when it decides to forward the operation. Closes scylladb/scylladb#15517	2023-09-25 20:56:44 +03:00
Botond Dénes	4dd373b8d3	tools/utils: tool_app_template::run_async(): also detect --help* as --help Don't try to lookup the current operation if the first argument is --help*. This allows --help-seastar and --help-loggers to work.	2023-09-14 05:25:14 -04:00
Botond Dénes	2d26613f28	tools: move operation-options to the operations themselves Currently, operation-options are declared in a single global list, then operations refer to the options they support via name. This system was born at a time, when scylla-sstable had a lot of shared options between its operations, so it was desirable to declare them centrally and only add references to individual operations, to reduce duplication. However, as the dust settled, only 2 options are shared by 2 operations each. This is a very low benefit. Up to now the cost was also very low -- shared options meant the same in all operations that used them. However this is about to change and this system becomes very awkward to use as soon as multiple operations want to have an option with the same name, but sligthly (or very) different meaning/semantics. So this patch changes moves the options to the operations themselves. Each will declare the list of options it supports, without having to reference some common list. This also removes an entire (although very uncommon) class of bugs: option-name referring to inexistent option. Closes #14898	2023-07-31 20:16:41 +03:00
Kefu Chai	1c525c02a3	tools/utils: use std::shift_left() when appropriate instead of using a loop of std::swap(), let's use std::shift_left() when appropriate. simpler and more readable this way. moreover, the pattern of looking for a command and consume it from the command line resembles what we have in main(), so let's use similar logic to handle both of them. probably we can consolidate them in future. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14888	2023-07-31 09:46:52 +03:00
Botond Dénes	cbcb20f0f9	tools/utils: make get_selected_operation() and configure_tool_mode() private Their only user is in tools/utils.cc, so move them there, into an anonymous namespace.	2023-07-28 08:41:34 -04:00
Botond Dénes	89d7d80fce	tools: extract tool app skeleton to utils.hh The skeleton of the two existing scylla-native tools (scylla-types and scylla-sstable) is very similar. By skeleton, I mean all the boilerplate around creating and configuring a seastar::app_template, representing operations/command and their options, and presenting and selecting these. To facilitate code-sharing and quick development of any new tools, extract this skeleton from scylla-sstable.cc into tools/utils.hh, in the form of a new tool_app_template, which wraps a seastar::app_template and centralizes all the boilerplate logic in a single place. The extracted code is not a simple copy-paste, although many elements are simply copied. The original code is not removed yet.	2023-07-28 08:30:53 -04:00
Botond Dénes	6a0db84706	tools: use standard allocator Use the new seastar option to instruct seastar to not initialize and use the seastar allocator, relying on the standard allocator instead. Configure LSA with the standard allocator based segment store backend: * scylla-types reserves 1MB for LSA -- in theory nothing here should use LSA, but just in case... * scylla-sstable reserves 100MB for LSA, to avoid excessive trashing in the sstable index caches. With this, tools now should allocate memory on demand, without reserving a large chunk of (or all of) the available memory, as regular seastar apps do.	2022-09-16 13:07:01 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	015d09a926	tools: utils: add configure_tool_mode() Which configures seastar to act more appropriate to a tool app. I.e. don't act as if it owns the place, taking over all system resources. These tools are often run on a developer machine, or even next to a running scylla instance, we want them to be the least intrusive possible. Also use the new tool mode in the existing tools. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211220143104.132327-1-bdenes@scylladb.com>	2022-01-05 15:33:57 +02:00

32 Commits