Commit Graph

639 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
0c6a3f568a compaction: delete default_compaction_progress_monitor
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.

Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.

Closes scylladb/scylladb#15800
2023-10-23 16:03:34 +03:00
Botond Dénes
4b57c2bf18 tools/scylla-nodetool: implement compactionhistory command 2023-10-20 10:55:38 -04:00
Botond Dénes
a212ddc5b1 tools/scylla-nodetool: implement stop command 2023-10-20 10:04:56 -04:00
Israel Fruchter
41c80929eb Update tools/cqlsh submodule
* tools/cqlsh 66ae7eac...426fa0ea (8):
  > Updated Scylla Driver[Issue scylladb/scylla-cqlsh#55]
  > copyutil: closing the local end of pipes after processes starts
  > setup.py: specify Cython language_level explicitly
  > setup.py: pass extensions as a list
  > setup.py: reindent block in else branch
  > setup.py: early return in get_extension()
  > reloc: install build==0.10.0
  > reloc: add --verbose option to build_reloc.sh

Fixes: https://github.com/scylladb/scylla-cqlsh/issues/37

Closes scylladb/scylladb#15685
2023-10-11 17:29:23 +03:00
Avi Kivity
854188a486 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes scylladb/scylladb#15414

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-10-05 22:39:34 +03:00
Avi Kivity
197b7590df Update tools/jmx submodule
* tools/jmx d107758...8d15342 (2):
  > Revert "install-dependencies.sh: do not install weak dependencies"
  > install-dependencies.sh: do not install weak dependencies Especially for Java, we really do not need the tens of packages and MBs it adds, just because Java apps can be built and use sound and graphics and whatnot.
2023-10-05 22:36:54 +03:00
Avi Kivity
ee57f69b17 Update tools/java submodule
* tools/java 9dddad27bf...3c09ab97a9 (1):
  > nodetool: parse and forward -h|--host to nodetool
2023-10-05 22:35:58 +03:00
Botond Dénes
96787ec0a5 Merge 'Do not keep excessive info on sstables::entry_descriptor' from Pavel Emelyanov
The descriptor in question is used to parse sstable's file path and return back the result. Parser, among "relevant" info, also parses sstable directory and keyspace+table names. However, there are no code (almost) that needs those strings. And the need to construct descriptor with those makes some places obscurely use empty strings.

The PR removes sstable's directory, keyspace and table names from descriptor and, while at it, relaxes the sstable directory code that makes descriptor out of a real sstable object by (!) parsing its Data file path back.

Closes scylladb/scylladb#15617

* github.com:scylladb/scylladb:
  sstables: Make descriptor from sstable without parsing
  sstables: Do not keep directory, keyspace and table names on descriptor
  sstables: Make tuple inside helper parser method
  sstables: Do not use ks.cf pair from descriptor
  sstables: Return tuple from parse_path() without ks.cf hints
  sstables: Rename make_descriptor() to parse_path()
2023-10-05 15:15:23 +03:00
Pavel Emelyanov
14ee59fb04 sstables: Do not use ks.cf pair from descriptor
There's only one place that needs ks.cf pair from the parsed desctipror
-- sstables loader from tools/. This code already has ks.cf from the
tuple returned after parsing and can use them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 12:21:01 +03:00
Pavel Emelyanov
62d71d398f sstables: Return tuple from parse_path() without ks.cf hints
There are two path parsers. One of them accepts keyspace and table names
and the other one doesn't. The latter is then supposed to parse the
ks.cf pair from path and put it on the descriptor. This patch makes this
method return ks.cf so that later it will be possible to remove these
strings from the desctiptor itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 12:21:00 +03:00
Pavel Emelyanov
d56f9db121 sstables: Rename make_descriptor() to parse_path()
The method really parses provided path, so the existing name is pretty
confusing. It's extra confusing in the table::get_snapshot_details()
where it's just called and the return value is simply ignored.

Named "parse_..." makes it clear what the method is for.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 11:04:07 +03:00
Avi Kivity
6d5823e8f5 Regenerate frozen toolchain for new Python driver
Update to scylla-driver 3.26.3.

Closes scylladb/scylladb#15629
2023-10-05 10:09:53 +03:00
Botond Dénes
62cdc36a74 tools/scylla-nodetool: implement help operation
Nodetool considers "help" to be just another operation. So implement it
as such. The usual --help and --help <command> is also supported.
2023-10-04 05:27:09 -04:00
Botond Dénes
1efabca515 tools/scylla-nodetool: implement the traceprobability commands
gettraceprobability and settraceprobability
2023-10-04 05:27:09 -04:00
Botond Dénes
25d41f72c4 tools/scylla-nodetool: implement the gossip commands
disablegossip, enablegossip and statusgossip
2023-10-04 05:27:09 -04:00
Botond Dénes
5bc25dbebe tools/scylla-nodetool: implement the binary commands
disablebinary, enablebinary and statusbinary
2023-10-04 05:27:09 -04:00
Botond Dénes
2ac1705c90 tools/scylla-nodetool: implement backup related commands
disablebackup, enablebackup and statusbackup
2023-10-04 05:27:09 -04:00
Botond Dénes
91e62413c8 tools/scylla-nodetool: implement version command 2023-10-04 05:27:09 -04:00
Botond Dénes
4f66e0208b tools/scylla-nodetool: compact: remove --partition argument
This argument is not recognized by the current nodetool either. It is
mentioned only in our documentation, but it should be removed from there
too.
2023-10-04 05:08:33 -04:00
Botond Dénes
2ddf28b8e5 tools/scylla-nodetool: scylla_rest_client: add support delete method 2023-10-04 05:07:03 -04:00
Botond Dénes
7dc77d03af tools/scylla-nodetool: get rid of check_json_type()
This check is redundant. Originally it was intended to work around by
rapidjson using an assert by default to check that the fields have the
expected type. But turns out we already configure rapidjson to use a
plain exception in utils/rjson.hh, so check_json_type() is not needed
for graceful error handling.
2023-10-03 02:05:30 -04:00
Botond Dénes
fdecea5480 tools/scylla-nodetool: log more details for failed requests
Instead of the unhelpful "Unexpected reply status", log what the request
was and what is the response status code.
2023-10-03 02:05:30 -04:00
Botond Dénes
adb65e18a1 tools/scylla-*: use operation_option for positional options
Use operation_option to describe positional options. The structure used
before -- app_template::positional_option -- was not a good fit for
this, as it was designed to store a description that is immediately
passed to the boost::program_options subsystem and then discarded.
As such, it had a raw pointer member, which was expected to be
immediately wrapped by boost::shared_ptr<> by boost::program_options.
This produced memory leaks for tools, for options that ended up not
being used. To avoid this altogether, use operation_option, converting
to the app_template::positional_option at the last moment.
2023-10-03 02:05:30 -04:00
Botond Dénes
c252ff4f03 tools/utils: add support for operation aliases
Some operations may have additional names, beyond their "main". Add
support for this.
2023-10-03 02:05:30 -04:00
Botond Dénes
caeddb9c88 tools/utils: return a distinct error-code on unknown operation
Currently, the tools loosely follow the following convention on
error-codes:
* return 1 if the error is with any of the command-line arguments
* return 2 on other errors

This patch changes the returned error-code on unknown operation/command
to 100 (instead of the previous 1). The intent is to allow any wrapper
script to determine that the tool failed because the operation is
unrecognized and not because of something else. In particular this
should enable us to write a wrapper script for scylla-nodetool, which
dispatches commands still un-implemented in scylla-nodetool, to the java
nodetool.
Note that the tool will still print an error message on an unknown
operation. So such wrapper script would have to make sure to not let
this bleed-through when it decides to forward the operation.

Closes scylladb/scylladb#15517
2023-09-25 20:56:44 +03:00
Botond Dénes
e723fb3017 tools/scylla-sstable: write: abort parser thread if writing fails
Currently if writing the sstable fails, e.g. because the input data is
out-of-order, the json parser thread hangs because its output is no
longer consumed. This results in the entire application just freezing.
Fix this by aborting the parsing thread explicitely in the
json_mutation_stream_parser destructor. If the parser thread existed
successfully, this will be a no-op, but on the error-path, this will
ensure that the parser thread doesn't hang.
2023-09-22 02:53:15 -04:00
Avi Kivity
47a1dc8d01 Update seastar submodule
* seastar 576ee47d...bab1625c (13):
  > build: s/{dpdk_libs}/${dpdk_libs}/
  > build: build with dpdk v23.07
  > scripts: Fix escaping of regexes in addr2line
  > linux-aio: print more specific error when setup_aio fails
  > linux-aio: correct the error message raised when io_setup() fails
  > build: reenable -Warray-bound compiling option
  > build: error out if find_program() fails
  > build: enable systemtap only if it is available
  > build: check if libucontext is necessary for using ucontext functions
  > smp: reference correct variable when fetch_or()
  > build: use target_compile_definitions() for adding -D...
  > http/client: pass tls_options to tls::connect()
  > Merge 'build, process: avoid using stdout or stderr as C++ identifiers' from Kefu Chai

Frozen toolchain regenerated for new Seastar depdendencies.

configure.py adjusted for new Seastar arch names.

Closes scylladb/scylladb#15476
2023-09-20 10:43:40 +02:00
Botond Dénes
bb7121a1fb Merge 'tools/scylla-nodetools: do not create unowned bpo::value ' from Kefu Chai
in other words, do not create bpo::value unless transfer it to an
option_description.

`boost::program_options::value()` create a new typed_value<T> object,
without holding it with a shared_ptr. boost::program_options expects
developer to construct a `bpo::option_description` right away from it.
and `boost::program_options::option_description` takes the ownership
of the `type_value<T>*` raw pointer, and manages its life cycle with
a shared_ptr. but before passing it to a `bpo::option_description`,
the pointer created by `boost::program_options::value()` is a still
a raw pointer.

before this change, we initialize `operations_with_func` as global
variables using `boost::program_options::value()`. but unfortunately,
we don't always initialize a `bpo::option_description` from it --
we only do this on demand when the corresponding subcommand is
called.

so, if the corresponding subcommand is not called, the created
`typed_value<T>` objects are leaked. hence LeakSanitizer warns us.

after this change, we create the option map as a static
local variable in a function so it is created on demand as well.
as an alternative, we could initialize the options map as local
variable where it used. but to be more consistent with how
`global_option` is specified. and to colocate them in a single
place, let's keep the existing code layout.

this change is quite similar to 374bed8c3d

Fixes https://github.com/scylladb/scylladb/issues/15429

Closes scylladb/scylladb#15430

* github.com:scylladb/scylladb:
  tools/scylla-nodetools: reindent
  tools/scylla-nodetools: do not create unowned bpo::value
2023-09-18 11:09:46 +03:00
Kefu Chai
a03dc92cb5 tools/scylla-nodetools: reindent
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-18 13:57:37 +08:00
Kefu Chai
ed41c725f3 tools/scylla-nodetools: do not create unowned bpo::value
in other words, do not create bpo::value unless transfer it to an
option_description.

`boost::program_options::value()` create a new typed_value<T> object,
without holding it with a shared_ptr. boost::program_options expects
developer to construct a `bpo::option_description` right away from it.
and `boost::program_options::option_description` takes the ownership
of the `type_value<T>*` raw pointer, and manages its life cycle with
a shared_ptr. but before passing it to a `bpo::option_description`,
the pointer created by `boost::program_options::value()` is a still
a raw pointer.

before this change, we initialize `operations_with_func` as global
variables using `boost::program_options::value()`. but unfortunately,
we don't always initialize a `bpo::option_description` from it --
we only do this on demand when the corresponding subcommand is
called.

so, if the corresponding subcommand is not called, the created
`typed_value<T>` objects are leaked. hence LeakSanitizer warns us.

after this change, we create the option map as a static
local variable in a function so it is created on demand as well.
as an alternative, we could initialize the options map as local
variable where it used. but to be more consistent with how
`global_option` is specified. and to colocate them in a single
place, let's keep the existing code layout.

this change is quite similar to 374bed8c3d

Fixes #15429
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-18 13:57:37 +08:00
Avi Kivity
67a0c865cf tools: toolchain: prepare: don't overwrite existing images
The docker/podman tooling is destructive: it will happily
overwrite images locally and on the server. If a maintainer
forgets to update tools/toolchain/image, this can result
in losing an older toolchain container image.

To prevent that, check that the image name is new.

Closes scylladb/scylladb#15397
2023-09-18 08:35:01 +03:00
Kefu Chai
1e6b2eb4c8 tools/scylla-nodetool: mark format string as constexpr
this change change `const` to `constexpr`. because the string literal
defined here is not only immutable, but also initialized at
compile-time, and can be used by constexpr expressions and functions.

this change is introduced to reduce the size of the change when moving
to compile-time format string in future. so far, seastar::format() does
not use the compile-time format string, but we have patches pending on
review implementing this. and the author of this change has local
branches implementing the changes on scylla side to support compile-time
format string, which practically replaces most of the `format()` calls
with `seastar::format()`.

without this change, if we use compile-time format check, compiler fails
like:

```
/home/kefu/dev/scylladb/tools/scylla-nodetool.cc:276:44: error: call to consteval function 'fmt::basic_format_string<char, const char *const &, seastar::basic_sstring<char, unsigned int, 15>>::basic_format_string<const char *, 0>' is not a constant expression
            .description = seastar::format(description_template, app_name, boost::algorithm::join(operations | boost::adaptors::transformed([] (const auto& op) {
                                           ^
/usr/include/fmt/core.h:3148:67: note: read of non-constexpr variable 'description_template' is not allowed in a constant expression
  FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) {
                                                                  ^
/home/kefu/dev/scylladb/tools/scylla-nodetool.cc:276:44: note: in call to 'basic_format_string(description_template)'
            .description = seastar::format(description_template, app_name, boost::algorithm::join(operations | boost::adaptors::transformed([] (const auto& op) {
                                           ^
/home/kefu/dev/scylladb/tools/scylla-nodetool.cc:258:16: note: declared here
    const auto description_template =
               ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15432
2023-09-15 19:28:38 +03:00
Kefu Chai
6c75dc4be8 tools/scylla-nodetool: do not compare unsigned with int
change the loop variable to `int` to silence warning like

```
/home/kefu/.local/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -I/home/kefu/dev/scylladb/build/cmake/gen -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere  -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o -MF tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o.d -o tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o -c /home/kefu/dev/scylladb/tools/scylla-nodetool.cc
/home/kefu/dev/scylladb/tools/scylla-nodetool.cc:215:28: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
    for (unsigned i = 0; i < argc; ++i) {
                         ~ ^ ~~~~
```

`i` is used as the index in a plain C-style array, it's perfectly fine
to use a signed integer as index in this case. as per C++ standard,

> The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15431
2023-09-15 19:28:14 +03:00
Botond Dénes
b87660f90c tools/scylla-sstable: log where schema was obtained from
Currently, we only log anything about what was tried w.r.t. obtaining
the schema if it failed. Add a log message to the success path too, so
in case the wrong schema was successfully loaded, the user can find the
problem.
The log message is printed with debug-level, so it doesn't distrurb
output by default.

Fixes: #15384

Closes #15417
2023-09-14 23:09:30 +03:00
Avi Kivity
d9a453e72e Merge 'Introduce a scylla-native nodetool' from Botond Dénes
This series introduces a scylla-native nodetool.  It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands.
For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable..

Closes #14841

* github.com:scylladb/scylladb:
  test: add nodetool tests
  test.py: add ToolTestSuite and ToolTest
  tools/scylla-nodetool: implement compact operation
  tools/scylla-nodetool: implement basic scylla_rest_api_client
  tools: introduce scylla-nodetool
  utils: export dns_connection_factory from s3/client.cc to http.hh
  utils/s3/client: pass logger to dns_connection_factory in constructor
  tools/utils: tool_app_template::run_async(): also detect --help* as --help
2023-09-14 17:20:40 +03:00
Benny Halevy
a5a22fe5b7 tools/scylla-sstable: load_sstables: handle load errors
Currently, exceptions thrown from `sst->load` are unhandled,
resulting in, e.g.:
```
ERROR 2023-09-12 08:02:58,124 [shard 0:main] seastar - Exiting on unhandled exception: std::runtime_error (SSTable /home/bhalevy/.dtest/dtest-dxg4xdxg/test/node1/data/ks/cf-a3009f20512911ee8000d81cd2da3fd7/me-3g9b_0e0x_39vtt1y2rcqrffz55j-big-Data.db uses org.apache.cassandra.dht.Murmur3Partitioner partitioner which is different than com.scylladb.dht.CDCPartitioner partitioner used by the database)
```

Log the errors and exit the tool with non-zero status
in this case.

Fixes #15359

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #15376
2023-09-14 14:27:38 +03:00
Botond Dénes
60dc2e9303 tools/scylla-nodetool: implement compact operation
Equivalent of nodetool compact.
The following arguments are accepted:
* split-output,s (unused)
* user-defined (error is raised)
* start-token,st (unused)
* end-token,et (unused)
* partition (unused)

The partition argument is mentioned only in our doc, our nodetool
doesn't recognize it. I added it nevertheless (it is ignored).
Split-output doesn't work with our current nodetool, attempting to use
it will result in an error. The option is parsed but an error is used if
used.
2023-09-14 05:25:14 -04:00
Botond Dénes
d67e22b876 tools/scylla-nodetool: implement basic scylla_rest_api_client
Add --host and --port parameters, parse and resolve these and
establish a connection to the provided host.
Add a simple get() and post() method, parsing the returned data as json.

Add the following compatibility arguments:
* password,pw
* password-file,pwf
* username,u
* print-port,pp

These are parsed and silently ignored, as they are specific to JMX and
aren't needed when connecting to the REST API.
Since boost program options doesn't support multi-char short-form
switches, as well as the -short=value syntax, the argv has to be massaged
into a form which boost program options can digest. This is achieved by
substituting all incompatible option formats and syntax with the
equivalent boost program options compatible one.
This mechanism is also used to make sure -h is translated to --host, not
--help. The help message is unfortunately still ambiguous, displaying
both with -h. This will be addressed in a follow-up.
2023-09-14 05:25:14 -04:00
Botond Dénes
eb1beca1b6 tools: introduce scylla-nodetool
This patch only introuces the bare skeleton of the tool, plus the wiring
into main.
No operations are added yet, they will be added in later patches.
2023-09-14 05:25:14 -04:00
Botond Dénes
4dd373b8d3 tools/utils: tool_app_template::run_async(): also detect --help* as --help
Don't try to lookup the current operation if the first argument is
--help*. This allows --help-seastar and --help-loggers to work.
2023-09-14 05:25:14 -04:00
Kamil Braun
bff9cedef9 Merge 'system_keyspace: remove flushes when writing to system tables' from Petr Gusev
There are several system tables with strict durability requirements.
This means that if we have written to such a table, we want to be sure
that the write won't be lost in case of node failure. We currently
accomplish this by accompanying each write to these tables with
`db.flush()` on all shards. This is expensive, since it causes all the
memtables to be written to sstables, which causes a lot of disk writes.
This overheads can become painful during node startup, when we write the
current boot state to `system.local`/`system.scylla_local` or during
topology change, when `update_peer_info`/`update_tokens` write to
`system.peers`.

In this series we remove flushes on writes to the `system.local`,
`system.peers`, `system.scylla_local` and `system.cdc_local` tables and
start using schema commitlog for durability.

Fixes: #15133

Closes #15279

* github.com:scylladb/scylladb:
  system_keyspace: switch CDC_LOCAL to schema commitlog
  system_keyspace: scylla_local: use schema commitlog
  database.cc: make _uses_schema_commitlog optional
  system_keyspace: drop load phases
  database.hh: add_column_family: add readonly parameter
  schema_tables: merge_tables_and_views: delay events until tables/views are created on all shards
  system_keyspace: switch system.peers to schema commitlog
  system_keyspace: switch system.local to schema commitlog
  main.cc: move schema commitlog replay earlier
  sstables_format_selector: extract listener
  sstables_format_selector: wrap when_enabled with seastar::async
  main.cc: inline and split system_keyspace.setup
  system_keyspace: refactor save_system_schema function
  system_keyspace: move initialize_virtual_tables into virtual_tables.hh
  system_keyspace: remove unused parameter
  config.cc: drop db::config::host_id
  main.cc:: extract local_info initialization into function
  schema.cc: check static_props for sanity
  system_keyspace: set null sharder when configuring schema commitlog
  system_keyspace: rename static variables
  system_keyspace: remove redundant wait_for_sync_to_commitlog
2023-09-14 10:39:20 +02:00
Kefu Chai
25457fca38 Update tools/cqlsh submodule
* tools/cqlsh 66ae7eac...e651e12e (6):
  > setup.py: specify Cython language_level explicitly
  > setup.py: pass extensions as a list
  > setup.py: reindent block in else branch
  > setup.py: early return in get_extension()
  > reloc: install build==0.10.0
  > reloc: add --verbose option to build_reloc.sh

Closes #15401
2023-09-14 10:30:07 +02:00
Yaniv Kaul
6c67c270c8 Update node exporter to v1.6.1
Fixes: https://github.com/scylladb/scylladb/issues/15044

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes #15045

[avi: toolchain regenerated; also pulls in clang-16.0.6-3]

Ref #15090

Closes #15392
2023-09-14 01:04:14 +03:00
Petr Gusev
b90011294d config.cc: drop db::config::host_id
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.

Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.

We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
2023-09-13 23:00:15 +04:00
Botond Dénes
7e7101c180 Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes"
This reverts commit 628e6ffd33, reversing
changes made to 45ec76cfbf.

The test included with this PR is flaky and often breaks CI.
Revert while a fix is found.

Fixes: #15371
2023-09-13 10:45:37 +03:00
Kefu Chai
571fab4179 build: cmake: build cqlsh as a submodule
since we also redistribute cqlsh, let's package it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
111d20958e build: cmake: build python3 dist tarball with arch postfix
now that `configure.py` always generate python3 dist tarball with
${arch} postfix, let's mirror this behavior. as `build_unified.sh`
uses this naming convention.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
34e3302c01 dbuild: use --userns option when using podman
instead of fabricating a `/etc/password` manually, we can just
leave it to podman to add an entry in `/etc/password` in container.
as podman allows us to map user's account to the same UID in the
container. see
https://docs.podman.io/en/stable/markdown/options/userns.container.html.

this is not only a cosmetic change, it also avoid the permission denied
failure when accessing `/etc/passwd` in the container when selinux is
enabled. without this change, we would otherwise need to either add the
selinux lable to the bind volume with ':Z' option address the failure
like:

```
type=AVC msg=audit(1693449115.261:2599): avc:  denied  { open } for  pid=2298247 comm="bash" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1693449115.263:2600): avc:  denied  { open } for  pid=2298249 comm="id" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
```

found in `/var/log/audit/audit.log`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15230
2023-09-11 21:41:48 +03:00
Avi Kivity
b8a655f55e Update tools/python3 submodule
* tools/python3 45fbd05...3e833f1 (1):
  > install.sh: replace <tab> with spaces
2023-09-11 21:38:02 +03:00
Avi Kivity
628e6ffd33 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes #14923

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-09-11 19:20:19 +03:00