Commit Graph

1306 Commits

Author SHA1 Message Date
Botond Dénes
bf6186ed7e Update tools/java submodule
* tools/java 9f63a96f...585b30fd (1):
  > cassandra-stress: add support for using RackAwareRoundRobinPolicy
2023-07-20 18:13:32 +03:00
Botond Dénes
8916aa311e Merge 'build: cmake: build: cmake: build submodules ' from Kefu Chai
this series enables CMake to build submodules. it helps developers to build, for instance, the java tools on demand.

Closes #14751

* github.com:scylladb/scylladb:
  build: cmake: build submodules
  build: cmake: generate version files with add_custom_command()
2023-07-19 12:04:29 +03:00
Botond Dénes
665f69b80d tools,mutation: extract the low-level json utilities into mutation/json.hh
Soon, we will want to convert mutation fragments into json inside the
scylla codebase, not just in tools. To avoid scylla-core code having to
include tools/ (and link against it), move the low-level json utilities
into mutation/.
2023-07-19 01:28:28 -04:00
Botond Dénes
36bca5a6af tools/json_writer: fold SstableKey() overloads into callers
These are very simple methods, and we want to make the low lever writers
not depend on knowing the sstable type.
2023-07-19 01:28:28 -04:00
Botond Dénes
043b0f316f tools/json_writer: allow writing metadata and value separately
The values of cells are potentially very large and thus, when presenting
row content as json in SELECT * FROM MUTATION_FRAGMENTS($table) queries,
we want to separate metadata and cell values into separate columns, so
users can opt out from the potentially big values being included too.
To support this use-case, write(row) and its downstream write methods
get a new `include_value` flag, which defaults to true. When set to
false, cell values will not be included in the json output. At the same
time, new methods are added to convert only cell values of a row to
json.
2023-07-19 01:28:28 -04:00
Botond Dénes
1df004db8c tools/json_writer: split mutation_fragment_json_writer in two classes
1) mutation_partition_json_writer - containing all the low level
   utilities for converting sub-fragment level mutation components (such
   as rows, tombstones, etc.) and their components into json;
2) mutation_fragment_stream_json_writer - containing all the high level
   logic for converting mutation fragment streams to json;

The latter using the former behind the scenes. The goal is to enable
reuse of converting mutation-fragments into json, without being forced
to work around differences in how the mutation fragments are reprenented
in json, on the higher level.
2023-07-19 01:28:28 -04:00
Botond Dénes
0a5b67d6d9 tools/json_writer: allow passing custom std::ostream to json_writer
To allow for use-cases where the user wants to write the json into a
string.
2023-07-19 01:28:28 -04:00
Kefu Chai
959bfae665 build: cmake: build submodules
this mirrors what we have in the `build.ninja` generated by
`configure.py`. with this change, we can build for instance,
`dist-tool-tar` from the `build.ninja` generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-19 13:08:35 +08:00
Avi Kivity
bfaac3a239 Merge 'Make replace sstables implementations exception safe' from Benny Halevy
This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`.

Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state).

Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors.

Closes #14104

* github.com:scylladb/scylladb:
  leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution
  size_tiered_backlog_tracker: make log4 helper static
  size_tiered_backlog_tracker: define struct sstables_backlog_contribution
  size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
  compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
  compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
2023-07-17 12:32:27 +03:00
Harsh Soni
78c8e92170 dbuild: fix ulimits hard value for docker on osx
Docker-on-osx cannot parse "unlimited" as the hard limit value of ulimit, so, hardcode it to a fixed value.

Closes #14295
2023-07-17 10:30:39 +03:00
Tomasz Grabiec
f2ed9fcd7e schema_mutations, migration_manager: Ignore empty partitions in per-table digest
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d, it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
18f484cc753d17d1e3658bcb5c73ed8f319d32e8. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.
2023-07-03 23:06:55 +02:00
Avi Kivity
d88dfa0ad2 tools: scylla-sstable: fix stack overflow due to multiple db::config placed on the stack
db::config is pretty large (~32k) and there are four of them, blowing the stack. Fix by
allocating them on the heap.

It's not clear why this shows up on my system (clang 16) and not in the frozen toolchain.
Perhaps clang 16 is less able to reuse stack space.

Closes #14464
2023-07-01 09:21:05 +03:00
Benny Halevy
1a8cc84981 compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
To facilitate rollback on the error handling path,
to provide strong exception safety guarantees.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 13:27:18 +03:00
Tomasz Grabiec
ad983ac23d sstables: Compute sstable shards using sharder from erm when loading
schema::get_sharder() does not use the correct sharder for
tablet-based tables.  Code which is supposed to work with all kinds of
tables should obtain the sharder from erm::get_sharder().
2023-06-21 00:58:24 +02:00
Israel Fruchter
3889e9040c Update tools/cqlsh submodule
* tools/cqlsh 6e1000f1...2254e920 (2):
  > test: add support for testing cloud bundle option
  > Fix cloudconf handling

Closes #14259
2023-06-20 00:10:53 +03:00
Nadav Har'El
a66c407bf1 Merge 'scylla-sstable: add scrub operation' from Botond Dénes
Exposing scrub compaction to the command-line. Allows for offline scrub of sstables, in cases where online scrubbing (via scylla itself) is not possible or not desired. One such case recently was an sstable from a backup which turned out to be corrupt, `nodetool refresh --load-and-stream` refusing to load it.

Fixes: #14203

Closes #14260

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: scylla-sstable: document scrub operation
  test/cql-pytest: test_tools.py: add test for scylla sstable scrub
  tools/scylla-sstable: add scrub operation
  tools/scylla-sstable: write operation: add none to valid validation levels
  tools/scylla-sstable: handle errors thrown by the operation
  test/cql-pytest: add option to omit scylla's output from the test output
  tools/scylla-sstable: s/option/operation_option/
  tool/scylla-sstable: add missing comments
2023-06-19 15:40:51 +03:00
Botond Dénes
c294f2480c tools/scylla-sstable: add scrub operation
Exposing scrub compaction to the command-line. Scrubbed sstables are
written into a directory specified by the `--output-directory` command
line parameter. This directory is expected to be empty, to avoid
clashes with any pre-existing sstables. This can be overriden by the
user if they wish.
2023-06-16 06:20:14 -04:00
Botond Dénes
84aeb21297 tools/scylla-sstable: write operation: add none to valid validation levels
This validation level was added recently, but scylla sstable write
didn't know about it yet, fix that.
2023-06-16 06:20:14 -04:00
Botond Dénes
34f1827ffc tools/scylla-sstable: handle errors thrown by the operation
Instead of letting the runtime catch them. Also, make sure all exception
throw due to bad arguments are instances of `std::invalid_argument`,
these are now reported differently from other, runtime errors.
Remove the now extraneous `error:` prefix from all exception messages.
2023-06-16 06:20:14 -04:00
Botond Dénes
21d9fbe875 tools/scylla-sstable: s/option/operation_option/
A future include will bring in a type with a similar name, resulting in
a name clash. Avoid by renaming to something more specific.
2023-06-16 06:20:14 -04:00
Botond Dénes
f31bf152aa tool/scylla-sstable: add missing comments
Separating entries in the operation list (pretty hard to visually
separate without comments).
2023-06-16 06:20:14 -04:00
Kamil Braun
26cd3b9b78 data_dictionary: add get_version
The `replica::database` version simply calls `get_version`
on the real database.

The `schema_loader` version throws `bad_function_call`.
2023-06-15 09:48:54 +02:00
Botond Dénes
4191e97d19 Update tools/java submodule
* tools/java 0cbfeb03...9f63a96f (1):
  > s/egrep/grep -E/
2023-06-14 12:29:59 +03:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Israel Fruchter
1ce739b020 Update tools/cqlsh submodule
* tools/cqlsh 8769c4c2...6e1000f1 (5):
  > build: erase uid/gid information from tar archives
  > Add github action to update the dockerhub description
  > cqlsh: Add extension handler for "scylla_encryption_options"
  > requirements.txt: update python-driver==3.26.0
  > Add support for arm64 docker image

Closes #13878
2023-06-04 19:56:52 +03:00
Kefu Chai
82cac8e7cf treewide: s/std::source_location/seastar::compact::source_location/
CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports
an issue on how the default argument is evaluated. this problem is
more obvious when it comes to how `std::source_location::current()`
is evaluated as a default argument. but not all compilers have the
same behavior, see https://godbolt.org/z/PK865KdG4.

notebaly, clang-15 evaluates the default argument at the callee
site. so we need to check the capability of compiler and fall back
to the one defined by util/source_location-compat.hh if the compiler
suffers from CWG 2631. and clang-16 implemented CWG2631 in
https://reviews.llvm.org/D136554. But unfortunately, this change
was not backported to clang-15.

before switching over to clang-16, for using std::source_location::current()
as the default parameter and expect the behavior defined by CWG2631,
we have to use the compatible layer provided by Seastar. otherwise
we always end up having the source_location at the callee side, which
is not interesting under most circumstances.

so in this change, all places using the idiom of passing
std::source_location::current() as the default parameter are changed
to use seastar::compat::source_location::current(). despite that
we have `#include "seastarx.h"` for opening the seastar namespace,
to disambiguate the "namespace compat" defined somewhere in scylladb,
the fully qualified name of
`seastar::compat::source_location::current()` is used.

see also 09a3c63345, where we used
std::source_location as an alias of std::experimental::source_location
if it was available. but this does not apply to the settings of our
current toolchain, where we have GCC-12 and Clang-15.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14086
2023-05-30 15:10:12 +03:00
Botond Dénes
a35758607a Update tools/java submodule
* tools/java eb3c43f8...0cbfeb03 (1):
  > nodetool: add `--primary-replica-only` option to `refresh`
2023-05-29 23:03:25 +03:00
Botond Dénes
fc24685b4d Update tools/jmx submodule
* tools/jmx 1fd23b60...d1077582 (1):
  > Support `--primary-replica-only` option from `nodetool refresh`
2023-05-29 23:03:25 +03:00
Botond Dénes
2526b232f1 Merge 'Remove explicit default_priority_class() usage from sstable aux methods' from Pavel Emelyanov
There are few places in sstables/ code that require caller to specify priority class to pass it along to file stream options. All these callers use default class, so it makes little sense to keep it. This change makes the sched classes unification mega patch a bit smaller.

ref: #13963

Closes #13996

* github.com:scylladb/scylladb:
  sstables: Remove default prio class from rewrite_statistics()
  sstables: Remove prio class from validate_checksums subs
  sstables: Remove always default io-prio from validate_checksums()
2023-05-24 09:23:24 +03:00
Pavel Emelyanov
7396d9d291 sstables: Remove always default io-prio from validate_checksums()
All calls to sstables::validate_checksums() happen with explicitly
default priority class. Just hard-code it as such in the method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 13:54:31 +03:00
Pavel Emelyanov
2bb024c948 index_reader: Introduce and use default arguments to constructor
Most of creators of index_reader construct it with default prio class,
null trace pointer and use_caching::yes. Assigning implicit defaults to
constructor arguments keeps the code shorter and easier to read.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 11:29:04 +03:00
Kefu Chai
03be1f438c sstables: move get_components_lister() into sstable_directory
sstables_manager::get_component_lister() is used by sstable_directory.
and almost all the "ingredients" used to create a component lister
are located in sstable_directory. among the other things, the two
implementations of `components_lister` are located right in
`sstable_directory`. there is no need to outsource this to
sstables_manager just for accessing the system_keyspace, which is
already exposed as a public function of `sstables_manager`. so let's
move this helper into sstable_directory as a member function.

with this change, we can even go further by moving the
`components_lister` implementations into the same .cc file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13853
2023-05-18 08:43:35 +03:00
Botond Dénes
3ea521d21b Update tools/jmx submodule
* tools/jmx f176bcd1...1fd23b60 (1):
  > select-java: query java version using -XshowSettings
2023-05-16 18:04:35 +03:00
Botond Dénes
3256afe263 Update tools/jmx submodule
* tools/jmx 5f988945...f176bcd1 (1):
  > sstableinfo: change the type of generation to string

Refs: #13834
2023-05-15 09:59:40 +03:00
Botond Dénes
6bc5c4acf6 tools/scylla-sstables: write validation result to stdout
Currently the validate command uses the logger to output the result of
validation. This is inconsistent with other commands which all write
their output to stdout and log any additional information/errors to
stderr. This patch updates the validate command to do the same. While at
it, remove the "Validating..." message, it is not useful.
2023-05-04 03:13:07 -04:00
Botond Dénes
3e52f0681e tools/scylla-sstable: move away from scrub_validate_mode_validate_reader()
Use sstable::validate() directly instead. Since sstables have to be
validated individually, this means the operation looses the `--merge`
option.
2023-05-02 09:42:42 -04:00
Avi Kivity
a1b99d457f Update tools/jmx submodule (error handling when jdk not available)
* tools/jmx fdd0474...5f98894 (1):
  > install.sh: bail out if jdk is not available
2023-04-25 14:20:57 +02:00
Wojciech Mitros
b0fa59b260 build: add tools for optimizing the Wasm binaries and translating to wat
After the addition of the rust-std-static-wasm32-wasi target, we're
able to compile the Rust programs to Wasm binaries. However, we're still
only able to handle the Wasm UDFs in the Text format, so we need a tool
to translate the .wasm files to .wat. Additionally, the .wasm files
generated by default are unnecessarily large, which can be helped
using wasm-opt and wasm-strip.
The tool for translating wasm to wat (wasm2wat), and the tool for
stripping the wasm binaries (wasm-strip) are included in the `wabt`
package, and the optimization tool (wasm-opt) is included in the
`binaryen` package. Both packages are added to install-dependencies.sh

Closes #13282

[avi: regenerate frozen toolchain]

Closes #13605
2023-04-25 09:53:47 +02:00
Botond Dénes
fcd7f6ac5f Update tools/java submodule
* tools/java c9be8583...eb3c43f8 (1):
  > Use EstimatedHistogram in metricPercentilesAsArray
2023-04-21 14:31:38 +03:00
Avi Kivity
342cdb2a63 Update tools/jmx submodule (split Depends line)
* tools/jmx 15fd4ca...fdd0474 (1):
  > dist/debian: split Depends into multiple lines
2023-04-20 15:11:33 +03:00
Avi Kivity
6ca1b14488 Update tools/jmx submodule (drop java 8 on debian)
* tools/jmx 3316f7a...15fd4ca (1):
  > dist/debian: drop dependencies on jdk-8
2023-04-19 19:51:03 +03:00
Botond Dénes
ad065aaa62 Update tools/jmx submodule
* tools/jmx e9bfaabd...3316f7a9 (2):
  > select-java: avoid exec multiple paths
  > select-java: extract function out
2023-04-19 11:18:19 +03:00
Botond Dénes
de67978211 Update tools/jmx submodule
* tools/jmx 826da61d...e9bfaabd (1):
  > metrics: revert 'metrics: EstimatedHistogram::getValues() returns bucketOffsets'
2023-04-17 15:42:11 +03:00
Tomasz Grabiec
952b455310 Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
scylla-sstable currently has two ways to obtain the schema:

    * via a `schema.cql` file.
    * load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:

```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```

As seen above, subdirectories like qurantine, staging etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13448

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add tests for schema loading
  test/cql-pytest: add no_autocompaction_context
  docs: scylla-sstable.rst: remove accidentally added copy-pasta
  docs: scylla-sstable.rst: remove paragraph with schema limitations
  docs: scylla-sstable.rst: update schema section
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema
2023-04-14 16:46:26 +02:00
Botond Dénes
38d6635afd Update tools/java submodule
* tools/java eddef023...c9be8583 (1):
  > README.md: drop cqlsh from README.md
2023-04-14 11:53:16 +03:00
Botond Dénes
7586491e1e Update tools/jmx/ submodule
* tools/jmx/ 57c16938...826da61d (4):
  > install.sh: do not create /usr/scylla/jmx in nonroot mode
  > install.sh: remove "echo done"
  > reloc-pkg: rename symlinks/scylla-jmx to select-java
  > install.sh: select java executable at runtime
2023-04-14 11:47:54 +03:00
Botond Dénes
4eb1bb460a Update tools/python3 submodule
* tools/python3 d2f57dd9...30b8fc21 (1):
  > create-relocatable-package.py: fix timestamp of executable files
2023-04-14 11:39:17 +03:00
Botond Dénes
50ee4033a9 Update tools/jmx submodule
* tools/jmx 602329c9...57c16938 (1):
  > install.sh: replace tab with spaces
2023-04-12 13:28:23 +03:00
Botond Dénes
ffec1e5415 tools/scylla-sstable: reform schema loading mechanism
So far, schema had to be provided via a schema.cql file, a file which
contains the CQL definition of the table. This is flexible but annoying
at the same time. Many times sstables the tool operates on are located
in their table directory in a scylla data directory, where the schema
tables are also available. To mitigate this, an alternative method to
load the schema from memory was added which works for system tables.
In this commit we extend this to work for all kind of tables: by
auto-detecting where the scylla data directory is, and loading the
schema tables from disk.
2023-04-12 03:14:43 -04:00
Botond Dénes
fd4c2f2077 tools/schema_loader: add load_schema_from_schema_tables()
Allows loading the schema for the designated keyspace and table, from
the system table sstables located on disk. The sstable files opened for
read only.
2023-04-12 03:14:43 -04:00