db::config is pretty large (~32k) and there are four of them, blowing the stack. Fix by
allocating them on the heap.
It's not clear why this shows up on my system (clang 16) and not in the frozen toolchain.
Perhaps clang 16 is less able to reuse stack space.
Closes#14464
schema::get_sharder() does not use the correct sharder for
tablet-based tables. Code which is supposed to work with all kinds of
tables should obtain the sharder from erm::get_sharder().
Exposing scrub compaction to the command-line. Allows for offline scrub of sstables, in cases where online scrubbing (via scylla itself) is not possible or not desired. One such case recently was an sstable from a backup which turned out to be corrupt, `nodetool refresh --load-and-stream` refusing to load it.
Fixes: #14203Closes#14260
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable: document scrub operation
test/cql-pytest: test_tools.py: add test for scylla sstable scrub
tools/scylla-sstable: add scrub operation
tools/scylla-sstable: write operation: add none to valid validation levels
tools/scylla-sstable: handle errors thrown by the operation
test/cql-pytest: add option to omit scylla's output from the test output
tools/scylla-sstable: s/option/operation_option/
tool/scylla-sstable: add missing comments
Exposing scrub compaction to the command-line. Scrubbed sstables are
written into a directory specified by the `--output-directory` command
line parameter. This directory is expected to be empty, to avoid
clashes with any pre-existing sstables. This can be overriden by the
user if they wish.
Instead of letting the runtime catch them. Also, make sure all exception
throw due to bad arguments are instances of `std::invalid_argument`,
these are now reported differently from other, runtime errors.
Remove the now extraneous `error:` prefix from all exception messages.
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).
So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command
The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields
Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)
Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile
The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13963
* tools/cqlsh 8769c4c2...6e1000f1 (5):
> build: erase uid/gid information from tar archives
> Add github action to update the dockerhub description
> cqlsh: Add extension handler for "scylla_encryption_options"
> requirements.txt: update python-driver==3.26.0
> Add support for arm64 docker image
Closes#13878
CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports
an issue on how the default argument is evaluated. this problem is
more obvious when it comes to how `std::source_location::current()`
is evaluated as a default argument. but not all compilers have the
same behavior, see https://godbolt.org/z/PK865KdG4.
notebaly, clang-15 evaluates the default argument at the callee
site. so we need to check the capability of compiler and fall back
to the one defined by util/source_location-compat.hh if the compiler
suffers from CWG 2631. and clang-16 implemented CWG2631 in
https://reviews.llvm.org/D136554. But unfortunately, this change
was not backported to clang-15.
before switching over to clang-16, for using std::source_location::current()
as the default parameter and expect the behavior defined by CWG2631,
we have to use the compatible layer provided by Seastar. otherwise
we always end up having the source_location at the callee side, which
is not interesting under most circumstances.
so in this change, all places using the idiom of passing
std::source_location::current() as the default parameter are changed
to use seastar::compat::source_location::current(). despite that
we have `#include "seastarx.h"` for opening the seastar namespace,
to disambiguate the "namespace compat" defined somewhere in scylladb,
the fully qualified name of
`seastar::compat::source_location::current()` is used.
see also 09a3c63345, where we used
std::source_location as an alias of std::experimental::source_location
if it was available. but this does not apply to the settings of our
current toolchain, where we have GCC-12 and Clang-15.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14086
There are few places in sstables/ code that require caller to specify priority class to pass it along to file stream options. All these callers use default class, so it makes little sense to keep it. This change makes the sched classes unification mega patch a bit smaller.
ref: #13963Closes#13996
* github.com:scylladb/scylladb:
sstables: Remove default prio class from rewrite_statistics()
sstables: Remove prio class from validate_checksums subs
sstables: Remove always default io-prio from validate_checksums()
All calls to sstables::validate_checksums() happen with explicitly
default priority class. Just hard-code it as such in the method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of creators of index_reader construct it with default prio class,
null trace pointer and use_caching::yes. Assigning implicit defaults to
constructor arguments keeps the code shorter and easier to read.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
sstables_manager::get_component_lister() is used by sstable_directory.
and almost all the "ingredients" used to create a component lister
are located in sstable_directory. among the other things, the two
implementations of `components_lister` are located right in
`sstable_directory`. there is no need to outsource this to
sstables_manager just for accessing the system_keyspace, which is
already exposed as a public function of `sstables_manager`. so let's
move this helper into sstable_directory as a member function.
with this change, we can even go further by moving the
`components_lister` implementations into the same .cc file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13853
Currently the validate command uses the logger to output the result of
validation. This is inconsistent with other commands which all write
their output to stdout and log any additional information/errors to
stderr. This patch updates the validate command to do the same. While at
it, remove the "Validating..." message, it is not useful.
After the addition of the rust-std-static-wasm32-wasi target, we're
able to compile the Rust programs to Wasm binaries. However, we're still
only able to handle the Wasm UDFs in the Text format, so we need a tool
to translate the .wasm files to .wat. Additionally, the .wasm files
generated by default are unnecessarily large, which can be helped
using wasm-opt and wasm-strip.
The tool for translating wasm to wat (wasm2wat), and the tool for
stripping the wasm binaries (wasm-strip) are included in the `wabt`
package, and the optimization tool (wasm-opt) is included in the
`binaryen` package. Both packages are added to install-dependencies.sh
Closes#13282
[avi: regenerate frozen toolchain]
Closes#13605
scylla-sstable currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like qurantine, staging etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13448
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add tests for schema loading
test/cql-pytest: add no_autocompaction_context
docs: scylla-sstable.rst: remove accidentally added copy-pasta
docs: scylla-sstable.rst: remove paragraph with schema limitations
docs: scylla-sstable.rst: update schema section
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
So far, schema had to be provided via a schema.cql file, a file which
contains the CQL definition of the table. This is flexible but annoying
at the same time. Many times sstables the tool operates on are located
in their table directory in a scylla data directory, where the schema
tables are also available. To mitigate this, an alternative method to
load the schema from memory was added which works for system tables.
In this commit we extend this to work for all kind of tables: by
auto-detecting where the scylla data directory is, and loading the
schema tables from disk.
Allows loading the schema for the designated keyspace and table, from
the system table sstables located on disk. The sstable files opened for
read only.
SSTable summary is one of the components fully loaded into memory that may have a significant footprint.
This series reduces the summary footprint by reducing the amount of token information that we need to keep
in memory for each summary entry.
Of course, the benefit of this size optimization is proportional to the amount of summary entries, which
in turn is proportional to the number of partitions in a SSTable.
Therefore we can say that this optimization will benefit the most tables which have tons of small-sized
partitions, which will result in big summaries.
Results:
```
BEFORE
[1000000 pkeys] data size: 4035888890, summary -> memory footprint: 5843232, entries: 88158
[10000000 pkeys] data size: 40368888890, summary -> memory footprint: 55787128, entries: 844925
AFTER
[1000000 pkeys] data size: 4035888890, summary -> memory footprint: 4351536, entries: 88158
[10000000 pkeys] data size: 40368888890, summary -> memory footprint: 42211984, entries: 844925
```
That shows a 25% reduction in footprint, for both 1 and 10 million pkeys.
Closes#13447
* github.com:scylladb/scylladb:
sstables: Store raw token into summary entries
sstables: Don't store token data into summary's memory pool
This patch adds storage options lw-ptr to sstables_manager::make_sstable
and makes the storage instance creation depend on the options. For local
it just creates the filesystem storage instance, for S3 -- throws, but
next patch will fix that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Scylla stores a dht::token into each summary entry, for convenience.
But that costs us 16 bytes for each summary entry. That's because
dht::token has a kind field in addition to data, both 64 bits.
With 1kk partitions, each averaging 4k bytes, summary may end up
with ~90k summary entries. So dht::token only will add ~1.5M to the
memory footprint of summary.
We know summary samples index keys, therefore all tokens in all
summary entries cannot have any token kind other than 'key'.
Therefore, we can save 8 bytes for each summary entry by storing
a 64-bit raw token and converting it back into token whenever
needed.
Memory footprint of summary entries in a summary goes from
sizeof(summary_entry) * entries.size(): 1771520
to
sizeof(summary_entry) * entries.size(): 1417216
which is explained by the 8 bytes reduction per summary entry.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.
This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
When loading a schema from disk, only the `tables` and `columns` tables
are required to have an entry to the loaded schema. All the others are
optional. Yet the schema loader expects all the tables to have a
corresponding entry, which leads to errors when trying to load a schema
which doesn't. Relax the loader to only require existing entries in the
two mandatory tables and not the others.
Closes#13393
The forward_service.hh and raft_group0_client.hh can be replaced with
forward declarations. Few other files need their previously indirectly
included headers back.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13384
Preparing for #10459, this series defines sstables::generation_type::int_t
as `int64_t` at the moment and use that instead of naked `int64_t` variables
so it can be changed in the future to hold e.g. a `std::variant<int64_t, sstables::generation_id>`.
sstables::new_generation was defined to generation new, unique generations.
Currently it is based on incrementing a counter, but it can be extended in the future
to manufacture UUIDs.
The unit tests are cleaned up in this series to minimize their dependency on numeric generations.
Basically, they should be used for loading sstables with hard coded generation numbers stored under `test/resource/sstables`.
For all the rest, the tests should use existing and mechanisms introduced in this series such as generation_factory, sst_factory and smart make_sstable methods in sstable_test_env and table_for_tests to generate new sstables with a unique generation, and use the abstract sst->generation() method to get their generation if needed, without resorting the the actual value it may hold.
Closes#12994
* github.com:scylladb/scylladb:
everywhere: use sstables::generation_type
test: sstable_test_env: use make_new_generation
sstable_directory::components_lister::process: fixup indentation
sstables: make highest_generation_seen return optional generation
replica: table: add make_new_generation function
replica: table: move sstable generation related functions out of line
test: sstables: use generation_type::int_t
sstables: generation_type: define int_t
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13075
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
test/cql-pytest: test_tools.py: add test for schema loading
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
this helps to use OpenJDK 11 instead of OpenJDK 8 for running scylla-jmx,
in hope to alleviate the pain of the crashes found in the JRE shipped along
with OpenJDK 8, as it is aged, and only security fixes are included now.
* tools/jmx 88d9bdc...48e1699 (3):
> Merge 'dist/redhat: support jre 11 instead of jre 8' from Kefu Chai
> install.sh: point java to /usr/bin/java
> Merge 'use OpenJDK 11 instead of OpenJDK 8' from Kefu Chai
Refs https://github.com/scylladb/scylla-jmx/issues/194
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13356