Writing into sstable component output stream should be done with care. In particular -- flushing can happen only once right before closing the stream. Flushing the stream in between several writes is not going to work, because file stream would step on unaligned IO and S3 upload stream would send completion message to the server and would lose any subsequent write.
Most of the file_writer users already obey that and flush the writer once right before closing it. The do_write_simple() is extra careful about exceptions handling, but it's an overkill (see first patch).
It's better to make file_writer API explicitly lack the ability to flush itself by flushing the stream when closing the writer.
Closes#13338
* github.com:scylladb/scylladb:
sstables: Move writer flush into close (and remove it)
sstables: Relax exception handling in do_write_simple
This test currently uses `test/lib/test_table.hh` to generate data for its test cases. This data generation facility is used by no other tests. Worse, it is redundant as we already have a random data generator with fixed schema, in `test/lib/mutation_source_test.hh`. So in this series, we migrate the test cases in said test file to random schema and its random data generation facilities. These are used by several other test cases and using random schema allows us to cover a wider (quasi-infinite) number of possibilities.
After migrating all tests away from it, `test/lib/test_table.hh` is removed.
This series also reduces the runtime of `fuzzy_test` drastically. It should now run in a few minutes or even in seconds (depending on the machine).
Fixes: #12944Closes#12574
* github.com:scylladb/scylladb:
test/lib: rm test_table.hh
test/boos/multishard_mutation_query_test: migrate other tests to random schema
test/boost/multishard_mutation_query_test: use ks keyspace
test/boost/multishard_mutation_query_test: improve test pager
test/boost/multishard_mutation_query_test: refactor fuzzy_test
test/boost: add multishard_mutation_query_test more memory
types/user: add get_name() accessor
test/lib/random_schema: add create_with_cql()
test/lib/random_schema: fix udt handling
test/lib/random_schema: type_generator(): also generate frozen types
test/lib/random_schema: type_generator(): make static column generation conditional
test/lib/random_schema: type_generator(): don't generate duration_type for keys
test/lib/random_schema: generate_random_mutations(): add overload with seed
test/lib/random_schema: generate_random_mutations(): respect range tombstone count param
test/lib/random_schema: generate_random_mutations(): add yields
test/lib/random_schema: generate_random_mutations(): fix indentation
test/lib/random_schema: generate_random_mutations(): coroutinize method
test/lib/random_schema: generate_random_mutations(): expand comment
Every tracker insertion has to have a corresponding removal or eviction,
(otherwise the number of rows in the tracker will be misaccounted).
If we add the row to the tracker before adding it to the tree,
and the tree insertion fails (with bad_alloc), this contract will be violated.
Fix that.
Note: the problem is currently irrelevant because an exception during
sentinel insertion will abort the program anyway.
Closes#13336
Compaction group is responsible for deleting SSTables of "in-strategy"
compactions, i.e. regular, major, cleanup, etc.
Both in-strategy and off-strategy compaction have their completion
handled using the same compaction group interface, which is
compaction_group::table_state::on_compaction_completion(...,
sstables::offstrategy offstrategy)
So it's important to bring symmetry there, by moving the responsibility
of deleting off-strategy input, from manager to group.
Another important advantage is that off-strategy deletion is now throttled
and gated, allowing for better control, e.g. table waiting for deletion
on shutdown.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13432
this is the 15th changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.
also, i just found that the scylla executable built with cmake building system segfault in master HEAD. like
```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3974496==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd48549f70 sp 0x7ffd48549728 T0)
==3974496==Hint: pc points to the zero page.
==3974496==The signal is caused by a READ memory access.
==3974496==Hint: address points to the zero page.
#0 0x0 (<unknown module>)
#1 0x14e785a5 in wasmtime_runtime::traphandlers::unix::trap_handler::h1f510afc2968497f /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159:9
#2 0x7f3462e5eb9f (/lib64/libc.so.6+0x3db9f) (BuildId: 6107835fa7d4725691b2b7f6aaee7abe09f493b2)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>)
==3974496==ABORTING
Aborting on shard 0.
Backtrace:
0xd16c38a
0x13c5aab0
0x13b9821e
0x13c2fdc7
/lib64/libc.so.6+0x3db9f
/lib64/libc.so.6+0x8eb93
/lib64/libc.so.6+0x3daed
/lib64/libc.so.6+0x2687e
0xd1e5f8a
0xd1e3d34
0xd1ca059
0xd1c5e29
0xd1c5605
0x14e785a5
/lib64/libc.so.6+0x3db9f
```
decoded:
```
__interceptor_backtrace at ??:?
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/kefu/dev/scylladb/seastar/include/seastar/util/backtrace.hh:60
seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:778
(inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:808
seastar::print_with_backtrace(char const*, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:820
(inlined by) seastar::sigabrt_action() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3882
(inlined by) operator() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3858
(inlined by) __invoke at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3854
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6107835fa7d4725691b2b7f6aaee7abe09f493b2, for GNU/Linux 3.2.0, not stripped
__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
__sanitizer::Abort() at ??:?
__sanitizer::Die() at ??:?
__asan::ScopedInErrorReport::~ScopedInErrorReport() at ??:?
__asan::ReportDeadlySignal(__sanitizer::SignalContext const&) at ??:?
__asan::AsanOnDeadlySignal(int, void*, void*) at ??:?
wasmtime_runtime::traphandlers::unix::trap_handler at /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159
__GI___sigaction at :?
```
this led me to this change. but unfortunately, this changeset does not address the segfault. will continue the investigation in my free cycles.
Closes#13434
* github.com:scylladb/scylladb:
build: cmake: include cxx.h with relative path
build: cmake: set stack frame limits
build: cmake: pass -fvisibility=hidden to compiler
build: cmake: use -O0 on aarch64, otherwise -Og
S3 client cannot perform anonymous multipart uploads into any real S3
buckets regardless of their configuration. Since multipart upload is
essential part of the sstables backend, we need to implement the
authorisation support for the client early.
(side note): with minio anonymous multipart upload works, with aws s3
anonymous PUT and DELETE can be configured, it's exactly the combination
of aws + multipart upload that does need authorization.
Fortunately, the signature generation and signature checking code is
symmetrical and we have the checking option already in alternator :) So
what this patch does is just moves the alternator::get_signature()
helper into utils/. A sad side effect of that is all tests now need to
link with gnutls :( that is used to compute the hash value itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13428
This PR reverts the scylla sstable schema loading improvements as they fail in CI every other run. I am already working on fixes for these but I am not sure I understand all the failures so it is best to revert and re-post the series later.
Fixes: #13404Fixes: #13410Closes#13419
* github.com:scylladb/scylladb:
Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
Revert "tools/schema_loader: don't require results from optional schema tables"
before this change, the wasm binding source files includes the
cxxbridge header file of `cxx.h` with its full path.
to better mirror the behavior of configure.py, let's just
include this header file with relative path.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
* transpose include(mode.common) and include (mode.${build_mode}),
so the former can reference the value defined by the latter.
* set stack_usage_threshold for supported build modes.
please note, this compiler option (-Wstack-usage=<bytes>) is only
supported by GCC so far.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this addresses an oversight in b234c839e4,
which is supposed to mirror the behavior of `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Related: https://github.com/scylladb/scylla-enterprise/issues/2770
This commit adds the upgrade guide from ScyllaDB Open Source 5.2
to ScyllaDB Enterprise 2023.1.
This commit does not cover metric updates (the metrics file has no
content, which needs to be added in another PR).
As this is an upgrade guide, this commit must be merged to master and
backported to branch-5.2 and branch-2023.1 in scylla-enterprise.git.
Closes#13294
Task manager compaction tasks that cover compaction group
compaction need access to compaction_manager::tasks.
To avoid circular dependency and be able to rely on forward
declaration, task needs to be moved out of compaction manager.
To avoid naming confusion compaction_manager::task is renamed.
Closes#13226
* github.com:scylladb/scylladb:
compaction: use compaction namespace in compaction_manager.cc
compaction: rename compaction::task
compaction: move compaction_manager::task out of compaction manager
compaction: move sstable_task definition to source file
This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.
This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
There are two occasions in scylla_cluster
where we read the node logs, and in both of
them we read the entire file in memory.
This is not efficient and may cause an OOM.
In the first case we need the last line of the
log file, so we seek at the end and move backwards
looking for a new line symbol.
In the second case we look through the
log file to find the expected_error.
The readlines() method returns a Python
list object, which means it reads the entire
file in memory. It's sufficient to just remove
it since iterating over the file instance
already yields lines lazily one by one.
This is a follow-up for #13134.
Closes#13399
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `position_in_partition` and `partition_region` without using ostream<<. also, this change removes `operator<<(ostream, const position_in_partition_view&)` , `operator<<(ostream, const partition_region&)` along with their callers.
Refs #13245Closes#13391
* github.com:scylladb/scylladb:
mutation: drop operator<< for position_in_partition and friends
partition_snapshot_row_cursor: do not use operator<< when printing position
mutation: specialize fmt::formatter<position_in_partition>
mutation: specialize fmt::formatter<partition_region>
as alien::run_on() requires the function to be noexcept, let's
make this explicit. also, this paves the road to the type constraint
added to `alien::run_on()`. the type contraint will enforce this
requirement to the function passed to `alien::run_on()`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13375
That's courtersy of 153813d3b8, which annotates Seastar smart pointer classes with Clang's consumed attributes, to help Clang to statically spot use-after-move bugs.
Closes#13386
* github.com:scylladb/scylladb:
replica: Fix use-after-move in table::make_streaming_reader
index/built_indexes_virtual_reader.hh: Fix use-after-move
db/view/build_progress_virtual_reader: Fix use-after-move
sstables: Fix use-after-move when making reader in reverse mode
Courtersy of clang-tidy:
row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move]
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:60: note: move occurred here
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema});
The use-after-move is UB, as for it to happen, depends on evaluation order.
We haven't hit it yet as clang is left-to-right.
Fixes#13400.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13401
When loading a schema from disk, only the `tables` and `columns` tables
are required to have an entry to the loaded schema. All the others are
optional. Yet the schema loader expects all the tables to have a
corresponding entry, which leads to errors when trying to load a schema
which doesn't. Relax the loader to only require existing entries in the
two mandatory tables and not the others.
Closes#13393
Variant used by
streaming/stream_transfer_task.cc: , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs))
as full slice is retrieved after schema is moved (clang evaluates
left-to-right), the stream transfer task can be potentially working
on a stale slice for a particular set of partitions.
static report:
In file included from replica/dirty_memory_manager.cc:6:
replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice());
Fixes#13397.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
static report:
./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
_db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS),
Fixes#13396.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
use-after-free in ctor, which potentially leads to a failure
when locating table from moved schema object.
static report
In file included from db/system_keyspace.cc:51:
./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
_db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),
Fixes#13395.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
static report:
sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator*' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
legacy_reverse_slice_to_native_reverse_slice(*schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor);
Fixes#13394.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
in order to prepare for dropping the `operator<<()` for `position_in_partition_view`,
let's use fmtlib to print `position()`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print
- position_in_partition
- position_in_partition_view
- position_in_partition_view::printer
without the help of fmt::ostream. their `operator<<(ostream,..)` are
reimplemented using fmtlib accordingly to ease the review.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `partition_region` with the help of fmt::ostream.
to help with the review process, the corresponding `to_string()` is
dropped, and its callers now switch over to `fmt::to_string()` in
this change as well. to use `fmt::to_string()` helps with consolidating
all places to use fmtlib for printing/formatting.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
sleep_abortable() is aborted on success, which causes sleep_aborted
exception to be thrown. This causes scylla to throw every 100ms for
each pinged node. Throwing may reduce performance if happens often.
Also, it spams the logs if --logger-log-level exception=trace is enabled.
Avoid by swallowing the exception on cancellation.
Fixes#13278.
Closes#13279
When adding extra columns in a test, make them value column. Name them
with the "v_" prefix and use the value column number counter.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13271
To allow tests with custom clusters, allow configuration of initial
cluster size of 0.
Add a proof-of-concept test to be removed later.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13342
generation_for_sharded_test is not used by any of these sstable
tests, so let's drop it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13388
Task manager task implementations of classes that cover
offstrategy keyspace compaction which can be start through
/storage_service/keyspace_compaction/ api.
Top level task covers the whole compaction and creates child
tasks on each shard.
Closes#12713
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test offstrategy compaction
compaction: create task manager's task for offstrategy keyspace compaction on one shard
compaction: create task manager's task for offstrategy keyspace compaction
compaction: create offstrategy_compaction_task_impl
The forward_service.hh and raft_group0_client.hh can be replaced with
forward declarations. Few other files need their previously indirectly
included headers back.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13384
The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level.
A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive.
This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here.
Fixes: #11867Closes#13263
* github.com:scylladb/scylladb:
commitlog: use separate directory for schema commitlog
schema commitlog: fix commitlog_total_space_in_mb initialization
The commitlog api originally implied that
the commitlog_directory would contain files
from a single commitlog instance. This is
checked in segment_manager::list_descriptors,
if it encounters a file with an unknown
prefix, an exception occurs in
commitlog::descriptor::descriptor, which is
logged with the WARN level.
A new schema commitlog was added recently,
which shares the filesystem directory with
the main commitlog. This causes warnings
to be emitted on each boot. This patch
solves the warnings problem by moving
the schema commitlog to a separate directory.
In addition, the user can employ the new
schema_commitlog_directory parameter to move
the schema commitlog to another disk drive.
By default, the schema commitlog directory is
nested in the commitlog_directory. This can help
avoid problems during an upgrade if the
commitlog_directory in the custom scylla.yaml
is located on a separate disk partition.
This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog)
is also scheduled for 5.3, and it already
requires a clean rolling restart (no cl
segments to replay), we don't need to
specifically handle upgrade here.
Fixes: #11867
initialization
It seems there was a typo here, which caused
commitlog_total_space_in_mb to always be zero
and the schema commitlog to be effectively
unlimited in size.
Preparing for #10459, this series defines sstables::generation_type::int_t
as `int64_t` at the moment and use that instead of naked `int64_t` variables
so it can be changed in the future to hold e.g. a `std::variant<int64_t, sstables::generation_id>`.
sstables::new_generation was defined to generation new, unique generations.
Currently it is based on incrementing a counter, but it can be extended in the future
to manufacture UUIDs.
The unit tests are cleaned up in this series to minimize their dependency on numeric generations.
Basically, they should be used for loading sstables with hard coded generation numbers stored under `test/resource/sstables`.
For all the rest, the tests should use existing and mechanisms introduced in this series such as generation_factory, sst_factory and smart make_sstable methods in sstable_test_env and table_for_tests to generate new sstables with a unique generation, and use the abstract sst->generation() method to get their generation if needed, without resorting the the actual value it may hold.
Closes#12994
* github.com:scylladb/scylladb:
everywhere: use sstables::generation_type
test: sstable_test_env: use make_new_generation
sstable_directory::components_lister::process: fixup indentation
sstables: make highest_generation_seen return optional generation
replica: table: add make_new_generation function
replica: table: move sstable generation related functions out of line
test: sstables: use generation_type::int_t
sstables: generation_type: define int_t
The wasm engine is moved from replica::database to the query_processor.
The wasm instance cache and compilation thread runner were already there,
but now they're also initialized in the query_processor constructor.
By moving the initialization to the constructor, we can now
be certain that all wasm-related objects (wasm instance cache,
compilation thread runner, and wasm engine, which was already
passed in the constructor) are initialized when we try to use
them because we have to use the query processor to access them
anyway.
The change is also motivated by the fact that we're planning
to take Wasm UDFs out of experimental, after which they should
stop getting special treatment.
Closes#13311
* github.com:scylladb/scylladb:
wasm: move wasm initialization to query_processor constructor
wasm: return wasm instance cache as a reference instead of a pointer
wasm: move wasm engine to query_processor
Currently, aggregate functions are implemented in a statefull manner.
The accumulator is stored internally in an aggregate_function::aggregate,
requiring each query to instantiate new instances (see
aggregate_function_selector's constructor, and note how it's called
from selector::new_instance()).
This makes aggregates hard to use in expressions, since expressions
are stateless (with state only provided to evaluate()). To facilitate
migration towards stateless expressions, we define a
stateless_aggregate_function (modeled after user-defined aggregates,
which are already stateless). This new struct defines the aggregate
in terms of three scalar functions: one to aggregate a new input into
an accumulator (provided in the first parameter), one to finalize an
accumulator into a result, and one to reduce two accumulators for
parallelized aggregation.
All existing native aggregate functions are converted to the new model, and
the old interface is removed. This series does not yet convert selectors to
expressions, but it does remove one of the obstacles.
Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times).
Closes#13105
* github.com:scylladb/scylladb:
cql3/selection, forward_service: use use stateless_aggregate_function directly
db: functions: fold stateless_aggregate_function_adapter into aggregate_function
cql3: functions: simplify accumulator_for template
cql3: functions: base user-defined aggregates on stateless aggregates
cql3: functions: drop native_aggregate_function
cql3: functions: reimplement count(column) statelessly
cql3: functions: reimplement avg() statelessly
cql3: functions: reimplement sum() statelessly
cql3: functions: change wide accumulator type to varint
cql3: functions: unreverse types for min/max
cql3: functions: rename make_{min,max}_dynamic_function
cql3: functions: reimplement min/max statelessly
cql3: functions: reimplement count(*) statelessly
cql3: functions: simplify creating native functions even more
cql3: functions: add helpers for automating marshalling for scalar functions
types: fix big_decimal constructor from literal 0
cql3: functions: add helper class for internal scalar functions
db: functions: add stateless aggregate functions
db, cql3: move scalar_function from cql3/functions to db/functions
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13075
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
test/cql-pytest: test_tools.py: add test for schema loading
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema