This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.
This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
When loading a schema from disk, only the `tables` and `columns` tables
are required to have an entry to the loaded schema. All the others are
optional. Yet the schema loader expects all the tables to have a
corresponding entry, which leads to errors when trying to load a schema
which doesn't. Relax the loader to only require existing entries in the
two mandatory tables and not the others.
Closes#13393
The forward_service.hh and raft_group0_client.hh can be replaced with
forward declarations. Few other files need their previously indirectly
included headers back.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13384
Preparing for #10459, this series defines sstables::generation_type::int_t
as `int64_t` at the moment and use that instead of naked `int64_t` variables
so it can be changed in the future to hold e.g. a `std::variant<int64_t, sstables::generation_id>`.
sstables::new_generation was defined to generation new, unique generations.
Currently it is based on incrementing a counter, but it can be extended in the future
to manufacture UUIDs.
The unit tests are cleaned up in this series to minimize their dependency on numeric generations.
Basically, they should be used for loading sstables with hard coded generation numbers stored under `test/resource/sstables`.
For all the rest, the tests should use existing and mechanisms introduced in this series such as generation_factory, sst_factory and smart make_sstable methods in sstable_test_env and table_for_tests to generate new sstables with a unique generation, and use the abstract sst->generation() method to get their generation if needed, without resorting the the actual value it may hold.
Closes#12994
* github.com:scylladb/scylladb:
everywhere: use sstables::generation_type
test: sstable_test_env: use make_new_generation
sstable_directory::components_lister::process: fixup indentation
sstables: make highest_generation_seen return optional generation
replica: table: add make_new_generation function
replica: table: move sstable generation related functions out of line
test: sstables: use generation_type::int_t
sstables: generation_type: define int_t
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13075
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
test/cql-pytest: test_tools.py: add test for schema loading
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
this helps to use OpenJDK 11 instead of OpenJDK 8 for running scylla-jmx,
in hope to alleviate the pain of the crashes found in the JRE shipped along
with OpenJDK 8, as it is aged, and only security fixes are included now.
* tools/jmx 88d9bdc...48e1699 (3):
> Merge 'dist/redhat: support jre 11 instead of jre 8' from Kefu Chai
> install.sh: point java to /usr/bin/java
> Merge 'use OpenJDK 11 instead of OpenJDK 8' from Kefu Chai
Refs https://github.com/scylladb/scylla-jmx/issues/194
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13356
* tools/cqlsh b9a606f...8769c4c (11):
> dist: redhat: provide only a single version
> pylib/setup, requirement.txt: remove Six
> setup: do not support python2
> install.sh: install files with correct permission in struct umask settings
> Remove unneed LC_ALL=en_US.UTF-8
> Support using other driver (datastax or older scylla ones)
> Fix RPM based downgrade command on scylla-cqlsh
> gitignore: ignore pylib/cqlshlib/__pycache__
> dist/redhat: add a proper changelog entry
> github actions: enable starting on tags
> Add support for building docker image
* tools/python3 279b6c1...d2f57dd (3):
> dist: redhat: provide only a single version
> SCYLLA-VERSION-GEN: use -gt when comparing values
> SCYLLA-VERSION-GEN: remove unnecessary bashism
So far, schema had to be provided via a schema.cql file, a file which
contains the CQL definition of the table. This is flexible but annoying
at the same time. Many times sstables the tool operates on are located
in their table directory in a scylla data directory, where the schema
tables are also available. To mitigate this, an alternative method to
load the schema from memory was added which works for system tables.
In this commit we extend this to work for all kind of tables: by
auto-detecting where the scylla data directory is, and loading the
schema tables from disk.
Allows loading the schema for the designated keyspace and table, from
the system table sstables located on disk. The sstable files opened for
read only.
There was an attempt to cut feature-service -> system-keyspace dependency (#13172) which turned out to require more changes. Here's a preparation squeezing from this future work.
This set
- leaves only batch-enabling API in feature service
- keeps the need for async context in feature service
- narrows down system keyspace features API to only load and store records
- relaxes features updating logic in sys.ks.
- cosmetic
Closes#13264
* github.com:scylladb/scylladb:
feature_service: Indentation fix after previous patch
feature_service: Move async context into enable()
system_keyspace: Refactor local features load/save helpers
feature_service: Mark supported_feature_set() const
feature_service: Remove single feature enabling method
boot: Enable features in batch
gossiper: Enable features in batch
Convert all users to use sstables::generation_type::int_t.
Further patches will continue to convert most to
using sstables::generation_type instead so we can
abstract the value type.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
Callers don't need to know that enabling features has this requirement
Indentation is deliberately left broken (until next patch)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In the future, when testing WASM UDFs, we will only store the Rust
source codes of them, and compile them to WASM. To be able to
do that, we need rust standard library for the wasm32-wasi target,
which is available as an RPM called rust-std-static-wasm32-wasi.
Closes#12896
[avi: regenerate toolchain]
Closes#13258
It will be needed by S3 driver to parse multipart-upload messages from
server
refs: #12523
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13158
[avi: regenerate toolchain]
Closes#13192
These two are static binaries, so no need in yum/apt-installing them with dependencies.
Just download with curl and put them into /urs/local/bin with X-bit set.
This is needed for future object-storage work in order to run unit tests against minio.
refs: #12523
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
[avi: regenerate frozen toolchain]
Closes#13064Closes#13099
despite that RapidJSON is a header-only library, we still need to
find it and "link" against it for adding the include directory.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`load_one_schema()` and `load_schemas_from_file()` are dropped,
as they are neither used by `scylla-sstable` or tested by
`schema_loader_test.cc` . the latter tests `load_schemas()`, which
is quite the same as `load_one_schema_from_file()`, but is more
permissive in the sense that it allows zero schema or more than
one schema in the specified path.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13003
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:
```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
return {};
^~
```
Fixes#12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12876
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
to replace tabs with spaces, for better readability if the editor
fails to render tabs with the right tabstop setting.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12839
The leak sanitizer has a bug [1] where, if it detects a leak, it
forks something, and before that, it closes all files (instead of
using close_range like a good citizen).
Docker tends to create containers with the NOFILE limit (number of
open files) set to 1 billion.
The resulting 1 billion close() system calls is incredibly slow.
Work around that problem by passing the host NOFILE limit.
[1] https://github.com/llvm/llvm-project/issues/59112Closes#12638
Although the number of keyspaces should mostly be 1 here, and thus the
chance of two tables from different keyspaces colliding is miniscule, it
is not zero. Better be safe than sorry, so match the keyspace name too
when looking up a table.
Closes#12627
The dbuild README has an example how to enable ccache, and required
modifying the PATH. Since recently, our docker image includes
required commands (cxxbridge) in /usr/local/bin, so the build will
fail if that directory isn't also in the path - so add it in the
example.
Also use the opportunity to fix the "/home/nyh" in one example to
"$HOME".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12631
Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes.
For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available.
Example:
```lua
function new_stats(key)
return {
partition_key = key,
total = 0,
partition = 0,
static_row = 0,
clustering_row = 0,
range_tombstone_change = 0,
};
end
total_stats = new_stats(nil);
function inc_stat(stats, field)
stats[field] = stats[field] + 1;
stats.total = stats.total + 1;
total_stats[field] = total_stats[field] + 1;
total_stats.total = total_stats.total + 1;
end
function on_new_sstable(sst)
max_partition_stats = new_stats(nil);
if sst then
current_sst_filename = sst.filename;
else
current_sst_filename = nil;
end
end
function consume_partition_start(ps)
current_partition_stats = new_stats(ps.key);
inc_stat(current_partition_stats, "partition");
end
function consume_static_row(sr)
inc_stat(current_partition_stats, "static_row");
end
function consume_clustering_row(cr)
inc_stat(current_partition_stats, "clustering_row");
end
function consume_range_tombstone_change(crt)
inc_stat(current_partition_stats, "range_tombstone_change");
end
function consume_partition_end()
if current_partition_stats.total > max_partition_stats.total then
max_partition_stats = current_partition_stats;
end
end
function on_end_of_sstable()
if current_sst_filename then
print(string.format("Stats for sstable %s:", current_sst_filename));
else
print("Stats for stream:");
end
print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes",
total_stats.total,
total_stats.partition,
total_stats.static_row,
total_stats.clustering_row,
total_stats.range_tombstone_change));
print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes",
max_partition_stats.total,
max_partition_stats.partition_key,
max_partition_stats.static_row,
max_partition_stats.clustering_row,
max_partition_stats.range_tombstone_change));
end
```
Running this script wilt yield the following:
```
$ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db
Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db:
397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes
Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes
```
Fixes: https://github.com/scylladb/scylladb/issues/9679Closes#11649
* github.com:scylladb/scylladb:
tools/scylla-sstable: consume_reader(): improve pause heuristincs
test/cql-pytest/test_tools.py: add test for scylla-sstable script
tools: add scylla-sstable-scripts directory
tools/scylla-sstable: remove custom operation
tools/scylla-sstable: add script operation
tools/sstable: introduce the Lua sstable consumer
dht/i_partitioner.hh: ring_position_ext: add weight() accessor
lang/lua: export Scylla <-> lua type conversion methods
lang/lua: use correct lib name for string lib
lang/lua: fix type in aligned_used_data (meant to be user_data)
lang/lua: use lua_State* in Scylla type <-> Lua type conversions
tools/sstable_consumer: more consistent method naming
tools/scylla-sstable: extract sstable_consumer interface into own header
tools/json_writer: add accessor to underlying writer
tools/scylla-sstable: fix indentation
tools/scylla-sstable: export mutation_fragment_json_writer declaration
tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
tools/scylla-sstable: extract json writing logic from json_dumper
tools/scylla-sstable: extract json_writer into its own header
tools/scylla-sstable: use json_writer::DataKey() to write all keys
tools/scylla-types: fix use-after-free on main lambda captures
The CQL binary protocol version 3 was introduced in 2014. All Scylla
version support it, and Cassandra versions 2.1 and newer.
Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer
use 32-bit collection sizes.
Unfortunately, we implemented support for multiple serialization formats
very intrusively, by pushing the format everywhere. This avoids the need
to re-serialize (sometimes) but is quite obnoxious. It's also likely to be
broken, since it's almost untested and it's too easy to write
cql_serialization_format::internal() instead of propagating the client
specified value.
Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's
easy to verify that they are no longer in use on a running system by
examining the `system.clients` table before upgrade.
Fixes#10607Closes#12432
* github.com:scylladb/scylladb:
treewide: drop cql_serialization_format
cql: modification_statement: drop protocol check for LWT
transport: drop cql protocol versions 1 and 2
The consume loop had some heuristics in place to determine whether after
pausing, the consumer wishes to skip just the partition or the remaining
content of the sstable. This heuristics was flawed so replace it with a
non-heuristic method: track the last consumed fragment and look at this
to determine what should be done.
To be the home of example scripts for scylla-sstable. For now only a
README.md is added describing the directory's purpose and with links to
useful resources.
One example script is added in this patch, more will come later.
Loads the script from the specified path, then feeds the mutation
fragment stream to it. For now only Lua scripts are supported for the
simple reason that Lua is easy to write bindings for, it is simple and
lightweight and more importantly we already have Lua included in the
Scylla binary as it is used as the implementation language for UDF/UDA.
We might consider WASM support in the future, but for now we don't have
any language support in WASM available.
The Lua sstable consumer loads a script from the specified path then
feeds the mutation fragment stream to the script via the
sstable_consumer methods, each method of which the script is allowed to
define, effectively overloading the virtual method in Lua.
This allows for very wide and flexible customization opportunities for
what to extract from sstables and how to process and present them,
without the need to recompile the scylla-sstable tool.
So it can be used in code outside scylla-sstable.cc. This source file is
quite large already, and as we have yet another large chunk of code to
add, we want to add it in a separate file.