Commit Graph

82 Commits

Author SHA1 Message Date
Piotr Smaron
5bfabff9a0 cql: adjust warning about tablets
Made it shorter, simpler and mentioned also that counters aren't
supported with tablets.

Fixes: #18876
2024-07-09 18:01:37 +02:00
Marcin Maliszkiewicz
c13fea371c cql3: always return created event in create ks/table/type/view statement
In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.

Similar situation occurs with other schema change statements.

In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.
2024-06-07 10:36:40 +02:00
Marcin Maliszkiewicz
281c06ba2e cql3: extract create ks/table/type/view event code
So that the code in subsequent commit is cleaner.

Create function/aggregate code was not changed as it
would require bigger refactor.
2024-06-07 10:07:50 +02:00
Marcin Maliszkiewicz
63e6334a64 raft: rename mutations_collector to group0_batch 2024-06-06 13:26:34 +02:00
Marcin Maliszkiewicz
0573fee2a9 cql3: auth: use mutation collector for grant and revoke permissions
This is done to achieve single transaction semantics.

The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
7f5d259b54 cql3: statements: co-routinize auth related statements 2024-05-21 10:37:26 +02:00
Michał Jadwiszczak
7dc0d068c0 cql3/statements: pass query_options to prepare_schema_mutations()
The object is needed to get timestamp from attributes (in a case when
the statement was prepared with parameter marker).
2024-04-25 21:27:40 +02:00
Nadav Har'El
6873a4772f tablets: add warning on CREATE KEYSPACE
The CDC feature is not supported on a table that uses tablets
(Refs #16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs #5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This patch does this. It was surprisingly hard and ugly to find a place
in the code that can check the tablet-ness of a keyspace while it is
still being created, but I think I found a reasonable solution.

The warning text in this patch is the following (obviously, it can
be improved later, as we perhaps find more missing features):

   "Tables in this keyspace will be replicated using tablets, and will
    not support the CDC feature (issue #16317) and LWT may suffer from
    issue #5251 more often. If you want to use CDC or LWT, please drop
    this keyspace and re-create it without tablets, by adding AND TABLETS
    = {'enabled': false} to the CREATE KEYSPACE statement."

This patch also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets.

Obviously, this entire patch - the warning and its test - can be reverted
as soon as we support CDC (and all other features) on tablets.

Fixes #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-02-15 15:51:47 +02:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Pavel Emelyanov
4dede19e4f cql3: Add feature service to ks_prop_defs::as_ks_metadata()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
267770bf0f cql3: Add feature service to get_keyspace_metadata()
To be passed down to ks_prop_defs::as_ks_metadata()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
5e69415387 guardrails: Do not validate initial_tablets as replication factor
When checking replication strategy options the code assumes (and it's
stated in the preceeding code comment) that all options are replication
factors. Nowadays it's no longer so, the initial_tablets one is not
replication factor and should be skipped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16335
2023-12-09 15:56:41 +02:00
Pavel Emelyanov
f2a99ad30a replica: Move storage options validation to sstables manager
Currently the cql statement .validate() callback is responsible for
checking if the non-local storage options are allowed with the
respective feature. Next patch will need to extend this check to also
validate the details of the provided storage options, but doing it at
cql level doesn't seem correct -- it's "too far" from query processor
down to sstables manager.

Good news is that there's a lower-level validation of the new keyspace,
namely the database::validate_new_keyspace() call. Move the storage
options validation into sstables manager, while at it, reimplement it
as a visitor to facilitate further extentions and plug the new
validation to the aforementioned database::validate_new_keyspace().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:24:59 +03:00
Piotr Smaroń
8c464b2ddb guardrails: restrict replication strategy (RS)
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.

Resolves #5224
Refs #8892 (the replication strategy part, not the RF part)

Closes scylladb/scylladb#15399
2023-10-31 18:34:41 +03:00
Piotr Smaroń
eb46f1bd17 guardrails: restrict replication factor (RF)
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.

Resolves #8619
Refs #8892 (the RF part, not the replication-strategy part)

Closes #14262
2023-09-04 19:22:17 +03:00
Gleb Natapov
4ffc39d885 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNsynXayKim2XAFr@scylladb.com>
2023-08-17 15:52:48 +03:00
Avi Kivity
d57a951d48 Revert "cql3: Extend the scope of group0_guard during DDL statement execution"
This reverts commit 70b5360a73. It generates
a failure in group0_test .test_concurrent_group0_modifications in debug
mode with about 4% probability.

Fixes #15050
2023-08-15 00:26:45 +03:00
Gleb Natapov
70b5360a73 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
2023-08-13 14:19:39 +03:00
Patryk Jędrzejczak
ffc3c1302e cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter
After changing the prepare_ methods of migration_manager to
functions, the migration_manager& parameter of
schema_altering_statement::prepare_schema_mutations has been
unused by all classes inheriting from schema_altering_statement.
2023-08-01 10:07:31 +02:00
Patryk Jędrzejczak
3468cbd66b service: migration_manager: change the prepare_ methods to functions
The migration_manager service is responsible for schema convergence
in the cluster - pushing schema changes to other nodes and pulling
schema when a version mismatch is observed. However, there is also
a part of migration_manager that doesn't really belong there -
creating mutations for schema updates. These are the functions with
prepare_ prefix. They don't modify any state and don't exchange any
messages. They only need to read the local database.

We take these functions out of migration_manager and make them
separate functions to reduce the dependency of other modules
(especially query_processor and CQL statements) on
migration_manager. Since all of these functions only need access
to storage_proxy (or even only replica::database), doing such a
refactor is not complicated. We just have to add one parameter,
either storage_proxy or database and both of them are easily
accessible in the places where these functions are called.
2023-07-28 13:55:27 +02:00
Kamil Braun
2606c190af cql3: statements: pass migration_manager& explicitly to prepare_schema_mutations
We want to stop relying on `qp.get_migration_manager()`, so we can make
the function private in the future. This in turn is a prerequisite for
splitting `query_processor` initialization into two phases, where the
first phase will only allow local queries (and won't require
`migration_manager`).
2023-06-15 09:48:54 +02:00
Nadav Har'El
5984db047d Merge 'mv: forbid IS NOT NULL on columns outside the primary key' from Jan Ciołek
statement_restrictions: forbid IS NOT NULL on columns outside the primary key

IS NOT NULL is currently allowed only when creating materialized views.
It's used to convey that the view will not include any rows that would make the view's primary key columns NULL.

Generally materialized views allow to place restrictions on the primary key columns, but restrictions on the regular columns are forbidden. The exception was IS NOT NULL - it was allowed to write regular_col IS NOT NULL. The problem is that this restriction isn't respected, it's just silently ignored (see #10365).

Supporting IS NOT NULL on regular columns seems to be as hard as supporting any other restrictions on regular columns.
It would be a big effort, and there are some reasons why we don't support them.

For now let's forbid such restrictions, it's better to fail than be wrong silently.

Throwing a hard error would be a breaking change.
To avoid breaking existing code the reaction to an invalid IS NOT NULL restrictions is controlled by the `strict_is_not_null_in_views` flag.

This flag can have the following values:
* `true` - strict checking. Having an `IS NOT NULL` restriction on a column that doesn't belong to the view's primary key causes an error to be thrown.
* `warn` - allow invalid `IS NOT NULL` restrictions, but throw a warning. The invalid restrictions are silently ignored.
* `false` - allow invalid `IS NOT NULL` restricitons, without any warnings or errors. The invalid restrictions are silently ignored.

The default values for this flag are `warn` in `db::config` and `true` in scylla.yaml.

This way the existing clusters will have `warn` by default, so they'll get a warning if they try to create such an invalid view.

New clusters with fresh scylla.yaml will have the flag set to `true`, as scylla.yaml overwrites the default value in `db::config`.
New clusters will throw a hard error for invalid views, but in older existing clusters it will just be a warning.
This way we can maintain backwards compatibility, but still move forward by rejecting invalid queries on new clusters.

Fixes: #10365

Closes #13013

* github.com:scylladb/scylladb:
  boost/restriction_test: test the strict_is_not_null_in_views flag
  docs/cql/mv: columns outside of view's primary key can't be restricted
  cql-pytest: enable test_is_not_null_forbidden_in_filter
  statement_restrictions: forbid IS NOT NULL on columns outside the primary key
  schema_altering_statement: return warnings from prepare_schema_mutations()
  db/config: add strict_is_not_null_in_views config option
  statement_restrictions: add get_not_null_columns()
  test: remove invalid IS NOT NULL restrictions from tests
2023-06-07 12:12:19 +03:00
Jan Ciolek
a8cc5ed491 schema_altering_statement: return warnings from prepare_schema_mutations()
Validation of a CREATE MATERIALIZED VIEW statement takes place inside
the prepare_schema_mutations() method.
I would like to generate warnings during this validation, but there's
currently no way to pass them.

Let's add one more return value - a vector of CQL warnings generated
during the execution of this statement.

A new alias is added to make it clear what the function is returning:
```c++
// A vector of CQL warnings generated during execution of a statement.
using cql_warnings_vec = std::vector<sstring>;
```

Later the warnings will be sent to the user by the function
schema_altering_statment::execute(), which is the only caller
of prepare_schema_mutations().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-07 02:30:07 +02:00
Avi Kivity
27f7cc4032 Revert "Merge 'cql: update permissions when creating/altering a function/keyspace' from Wojciech Mitros"
This reverts commit 52e4edfd5e, reversing
changes made to d2d53fc1db. The associated test
fails with about 10% probablity, which blocks other work.

Fixes #13919
Reopens #13747
2023-05-29 23:03:25 +03:00
Wojciech Mitros
1d099644d4 cql: grant permissions on functions when creating a keyspace/function
When a user creates a function, they should have all permissions on
this function.
Similarly, when a user creates a keyspace, they should have all
permissions on functions in the keyspace.
This patch introduces GRANTs on the missing permissions.
2023-05-12 10:56:29 +02:00
Wojciech Mitros
dd20621d71 cql: pass a reference to query processor in grant_permissions_to_creator
In the following patch, the grant_permissions_to_creator method is going
to be also used to grant permissions on a newly created function. The
function resource may contain user-defined types which need the
query processor to be prepared, so we add a reference to it in advance
in this patch for easier review.
2023-05-12 10:56:29 +02:00
Botond Dénes
de402878e4 cql3: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:32 -04:00
Nadav Har'El
cdedc79050 cql: add configurable restriction of minimum RF
We have seen users unintentionally use RF=1 or RF=2 for a keyspace.
We would like to have an option for a minimal RF that is allowed.

Cassandra recently added, in Cassandra 4.1 (see apache/cassandra@5fdadb2
and https://issues.apache.org/jira/browse/CASSANDRA-14557), exactly such
a option, called "minimum_keyspace_rf" - so we chose to use the same option
name in Scylla too. This means that unlike the previous "safe mode"
options, the name of this option doesn't start with "restrict_".

The value of the minimum_keyspace_rf option is a number, and lower
replication factors are rejected with an error like:

  cqlsh> CREATE KEYSPACE x WITH REPLICATION = { 'class' : 'SimpleStrategy',
         'replication_factor': 2 };

  ConfigurationException: Replication factor replication_factor=2 is
  forbidden by the current configuration setting of minimum_keyspace_rf=3.
  Please increase replication factor, or lower minimum_keyspace_rf set in
  the configuration.

This restriction applies to both CREATE KEYSPACE and ALTER KEYSPACE
operations. It applies to both SimpleStrategy and NetworkTopologyStrategy,
for all DCs or a specific DC. However, a replication factor of zero (0)
is *not* forbidden - this is the way to explicitly request not to
replicate (at all, or in a specific DC).

For the time being, minimum_keyspace_rf=0 is still the default, which
means that any replication factor is allowed, as before. We can easily
change this default in a followup patch.

Note that in the current implementation, trying to use RF below
minimum_keyspace_rf is always an error - we don't have a syntax
to make into just a warning. In any case the error message explains
exactly which configuration option is responsible for this restriction.

Fixes #8891.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #9830
2023-03-07 19:04:06 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Avi Kivity
19ab3edd77 gms: feature_service: remove variable/helper function duplication
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.

To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).

References throughout the codebase are adjusted.
2022-05-04 18:59:56 +03:00
Piotr Sarna
58529591a9 database,cql3: add STORAGE option to keyspaces
The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.

Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
2022-04-08 09:17:01 +02:00
Kamil Braun
283ac7fefe treewide: pass mutation timestamp from call sites into migration_manager::prepare_* functions
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
2022-01-24 15:12:50 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Gleb Natapov
459539e812 migration_manager: do not allow creating keyspace with arbitrary timestamp
This was needed to fix issue #2129 which was only manifest itself with
auto_bootstrap set to false. The option is ignored now and we always
wait for schema to synch during boot.
2022-01-12 16:33:15 +02:00
Pavel Emelyanov
d32de22ee8 cql3: Get data dictionary directly from query_processor
After previous patches there's a whole bunch of places that do

  qp.proxy().data_dictionary()

while the data_dictionary is present on the query processor itself
and there's a public method to get one. So use it everywhere.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 11:28:44 +03:00
Pavel Emelyanov
ec101e8b56 create_keyspace_statement: Do not use proxy.shared_from_this()
The prepare_schema_mutations is not sleeping method, so there's no
point in getting call-local shared pointer on proxy. Plain reference
is more than enough.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 11:28:44 +03:00
Pavel Emelyanov
2ca8a580d9 create_|alter_keyspace_statement: Make check_restricted_replication_strategy() accept query_processor
Patch the check_restricted_replication_strategy() and its callers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:54:28 +03:00
Pavel Emelyanov
b990ca5550 cql3: Make .validate() and .check_access() accept query_processor
This is mostly a sed script that replaces methods' first argument
plus fixes of compiler-generated errors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-23 10:53:44 +03:00
Avi Kivity
d768e9fac5 cql3, related: switch to data_dictionary
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.

data_dictionary::database::real_database() is called from several
places, for these reasons:

 - calling yet-to-be-converted code
 - callers with a legitimate need to access data (e.g. system_keyspace)
   but with the ::database accessor removed from query_processor.
   We'll need to find another way to supply system_keyspace with
   data access.
 - to gain access to the wasm engine for testing whether used
   defined functions compile. We'll have to find another way to
   do this as well.

The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.

Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
2021-12-15 13:54:23 +02:00
Avi Kivity
3945acaa2d data_dictionary: move keyspace_metadata to data_dictionary
Like user_types_metadata, keyspace_metadata does not grant
data access, just metadata, and so belongs in data_dictionary.
2021-12-15 13:52:21 +02:00
Gleb Natapov
730171f4df cql3: drop schema_altering_statement::announce_migration()
It is no longer used.
2021-12-11 12:31:07 +02:00
Gleb Natapov
456d2e28c3 cql3: move CREATE KEYSPACE statement to prepare_schema_mutations() api 2021-12-11 12:31:07 +02:00
Benny Halevy
e4dc81ec04 abstract_replication_strategy: add to_qualified_class_name
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-18 12:13:25 +03:00
Benny Halevy
8c85197c6c abstract_replication_strategy: get rid of shared_token_metadata member and ctor param
It is not used any more.

Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Avi Kivity
219fdcd8da Merge 'tools: introduce scylla-sstable' from Botond Dénes
A tool which can be used to examine the content of sstable(s) and
execute various operations on them. The currently supported operations
are:
* dump - dumps the content of the sstable(s), similar to sstabledump;
* dump-index - dumps the content of the sstable index(es), similar to scylla-sstable-index;
* writetime-histogram - generates a histogram of all the timestamps in
  the sstable(s);
* custom - a hackable operation for the expert user (until scripting
  support is implemented);
* validate - validate the content of the sstable(s) with the mutation
  fragment stream validator, same as scrub in validate mode;

The sstables to-be-examined are passed as positional command line
arguments. Sstables will be processed by the selected operation
one-by-one (can be changed with `--merge`). Any number of sstables can
be passed but mind the open file limits. Pass the full path to the data
component of the sstables (*-Data.db). For now it is required that the
sstable is found at a valid data path:

    /path/to/datadir/{keyspace_name}/{table_name}-{table_id}/

The schema to read the sstables is read from a `schema.cql` file. This
should contain the keyspace and table definitions, as well as any UDTs
used.
Filtering the sstable(s) to process only certain partition(s) is supported
via the `--partition` and `--partitions-file` command line flags.
Partition keys are expected to be in the hexdump format used by scylla
(hex representation of the raw buffer).
Operations write their output to stdout, or file(s). The tool logs to
stderr, with a logger called `scylla-sstable-crawler`.

Examples:

    # dump the content of the sstable
    $ scylla-sstable-crawler --dump /path/to/md-123456-big-Data.db

    # dump the content of the two sstable(s) as a unified stream
    $ scylla-sstable-crawler --dump --merge /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db

    # generate a joint histogram for the specified partition
    $ scylla-sstable-crawler --writetime-histogram --partition={{myhexpartitionkey}} /path/to/md-123456-big-Data.db

    # validate the specified sstables
    $ scylla-sstable-crawler --validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db

Future plans:
* JSON output for dump.
* A simple way of generating `schema.cql` for any schema, other than copying it from snapshots, or copying from `cqlsh`. None of these generate a complete output.
* Relax sstable path checks, so sstables can be loaded from any path.
* Add scripting support (Lua), allowing custom operations to be written
  in a scripting language.

Refs: #9241

Closes #9271

* github.com:scylladb/scylla:
  tools: remove scylla-sstable-index
  tools: introduce scylla-sstable
  tools: extract finding selected operation (handler) into function
  tools: add schema_loader
  cql3: query_processor: add parse_statements()
  cql3: statements/create_type: expose create_type()
  cql3: statements/create_keyspace: add get_keyspace_metadata()
2021-09-09 19:24:06 +03:00
Botond Dénes
6b224b76b9 cql3: statements/create_keyspace: add get_keyspace_metadata() 2021-09-07 10:37:25 +03:00