Commit Graph

3006 Commits

Author SHA1 Message Date
Pavel Emelyanov
9628d07adb Put storage_service.hh on a diet
By removing unneeded headers inclusions. At the cost of few more forward
declarations and a couple of extra includes in other .cc files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13552
2023-04-18 14:53:17 +03:00
Tomasz Grabiec
a8f8f9f0ea Merge 'raft topology: store shard_count and ignore_msb in topology' from Kamil Braun
Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values).

Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed.

An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations.

Fixes: #13508

Closes #13521

* github.com:scylladb/scylladb:
  raft topology: update `release_version` in topology on restart
  raft topology: store `shard_count` and `ignore_msb` in topology
2023-04-18 01:18:50 +02:00
Kamil Braun
f9051dccaa raft topology: store shard_count and ignore_msb in topology
Add new columns to the `system.topology` table: `shard_count` and
`ignore_msb`. When a node bootstraps or restarts and observes that the
values stored in `topology` are different than the local values, it
updates them. This is done in the `update_topology_with_local_metadata`
function (the 'metadata' here being the two values).

Additional flag persisted in `system.scylla_local` is used to safely
avoid performing read barriers when the values didn't change on node
restart. A comment in `update_topology_with_local_metadata` explains why
this flag is needed.

An example use case where `shard_count` and `ignore_msb` are needed is
creating CDC generations.

Fixes: #13508
2023-04-17 10:45:30 +02:00
Botond Dénes
4c37dc5507 Merge 'keys: specialize fmt::formatter<partition_key> and friends' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Closes #13513

* github.com:scylladb/scylladb:
  keys: consolidate the formatter for partition_keys
  keys: specialize fmt::formatter<partition_key> and friends
2023-04-17 10:27:31 +03:00
Tomasz Grabiec
952b455310 Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
scylla-sstable currently has two ways to obtain the schema:

    * via a `schema.cql` file.
    * load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:

```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```

As seen above, subdirectories like qurantine, staging etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13448

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add tests for schema loading
  test/cql-pytest: add no_autocompaction_context
  docs: scylla-sstable.rst: remove accidentally added copy-pasta
  docs: scylla-sstable.rst: remove paragraph with schema limitations
  docs: scylla-sstable.rst: update schema section
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema
2023-04-14 16:46:26 +02:00
Kefu Chai
3738fcbe05 keys: specialize fmt::formatter<partition_key> and friends
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-14 13:21:30 +08:00
Pavel Emelyanov
097cea11b2 view: Remove unused view_ptr reference
After previous patch the value_getter::_view becomes unused and can be
dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:51:27 +03:00
Pavel Emelyanov
821c8b19a6 view: Carry backing-secondary-index bit via view builder
When view builder constructs it populates itself with view updates.
Later the updates may instantiate the value_getter-s which, in turn,
would need to check if the view is backing secondary index.

Good news is that when view builder constructs it has all the
information at hand needed to evaluate this "backing" bit. It's then
propagated down to value_getter via corresponding view_updates.

The getter's _view field becomes unused after this change and is
(void)-ed to make this patch compile.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:48:36 +03:00
Pavel Emelyanov
e8b5022343 view: Keep backing-seconday-index bool on value_getter
The getter needs to check if the view is backing a secondary index.
Currentl it's done inside the handle_computed_column() method, but it's
more convenient if this bit is known during construction, so move it
there. There are no places that can change this property between
view_getter is created and the method in question is called.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:45:59 +03:00
Botond Dénes
5d0c0ae0c4 Merge 'token_metadata: use topology nodes for endpoint_to_host_id map' from Benny Halevy
Currently, token_metadata_impl maintains a "shadow" endpoint to host_id map on top of the maps in topology.
This series first reimplements the functions that currently use this map to use topology instead.
Then the important users of `get_endpoint_to_host_id_map_for_reading`: node_ops_ctl and view_builder
and converted to use a new `topology::for_each_node` function to process all nodes in topology directly, without going through `get_endpoint_to_host_id_map_for_reading`.

Closes #13476

* github.com:scylladb/scylladb:
  view_builder: view_build_statuses: use topology::for_each_node
  storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node
  topology: add for_each_node
  token_metadata: get endpoint to node map from topology
2023-04-12 10:33:02 +03:00
Botond Dénes
63b266a988 db/schema_tables: expose types schema 2023-04-12 02:43:53 -04:00
Benny Halevy
535b71eba3 view_builder: view_build_statuses: use topology::for_each_node
Instead of tmptr->get_endpoint_to_host_id_map_for_reading.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 18:14:51 +03:00
Benny Halevy
e635aa30d6 token_metadata: get endpoint to node map from topology
Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl.

Instead, get the nodes_by_endpoint map from topology
and use it to build the endpoint_to_host_id_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 15:48:30 +03:00
Botond Dénes
a8e59d9fb2 Merge 'Metrics relabel from file' from Amnon Heiman
This series adds an option to read the relabel config from file.

Most of Scylla's metrics are reported per-shard, some times they are also reported per scheduling
groups or per tables.  With modern hardware, this can quickly grow to a large number of metrics that
overload Scylla and the  collecting server.

One of the main issues around metrics reduction is that many of the metrics are only
helpful in certain situations.

For example, Scylla monitoring only looks at a subset of the metrics. So in large deployments
it would be helpful to scrap only those.

An option to do that, would be to mark all dashboards related metrics with a label value, and then Prometheus
will request only metrics with that label value.

There are two main limitations to scrap by label values:
1. some of the metrics we want to report are in seastar, so we'll need to label them somehow (we cannot just add random labels to seastar metrics)
2. things change, new metrics are introduce and we may want them, it's not practicall to re-compile and wait
for a new release whenever we want to change a label just for monitoring.

It will be best to have the option to add metrics freely and choose at runtime what to report.

This series make use of Seastar API to perform metrics manipulation dynamically. It includes adding, removing, and changing labels and also enable and disable metrics, and enable and disable the skip_when_empty option.

After this series the configuration could be used with:
```--relabel-config-file conf.yaml```

The general logic and format follows Prometheus metrics_relabel_config configuration.

Where the configuration file looks like:
```
$ cat conf.yaml
relabel_configs:
  - source_labels: [shard]
    action: drop
    target_label: shard
    regex: (2)
  - source_labels: [shard]
    action: replace
    target_label: level
    replacement: $1
    regex: (.*3)

```

Closes #12687

* github.com:scylladb/scylladb:
  main: Load metrics relabel config from a file if it exists
  Add relabel from file support.
2023-04-11 12:47:09 +03:00
Botond Dénes
dba1d36aa6 Merge 'alternator: fix isolation of concurrent modifications to tags' from Nadav Har'El
Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed.

The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected.

Fixes #6389.

Closes #13150

* github.com:scylladb/scylladb:
  test/alternator: test concurrent TagResource / UntagResource
  db/tags: drop unsafe update_tags() utility function
  alternator: isolate concurrent modification to tags
  db/tags: add safe modify_tags() utility functions
  migration_manager: expose access to storage_proxy
2023-04-11 11:17:23 +03:00
Botond Dénes
05b381bfa2 Merge 'Simple S3 storage for sstables' from Pavel Emelyanov
The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names.

When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry.

To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state (*) and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name.

(*) About sstable's status and state.

The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017)

The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry.

To play with:

1. Start minio (installed by install-dependencies.sh)
```
export MINIO_ROOT_USER=${root_user}
export MINIO_ROOT_PASSWORD=${root_pass}
mkdir -p ${root_directory}
minio server ${root_directory}
```

2. Configure minio CLI, create anonymous bucket
```
mc config host rm local
mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass}
mc mb local/sstables
mc anonymous set public local/sstables
```

3. Start Scylla with object-storage feature enabled
``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}```

4. Create KS with S3 storage
``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };```

The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity.

Closes #12523

* github.com:scylladb/scylladb:
  test: Add object-storage test
  distributed_loader: Print storage type when populating
  sstable_directory: Add ownership table components lister
  sstable_directory: Make components_lister and API
  sstable_directory: Create components lister based on storage options
  sstables: Add S3 storage implementation
  system_keyspace: Add ownership table
  system_keyspace: Plug to user sstables manager too
  sstable: Make storage instance based on storage options
  sstable_directory: Keep storage_options aboard
  sstable: Virtualize the helper that gets on-disk stats for sstable
  sstable, storage: Virtualize data sink making for small components
  sstable, storage: Virtualize data sink making for Data and Index
  sstable/writer: Shuffle writer::init_file_writers()
  sstable: Make storage an API
  utils: Add S3 readable file impl for random reads
  utils: Add S3 data sink for multipart upload
  utils: Add S3 client with basic ops
  cql-pytest: Add option to run scylla over stable directory
  test.py: Equip it with minio server
  sstables: Detach write_toc() helper
2023-04-11 08:17:25 +03:00
Pavel Emelyanov
08e9046d07 system_keyspace: Add ownership table
The schema is

CREATE TABLE system.sstables (
    location text,
    generation bigint,
    format text,
    status text,
    uuid uuid,
    version text,
    PRIMARY KEY (location, generation)
)

A sample entry looks like:

 location                                                            | generation | format | status | uuid                                 | version
---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+---------
 /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 |          2 |    big | sealed | d0a743b0-ad38-11ed-85b5-39b6b0998182 |      me

The uuid field points to the "folder" on the storage where the sstable
components are. Like this:

s3
`- test_bucket
   `- f7548f00-a64d-11ed-865a-0c1fbc116bb3
      `- Data.db
       - Index.db
       - Filter.db
       - ...

It's not very nice that the whole /var/lib/... path is in fact used as
location, it needs the PR #12707 to fix this place.

Also, the "status" part is not yet fully functional, it only supports
three options:

- creating -- the same as TemporaryTOC file exists on disk
- sealed -- default state
- deleting -- the analogy for the deletion log on disk

The latter needs support from the distributed_loader, which's not yet
there. In fact, distributes_loader also needs to be patched to actualy
select entries from this table on load. Also it needs the mentioned
PR #12707 to support staging and quarantine sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:28 +03:00
Benny Halevy
cc42f00232 view: view_builder: start: demote sleep_aborted log error
This is not really an error, so print it in debug log_level
rather than error log_level.

Fixes #13374

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13462
2023-04-09 22:49:06 +03:00
Nadav Har'El
d26bb8c12d Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes
Except for where usage of `std::regex` is required by 3rd party library interfaces.
As demonstrated countless times, std::regex's practice of using recursion for pattern matching can result in stack overflow, especially on AARCH64. The most recent incident happened after merging https://github.com/scylladb/scylladb/pull/13075, which (indirectly) uses `sstables::make_entry_descriptor()` to test whether a certain path is a valid scylla table path in a trial-and-error manner. This resulted in stacks blowing up in AARCH64.
To prevent this, use the already tried and tested method of switching from `std::regex` to `boost::regex`. Don't wait until each of the `std::regex` sites explode, replace them all preemptively.

Refs: https://github.com/scylladb/scylladb/issues/13404

Closes #13452

* github.com:scylladb/scylladb:
  test: s/std::regex/boost::regex/
  utils: s/std::regex/boost::regex/
  db/commitlog: s/std::regex/boost::regex/
  types: s/std::regex/boost::regex/
  index: s/std::regex/boost::regex/
  duration.cc: s/std::regex/boost::regex/
  cql3: s/std::regex/boost::regex/
  thrift: s/std::regex/boost::regex/
  sstables: use s/std::regex/boost::regex/
2023-04-09 18:47:41 +03:00
Amnon Heiman
990545f616 Add relabel from file support.
This patch adds a configuration with an optional file name for
relabeling metrics.  It also adds a function that accepts a file name
and loads the relabel config from a file.

An example for such a file:
```
$cat conf.yml
relabel_configs:
  - source_labels: [shard]
    action: drop
    target_label: shard
    regex: (2)
  - source_labels: [shard]
    action: replace
    target_label: level
    replacement: $1
    regex: (.*3)
```

update_relabel_config_from_file throws an exception on failure, it's up
to the caller to decide what to do in such cases.
2023-04-09 09:10:02 +03:00
Botond Dénes
52e66e38e7 db/commitlog: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:51:24 -04:00
Botond Dénes
c65bd01174 Merge 'Debloat system_keyspace.hh (and a bit of .cc)' from Pavel Emelyanov
The system_keyspace.hh now includes raft stuff, topology changes stuff, task_manager stuff, etc. It's going to include tablets.hh (but maybe not). Anything that deals with system keyspace, and includes system_keyspace.hh, would transitively pull these too. This header is becoming a central hub for all the features.

This PR removes all the headers from system_keyspace.hh that correspond to other "subsystems" keeping only generic mutations/querying and seastar ones.

Closes #13450

* github.com:scylladb/scylladb:
  system_keyspace.hh: Remove unneeded headers
  system_keyspace: Move topology_mutation_builder to storage_service
  system_keyspace: Move group0_upgrade_state conversions to group0 code
2023-04-06 16:39:20 +03:00
Botond Dénes
0a46a574e6 Merge 'Topology: introduce nodes' from Benny Halevy
As a first step towards using host_id to identify nodes instead of ip addresses
this series introduces a node abstraction, kept in topology,
indexed by both host_id and endpoint.

The revised interface also allows callers to handle cases where nodes
are not found in the topology more gracefully by introducing `find_node()` functions
that look up nodes by host_id or inet_address and also get a `must_exist` parameter
that, if false (the default parameter value) would return nullptr if the node is not found.
If true, `find_node` throws an internal error, since this indicates a violation of an internal
assumption that the node must exist in the topology.

Callers that may handle missing nodes, should use the more permissive flavor
and handle the !find_node() case gracefully.

Closes #11987

* github.com:scylladb/scylladb:
  topology: add node state
  topology: remove dead code
  locator: add class node
  topology: rename update_endpoint to add_or_update_endpoint
  topology: define get_{rack,datacenter} inline
  shared_token_metadata: mutate_token_metadata: replicate to all shards
  locator: endpoint_dc_rack: refactor default_location
  locator: endpoint_dc_rack: define default operator==
  test: storage_proxy_test: provide valid endpoint_dc_rack
2023-04-06 13:47:22 +03:00
Pavel Emelyanov
18333b4225 system_keyspace.hh: Remove unneeded headers
Now this header can replace lots of used types with plain forward
declarations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:37:00 +03:00
Pavel Emelyanov
1af373cf0a system_keyspace: Move topology_mutation_builder to storage_service
The latter is the only user of the class. This keeps system keyspace
code free from unrelated logic and from raft::server_id type.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:36:02 +03:00
Pavel Emelyanov
45de375126 system_keyspace: Move group0_upgrade_state conversions to group0 code
In order to keep system keyspace free from group0 logic and from the
service::group0_upgrade_state type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:35:07 +03:00
Nadav Har'El
aeabfcb93f Merge 'Revert scylla sstable schema improvements' from Botond Dénes
This PR reverts the scylla sstable schema loading improvements as they fail in CI every other run. I am already working on fixes for these but I am not sure I understand all the failures so it is best to revert and re-post the series later.

Fixes: #13404
Fixes: #13410

Closes #13419

* github.com:scylladb/scylladb:
  Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
  Revert "tools/schema_loader: don't require results from optional schema tables"
2023-04-04 18:22:14 +03:00
Botond Dénes
54c0a387a2 Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.

This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
2023-04-03 13:54:00 +03:00
Marcin Maliszkiewicz
99f8d7dcbe db: view: use deferred_close for closing staging_sstable_reader
When consume_in_thread throws the reader should still be closed.

Related https://github.com/scylladb/scylla-enterprise/issues/2661

Closes #13398
Refs: scylladb/scylla-enterprise#2661
Fixes: #13413
2023-04-03 09:02:55 +03:00
Botond Dénes
36e53d571c Merge 'Treewide use-after-move bug fixes' from Raphael "Raph" Carvalho
That's courtersy of 153813d3b8, which annotates Seastar smart pointer classes with Clang's consumed attributes, to help Clang to statically spot use-after-move bugs.

Closes #13386

* github.com:scylladb/scylladb:
  replica: Fix use-after-move in table::make_streaming_reader
  index/built_indexes_virtual_reader.hh: Fix use-after-move
  db/view/build_progress_virtual_reader: Fix use-after-move
  sstables: Fix use-after-move when making reader in reverse mode
2023-04-03 06:57:54 +03:00
Benny Halevy
f3d5df5448 locator: add class node
And keep per node information (idx, host_id, endpoint, dc_rack, is_pending)
in node objects, indexed by topology on several indices like:
idx, host_id, endpoint, current/pending, per dc, per dc/rack.

The node index is a shorthand identifier for the node.

node* and index are valid while the respective topology instance is valid.
To be used, the caller must hold on to the topology / token_metadata object
(e.g. via a token_metadata_ptr or effective_replication_map)

Refs #6403

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

topology: add node idx

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:02 +03:00
Raphael S. Carvalho
1ecba373d6 db/view/build_progress_virtual_reader: Fix use-after-move
use-after-free in ctor, which potentially leads to a failure
when locating table from moved schema object.

static report
In file included from db/system_keyspace.cc:51:
./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),

Fixes #13395.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-31 08:40:30 -03:00
Tomasz Grabiec
4d6443e030 Merge 'Schema commitlog separate dir' from Gusev Petr
The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level.

A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive.

This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here.

Fixes: #11867

Closes #13263

* github.com:scylladb/scylladb:
  commitlog: use separate directory for schema commitlog
  schema commitlog: fix commitlog_total_space_in_mb initialization
2023-03-30 23:48:58 +02:00
Petr Gusev
0152c000bb commitlog: use separate directory for schema commitlog
The commitlog api originally implied that
the commitlog_directory would contain files
from a single commitlog instance. This is
checked in segment_manager::list_descriptors,
if it encounters a file with an unknown
prefix, an exception occurs in
commitlog::descriptor::descriptor, which is
logged with the WARN level.

A new schema commitlog was added recently,
which shares the filesystem directory with
the main commitlog. This causes warnings
to be emitted on each boot. This patch
solves the warnings problem by moving
the schema commitlog to a separate directory.
In addition, the user can employ the new
schema_commitlog_directory parameter to move
the schema commitlog to another disk drive.

By default, the schema commitlog directory is
nested in the commitlog_directory. This can help
avoid problems during an upgrade if the
commitlog_directory in the custom scylla.yaml
is located on a separate disk partition.

This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog)
is also scheduled for 5.3, and it already
requires a clean rolling restart (no cl
segments to replay), we don't need to
specifically handle upgrade here.

Fixes: #11867
2023-03-30 21:55:50 +04:00
Pavel Emelyanov
92318fdeae Merge 'Initialize Wasm together with query_processor' from Wojciech Mitros
The wasm engine is moved from replica::database to the query_processor.
The wasm instance cache and compilation thread runner were already there,
but now they're also initialized in the query_processor constructor.

By moving the initialization to the constructor, we can now
be certain that all wasm-related objects (wasm instance cache,
compilation thread runner, and wasm engine, which was already
passed in the constructor) are initialized when we try to use
them because we have to use the query processor to access them
anyway.

The change is also motivated by the fact that we're planning
to take Wasm UDFs out of experimental, after which they should
stop getting special treatment.

Closes #13311

* github.com:scylladb/scylladb:
  wasm: move wasm initialization to query_processor constructor
  wasm: return wasm instance cache as a reference instead of a pointer
  wasm: move wasm engine to query_processor
2023-03-30 14:30:23 +03:00
Nadav Har'El
59ab9aac44 Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity
Currently, aggregate functions are implemented in a statefull manner.
The accumulator is stored internally in an aggregate_function::aggregate,
requiring each query to instantiate new instances (see
aggregate_function_selector's constructor, and note how it's called
from selector::new_instance()).

This makes aggregates hard to use in expressions, since expressions
are stateless (with state only provided to evaluate()). To facilitate
migration towards stateless expressions, we define a
stateless_aggregate_function (modeled after user-defined aggregates,
which are already stateless). This new struct defines the aggregate
in terms of three scalar functions: one to aggregate a new input into
an accumulator (provided in the first parameter), one to finalize an
accumulator into a result, and one to reduce two accumulators for
parallelized aggregation.

All existing native aggregate functions are converted to the new model, and
the old interface is removed. This series does not yet convert selectors to
expressions, but it does remove one of the obstacles.

Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times).

Closes #13105

* github.com:scylladb/scylladb:
  cql3/selection, forward_service: use use stateless_aggregate_function directly
  db: functions: fold stateless_aggregate_function_adapter into aggregate_function
  cql3: functions: simplify accumulator_for template
  cql3: functions: base user-defined aggregates on stateless aggregates
  cql3: functions: drop native_aggregate_function
  cql3: functions: reimplement count(column) statelessly
  cql3: functions: reimplement avg() statelessly
  cql3: functions: reimplement sum() statelessly
  cql3: functions: change wide accumulator type to varint
  cql3: functions: unreverse types for min/max
  cql3: functions: rename make_{min,max}_dynamic_function
  cql3: functions: reimplement min/max statelessly
  cql3: functions: reimplement count(*) statelessly
  cql3: functions: simplify creating native functions even more
  cql3: functions: add helpers for automating marshalling for scalar functions
  types: fix big_decimal constructor from literal 0
  cql3: functions: add helper class for internal scalar functions
  db: functions: add stateless aggregate functions
  db, cql3: move scalar_function from cql3/functions to db/functions
2023-03-30 13:58:47 +03:00
Nadav Har'El
32fff17e19 Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13075

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
  test/cql-pytest: test_tools.py: add test for schema loading
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema
2023-03-30 09:35:59 +03:00
Botond Dénes
972b24a969 Merge 'Break the proxy -> database -> [views] -> proxy loop' from Pavel Emelyanov
... and drop usage of global storage proxy from several places of mutate_MV().

This is the last dependency loop around storage proxy left as long as the last user of the global storage proxy. The trouble is that while proxy naturally depends on database, the database SUDDENLY requires proxy to push view updates from the guts of database::do_apply().

Similar loop existed in a form of database -> { large_data_handler, compaction manager } -> system keyspace -> database and it was cut in 917fdb9e53 (Cut database-system_keyspace circular dependency) by introducing a soft dependency link from l. d. handler / compaction manager to system keyspace. The similar solution is proposed here.

The database instance gets a soft dependency (shared_ptr) to view_update_generator instance. On start the link is nullptr and pushing view updates is not possible until view_updates_generator starts and plugs itself to the database. The plugging happens naturally, because v.u.generator needs proxy as explicit dependency and, thus, can reach database via proxy. This (seems to) works because tables that need view updates don't start being mutated until late enough, as late as v.u.generator starts.

As a nice side effect this allows removing a bunch of global storage proxy usages from mutate_MV() which opens a pretty  short way towards de-globalizing proxy (after it only qctx, tracing and schema registry will be left).

Closes #13367

* github.com:scylladb/scylladb:
  view: Drop global storage_proxy usage from mutate_MV()
  view: Make mutate_MV() method of view_update_generator
  table: Carry v.u.generator down to populate_views()
  table: Carry v.u.generator down to do_push_view_replica_updates()
  view: Keep v.u.generator shared pointer on view_builder::consumer
  view: Capture v.u.generator on view_updating_consumer lambda
  view: Plug view update generator to database
  view: Add view_builder -> view_update_generator dependency
  view: Add view_update_generator -> sharded<storage_proxy> dependency
2023-03-30 08:29:29 +03:00
Pavel Emelyanov
cc262d814b view: Drop global storage_proxy usage from mutate_MV()
Now the mutate_MV is the method of v.u.generator which has reference to
the sharded<storage_proxy>. Few helper static wrappers are patched to
get the needed proxy or database reference from the mutate_MV call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:14 +03:00
Pavel Emelyanov
7cabdc54a6 view: Make mutate_MV() method of view_update_generator
Nowadays its a static helper, but internally it depends on storage
proxy, so it grabs its global instance. Making it a method of view
update generator makes it possible to use the proxy dependency from the
generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:14 +03:00
Pavel Emelyanov
e78e64a920 table: Carry v.u.generator down to populate_views()
The method is called by view_builder::consumer when building a view and
the consumer already has stable dependency reference on the view updates
generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:13 +03:00
Kefu Chai
f789d8d3cd config: mark query timeouts live update-able
in this change, following query timeouts config options are marked
live update-able:

- range_request_timeout_in_ms
- read_request_timeout_in_ms
- counter_write_request_timeout_in_ms
- cas_contention_timeout_in_ms
- truncate_request_timeout_in_ms
- write_request_timeout_in_ms
- request_timeout_in_ms

as per https://github.com/scylladb/scylladb/issues/10172,

> Many users would like to set the driver timers based on server timers.
> For example: expire a read timeout before or after the server read time
> out.

with this change, these options are *marked* live-updateable, but since
they are cached by their consumers locally, so we will have another commit
to update the local copies when these options get updated.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Pavel Emelyanov
a95d3446fd table: Carry v.u.generator down to do_push_view_replica_updates()
The latter is the place where mutate_MV is called and it needs the
view updates generator nearby.

The call-stack starts at database::do_apply(). As was described in one
of the previous patches, applying mutations that need updating views
happen late enough, so if the view updates generator is not plugged to
the database yet, it's OK to bail out with exception. If it's plugged,
it's carried over thus keeping the generator instance alive and waited
for on its stop.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:12:01 +03:00
Pavel Emelyanov
ddc8c8b019 view: Keep v.u.generator shared pointer on view_builder::consumer
This is another mutations consumer that pushes view updates forward and
thus also needs the view updates generator pointer. It gets one from the
view builder that already has the dependency on generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:11:30 +03:00
Pavel Emelyanov
2652dffd89 view: Capture v.u.generator on view_updating_consumer lambda
The consumer is in fact pushing the updates and _that_'s the component
that would really need the view_update_generator at hand. The consumer
is created from the generator itself so no troubles getting the pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:10:55 +03:00
Pavel Emelyanov
d5557ef0e2 view: Plug view update generator to database
The database is low-level service and currently view update generator
implicitly depend on it via storage proxy. However, database does need
to push view updates with the help of mutate_MV helper, thus adding the
dependency loop.

This patch exploits the fact that view updates start being pushed late
enough, by that time all other service, including proxy and view update
generator, seem to be up and running. This allows a "weak dependency"
from database to view update generator, like there's one from database
to system keyspace already.

So in this patch the v.u.g. puts the shared-from-this pointer onto the
database at the time it starts. On stop it removes this pointer after
database is drained and (hopefully) all view updates are pushed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:09:49 +03:00
Pavel Emelyanov
3455b1aed8 view: Add view_builder -> view_update_generator dependency
The builder will need generator for view_builder::consumer in one of the
next patches.

The builder is a standalone service that starts one of the latest and no
other services need builder as their dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:08:47 +03:00
Pavel Emelyanov
3fd12d6a0e view: Add view_update_generator -> sharded<storage_proxy> dependency
The generator will be responsible for spreading view updates with the
help of mutate_MV helper. The latter needs storage proxy to operate, so
the generator gets this dependency in advance.

There's no need to change start/stop order at the moment, generator
already starts after and stops before proxy. Also, services that have
generator as dependency are not required by proxy (even indirectly) so
no circular dependency is produced at this point.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:08:47 +03:00
Avi Kivity
6977df5539 cql3/selection, forward_service: use use stateless_aggregate_function directly
Now that stateless_aggregate_function is directly exposed by
aggregate_function, we can use it directly, avoiding the intermediary
aggregate_function::aggregate, which is removed.
2023-03-28 23:49:34 +03:00
Avi Kivity
58eb21aa5d db: functions: fold stateless_aggregate_function_adapter into aggregate_function
Now that all aggregate functions are derived from
stateless_aggregate_function_adapter, we can just fold its functionality
into the base class. This exposes stateless_aggregate_function to
all users of aggregate_function, so they can begin to benefit from
the transformation, though this patch doesn't touch those users.

The aggregate_function base class is partiallly devirtualized since
there is just a single implementation now.
2023-03-28 23:47:11 +03:00