Commit Graph

37531 Commits

Author SHA1 Message Date
Benny Halevy
4e5bfe2c18 size_tiered_backlog_tracker: make log4 helper static
It is completely generic.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 13:30:43 +03:00
Benny Halevy
5d6c2b0d12 size_tiered_backlog_tracker: define struct sstables_backlog_contribution
Encapsulate the contribution-related members in
struct contribution, to be used for strong exception safety.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 13:29:38 +03:00
Benny Halevy
bf69584ccc size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
Although replace_sstables is supposed to be called
only once per {old_ssts, new_ssts} it is safer
to update `_total_bytes` with `sst->data_size()`
only if the sst was inserted/erased successfully.
Otherwise _total_bytes may go out of sync with the
contents of _all.

That said, the next step should be to refer to the
compaction_group's main sstable set directly rather
than maintaining a "shadow" set in the tracker.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 13:28:50 +03:00
Benny Halevy
1a8cc84981 compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
To facilitate rollback on the error handling path,
to provide strong exception safety guarantees.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 13:27:18 +03:00
Benny Halevy
0877e7a846 compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 12:51:48 +03:00
Kamil Braun
be5b61b870 Merge 'cql3: expr: break up expression.hh header' from Avi Kivity
It's very annoying to add a declaration to expression.hh and watch
the whole world get recompiled. Improve that by moving less-common
functions to a new header expr-utils.hh. Move the evaluation machinery
to a new header evaluate.hh. The remaining definitions in expression.hh
should not change as often, and thus cause less frequent recompiles.

Closes #14346

* github.com:scylladb/scylladb:
  cql3: expr: break up expression.hh header
  cql3: expr: restrictions.hh: protect against double inclusions
  cql3: constants: deinline
  cql3: statement_restrictions: deinline
  cql3: deinline operation::fill_prepare_context()
2023-06-23 10:19:28 +02:00
Nadav Har'El
0a1283c813 Merge 'cql3:statements:describe_statement: check pointer after casting to UDF/UDA' from Michał Jadwiszczak
There was a bug in describe_statement. If executing `DESC FUNCTION  <uda name>` or ` DESC AGGREGATE <udf name>`, Scylla was crashing because the function was found (`functions::find()` searches both UDFs and UDAs) but the function was bad and the pointer wasn't checked after cast.

Added a test for this.

Fixes: #14360

Closes #14332

* github.com:scylladb/scylladb:
  cql-pytest:test_describe: add test for filtering UDF and UDA
  cql3:statements:describe_statement: check pointer to UDF/UDA
2023-06-22 20:54:25 +03:00
Michał Jadwiszczak
d3d9a15505 cql-pytest:test_describe: add test for filtering UDF and UDA 2023-06-22 18:08:45 +02:00
Michał Jadwiszczak
d498451cdf cql3:statements:describe_statement: check pointer to UDF/UDA
While looking for specific UDF/UDA, result of
`functions::functions::find()` needs to be filtered out based on
function's type.

Fixes: #14360
2023-06-22 18:08:16 +02:00
Avi Kivity
b858a4669d cql3: expr: break up expression.hh header
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:

 - moving some restrictions-related definitions to
   the existing expr/restrictions.hh
 - moving evaluation related names to a new header
   expr/evaluate.hh
 - move utilities to a new header
   expr/expr-utilities.hh

expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
2023-06-22 14:21:03 +03:00
Avi Kivity
25c351a4f6 cql3: expr: restrictions.hh: protect against double inclusions
Add #pragma once. Right now it's safe as it only has declarations
(which can be repeated), but soon it will have a definition.
2023-06-22 14:19:43 +03:00
Avi Kivity
7302088274 cql3: constants: deinline
To reduce future header fan-in, deinline all non-trivial functions.
While these aer on the hot path, they can't be inlined anyway as they're
virtual, and they're quite heavy anyway.
2023-06-22 14:19:43 +03:00
Avi Kivity
6c0f8a73c5 cql3: statement_restrictions: deinline
Reduce future header fan-in by deinlining functions. These are
all on the prepare path.
2023-06-22 14:19:43 +03:00
Avi Kivity
3834a1fd7c cql3: deinline operation::fill_prepare_context()
To reduce operation.hh include fan-in, deinline fill_prepare_context().
It's not performance sensitive has it's on the prepare phase.
2023-06-22 14:19:43 +03:00
Kamil Braun
23a60df92d Merge 'cql3: expr: simplify evaluate()' from Avi Kivity
Make evaluate()'s body more regular, then exploit it by
replacing the long list of branches with a lambda template.

Closes #14306

* github.com:scylladb/scylladb:
  cql3: expr: simplify evaluate()
  cql3: expr: standardize evaluate() branches to call do_evaluate()
  cql3: expr: rename evaluate(ExpressionElement) to do_evaluate()
2023-06-22 12:18:36 +02:00
Kamil Braun
563d466de1 Merge 'cql3: select_statement: coroutinize indexed statement's do_execute()' from Avi Kivity
Improves readability, and probably a little faster too.

Closes #14311

* github.com:scylladb/scylladb:
  cql3: select_statement: reindent indexed_table_select_statement::do_execute
  cql3: select_statement: simplify inner lambda in indexed_table_select_statement::do_execute()
  cql3: select_statement: coroutinize indexed_table_select_statement::do_execute()
2023-06-22 12:10:45 +02:00
Botond Dénes
55e09dbdc0 Merge 'doc: move cloud deployment instruction to docs -v2' from Anna Stuchlik
This is V2 of https://github.com/scylladb/scylladb/pull/14108

This commit moves the installation instruction for the cloud from the [website ](https://www.scylladb.com/download/)to the docs.

The scope:

* Added new files with instructions for AWS, GCP, and Azure.
* Added the new files to the index.
* Updating the "Install ScyllaDB" page to create the "Cloud Deployment" section.
* Adding new bookmarks in other files to create stable links, for example, ".. _networking-ports:"
* Moving common files to the new "installation-common" directory. This step is required to exclude the open source-only files in the Enterprise repository.

In addition:
- The Configuration Reference file was moved out of the installation section (it's not about installation at all)
- The links to creating a cluster were removed from the installation page (as not related).

Related: https://github.com/scylladb/scylla-docs/issues/4091

Closes #14153

* github.com:scylladb/scylladb:
  doc: remove the rpm-info file (What is in each RPM) from the installation section
  doc: move cloud deployment instruction to docs -v2
2023-06-22 12:58:30 +03:00
Avi Kivity
32b27d6a08 cql3: expr: change evaluation_input vector components to take spans
Spans are slightly cleaner, slightly faster (as they avoid an indirection),
and allow for replacing some of the arguments with small_vector:s.

Closes #14313
2023-06-22 11:28:01 +02:00
Anna Stuchlik
950ef5195e Merge branch 'master' into anna-install-cloud-v2 2023-06-22 10:03:29 +02:00
Botond Dénes
e1c2de4fb8 Merge 'forward_service: fix forgetting case-sensitivity in aggregates ' from Jan Ciołek
There was a bug that caused aggregates to fail when used on column-sensitive columns.

For example:
```cql
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase, but for case sensitive names we have to preserve the name as originally written.

The problem was in `forward_service` - we took a column name and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Closes #14340

* github.com:scylladb/scylladb:
  service/forward_service.cc: make case-sensitivity explicit
  cql-pytest/test_aggregate: test case-sensitive column name in aggregate
  forward_service: fix forgetting case-sensitivity in aggregates
2023-06-22 08:25:33 +03:00
Botond Dénes
320159c409 Merge 'Compaction group major compaction task' from Aleksandra Martyniuk
Task manager task covering compaction group major
compaction.

Uses multiple inheritance on already existing
major_compaction_task_executor to keep track of
the operation with task manager.

Closes #14271

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py
  test: use named variable for task tree depth
  compaction: turn major_compaction_task_executor into major_compaction_task_impl
  compaction: take gate holder out of task executor
  compaction: extend signature of some methods
  tasks: keep shared_ptr to impl in task
  compaction: rename compaction_task_executor methods
2023-06-22 08:15:17 +03:00
Avi Kivity
8576502c48 Merge 'raft topology: ban left nodes from the cluster' from Kamil Braun
Use the new Seastar functionality for storing references to connections to implement banning hosts that have left the cluster (either decommissioned or using removenode) in raft-topology mode. Any attempts at communication from those nodes will be rejected.

This works not only for nodes that restart, but also for nodes that were running behind a network partition and we removed them. Even when the partition resolves, the existing nodes will effectively put a firewall from that node.

Some changes to the decommission algorithm had to be introduced for it to work with node banning. As a side effect a pre-existing problem with decommission was fixed. Read the "introduce `left_token_ring` state" and "prepare decommission path for node banning" commits for details.

Closes #13850

* github.com:scylladb/scylladb:
  test: pylib: increase checking period for `get_alive_endpoints`
  test: add node banning test
  test: pylib: manager_client: `get_cql()` helper
  test: pylib: ScyllaCluster: server pause/unpause API
  raft topology: ban left nodes
  raft topology: skip `left_token_ring` state during `removenode`
  raft topology: prepare decommission path for node banning
  raft topology: introduce `left_token_ring` state
  raft topology: `raft_topology_cmd` implicit constructor
  messaging_service: implement host banning
  messaging_service: exchange host IDs and map them to connections
  messaging_service: store the node's host ID
  messaging_service: don't use parameter defaults in constructor
  main: move messaging_service init after system_keyspace init
2023-06-21 20:16:45 +03:00
Anna Stuchlik
c65abb06cd doc: udpate the OSS docs landing page
Fixes https://github.com/scylladb/scylladb/issues/14333

This commit replaces the documentation landing page with
the Open Source-only documentation landing page.

This change is required as now there is a separate landing
page for the ScyllaDB documentation, so the page is duplicated,
creating bad user experience.

Closes #14343
2023-06-21 17:06:48 +03:00
Jan Ciołek
16c21d7252 service/forward_service.cc: make case-sensitivity explicit
Make it explicit that the boolean argument determines case-sensitivity. It emphasizes its importance.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 16:02:41 +02:00
Jan Ciolek
854b0301be cql-pytest/test_aggregate: test case-sensitive column name in aggregate
There was a bug which made aggregates fail when used with case-sensitive
column names.
Add a test to make sure that this doesn't happen in the future.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 14:49:24 +02:00
Jan Ciolek
7fca350075 forward_service: fix forgetting case-sensitivity in aggregates
There was a bug that caused aggregates to fail when
used on column-sensitive columns.

For example:
```
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there
is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase,
but for case sensitive names we have to preserve the name
as originally written.

The problem was in `forward_service` - we took a column name
and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column
couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 14:37:42 +02:00
Nadav Har'El
8a9de08510 sstable: limit compression chunk size to 128 KB
The chunk size used in sstable compression can be set when creating a
table, using the "chunk_length_in_kb" parameter. It can be any power-of-two
multiple of 1KB. Very large compression chunks are not useful - they
offer diminishing returns on compression ratio, and require very large
memory buffers and reading a very large amount of disk data just to
read a small row. In fact, small chunks are recommended - Scylla
defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB
(in Cassandra 3) to 16 KB (in Cassandra 4).

Therefore, allowing arbitrarily large chunk sizes is just asking for
trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang
Scylla when it runs out of memory. So in this patch we add a hard limit
of 128 KB for the chunk size - anything larger is refused.

Fixes #9933

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14267
2023-06-21 14:26:02 +03:00
Kefu Chai
f014ccf369 Revert "Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai""
This reverts commit 562087beff.

The regressions introduced by the reverted change have been fixed.
So let's revert this revert to resurrect the
uuid_sstable_identifier_enabled support.

Fixes #10459
2023-06-21 13:02:40 +03:00
Avi Kivity
e233f471b8 Merge 'Respect tablet shard assignment' from Tomasz Grabiec
This PR changes the system to respect shard assignment to tablets in tablet metadata (system.tablets):
1. The tablet allocator is changed to distribute tablets evenly across shards taking into account currently allocated tablets in the system. Each tablet has equal weight. vnode load is ignored.
2. CDC subsystem was not adjusted (not supported yet)
3. sstable sharding metadata reflects tablet boundaries
5. resharding is NOT supported yet (the node will abort on boot if there is a need to reshard tablet-based tables)
6. The system is NOT prepared to handle tablet migration / topology changes in a safe way.
7. Sstable cleanup is not wired properly yet

After this PR, dht::shard_of() and schema::get_sharder() are deprecated. One should use table::shard_of() and effective_replication_map::get_sharder() instead.

To make the life easier, support was added to obtain table pointer from the schema pointer:

```
schema_ptr s;
s->table().shard_of(...)
```

Closes #13939

* github.com:scylladb/scylladb:
  locator: network_topology_startegy: Allocate shards to tablets
  locator: Store node shard count in topology
  service: topology: Extract topology updating to a lambda
  test: Move test_tablets under topology_experimental
  sstables: Add trace-level logging related to shard calculation
  schema: Catch incorrect uses of schema::get_sharder()
  dht: Rename dht::shard_of() to dht::static_shard_of()
  treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of()
  storage_proxy: Avoid multishard reader for tablets
  storage_proxy: Obtain shard from erm in the read path
  db, storage_proxy: Drop mutation/frozen_mutation ::shard_of()
  forward_service: Use table sharder
  alternator: Use table sharder
  db: multishard: Obtain sharder from erm
  sstable_directory: Improve trace-level logging
  db: table: Introduce shard_of() helper
  db: Use table sharder in compaction
  sstables: Compute sstable shards using sharder from erm when loading
  sstables: Generate sharding metadata using sharder from erm when writing
  test: partitioner: Test split_range_to_single_shard() on tablet-like sharder
  dht: Make split_range_to_single_shard() prepared for tablet sharder
  sstables: Move compute_shards_for_this_sstable() to load()
  dht: Take sharder externally in splitting functions
  locator: Make sharder accessible through effective_replication_map
  dht: sharder: Document guarantees about mapping stability
  tablets: Implement tablet sharder
  tablets: Include pending replica in get_shard()
  dht: sharder: Introduce next_shard()
  db: token_ring_table: Filter out tablet-based keyspaces
  db: schema: Attach table pointer to schema
  schema_registry: Fix SIGSEGV in learn() when concurrent with get_or_load()
  schema_registry: Make learn(schema_ptr) attach entry to the target schema
  test: lib: cql_test_env: Expose feature_service
  test: Extract throttle object to separate header
2023-06-21 10:20:41 +03:00
Calle Wilund
f18e967939 storage_proxy: Make split_stats resilient to being called from different scheduling group
Fixes #11017

When doing writes, storage proxy creates types deriving from abstract_write_response_handler.
These are created in the various scheduling groups executing the write inducing code. They
pick up a group-local reference to the various metrics used by SP. Normally all code
using (and esp. modifying) these metrics are executed in the same scheduling group.
However, if gossip sees a node go down, it will notify listeners, which eventually
calls get_ep_stat and register_metrics.
This code (before this patch) uses _active_ scheduling group to eventually add
metrics, using a local dict as guard against double regs. If, as described above,
we're called in a different sched group than the original one however, this
can cause double registrations.

Fixed here by keeping a reference to creating scheduling group and using this, not
active one, when/if creating new metrics.

Closes #14294
2023-06-21 10:08:27 +03:00
Tomasz Grabiec
ebdebb982b locator: network_topology_startegy: Allocate shards to tablets
Uses a simple algorihtm for allocating shards which chooses
least-loaded shard on a given node, encapsulated in load_sketch.

Takes load due to current tablet allocation into account.

Each tablet, new or allocated for other tables, is assumed to have an
equal load weight.
2023-06-21 00:58:25 +02:00
Tomasz Grabiec
e110167a2a locator: Store node shard count in topology
Will be needed by tablet allocator.
2023-06-21 00:58:25 +02:00
Tomasz Grabiec
dd968e16bf service: topology: Extract topology updating to a lambda
Reduces code duplication.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
6defcb7bd5 test: Move test_tablets under topology_experimental
Tablets will rely on shard_count information in topology, which is set
only when using eperimental raft-based topology.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
34f28aa0cb sstables: Add trace-level logging related to shard calculation 2023-06-21 00:58:24 +02:00
Tomasz Grabiec
f6625e16ee schema: Catch incorrect uses of schema::get_sharder()
We still use it in many places in unit tests, which is ok because
those tables are vnode-based.

We want to check incorrect uses in production as they may lead to hard
to debug consistency problems.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
29cbdb812b dht: Rename dht::shard_of() to dht::static_shard_of()
This is in order to prevent new incorrect uses of dht::shard_of() to
be accidentally added. Also, makes sure that all current uses are
caught by the compiler and require an explicit rename.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
21198e8470 treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of()
dht::shard_of() does not use the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
fb0bdcec0c storage_proxy: Avoid multishard reader for tablets
Currently, the coordinator splits the partition range at vnode (or
tablet) boundaries and then tries to merge adjacent ranges which
target the same replica. This is an optimization which makes less
sense with tablets, which are supposed to be of substantial size. If
we don't merge the ranges, then with tablets we can avoid using the
multishard reader on the replica side, since each tablet lives on a
single shard.

The main reason to avoid a multishard reader is avoiding its
complexity, and avoiding adapting it to work with tablet
sharding. Currently, the multishard reader implementation makes
several assumptions about shard assignment which do not hold with
tablets. It assumes that shards are assigned in a round-robin fashion.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
10e05eec66 storage_proxy: Obtain shard from erm in the read path
dht::shard_of() does not use the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
e48ec6fed3 db, storage_proxy: Drop mutation/frozen_mutation ::shard_of()
dht::shard_of() does not use the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
d4497a058e forward_service: Use table sharder
schema::get_sharder() does not return the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
ab94e74774 alternator: Use table sharder
schema::get_sharder() does not return the correct sharder for tablet-based tables.
Code which is supposed to work with all kinds of tables should use erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
d92287f997 db: multishard: Obtain sharder from erm
This is not strictly necessary, as the multishard reader will be later
avoided altogether for tablet-based tables, but it is a step towards
converting all code to use the erm->get_sharder() instead of
schema::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
18f567385c sstable_directory: Improve trace-level logging 2023-06-21 00:58:24 +02:00
Tomasz Grabiec
34ba8a6a53 db: table: Introduce shard_of() helper
Saves some boiler plate code.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
36da062bcb db: Use table sharder in compaction 2023-06-21 00:58:24 +02:00
Tomasz Grabiec
ad983ac23d sstables: Compute sstable shards using sharder from erm when loading
schema::get_sharder() does not use the correct sharder for
tablet-based tables.  Code which is supposed to work with all kinds of
tables should obtain the sharder from erm::get_sharder().
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
17d6163548 sstables: Generate sharding metadata using sharder from erm when writing
We need to keep sharding metadata consistent with tablet mapping to
shards in order for node restart to detect that those sstables belong
to a single shard and that resharding is not necessary. Resharding of
sstables based on tablet metadata is not implemented yet and will
abort after this series.

Keeping sharding metadata accurate for tablets is only necessary until
compaction group integration is finished. After that, we can use the
sstable token range to determine the owning tablet and thus the owning
shard. Before that, we can't, because a single sstable may contain
keys from different tablets, and the whole key range may overlap with
keys which belong to other shards.
2023-06-21 00:58:24 +02:00
Tomasz Grabiec
36e12020b9 test: partitioner: Test split_range_to_single_shard() on tablet-like sharder 2023-06-21 00:58:24 +02:00