Commit Graph

383 Commits

Author SHA1 Message Date
Tomasz Grabiec
434ecdee0e service: migration_manager: Run group0 barrier in gossip scheduling group
Fixes two issues.

One is potential priority inversion. The barrier will be executed
using scheduling group of the first fiber which triggers it, the rest
will block waiting on it. For example, CQL statements which need to
sync the schema on replica side can block on the barrier triggered by
streaming. That's undesirable. This is theoretical, not proved in the
field.

The second problem is blocking the error path. This barrier is called
from the streaming error handling path. If the streaming concurrency
semaphore is exhausted, and streaming fails due to timeout on
obtaining the permit in check_needs_view_update_path(), the error path
will block too because it will also attempt to obtain the permit as
part of the group0 barrier. Running it in the gossip scheduling group
prevents this.

Fixes #24925

(cherry picked from commit ee2fa58bd6)
2025-07-17 17:24:33 +00:00
Wojciech Mitros
f1fd053572 base_info: remove the lw_shared_ptr variant
The base_dependent_view_info is no longer needed to be shared or
modified in the view_info, so we no longer need to keep it as
a shared pointer.

(cherry picked from commit d77f11d436)
2025-05-27 21:40:23 +02:00
Wojciech Mitros
d9fe006a20 schema_registry: store base info instead of base schema for view entries
In the following patch we plan to remove the base schema from the base_info
to make the base_info immutable. To do that, we first prepare the schema
registry for the change; we need to be able to create view schemas from
frozen schemas there and frozen schemas have no information about the base
table. Unless we do this change, after base schemas are removed from the
base info, we'll no longer be able to load a view schema to the schema registry
without looking up the base schema in the database.

This change also required some updates to schema building:
* we add a method for unfreezing a view schema with base info instead of
a base schema
* we make it possible to use schema_builder with a base info instead of
a base schema
* we add a method for creating a view schema from mutations with a base info
instead of a base schema
* we add a view_info constructor withat base info instead of a base schema
* we update the naming in schema_registry to reflect the usage of base info
instead of base schema

(cherry picked from commit 05fce91945)
2025-05-27 21:40:23 +02:00
Tomasz Grabiec
8059090a29 Merge 'Cache base info for view schemas in the schema registry' from Wojciech Mitros
Currently, when we load a frozen schema into the registry, we lose
the base info if the schema was of a view. Because of that, in various
places we need to set the base info again, and in some codepaths we
may miss it completely, which may make us unable to process some
requests (for example, when executing reverse queries on views).
Even after setting the base info, we may still lose it if the schema
entry gets deactivated due to all `schema_ptr`s temporarily dying.

To fix this, this patch adds the base schema to the registry, alongside
the view schema. We store just the frozen base schema, so that we can
transfer it across shards. With the base schema, we can now set the base
info when returning the schema from the registry. As a result, we can now
assume that all view schemas returned by the registry have base_info set.

In this series we also make sure that the view schemas in the registry are
kept up-to-date in regards to base schema changes.

Fixes https://github.com/scylladb/scylladb/issues/21354

This issue is a bug, so adding backport labels 6.1 and 6.2

Closes scylladb/scylladb#21862

* github.com:scylladb/scylladb:
  test: add test for schema registry maintaining base info for views
  schema_registry: avoid setting base info when getting the schema from registry
  schema_registry: update cached base schemas when updating a view
  schema_registry: cache base schemas for views
  db: set base info before adding schema to registry
2025-01-21 00:17:54 +01:00
Benny Halevy
88ae067ddb everywhere: add skeletal support for the in_memory_tables feature
Forward-ported from scylla-enterprise.
Note that the feature has been deprecated and the implementation
is provided only for backward compatibility with pre-existing
features and schema.

Tested manually after adding the following to feature_service:
```
    gms::feature workload_prioritization { *this, "WORKLOAD_PRIORITIZATION"sv };
```

Launched a single-node cluster running 2023.1.10
```
cqlsh> create KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> create TABLE ks.test ( pk int PRIMARY KEY, val int ) WITH compaction = {'class': 'InMemoryCompactionStrategy'};
```

log:
```
Scylla version 2023.1.10-0.20241227.21cffccc1ccd with build-id bd65b8399cb13b713a87e57fe333cfcabfd50be7 starting ...
...
INFO  2024-12-27 19:45:16,563 [shard 0] migration_manager - Create new ColumnFamily: org.apache.cassandra.config.CFMetaData@0x600000f1b400[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName=ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,readRepairChance=0,dcLocalReadRepairChance=0,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,keyValidator=org.apache.cassandra.db.marshal.Int32Type,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,in_memory=false,version=5529c631-c47a-11ef-bd1d-4295734ce5a8,droppedColumns={},collections={},indices={}]
INFO  2024-12-27 19:45:16,564 [shard 0] schema_tables - Creating ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440
```

Upgraded to this branch and started scylla.
Verified that ks.test was successfuly loaded:

log:
```
INFO  2024-12-27 19:48:58,115 [shard 0:main] init - Scylla version 6.3.0~dev-0.20241227.a64c6dfc153e with build-id f9496134a09cf2e55d3865b9e9ff499f672aa7da starting ...
...
WARN  2024-12-27 19:53:02,948 [shard 1:main] CompactionStrategy - InMemoryCompactionStrategy is no longer supported. Defaulting to NullCompactionStrategy.
...
INFO  2024-12-27 19:53:02,948 [shard 0:main] database - Keyspace ks: Reading CF test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ec88d510-6aff-344a-914d-541d37081440 storage=/home/bhalevy/scylladb/data/ks/test-5529c630c47a11efbd1d4295734ce5a8
```

Then, tested:
```
cqlsh> describe KEYSPACE ks;

CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false};

CREATE TABLE ks.test (
    pk int,
    val int,
    PRIMARY KEY (pk)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'InMemoryCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE';

cqlsh> alter TABLE ks.test with compaction = {'class': 'SizeTieredCompactionStrategy'};
cqlsh> describe KEYSPACE ks;

CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true AND tablets = {'enabled': false};

CREATE TABLE ks.test (
    pk int,
    val int,
    PRIMARY KEY (pk)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE'
    AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'};
```

log:
```
INFO  2024-12-27 19:56:40,465 [shard 0:stmt] migration_manager - Update table 'ks.test' From org.apache.cassandra.config.CFMetaData@0x60000362d800[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.InMemoryCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ec88d510-6aff-344a-914d-541d37081440,droppedColumns={},collections={},indices={}] To org.apache.cassandra.config.CFMetaData@0x60000336e000[cfId=5529c630-c47a-11ef-bd1d-4295734ce5a8,ksName==ks,cfName=test,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type),comment=,tombstoneGcOptions={"mode":"timeout","propagation_delay_in_seconds":"3600"},gcGraceSeconds=864000,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}, ColumnDefinition{name=val, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, componentIndex=null, droppedAt=-9223372036854775808}],compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={enabled=true},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=0,caching={"keys":"ALL","rows_per_partition":"ALL"},cdc={},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,triggers=[],isDense=false,version=ecccf010-c47b-11ef-b52c-622f2f0e87c4,droppedColumns={},collections={},indices={}]
INFO  2024-12-27 19:56:40,466 [shard 0: gms] schema_tables - Altering ks.test id=5529c630-c47a-11ef-bd1d-4295734ce5a8 version=ecccf010-c47b-11ef-b52c-622f2f0e87c4
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22068
2025-01-20 16:55:17 +02:00
Gleb Natapov
36ccc897e8 gossiper: change get_live_members and all its users to work on host ids 2025-01-16 16:37:06 +02:00
Gleb Natapov
7a3237c687 messaging_service: drop get_raw_version and knows_version
The are unused. The version is always fixed.
2025-01-16 16:37:06 +02:00
Calle Wilund
511326882a schema/migration_manager: Add schema validate
Validates schema before announce. To ensure all extensions
are happy.
2025-01-08 12:50:03 +00:00
Wojciech Mitros
3094ff7cbe schema_registry: avoid setting base info when getting the schema from registry
After the previous patches, the view schemas returned by schema registry
always have their base info set. As such, we no longer need to set it after
getting the view schema from the registry. This patch removes these
unnecessary updates.
2024-12-30 14:56:18 +01:00
Wojciech Mitros
dfe3810f64 schema_registry: cache base schemas for views
Currently, when we load a frozen schema into the registry, we lose
the base info if the schema was of a view. Because of that, in various
places we need to set the base info again, and in some codepaths we
may miss it completely, which may make us unable to process some
requests (for example, when executing reverse queries on views).
Even after setting the base info, we may still lose it if the schema
entry gets deactivated.

To fix this, this patch adds the base schema to the registry, alongside
the view schema. With the base schema, we can now set the base
info when returning the schema from the registry. As a result, we can now
assume that all view schemas returned by the registry have base_info set.

To store the base schema, the loader methods now have to return the base
schema alongside the view schema. At the same time, when loading into
the registry, we need to check whether we're loading a view schema, and if
so, we need to also provide the base schema. When inserting a regular table
schema, the base schema should be a disengaged optional.
2024-12-30 14:56:17 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Gleb Natapov
e479ba88af repair: move repair code to use host ids instead of host ips
The patch is rather large, but it is a straightforward conversion from
one type to another.
2024-12-15 11:31:11 +02:00
Gleb Natapov
cd9b349886 migration_manager: move to use host ids instead of ips
Users also amended to pass ids instead of ips.
2024-12-02 10:31:12 +02:00
Kefu Chai
5e391eee25 treewide: use coroutine::parallel_for_each(range) when appropriate
`coroutine::parallel_for_each` accepts both a range and a pair of
iterators. let's use the former when appropriate. it is simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21684
2024-11-27 21:00:47 +02:00
Kamil Braun
9dc8926252 Merge 'a bunch of cleanups and enhancements to various services' from Gleb
Mostly no functional changes here except in patch 3.

* 'gleb/cleanups' of github.com:scylladb/scylla-dev:
  migration_manager: move migration manager verbs to the IDL
  storage_proxy: remove unused function
  storage_proxy: co-routinize handle_paxos_prepare
  storage_proxy: co-routinise handle_paxos_prune
  service: raft: no need to sync schema if the cluster is in raft topology mode
  messaging_service: co-routinize messaging_service::stop_client
  gossiper: rename apply_state_locally_without_listener_notification to apply_state_locally_in_shadow_round
2024-11-26 14:18:00 +01:00
Kefu Chai
a5ee0c896b treewide: migrate from boost::adaptors::filtered to std::views::filter
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.

Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views

Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices

Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
   - `std::ranges::to` adaptor requires rvalue references
   - Necessitated updates to variable and parameter constness
   - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
     from `common` to enable efficient range conversion

2. Range Iteration and Mutation
   - Range views may mutate internal state during iteration
   - Cannot pass ranges by const reference in some scenarios
   - Solution: Pass ranges by rvalue reference to explicitly indicate
     state invalidation

Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
  due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change

This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21648
2024-11-26 14:26:50 +02:00
Gleb Natapov
a1de06d90f migration_manager: move migration manager verbs to the IDL 2024-11-24 11:02:03 +02:00
Benny Halevy
4b21cca443 treewide: always allow tablets keyspaces
With the tablets feature always enabled (Unless gossip toopology
changes are forced), the enable_tablets option now controls only
the default for newly created keyspaces.

Even when set to `false`, tablets are still enabled as a
feature and the user may explicitly enable tablets
using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}`

Note: best viewed with `git show -w`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-11-07 13:57:39 +02:00
Avi Kivity
73b1f66b70 Revert "Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy"
This reverts commit c286434e4c, reversing
changes made to 6712fcc316.

The commit causes memtable_test to be very flaky in debug mode.
Specifically, subtests test_exceptions_in_flush_on_sstable_open
and test_exceptions_in_flush_on_sstable_write).
2024-10-30 00:55:29 +02:00
Benny Halevy
b0e12cb40d treewide: always allow tablets keyspaces
With the tablets feature always enabled (Unless gossip toopology
changes are forced), the enable_tablets option now controls only
the default for newly created keyspaces.

Even when set to `false`, tablets are still enabled as a
feature and the user may explicitly enable tablets
using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}`

Note: best viewed with `git show -w`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-10-24 10:18:42 +03:00
Marcin Maliszkiewicz
9792d720c9 db: move schema merging code into a separate unit
It's mostly self containted and it's easier to
maintain reasonably sized files. Also splitting
better shows boundaries between schema and
schema merging code.
2024-09-23 12:01:36 +02:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Patryk Jędrzejczak
ed55261650 treewide: distinguish all nodes from all token owners
In one of the following patches, we introduce support for zero-token
nodes. From that point, getting all nodes and getting all token
owners isn't equivalent. In this patch, we ensure that we consider
only token owners when we want to consider only token owners (for
example, in the replication logic), and we consider all nodes when
we want to consider all nodes (for example, in the topology logic).

The main purpose of this patch is to make the PR introducing
zero-token nodes easier to review. The patch that introduces
zero-token nodes is already complicated. We don't want trivial
changes from this patch to make noise there.

This patch introduces changes needed for zero-token nodes only in the
Raft-based topology and in the recovery mode. Zero-token nodes are
unsupported in the gossip-based topology outside recovery.

Some functions added to `token_metadata` and `topology` are
inefficient because they compute a new data structure in every call.
They are never called in the hot path, so it's not a serious problem.
Nevertheless, we should improve it somehow. Note that it's not
obvious how to do it because we don't want to make `token_metadata`
store topology-related data. Similarly, we don't want to make
`topology` store token-related data. We can think of an improvement
in a follow-up.

We don't remove unused `topology::get_datacenter_rack_nodes` and
`topology::get_datacenter_nodes`. These function can be useful in the
future. Also, `topology::_dc_nodes` is used internally in `topology`.
2024-08-29 10:37:07 +02:00
Botond Dénes
2cec0d8dd1 service/migration_listener: update_tablet_metadata(): add hint parameter
The hint contains information related to what exactly changed, allowing
listeners to do partial updates, instead of reloading all metadata on
each notification.
2024-08-11 09:53:19 -04:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Emil Maskovsky
2dbe9ef2f2 raft: use the abort source reference in raft group0 client interface
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
2024-07-31 09:18:54 +02:00
Benny Halevy
5f6c411656 migration_manager: remove dead code
Well, even after 10 years, the c++ compilers still
do not compile Java...

And having that legacy code laying around
not only it doesn't help anyone understand what's
going on, but on the contrary, it's confusing and distracting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-27 20:30:33 +03:00
Kefu Chai
ad649be1bf treewide: drop thrift support
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
  preserved for backward compatibility, as we could load
  from an existing system.local table which still contains
  this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
  backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
  they are marked as "Unused". so that the new release
  of scylladb can consume existing scylla.yaml configurations
  which might contain these settings. by making them
  deprecated, user will be able get warned, and update
  their configurations before we actually remove them
  in the next major release.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 06:44:59 +08:00
Pavel Emelyanov
83d491af02 config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18898
2024-05-30 18:03:51 +03:00
Piotr Smaron
7081215552 Parameterize migration_manager::announce by type to allow executing different raft commands
Since ALTER KS requires creating topology_change raft command, some
functions need to be extended to handle it. RAFT commands are recognized
by types, so some functions are just going to be parameterized by type,
i.e. made into templates.
These templates are instantiated already, so that only 1 instances of
each template exists across the whole code base, to avoid compiling it
in each translation unit.
2024-05-28 13:55:11 +02:00
Kefu Chai
125464f2d9 migration_manager: do not reference moved-away smart pointer
this change is inspired by clang-tidy. it warns like:
```
[752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o
Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move]
  891 |             db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts);
      |                                                                       ^
/home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here
  886 |             auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts);
      |                                                                                      ^
```
in which,  `view` is an instance of view_ptr which is a type with the
semantics of shared pointer, it's backed by a member variable of
`seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets
the original instance. so we are actually accessing the moved-away
pointer in

```c++
db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts)
```

so, in this change, instead of moving away from `view`, we create
a copy, and pass the copy to
`db::schema_tables::make_create_view_mutations()`. this should be fine,
as the behavior of `db::schema_tables::make_create_view_mutations()`
does not rely on if the `view` passed to it is a moved away from it or not.

the change which introduced this use-after-move was 88a5ddabce

Refs 88a5ddabce
Fixes #18837
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18838
2024-05-26 12:04:00 +03:00
Avi Kivity
3bead8cea0 feature: grandfather PER_TABLE_PARTITIONERS
The PER_TABLE_PARTITIONERS feature was added in 90df9a44ce (2020; 4.0)
and can now be assumed to be always present. We also remove the associated
schema_feature.
2024-05-18 00:15:07 +03:00
Avi Kivity
c7d7ca2c23 feature: grandfather CDC
The CDC feature was made non-experimental in e9072542c1 (2020; 4.4)
and can now be assumed to be always present. We also remove the corresponding
schema_feature.
2024-05-17 20:41:20 +03:00
Avi Kivity
82ad2913ca feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY
The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in 9de071d214 (2019; 3.2)
and can now be assumed to be always present. We enable the corresponding
schema_feature unconditionally.

We do not remove the corresponding schema feature, because it can be disabled
when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.
2024-05-17 20:41:19 +03:00
Avi Kivity
b5f6021a6b feature: grandfather VIEW_VIRTUAL_COLUMNS
The VIEW_VIRTUAL_COLUMNS feature was added in a108df09f9 (2019; 3.1)
and can now be assumed to be always present.

The corresponding schema_feature is removed. Note schema_features are not sent
over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.
2024-05-17 20:41:19 +03:00
Tomasz Grabiec
c6c8347493 migration_manager: Pull all of group0 state on repair
Current code uses non-raft path to pull the schema, which violates
group0 linearizability because the node will have latest schema but
miss group0 updates of other system tables. In particular,
system.tablets. This manifests as repair errors due to missing
tablet_map for a given table when trying to access it. Tablet map is
always created together with the table in the same group0 command.

When a node is bootstrapping, repair calls sync_schema() to make
sure local schema is up to date. This races with group0 catch up,
and if sync_schema() wins, repair may fail on misssing tablet map.

Fix by making sync_schema() do a group0 read barrier when in raft
mode.

Fixes #18002

Closes scylladb/scylladb#18175
2024-04-17 16:21:05 +02:00
Kefu Chai
1b859e484f treewide: use fmt::to_string() to transform a UUID to std::string
without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is
implemented using its `fmt::formatter`, which is not available
at the end of this header file where `UUID` is defined. at this moment,
we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can
still use `UUID::to_sstring()`, but in {fmt} v10, we cannot.

so, in this change, we change all callers of `UUID::to_sstring()`
to `fmt::to_string()`, so that we don't depend on
`FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Petr Gusev
ae0ec19537 migration_manager: use raft_timeout{}
Checking all the call sites of the migration manager shows
that all of them are initiated by user requests,
not background activities. Therefore, we add a global
raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Benny Halevy
358e92e645 migration_manager: notify before_drop_column_family before dropping views
Call the before_drop_column_family notifications
before dropping the views to allow the tablet_allocator
to delete the view's tablets.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:14:56 +02:00
Gleb Natapov
77907b97f1 migration_manager: fix indentation after the previous patch. 2024-02-29 12:39:48 +02:00
Gleb Natapov
4a3c79625f messaging_service: process migration_request rpc on shard 0
Commit 0c376043eb added access to group0
semaphore which can be done on shard0 only. Unlike all other group0 rpcs
(that already always forwarded to shard0) migration_request does not
since it is an rpc that what reused from non raft days. The patch adds
the missing jump to shard0 before executing the rpc.
2024-02-29 12:39:48 +02:00
Gleb Natapov
0c376043eb migration_manager: take group0 lock during raft snapshot taking
Group0 state machine access atomicity is guaranteed by a mutex in group0
client. A code that reads or writes the state needs to hold the log. To
transfer schema part of the snapshot we used existing "migration
request" verb which did not follow the rule. Fix the code to take group0
lock before accessing schema in case the verb is called as part of
group0 snapshot transfer.

Fixes scylladb/scylladb#16821
2024-02-27 11:15:17 +01:00
Kefu Chai
7e84e03f52 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

because the removal of `#include "unimplemented.hh"`,
`service/migration_manager.cc` misses the definition of
`unimplemented::cause::VALIDATION`, so include the header where it is
used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16654
2024-01-05 13:37:08 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Nadav Har'El
8181e28731 secondary index: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

Unfortunately, We have the same problem when creating a secondary index,
because it does not use prepare_new_view_announcement(), and instead uses
a generic function to "update" the base table, which in some cases ends
up creating new views when a new index is requested. In this path, the
notifier did not get called to the notifier, so we must add it here too.
Unfortunately, the notifiers must run in a Seastar thread, which means
that yet another function now needs to run in a Seastar thread.

Before this patch, creating a secondary index in a table using tablets
fails with "Tablet map not found for table <uuid>". With this patch,
it works.

The patch also includes tests for creating a regular and local secondary
index. Both tests fail (with the aforementioned error) before this
patch, and pass with it.

Fixes #16396

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Kamil Braun
6a4106edf3 migration_manager: don't attach empty system.scylla_local mutation in migration request handler
In effb9fb3cb migration request handler
(called when a node requests schema pull) was extended with a
`system.scylla_local` mutation:
```
        cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version());
```

This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is
disabled.

Nevertheless, it turned out to cause problems during upgrades.
The following scenario shows the problem:

We upgrade from 5.2 to enterprise version with the aforementioned patch.
In 5.2, `system.scylla_local` does not use schema commitlog.
After the first node upgrades to the enterprise version, it immediately
on boot creates a new enterprise-only table
(`system_replicated_keys.encrypted_keys`) -- the specific table is not
important, only the fact that a schema change is performed.
This happens before the restarting node notices other nodes being UP, so
the schema change is not immediately pushed to the other nodes.
Instead, soon after boot, the other non-upgraded nodes pull the schema
from the upgraded node.
The upgraded node attaches a `system.scylla_local` mutation to the
vector of returned mutations.
The non-upgraded nodes try to apply this vector of mutations. Because
some of these mutations are for tables that already use schema
commitlog, while the `system.scylla_local` table does not use schema
commitlog, this triggers the following error (even though the mutation
is empty):
```
    Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces
```

Fortunately, the fix is simple -- instead of attaching an empty
mutation, do not attach a mutation at all if the handler of migration
request notices that group0_schema_version is not present.

Note that group0_schema_version is only present if the
GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after
the whole upgrade finishes.

Refs: scylladb/scylladb#16414

Not using "Fixes" because the issue will only be fixed once this PR is
merged to `master` and the commit is cherry-picked onto next-enterprise.

Closes scylladb/scylladb#16416
2023-12-14 22:58:13 +01:00
Kamil Braun
26cbd28883 Merge 'token_metadata: switch to host_id' from Petr Gusev
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:
  * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
  * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
  * Go over all the places which read `token_metadata` and switch them to the new version.
  * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.

These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

Closes scylladb/scylladb#15903

* github.com:scylladb/scylladb:
  topology: remove_endpoint: remove inet_address overload
  token_metadata: topology: cleanup add_or_update_endpoint
  token_metadata: add_replacing_endpoint: forbid replacing node with itself
  topology: drop key_kind, host_id is now the primary key
  dc_rack_fn: make it non-template
  token_metadata: drop the template
  shared_token_metadata: switch to the new token_metadata
  gossiper: use new token_metadata
  database: get_token_metadata -> new token_metadata
  erm: switch to the new token_metadata
  storage_service: get_token_metadata -> token_metadata2
  storage_service: get_token_to_endpoint_map: use new token_metadata
  api/token_metadata: switch to new version
  storage_service::on_change: switch to new token_metadata
  cdc: switch to token_metadata2
  calculate_natural_endpoints: fix indentation
  calculate_natural_endpoints: switch to token_metadata2
  storage_service: get_changed_ranges_for_leaving: use new token_metadata
  decommission_with_repair, removenode_with_repair -> new token_metadata
  rebuild_with_repair, replace_with_repair: use new token_metadata
  bootstrap: use new token_metadata
  tablets: switch to token_metadata2
  calculate_effective_replication_map: use new token_metadata
  calculate_natural_endpoints: fix formatting
  abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
  network_topology_strategy_test: update new token_metadata
  storage_service: on_alive: update new token_metadata
  storage_service: handle_state_bootstrap: update new token_metadata
  storage_service: snitch_reconfigured: update new token_metadata
  storage_service: leave_ring: update new token_metadata
  storage_service: node_ops_cmd_handler: update new token_metadata
  storage_service: node_ops_cmd_handler: add coordinator_host_id
  storage_service: bootstrap: update new token_metadata
  storage_service: join_token_ring: update new token_metadata
  storage_service: excise: update new token_metadata
  storage_service: join_cluster: update new token_metadata
  storage_service: on_remove: update new token_metadata
  storage_service: handle_state_normal: fill new token_metadata
  storage_service: topology_state_load: fill new token_metadata
  storage_service: adjust update_topology_change_info to update new token_metadata
  topology: set self host_id on the new topology
  locator::topology: allow being_replaced and replacing nodes to have the same IP
  token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
  token_metadata: get_host_id: exception -> on_internal_error
  token_metadata: add get_all_ips method
  token_metadata: support host_id-based version
  token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
  locator: make dc_rack_fn a template
  locator/topology: add key_kind parameter
  token_metadata: topology_change_info: change field types to token_metadata_ptr
  token_metadata: drop unused method get_endpoint_to_token_map_for_reading
2023-12-13 16:35:52 +01:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Tomasz Grabiec
effb9fb3cb Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).

If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).

When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.

We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.

Fixes: #7620
Fixes: #13957

---

This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller.

I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR.

Closes scylladb/scylladb#16242

* github.com:scylladb/scylladb:
  docs: describe group 0 schema versioning in raft docs
  test: add test for group 0 schema versioning
  feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
  schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
  migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
  schema_tables: use schema version from group 0 if present
  migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
  system_keyspace: make `get/set_scylla_local_param` public
  feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
2023-12-11 12:17:57 +01:00
Kamil Braun
defcf9915c migration_manager: add committed_by_group0 flag to system.scylla_tables mutations
As described in #13957, when creating or altering a table in group 0
mode, we don't want each node to calculate `schema::version()`s
independently using a hash algorithm. Instead, we want to all nodes to
use a single version for that table, commited by the group 0 command.

There's even a column ready for this in `system.scylla_tables` --
`version`. This column is currently being set for system tables, but
it's not being used for user tables.

Similarly to what we did with global schema version in earlier commits,
the obvious thing to do would be to include a live cell for the `version`
column in the `system.scylla_tables` mutation when we perform the schema
change in Raft mode, and to include a tombstone when performing it
outside of Raft mode, for the RECOVERY case.

But it's not that simple because as it turns out, we're *already*
sending a `version` live cell (and also a tombstone, with timestamp
decremented by 1) in all `system.scylla_tables` mutations. But then we
delete that cell when doing schema merge (which begs the question
why were we sending it in the first place? but I digress):
```
        // We must force recalculation of schema version after the merge, since the resulting
        // schema may be a mix of the old and new schemas.
        delete_schema_version(mutation);
```
the above function removes the `version` cell from the mutation.

So we need another way of distinguishing the cases of schema change
originating from group 0 vs outside group 0 (e.g. RECOVERY).

The method I chose is to extend `system.scylla_tables` with a boolean
column, `committed_by_group0`, and extend schema mutations to set
this column.

In the next commit we'll decide whether or not the `version` cell should
be deleted based on the value of this new column.
2023-12-08 17:46:31 +01:00