This patch changes the syntax of enabling tablets from
CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }
to be
CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }
and updates all tests accordingly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.
In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.
Fixes#16180
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#16658
Tablets metadata is quite expensive to generate (each data_value is
an allocation), so an old driver (without support for tablets) will
generate huge amounts of such notifications. This commit adds a way
to negotiate generation of the notification: a new driver will ask
for them, and an old driver won't get them. It uses the
OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec.
Closesscylladb/scylladb#16611
It's only testing code that wants to call new_keyspace with existing
schemas, all the other callers either construct the ks metadata
directly, or use convenience new_keyspace with explicitly empty schemas.
By and large it's nicer if new_keyspace() doesn't requires this
argument.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.
This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
One of the unfortunate anti-features of cql_test_env (the framework used
in our CQL tests that are written in C++) is that it needs to repeat
various bizarre initializations steps done in main.cc, otherwise various
requests work incorrectly. One of these steps that main.cc is to initialize
various "schema extensions" which some of the Scylla features need to work
correctly.
We remembered to initialize some schema extensions in cql_test_env, but
forgot others. The one I will need in the following patch is the "tags"
extension, which we need to mark materialized views used by local
secondary indexes as "synchronous_updates" - without this patch the LSI
tests in secondary_index_test.cc will crash.
In addition to adding the missing extension, this patch also replaces
the segmentation-fault crash when it's missing (caused by a dynamic
cast failure) by a clearer on_internal_error() - so if we ever have
this bug again, it will be easier to debug.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.
To set up the maintenance socket, use the `maintenance-socket` option when starting the node.
* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.
The default value is `ignore`.
* With python driver
```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy
socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
# Driver tries to connect to other nodes in the cluster, so we need to filter them out.
load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```
Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.
Closesscylladb/scylladb#16172
* github.com:scylladb/scylladb:
test.py: add maintenance socket test
test.py: enable maintenance socket in tests by default
docs: add maintenance socket documentation
main: add maintenance socket
main: refactor initialization of cql controller and auth service
auth/service: don't create system_auth keyspace when used by maintenance socket
cql_controller: maintenance socket: fix indentation
cql_controller: add option to start maintenance socket
db/config: add maintenance_socket_enabled bool class
auth: add maintenance_socket_role_manager
db/config: add maintenance_socket variable
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.
The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.
The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.
Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.
Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.
We also update documentation by removing information irrelevant in 5.5.
Fixesscylladb/scylladb#15854
Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.
Closesscylladb/scylladb#16334
* github.com:scylladb/scylladb:
docs: update after making consistent_cluster_management mandatory
system_keyspace, main, cql_test_env: fix indendations
db: config: make consistent_cluster_management mandatory
test: boost: schema_change_test: replace disable_raft_schema_config
db: config: make override_decommission deprecated
db: config: make force_schema_commit_log deprecated
Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels.
We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not.
Fixes: https://github.com/scylladb/scylladb/issues/16402Closesscylladb/scylladb#16383
* github.com:scylladb/scylladb:
database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
reader_concurrency_semaphore: add register_metrics constructor parameter
sstables: name sstables_manager
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
In the following commits, we make consistent cluster management
mandatory. This will make disable_raft_schema_config unusable,
so we need to get rid of it. However, we don't want to remove
tests that use it.
The idea is to use the Raft RECOVERY mode instead of disabling
consistent cluster management directly.
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
Soon, the reader_concurrency_semaphore will require a unique
and meaningful name in order to label its metrics. To prepare
for that, name sstable_manager instances. This will be used
to generate a name for sstable_manager's reader_concurrency_semaphore.
Make host_id parameter non-optional and
move it to the beginning of the arguments list.
Delete unused overloads of add_or_update_endpoint.
Delete unused overload of token_metadata::update_topology
with inet_address argument.
With this commit, we begin the next stage of the
refactoring - updating the new version of the token_metadata
in all places where the old version is currently being updated.
In this commit we assign host_id of this node, both in main.cc
and in boost tests.
This is needed for rpc calls to work in the tests. With this patch, by
default, messaging_service does not listen as it was before.
This is useful for file stream for tablet test.
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.
Closesscylladb/scylladb#16250
* github.com:scylladb/scylladb:
treewide: get rid of now unused fb_utilities
tracing: use locator::topology rather than fb_utilities
streaming: use locator::topology rather than fb_utilities
raft: use locator::topology/messaging rather than fb_utilities
storage_service: use locator::topology rather than fb_utilities
storage_proxy: use locator::topology rather than fb_utilities
service_level_controller: use locator::topology rather than fb_utilities
misc_services: use locator::topology rather than fb_utilities
migration_manager: use messaging rather than fb_utilities
forward_service: use messaging rather than fb_utilities
messaging_service: accept broadcast_addr in config rather than via fb_utilities
messaging_service: move listen_address and port getters inline
test: manual: modernize message test
table: use gossiper rather than fb_utilities
repair: use locator::topology rather than fb_utilities
dht/range_streamer: use locator::topology rather than fb_utilities
db/view: use locator::topology rather than fb_utilities
database: use locator::topology rather than fb_utilities
db/system_keyspace: use topology via db rather than fb_utilities
db/system_keyspace: save_local_info: get broadcast addresses from caller
db/hints/manager: use locator::topology rather than fb_utilities
db/consistency_level: use locator::topology rather than fb_utilities
api: use locator::topology rather than fb_utilities
alternator: ttl: use locator::topology rather than fb_utilities
gossiper: use locator::topology rather than fb_utilities
gossiper: add get_this_endpoint_state_ptr
test: lib: cql_test_env: pass broadcast_address in cql_test_config
init: get_seeds_from_db_config: accept broadcast_address
locator: replication strategies: use locator::topology rather than fb_utilities
locator: topology: add helpers to retrieve this host_id and address
snitch: pass broadcast_address in snitch_config
snitch: add optional get_broadcast_address method
locator: ec2_multi_region_snitch: keep local public address as member
ec2_multi_region_snitch: reindent load_config
ec2_multi_region_snitch: coroutinize load_config
ec2_snitch: reindent load_config
ec2_snitch: coroutinize load_config
thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. What it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch
series also introduces.
* 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev:
storage_service: topology coordinator: ignore abort_requested_exception in background fibers
storage_service: fix de-initialization order between storage service and group0_service
For getting rid of fb_utilities.
In the future, that could be used to instantiate
multiple scylla node instances.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.
A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
cluster feature.
The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.
The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.
When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.
In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.
Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
assumed to be true.
Closesscylladb/scylladb#16254
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. But what it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch also
introduces.
Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal.
Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well.
fixes: #13745Closesscylladb/scylladb#15635
* github.com:scylladb/scylladb:
test: Add object_store test to validate config reloading works
test: Add config update facility to test cluster
test: Make S3_Server export config file as pathlib.Path
config: Make object storage config updateable_value_source
memtable: Extend list of checking codes
sstables/storage/s3: Fix missing TOC status check
s3/client: Map http exceptions into storage_io_error
exceptions: Extend storage_io_error construction options
There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed.
The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload.
Mechanism for sending custom_payload added.
Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable.
I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly.
Automatic tests added.
Closesscylladb/scylladb#15410
* github.com:scylladb/scylladb:
docs: add documentation about sending tablet info to protocol extensions
Add tests for sending tablet info
cql3: send tablet if wrong node/shard is used during modification statement
cql3: send tablet if wrong node/shard is used during select statement
locator: add function to check locality
locator: add function to check if host is local
transport: add function to add tablet info to the result_message
transport: add support for setting custom payload
This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests.
Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s.
Closesscylladb/scylladb#16111
* github.com:scylladb/scylladb:
test: cql_test_env: Interrupt all components on cql_test_env teardown
tests: cql_test_env: Skip gossip shutdown sleep
Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.
refs: #2795
refs: #2737Closesscylladb/scylladb#16103
* github.com:scylladb/scylladb:
storage_service: Drop (un)init_messaging_service_part() pair
storage_service: Init/Deinit RPC handlers in constructor/stop
storage_service: Dont capture container() on RPC handler
storage_service: Use storage_service::_sys_dist_ks in some places
storage_service: Add explicit dependency on system dist. keyspace
storage_service: Rurn query processor pointer into reference
storage_service: Add explicity query_processor dependency
main: Start storage service later
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.
In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This should interrupt all sleeps in component teardown.
Before this patch, there was a 1s sleep on gossiper shutdown, which I
don't know where it comes from. After the patch there is no such
sleep.
The wrapper just calls the test-only core write_memtable_to_sstable()
overload, tests can do it on their own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This class only provides a .run() method which allocates a task and
calls sstables::test_env::perform_compaction(). This can be done in a
helper method, no need for the whole class for it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method is the simplified rewrite of the compaction_manager's
perform_compaction() one, but it makes task registration and
unregistration to hard way. Keep it shorter and simpler resembling the
compaction_manager's prototype.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the one sitting in utils is only called from its peer in compaction
test. Things get simpler if they get merged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>