Commit Graph

316 Commits

Author SHA1 Message Date
Botond Dénes
5a7af93c7c db/config: introduce reader_concurrency_semahore_cpu_concurrency
To allow increasing the semaphore's CPU concurrency, which is currently
hard-limited to 1. Not wired yet.

(cherry picked from commit c7317be09a)
2024-07-08 08:06:28 +03:00
Pavel Emelyanov
5811df4d4b config: Mark tablets feature as unused
This features used to be there for a while, but then it was removed by
83d491af02. This patch partially takes it
back, but maps to UNUSED, so that if met in config, it's warned, but
other features are parsed as well.

refs: #18968

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit b2520b8185)
2024-06-12 18:35:32 +00:00
Lakshmi Narayanan Sreethar
85805f6472 db/config.cc: increment components_memory_reclaim_threshold config default
Incremented the components_memory_reclaim_threshold config's default
value to 0.2 as the previous value was too strict and caused unnecessary
eviction in otherwise healthy clusters.

Fixes #18607

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 3d7d1fa72a)

Closes scylladb/scylladb#19014
2024-06-03 12:19:16 +03:00
Pavel Emelyanov
62a23fd86a config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 83d491af02)

Closes scylladb/scylladb#19012
2024-06-03 12:16:41 +03:00
Kefu Chai
617e532859 db: config: drop operator<<() for error_injection_at_startup
it is not used anymore, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18701
2024-05-16 15:10:57 +03:00
Asias He
952dfc6157 repair: Introduce repair_partition_count_estimation_ratio config option
In commit 642f9a1966 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

Closes scylladb/scylladb#18634
2024-05-13 15:16:55 +03:00
Kamil Braun
03818c4aa9 direct_failure_detector: increase ping timeout and make it tunable
The direct failure detector design is simplistic. It sends pings
sequentially and times out listeners that reached the threshold (i.e.
didn't hear from a given endpoint for too long) in-between pings.

Given the sequential nature, the previous ping must finish so the next
ping can start. We timeout pings that take too long. The timeout was
hardcoded and set to 300ms. This is too low for wide-area setups --
latencies across the Earth can indeed go up to 300ms. 3 subsequent timed
out pings to a given node were sufficient for the Raft listener to "mark
server as down" (the listener used a threshold of 1s).

Increase the ping timeout to 600ms which should be enough even for
pinging the opposite side of Earth, and make it tunable.

Increase the Raft listener threshold from 1s to 2s. Without the
increased threshold, one timed out ping would be enough to mark the
server as down. Increasing it to 2s requires 3 timed out pings which
makes it more robust in presence of transient network hiccups.

In the future we'll most likely want to decrease the Raft listener
threshold again, if we use Raft for data path -- so leader elections
start quickly after leader failures. (Faster than 2s). To do that we'll
have to improve the design of the direct failure detector.

Ref: scylladb/scylladb#16410
Fixes: scylladb/scylladb#16607

---

I tested the change manually using `tc qdisc ... netem delay`, setting
network delay on local setup to ~300ms with jitter. Without the change,
the result is as observed in scylladb/scylladb#16410: interleaving
```
raft_group_registry - marking Raft server ... as dead for Raft groups
raft_group_registry - marking Raft server ... as alive for Raft groups
```
happening once every few seconds. The "marking as dead" happens whenever
we get 3 subsequent failed pings, which is happens with certain (high)
probability depending on the latency jitter. Then as soon as we get a
successful ping, we mark server back as alive.

With the change, the phenomenon no longer appears.

Closes scylladb/scylladb#18443
2024-05-07 23:40:23 +02:00
Kamil Braun
d8313dda43 Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak
We move consistent cluster management out of experimental and
make it the default for new clusters in 6.0. In code, we make the
`consistent-topology-changes` flag unused and assumed to be true.

In 6.0, the topology upgrade procedure will be manual and
voluntary, so some clusters will still be using the gossip-based
topology even though they support the raft-based topology.
Therefore, we need to continue testing the gossip-based topology.
This is possible by using the `force-gossip-topology-changes` flag
introduced in scylladb/scylladb#18284.

Ref scylladb/scylladb#17802

Closes scylladb/scylladb#18285

* github.com:scylladb/scylladb:
  docs: raft.rst: update after removing consistent-topology-changes
  treewide: fix indentation after the previous patch
  db: config: make consistent-topology-changes unused
  test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
  test: test_read_required_hosts: run with force-gossip-topology-changes
  storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
  storage_service: join_token_ring: fix finish_setup_after_join calls
2024-04-26 14:45:29 +02:00
Avi Kivity
c2b8ca7d71 Merge 'cql3: statements: change default tombstone_gc mode for tablets' from Aleksandra Martyniuk
Repair may miss some tablets that migrated across nodes.
So if tombstones expire after some timeout, then we can
have data resurrection.

Set default tombstone_gc mode to "repair" for tables which
use tablets (if repair is required).

Fixes: #16627.

Closes scylladb/scylladb#18013

* github.com:scylladb/scylladb:
  test: check default value of tombstone_gc
  test: topology: move some functions to util.py
  cql3: statements: change default tombstone_gc mode for tablets
2024-04-25 19:18:37 +03:00
Amnon Heiman
dfea50a7e9 db/config.cc add metric family config from file
Metric family config lets a user configure the metric family aggregate labels.
This patch modifies the existing relable-config from file to accept
metric family config.

Similar to the existing relable_config, it adds a metric_family_configs
section.  For example, the following configuration demonstrates changing
aggregate labels by name and regular expression.

```
metric_family_configs:
 - name: storage_service
   aggregate_labels: [shard]
 - regex: (storage_proxy.*)
   aggregate_labels: [shard, scheduling_group_name]
```

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18339
2024-04-25 16:03:39 +03:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Aleksandra Martyniuk
58f72f9019 cql3: statements: change default tombstone_gc mode for tablets
Currently, if tombstone_gc mode isn't specified for a table,
then "timeout" is used by default. With tablets, running
"nodetool repair -pr" may miss a tablet if it migrated across
the nodes. Then, if we expire tombstones for ranges that
weren't repaired, we may get data resurrection.

Set default tombstone_gc mode value for DDLs that don't
specify it. It's set to "repair" for tables which use tablets
unless they use local replication strategy or rf = 1.
Otherwise it's set to "timeout".
2024-04-24 10:42:10 +02:00
Patryk Jędrzejczak
14911051ee db: config: introduce force-gossip-topology-changes
We are going to make the `consistent-topology-changes` experimental
feature unused in 6.0. However, the topology upgrade procedure will
be manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`force-gossip-topology-changes`, that will enforce the gossip-based
topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Making `consistent-topology-changes` unused
and introducing `force-gossip-topology-changes` requires adjustments
in scylla-dtest. We want to merge changes to scylladb and scylla-dtest
in a way that ensures all tests are run correctly during the whole
process. If we merged all changes to scylladb first, before merging
the scylla-dtest changes, all tests would run with the raft-based
topology and the ones excluded in the raft-based topology would fail.
We also can't merge all changes to scylla-dtest first. However, we
can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `force-gossip-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb/scylladb#17802

Closes scylladb/scylladb#18284
2024-04-23 09:42:46 +02:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Lakshmi Narayanan Sreethar
e8026197d2 db/config: add a new variable to limit memory used by table components
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Petr Gusev
49a4220fea error_injection: pass injection parameters at startup
Injection parameters can be used in the lambda passed to
inject_with_handler method to take some values from
the test. However, there was no way to set values to these
parameters on node startup, only through
the error injection REST api. Therefore, we couldn't rely
on this when inject_with_handler is used during
node startup, it could trigger before we call the api
from the test.

In this commit with solve this problem by allowing these
parameters to be assigned through scylla.yaml config.

The defer.hh header was added to error_injection.hh to fix
compilation after adding error_injection.hh to config.hh,
defer function is used in error_injection.hh.
2024-03-19 20:17:02 +04:00
Botond Dénes
5e37c1465f db/config: introduce query_page_size_in_bytes
Regulates the page size in bytes via config, instead of the currently
used hard-coded constant. Allows tests to configure lower limits so they
can work with smaller data-sets when testing paging related
functionality.
Not wired yet.
2024-02-27 02:14:45 -05:00
Tomasz Grabiec
ef9e5e64a3 locator: token_metadata: Introduce topology barrier stall detector
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.

Example log:

  token_metadata - topology version 30 held for 299.159 [s] past expiry, released at:  0x2397ae1 0x23a36b6 ...

Closes scylladb/scylladb#17427
2024-02-21 15:05:34 +02:00
Avi Kivity
93af3dd69b Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch
Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect.
Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket.

Fixes https://github.com/scylladb/scylladb/issues/16487.

Closes scylladb/scylladb#17113

* github.com:scylladb/scylladb:
  maintenance_socket: add option to set owning group
  transport/controller: get rid of magic number for socket path's maximal length
  transport/controller: set unix_domain_socket_permissions for maintenance_socket
  transport/controller: pass unix_domain_socket_permissions to generic_server::listen
  transport/controller: split configuring sockets into separate functions
2024-02-20 15:09:54 +02:00
Mikołaj Grzebieluch
182cfebe40 maintenance_socket: add option to set owning group
Option `maintenance-socket-group` sets the owning group of the maintenance socket.
If not set, the group will be the same as the user running the scylla node.
2024-02-19 10:21:00 +01:00
Kefu Chai
3dfb0f86f1 db: add formatter for error_injection_at_startup
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `error_injection_at_startup`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17211
2024-02-08 19:40:48 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Avi Kivity
c8397f0287 Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho
The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer.

If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint.

Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on.

A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split.

Tablet metadata gains 2 new fields for managing this:
resize_type: resize decision type, can be either of "merge", "split", or "none".
resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator).

A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ).
When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready.

When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split").
The split monitor will start splitting of compaction groups (using mechanism introduced here: 081f30d149) for the table. And once splitting work is completed, the table updates its local state as having completed split.

When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step).

Fixes #16536.

Closes scylladb/scylladb#16580

* github.com:scylladb/scylladb:
  test/topology_experimental_raft: Add tablet split test
  replica: Bypass reshape on boot with tablets temporarily
  replica: Fix table::compaction_group_for_sstable() for tablet streaming
  test/topology_experimental_raft: Disable load balancer in test fencing
  replica: Remap compaction groups when tablet split is finalized
  service: Split tablet map when split request is finalized
  replica: Update table split status if completed split compaction work
  storage_service: Implement split monitor
  topology_cordinator: Generate updates for resize decisions made by balancer
  load_balancer: Introduce metrics for resize decisions
  db: Make target tablet size a live-updateable config option
  load_balancer: Implement resize decisions
  service: Wire table_resize_plan into migration_plan
  service: Introduce table_resize_plan
  tablet_mutation_builder: Add set_resize_decision()
  topology_coordinator: Wire load stats into load balancer
  storage_service: Allow tablet split and migration to happen concurrently
  topology_coordinator: Periodically retrieve table_load_stats
  locator: Introduce topology::get_datacenter_nodes()
  storage_service: Implement table_load_stats RPC
  replica: Expose table_load_stats in table
  replica: Introduce storage_group::live_disk_space_used()
  locator: Introduce table_load_stats
  tablets: Add resize decision metadata to tablet metadata
  locator: Introduce resize_decision
2024-01-31 13:59:56 +02:00
Piotr Smaroń
35ba037724 config: fix a typo in --role-manager's description
Closes scylladb/scylladb#17063
2024-01-30 16:13:33 +02:00
Patryk Wrobel
dc8d5ffaf6 db::config: keep dir paths unchanged
This change is intended to ensure, that
db::config fields related to directories
are not changed. To achieve that a member
function called setup_directories() is
removed.

The responsibility for directories paths
has been moved to utils::directories,
which may generate default paths if the
configuration does not provide a specific
value.

Fixes: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Raphael S. Carvalho
638e6e30cb db: Make target tablet size a live-updateable config option
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Mikołaj Grzebieluch
e6a83b9819 db/config: add maintenance mode flag 2024-01-25 15:27:53 +01:00
Nadav Har'El
df6c9828ef Merge 'Add protobuf and Native histogram support' from Amnon Heiman
Native histograms (also known as sparse histograms) are an experimental Prometheus feature.
They use protobuf as the reporting layer.
Native histograms hold the benefits of high resolution at a lower resource cost.

This series allows sending histograms in a native histogram format over protobuf.
By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf.

Fixes #12931

Closes scylladb/scylladb#16737

* github.com:scylladb/scylladb:
  main.cc: Add prometheus_allow_protobuf command line
  histogram_metrics_helper: support native histogram
  config: Add prometheus_allow_protobuf flag
2024-01-24 21:24:50 +02:00
Kefu Chai
c978d1b3f8 config: s/re-use/reuse/
this misspelling is identified by codespell.
per m-w, reuse is a word per-se, and we don't need the hyphen for
addressing the ambiguity in the use cases, like, recover and re-cover.
see also https://www.merriam-webster.com/dictionary/reuse

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16962
2024-01-24 15:19:03 +02:00
Botond Dénes
26d814d8be Merge 'Configure initial tablets count scaling' from Pavel Emelyanov
There are currently two options how to "request" the number of initial tables for a table

1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own

Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.

Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.

Closes scylladb/scylladb#16919

* github.com:scylladb/scylladb:
  config: Add --tablets-initial-scale-factor
  tablet_allocator: Add initial tablets scale to config
  tablet_allocator: Add config
2024-01-23 13:25:12 +02:00
Amnon Heiman
fc9bd2de03 config: Add prometheus_allow_protobuf flag
Native histograms (also known as sparse histograms) are an experimental
Prometheus feature. They use protobuf as the reporting layer.  The
prometheus_allow_protobuf flag allows the user to enable protobuf
protocol. When this flag is set to true, and the Prometheus server sends
in the request that it accepts protobuf, the result will be in protobuf
protocol.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:07 +02:00
Pavel Emelyanov
d1d4620af8 config: Add --tablets-initial-scale-factor
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:18:18 +03:00
David Garcia
f3eeba8cc6 docs: parse config.cc properties as rst text
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).

By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.

Closes scylladb/scylladb#16311
2024-01-22 16:40:18 +02:00
Michał Jadwiszczak
f6a464ad81 configure service levels interval
So far the service levels interval, responsible for updating SL configuration,
was hardcoded in main.
Now it's extracted to `service_levels_interval_ms` option.
2024-01-12 10:28:24 +01:00
Nadav Har'El
7c5092cb8f test: add missing "tags" schema extension to cql_test_env
One of the unfortunate anti-features of cql_test_env (the framework used
in our CQL tests that are written in C++) is that it needs to repeat
various bizarre initializations steps done in main.cc, otherwise various
requests work incorrectly. One of these steps that main.cc is to initialize
various "schema extensions" which some of the Scylla features need to work
correctly.

We remembered to initialize some schema extensions in cql_test_env, but
forgot others. The one I will need in the following patch is the "tags"
extension, which we need to mark materialized views used by local
secondary indexes as "synchronous_updates" - without this patch the LSI
tests in secondary_index_test.cc will crash.

In addition to adding the missing extension, this patch also replaces
the segmentation-fault crash when it's missing (caused by a dynamic
cast failure) by a clearer on_internal_error() - so if we ever have
this bug again, it will be easier to debug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Kamil Braun
6fcaec75db Merge 'Add maintenance socket' from Mikołaj Grzebieluch
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.

To set up the maintenance socket, use the `maintenance-socket` option when starting the node.

* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.

The default value is `ignore`.

* With python driver

```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy

socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
                  # Driver tries to connect to other nodes in the cluster, so we need to filter them out.
                  load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```

Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.

Closes scylladb/scylladb#16172

* github.com:scylladb/scylladb:
  test.py: add maintenance socket test
  test.py: enable maintenance socket in tests by default
  docs: add maintenance socket documentation
  main: add maintenance socket
  main: refactor initialization of cql controller and auth service
  auth/service: don't create system_auth keyspace when used by maintenance socket
  cql_controller: maintenance socket: fix indentation
  cql_controller: add option to start maintenance socket
  db/config: add maintenance_socket_enabled bool class
  auth: add maintenance_socket_role_manager
  db/config: add maintenance_socket variable
2023-12-20 19:04:40 +02:00
Mikołaj Grzebieluch
e682e362a3 db/config: add maintenance_socket variable
If set to "ignore", maintenance socket will be disabled.
If set to "workdir", maintenance socket will be opened on <scylla's
workdir>/cql.m.
Otherwise it will be opened on path provided by maintenance_socket
variable.

It is set by default to 'ignore'.
2023-12-18 11:42:05 +01:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
a54f9052fc db: config: make override_decommission deprecated
The override_decommission option is supported only when
consistent_cluster_management is disabled. In the following commit,
we make consistent_cluster_management mandatory, which makes
overwrite_decommission unusable.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
571db3c983 db: config: make force_schema_commit_log deprecated
In scylladb/scylladb#16254, we made force_schema_commit_log unused.
After this change, if someone passes this option as the command line
argument, the boot fails. This behavior is undesired. We only want
this option to be ignored. We can achieve this effect by making it
deprecated.
2023-12-14 16:53:46 +01:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00
Piotr Smaroń
5fd30578d7 config: introduce value_status::Deprecated
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing

With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN  2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN  2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```

Fixes #15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928

Closes scylladb/scylladb#16184
2023-11-30 08:52:57 +03:00
Avi Kivity
8e9d3af431 Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani
This miniset, completes the prerequisites for enabling commitlog hard limit on by default.
Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances.
It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however,
once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant.

Tests: unit tests.
          LWT write test (#9331)
A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations:
1. The default configuration
2. Hard limit on (without changing the flush threshold)
3. the changes in this PR applied.
The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others.

Closes scylladb/scylladb#10974

* github.com:scylladb/scylladb:
  commitlog: enforce commitlog size hard limit by default
  commitlog: set flush threshold to half of the  limit size
  commitlog: unfold flush threshold assignment
2023-11-29 20:55:53 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Kefu Chai
6749d963ed config: define formatter for db::seed_provider_type
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define a formatter for db::seed_provider_type.

please note, we are still formatting vector<db::seed_provider_type>
with the helper provided by seastar/core/sstring.hh, which uses
operator<<() to print the elements in the vector being printed.
so we have to keep the operator<< formatter before disabling
the generic formatter for vector<T>.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16138
2023-11-23 11:04:35 +02:00
Eliran Sinvani
bfa839ce92 commitlog: enforce commitlog size hard limit by default
Since the commitlog size hard limit is a failsafe mechanism,
we don't expect to ever hit it. If we do hit the limit, it means
that we have an exceptional condition in the system. Hence, the
impact of enforcing the commitlog hard limit is irrelevant.
Here we enforce the limit by default.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
Piotr Smaroń
8c464b2ddb guardrails: restrict replication strategy (RS)
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.

Resolves #5224
Refs #8892 (the replication strategy part, not the RF part)

Closes scylladb/scylladb#15399
2023-10-31 18:34:41 +03:00
David Garcia
1121a4df04 docs: add groups to reference docs
fix: comment

Closes scylladb/scylladb#15592
2023-10-04 11:42:36 +03:00