Files
scylladb/docs/dev/service_levels.md
2022-06-24 18:07:08 +01:00

137 lines
5.0 KiB
Markdown

## Service Level Distributed Data
There are two system tables that are used to facilitate the service level feature.
### Service Level Attachment Table
```
CREATE TABLE system_auth.role_attributes (
role text,
attribute_name text,
attribute_value text,
PRIMARY KEY (role, attribute_name))
```
The table was created with generality in mind, but its purpose is to record
information about roles. The table columns meaning are:
*role* - the name of the role that the attribute belongs to.
*attribute_name* - the name of the attribute for the role.
*attribute_value* - the value of the specified attribute.
For the service level, the relevant attribute name is `service_level`.
So for example in order to find out which `service_level` is attached to role `r`
one can run the following query:
```
SELECT * FROM system_auth.role_attributes WHERE role='r' and attribute_name='service_level'
```
### Service Level Configuration Table
```
CREATE TABLE system_distributed.service_levels (
service_level text PRIMARY KEY,
timeout duration,
workload_type text)
```
The table is used to store and distribute the service levels configuration.
The table column names meanings are:
*service_level* - the name of the service level.
*timeout* - timeout for operations performed by users under this service level
*workload_type* - type of workload declared for this service level (unspecified, interactive or batch)
```
select * from system_distributed.service_levels ;
service_level | timeout | workload_type
---------------+---------+---------------
sl | 500ms | interactive
```
### Service Level Timeout
Service level timeout can be used to assign a default timeout value for all operations for a particular service level.
Service level timeout takes precedence over default timeout values from scylla.yaml configuration
file, but it can still be superseded by per-query timeouts (issuing a query with USING TIMEOUT directive).
In order to set a timeout for a service level, create or alter it with proper parameters, e.g.:
```
create service level sl with timeout = 50ms;
list all service levels;
service_level | timeout
---------------+---------
sl | 50ms
```
Restoring the default timeout value (from scylla.yaml file) can be done by setting the service level timeout value to null:
```
alter service level sl with timeout = null;
list all service levels;
service_level | timeout
---------------+---------
sl | null
```
#### Combining service level timeouts from multiple roles
A single role may be granted multiple other roles, which also means that more than one service level may be in effect
for a particular user. In case of timeouts, multiple timeout values are combined by using a minimum of all effective
timeouts. Example:
role1: `timeout = 1s`
role2: `timeout = 50ms`
role3: `timeout = 2s`
role4: `timeout = 10ms`
The granting hierarchy is as follows, with role1 inheriting from role2, which in turn
inherits from role3 and role4:
role4 role3
\ /
role2
/
role1
With the following roles granted, here are the effective timeouts for the roles:
role1: `timeout = 10ms`
role2: `timeout = 10ms`
role3: `timeout = 2s`
role4: `timeout = 10ms`
### Workload types
It's possible to declare a workload type for a service level, currently out of three available values:
1. unspecified - generic workload without any specific characteristics; default
2. interactive - workload sensitive to latency, expected to have high/unbounded concurrency,
with dynamic characteristics, OLTP;
example: users clicking on a website and generating events with their clicks
3. batch - workload for processing large amounts of data, not sensitive to latency, expected to have
fixed concurrency, OLAP, ETL;
example: processing billions of historical sales records to generate useful statistics
Declaring a workload type provides more context for Scylla to decide how to handle the sessions.
For instance, if a coordinator node receives requests with a rate higher than it can handle,
it will make different decisions depending on the declared workload type:
- for batch workloads it makes sense to apply backpressure - the concurrency is assumed to be fixed,
so delaying a reply will likely also reduce the rate at which new requests are sent;
- for interactive workloads, backpressure would only waste resources - delaying a reply does not
decrease the rate of incoming requests, so it's reasonable for the coordinator to start shedding
surplus requests.
If multiple workload types are applicable for a role, it makes sense if:
- all the applicable workload types are identical
- some of the service levels do not have any workload types specified
Otherwise, e.g. if a role has multiple workload types declared,
the conflicts are resolved as follows:
- `X` vs `unspecified` -> `X`
- `batch` vs `interactive` -> `batch` - under the assumption that `batch` is safer, because it would not trigger load shedding as eagerly as `interactive`