Files
scylladb/docs/dev/service_levels.md
2022-06-24 18:07:08 +01:00

5.0 KiB

Service Level Distributed Data

There are two system tables that are used to facilitate the service level feature.

Service Level Attachment Table

    CREATE TABLE system_auth.role_attributes (
    role text,
    attribute_name text,
    attribute_value text,
    PRIMARY KEY (role, attribute_name))

The table was created with generality in mind, but its purpose is to record information about roles. The table columns meaning are: role - the name of the role that the attribute belongs to. attribute_name - the name of the attribute for the role. attribute_value - the value of the specified attribute.

For the service level, the relevant attribute name is service_level. So for example in order to find out which service_level is attached to role r one can run the following query:

SELECT * FROM  system_auth.role_attributes WHERE role='r' and attribute_name='service_level'

Service Level Configuration Table

    CREATE TABLE system_distributed.service_levels (
    service_level text PRIMARY KEY,
    timeout duration,
    workload_type text)

The table is used to store and distribute the service levels configuration. The table column names meanings are: service_level - the name of the service level. timeout - timeout for operations performed by users under this service level workload_type - type of workload declared for this service level (unspecified, interactive or batch)

select * from system_distributed.service_levels ;

 service_level | timeout | workload_type
---------------+---------+---------------
            sl |   500ms |   interactive

Service Level Timeout

Service level timeout can be used to assign a default timeout value for all operations for a particular service level.

Service level timeout takes precedence over default timeout values from scylla.yaml configuration file, but it can still be superseded by per-query timeouts (issuing a query with USING TIMEOUT directive).

In order to set a timeout for a service level, create or alter it with proper parameters, e.g.:

create service level sl with timeout = 50ms;
list all service levels;

 service_level | timeout 
---------------+---------
            sl |    50ms

Restoring the default timeout value (from scylla.yaml file) can be done by setting the service level timeout value to null:

alter service level sl with timeout = null;
list all service levels;

 service_level | timeout 
---------------+---------
            sl |    null

Combining service level timeouts from multiple roles

A single role may be granted multiple other roles, which also means that more than one service level may be in effect for a particular user. In case of timeouts, multiple timeout values are combined by using a minimum of all effective timeouts. Example:

role1: timeout = 1s role2: timeout = 50ms role3: timeout = 2s role4: timeout = 10ms

The granting hierarchy is as follows, with role1 inheriting from role2, which in turn inherits from role3 and role4: role4 role3 \ / role2 / role1

With the following roles granted, here are the effective timeouts for the roles:

role1: timeout = 10ms role2: timeout = 10ms role3: timeout = 2s role4: timeout = 10ms

Workload types

It's possible to declare a workload type for a service level, currently out of three available values:

  1. unspecified - generic workload without any specific characteristics; default
  2. interactive - workload sensitive to latency, expected to have high/unbounded concurrency, with dynamic characteristics, OLTP; example: users clicking on a website and generating events with their clicks
  3. batch - workload for processing large amounts of data, not sensitive to latency, expected to have fixed concurrency, OLAP, ETL; example: processing billions of historical sales records to generate useful statistics

Declaring a workload type provides more context for Scylla to decide how to handle the sessions. For instance, if a coordinator node receives requests with a rate higher than it can handle, it will make different decisions depending on the declared workload type:

  • for batch workloads it makes sense to apply backpressure - the concurrency is assumed to be fixed, so delaying a reply will likely also reduce the rate at which new requests are sent;
  • for interactive workloads, backpressure would only waste resources - delaying a reply does not decrease the rate of incoming requests, so it's reasonable for the coordinator to start shedding surplus requests.

If multiple workload types are applicable for a role, it makes sense if:

  • all the applicable workload types are identical
  • some of the service levels do not have any workload types specified

Otherwise, e.g. if a role has multiple workload types declared, the conflicts are resolved as follows:

  • X vs unspecified -> X
  • batch vs interactive -> batch - under the assumption that batch is safer, because it would not trigger load shedding as eagerly as interactive