scylladb/docs/dev/service_levels.md at c73d0ffbaa94de1ac8f4d024e073015537d07d6d

Files

Marcin Maliszkiewicz 2ab143fb40 db: auth: move auth tables to system keyspace

Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.

Fixes https://github.com/scylladb/scylladb/issues/18098

Closes scylladb/scylladb#18769

2024-05-26 22:30:42 +03:00

6.2 KiB

Raw Blame History

Service Level Distributed Data

There are two system tables that are used to facilitate the service level feature.

Service Level Attachment Table

    CREATE TABLE system.role_attributes (
    role text,
    attribute_name text,
    attribute_value text,
    PRIMARY KEY (role, attribute_name))

The table was created with generality in mind, but its purpose is to record information about roles. The table columns meaning are: role - the name of the role that the attribute belongs to. attribute_name - the name of the attribute for the role. attribute_value - the value of the specified attribute.

For the service level, the relevant attribute name is service_level. So for example in order to find out which service_level is attached to role r one can run the following query:

SELECT * FROM  system.role_attributes WHERE role='r' and attribute_name='service_level'

Service Level Configuration Table

    CREATE TABLE system_distributed.service_levels (
    service_level text PRIMARY KEY,
    timeout duration,
    workload_type text)

The table is used to store and distribute the service levels configuration. The table column names meanings are: service_level - the name of the service level. timeout - timeout for operations performed by users under this service level workload_type - type of workload declared for this service level (unspecified, interactive or batch)

select * from system_distributed.service_levels ;

 service_level | timeout | workload_type
---------------+---------+---------------
            sl |   500ms |   interactive

Service Level Timeout

Service level timeout can be used to assign a default timeout value for all operations for a particular service level.

Service level timeout takes precedence over default timeout values from scylla.yaml configuration file, but it can still be superseded by per-query timeouts (issuing a query with USING TIMEOUT directive).

In order to set a timeout for a service level, create or alter it with proper parameters, e.g.:

create service level sl with timeout = 50ms;
list all service levels;

 service_level | timeout 
---------------+---------
            sl |    50ms

Restoring the default timeout value (from scylla.yaml file) can be done by setting the service level timeout value to null:

alter service level sl with timeout = null;
list all service levels;

 service_level | timeout 
---------------+---------
            sl |    null

Combining service level timeouts from multiple roles

A single role may be granted multiple other roles, which also means that more than one service level may be in effect for a particular user. In case of timeouts, multiple timeout values are combined by using a minimum of all effective timeouts. Example:

role1: timeout = 1s role2: timeout = 50ms role3: timeout = 2s role4: timeout = 10ms

The granting hierarchy is as follows, with role1 inheriting from role2, which in turn inherits from role3 and role4:

  role4  role3
     \   /
     role2
      /
   role1

With the following roles granted, here are the effective timeouts for the roles:

role1: timeout = 10ms role2: timeout = 10ms role3: timeout = 2s role4: timeout = 10ms

Workload types

It's possible to declare a workload type for a service level, currently out of three available values:

unspecified - generic workload without any specific characteristics; default
interactive - workload sensitive to latency, expected to have high/unbounded concurrency, with dynamic characteristics, OLTP; example: users clicking on a website and generating events with their clicks
batch - workload for processing large amounts of data, not sensitive to latency, expected to have fixed concurrency, OLAP, ETL; example: processing billions of historical sales records to generate useful statistics

Declaring a workload type provides more context for Scylla to decide how to handle the sessions. For instance, if a coordinator node receives requests with a rate higher than it can handle, it will make different decisions depending on the declared workload type:

for batch workloads it makes sense to apply backpressure - the concurrency is assumed to be fixed, so delaying a reply will likely also reduce the rate at which new requests are sent;
for interactive workloads, backpressure would only waste resources - delaying a reply does not decrease the rate of incoming requests, so it's reasonable for the coordinator to start shedding surplus requests.

If multiple workload types are applicable for a role, it makes sense if:

all the applicable workload types are identical
some of the service levels do not have any workload types specified

Otherwise, e.g. if a role has multiple workload types declared, the conflicts are resolved as follows:

X vs unspecified -> X
batch vs interactive -> batch - under the assumption that batch is safer, because it would not trigger load shedding as eagerly as interactive

Effective service level

Actual values of service level's options may come from different service levels, not only from the one user is assigned with. This can be achieved by assigning one role to another.

For instance: There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch). Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2).

To see detail how the options are merged, check combining service levels section.

To facilitate insight into which values come from which service level, there is LIST EFFECTIVE SERVICE LEVEL OF <role_name> command.

The command displays a table with: option name, effective service level the value comes from and the option value.

> LIST EFFECTIVE SERVICE LEVEL OF role2;

 service_level_option | effective_service_level | value
----------------------+-------------------------+-------------
        workload_type |                     sl2 |       batch
              timeout |                     sl1 |          2s

6.2 KiB Raw Blame History