Files
scylladb/docs/dev/object_storage.md
Robert Bindar e3a3508960 Move object_storage.yaml endpoints to scylla.yaml
This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-03-31 13:39:39 +03:00

115 lines
4.0 KiB
Markdown

# Keeping sstables on S3
On of the ways to use object storage is to keep sstables directly on it as objects.
## Enabling the feature
Currently the object-storage backend works if `keyspace-storage-options` is listed
in `experimental_features` in `scylla.yaml`. like:
```yaml
experimental_features:
- keyspace-storage-options
```
It can also be enabled with `--experimental-features=keyspace-storage-options`
command line option when launchgin scylla.
## Configuring AWS S3 access
You can define endpoint details in the `scylla.yaml` file. For example:
```yaml
object_storage_endpoints:
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
```
### Local/Development Environment
In a local or development environment, you usually need to set authentication tokens in environment variables to ensure the client works properly. For instance:
```sh
export AWS_ACCESS_KEY_ID=EXAMPLE_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=EXAMPLE_SECRET_ACCESS_KEY
```
Additionally, you may include an `aws_session_token`, although this is not typically necessary for local or development environments:
```sh
export AWS_ACCESS_KEY_ID=EXAMPLE_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=EXAMPLE_SECRET_ACCESS_KEY
export AWS_SESSION_TOKEN=EXAMPLE_TEMPORARY_SESSION_TOKEN
```
### Important Note
The examples above are intended for development or local environments. You should *never* use this approach in production. The Scylla S3 client will first attempt to access credentials from environment variables. If it fails to obtain credentials, it will then try to retrieve them from the AWS Security Token Service (STS) or the EC2 Instance Metadata Service.
For the EC2 Instance Metadata Service to function correctly, no additional configuration is required. However, STS requires the IAM Role ARN to be defined in the `scylla.yaml` file, as shown below:
```yaml
object_storage_endpoints:
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
iam_role_arn: arn:aws:iam::123456789012:instance-profile/my-instance-instance-profile
```
## Creating keyspace
Sstables location is keyspace-scoped. In order to create a keyspace with S3
storage use `CREATE KEYSPACE` with `STORAGE = { 'type': 'S3', 'endpoint': '$endpoint_name', 'bucket': '$bucket' }`
parameters, where `$endpoint_name` should match with the corresponding `name`
of the configured endpoint in the YAML file above.
In the following example, an endpoint named "s3.us-east-2.amazonaws.com" is
defined in `scylla.yaml`, and this endpoint is used when creating the
keyspace "ks".
in `scylla.yaml`:
```yaml
object_storage_endpoints:
- name: s3.us-east-2.amazonaws.com
port: 443
https: true
aws_region: us-east-2
```
and when creating the keyspace:
```cql
CREATE KEYSPACE ks
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'replication_factor' : 1
}
AND STORAGE = {
'type' : 'S3',
'endpoint' : 's3.us-east-2.amazonaws.com',
'bucket' : 'bucket-for-testing'
};
```
# Copying sstables on S3 (backup)
It's possible to upload sstables from data/ directory on S3 via API. This is good
to do because in that case all the resources that are needed for that operation (like
disk IO bandwidth and IOPS, CPU time, networking bandwidth) will be under Seastar's
control and regular Scylla workload will not be randomly affected.
The API endpoint name is `/storage_service/backup` and its Swagger description can be
found [here](./api/api-doc/storage_service.json). Accepted parameters are
* *keyspace*: the keyspace to copy sstables from
* *table*: the table to copy sstables from
* *snapshot*: the snapshot name to copy sstables from
* *endpoint*: the key in the object storage configuration file
* *bucket*: bucket name to put sstables' files in
* *prefix*: prefix to put sstables' files under
Currently only snapshot backup is possible, so first one needs to take [snapshot](docs/kb/snapshots.rst)
All tables in a keyspace are uploaded, the destination object names will look like
`s3://bucket/some/prefix/to/store/data/.../sstable`