Because the concept of pushing reading range does not work for the wrapping
we do (i.e. encryption), there is no point having it here. We need to do
said range handling higher up.
Also, must allow multi-layered wrapping.
For efficiency, the cardinality of the bloom filter
(i.e. the number of partition keys which will be written into the sstable)
has to be known before elements are inserted into the filter.
In some cases (e.g. memtables flush) this number is known exactly.
But in others (e.g. repair) it can only be estimated,
and the estimation might be very wrong, leading to an oversized filter.
Because of that, some time ago we added a piece of logic
(ran after the sstable is written, but before it's sealed)
which looks at the actual number of written partitions,
compares it to the initial estimate (on which the size of the bloom
filter was based on), and if the difference is unacceptably large,
it rewrites the bloom filter from partition keys contained in Index.db.
But the idea to rebuild the bloom filters from index files
isn't going to work with BTI indexes, because they don't store
whole partition keys. If we want sstables which don't have Index.db
files, we need some other way to deal with oversized filters.
Partition keys can be recovered from Data.db,
but that would often be way too expensive.
This patch adds another way. We introduce a new component file,
TemporaryHashes. This component, if written at all,
contains the 16-byte murmur hash for every partition key, in order,
and can be used in place of Index to reconstruct the bloom filter.
(Our bloom filters are actually built from the set of murmur hashes of
partition keys. The first step of inserting a partition key into a
filter is hashing the key. Remembering the hashes is sufficient
to build the filter later, without looking at partition keys again.)
As of this patch, if the Index component is not being written,
we don't allocate and populate a bloom filter during the Data.db write.
Instead, we write the murmur hashes to TemporaryHashes, and only
later, after the Data write finishes, we allocate the optimal-size,
bloom filter, we read the hashes back from TemporaryHashes,
and we populate the filter with them.
That is suboptimal.
Writing the hashes to disk (or worse, to S3) and reading
them back is more expensive than building the bloom filter
during the main Data pass.
So ideally it should be avoided in cases where we know
in advance that the partition key count estimate is good enough.
(Which should be the case in flushes and compactions).
But we defer that to a future patch.
(Such a change would involve passing some flag to the sstable writer
if the cardinality estimate is trustworthy, and not creating
TemporaryHashes if the estimate is trustworthy).
BTI indexes are made up of Partition.db and Rows.db files.
In this patch we introduce the corresponding component types.
In Cassandra, BTI is a separate "sstable format", with a new set
of versions. (I.e. `bti-da`, as opposed to `big-me`).
In this patch series, we are doing something different:
we are introducing version `ms`, which is like `me`, except with
`Index.db` and `Summary.db` replaced with `Partitions.db` and `Rows.db`.
With a setup like that, Scylla won't yet be able to read Cassandra's
BTI (`da`) files, because this patch doesn't teach Scylla
about `da`.
(But the way to that is open. It would just require first implementing
several other things which changed between `me` and `da`).
(And, naturally Cassandra will reject `ms` sstables.
But this isn't the first time we are breaking file
compatibility with Cassandra to some degree.
Other examples include encryption and dictionary compression).
Note: Partitions.db and Rows.db contain prefixes of keys,
which is sensitive information, so they have to be encrypted.
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
Fixes#22106
Moves the shared compress components to sstables, and rename to
match class type.
Adjust includes, removing redundant/unneeded ones where possible.
Closesscylladb/scylladb#25103
This PR introduces a new Key Provider to support Azure Key Vault as a Key Management System (KMS) for Encryption at Rest. The core design principle is the same as in the AWS and GCP key providers - an externally provided Vault key that is used to protect local data encryption keys (a process known as "key wrapping").
In more detail, this patch series consists of:
* Multiple Azure credential sources, offering a variety of authentication options (Service Principals, Managed Identities, environment variables, Azure CLI).
* The Azure host - the Key Vault endpoint bridge.
* The Azure Key Provider - the interface for the Azure host.
* Unit tests using real Azure resources (credentials and Vault keys).
* Log filtering logic to not expose sensitive data in the logs (plaintext keys, credentials, access tokens).
This is part of the overall effort to support Azure deployments.
Testing done:
* Unit tests.
* Manual test on an Azure VM with a Managed Identity.
* Manual test with credentials from Azure CLI.
* Manual test of `--azure-hosts` cmdline option.
* Manual test of log filtering.
Remaining items:
- [x] Create necessary Azure resources for CI.
- [x] Merge pipeline changes (https://github.com/scylladb/scylla-pkg/pull/5201).
Closes https://github.com/scylladb/scylla-enterprise/issues/1077.
New feature. No backport is needed.
Closesscylladb/scylladb#23920
* github.com:scylladb/scylladb:
docs: Document the Azure Key Provider
test: Add tests for Azure Key Provider
pylib: Add mock server for Azure Key Vault
encryption: Define and enable Azure Key Provider
encryption: azure: Delegate hosts to shard 0
encryption: Add Azure host cache
encryption: Add config options for Azure hosts
encryption: azure: Add override options
encryption: azure: Add retries for transient errors
encryption: azure: Implement init()
encryption: azure: Implement get_key_by_id()
encryption: azure: Add id-based key cache
encryption: azure: Implement get_or_create_key()
encryption: azure: Add credentials in Azure host
encryption: azure: Add attribute-based key cache
encryption: azure: Add skeleton for Azure host
encryption: Templatize get_{kmip,kms,gcp}_host()
encryption: gcp: Fix typo in docstring
utils: azure: Get access token with default credentials
utils: azure: Get access token from Azure CLI
utils: azure: Get access token from IMDS
utils: azure: Get access token with SP certificate
utils: azure: Get access token with SP secret
utils: rest: Add interface for request/response redaction logic
utils: azure: Declare all Azure credential types
utils: azure: Define interface for Azure credentials
utils: Introduce base64url_{encode,decode}
Do not use default, instead list all fall-through components
explicitly, so if we add a new one, the developer doing that
will be forced to consider what to do here.
Eliminate the `default` case from the switch in
`encryption_file_io_extension::wrap_sink`, and explicitly
handle all `component_type` values within the switch statement.
fixes: https://github.com/scylladb/scylladb/issues/23724Closesscylladb/scylladb#24987
Define the Azure Key Provider to connect the core EaR business logic
with the Azure-based Key Management implementation (Azure host).
Introduce "AzureKeyProviderFactory" as a new `key_provider` value in the
configuration.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
The encryption context maintains a cache per host type per thread.
Add a cache for the Azure host as well. Initialize the cache with Azure
hosts from the configuration, while registering the extensions for
encryption.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Add `make_data_or_index_source` to the `storage` interface, implement it
for `filesystem_storage` storage which just creates `data_source` from a
file and for the `s3_storage` create a (maybe) decrypting source from s3
make_download_source.
This change should solve performance improvement for reading large objects
from S3 and should not affect anything for the `filesystem_storage`.
utils::loading cache has a timer that can, if we're unlucky, be runnnig
while the encryption context/extensions referencing the various host
objects containing them are destroyed in the case of unit testing.
Add a stop phase in encryption context shutdown closing the caches.
schema_extension allows making invisible changes to system_schema
that evade upgrade rollback tests. They appear in system_schema
as an encoded blob which reduces serviceability, as they cannot
be read.
Deprecate it and point users to adding explicit columns in scylla_tables.
We could probably make use of the data structure, after we teach it
to encode its payload into proper named and typed columns instead of
using IDL.
Closesscylladb/scylladb#23151
Adds exception handler + cleanup for the case where we have a
bad config/env vars (hint minio) or similar, such that we fail
with exception during setting up the EAR context.
In a normal startup, this is ok. We will report the exception,
and the do a exit(1).
In tests however, we don't and active context will instead be
freed quite proper, in which case we need to call stop to ensure
we don't crash on shared pointer destruction on wrong shard.
Doing so will hide the real issue from whomever runs the test.
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22857
Fixes#22401
In the fix for scylladb/scylla-enterprise#892, the extraction and check for sstable component encryption mask was copied
to a subroutine for description purposes, but a very important 1 << <value> shift was somehow
left on the floor.
Without this, the check for whether we actually contain a component encrypted can be wholly
broken for some components.
Closesscylladb/scylladb#22398
"sie" is the short for "system info encryption". it is a wrapper around
a `opts` map so we can get the individual option by providing a default
value via an `optional<>` return value. but "sie" could be difficult to
understand without more context. and it is used like a function -- we
get the individual option using its operator().
so, in order to improve the readability, in this change, we rename it
to "get_opt".
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these misspellings are identified by codespell. they are either in
comment or logging messages. let's fix them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Bulk transfer of EAR functionality. Includes all providers etc.
Could maybe break up into smaller blocks, but once it gets down to
the core of it, would require messing with code instead of just moving.
So this is it.
Note: KMIP support is disabled unless you happen to have the kmipc
SDK in your scylla dir.
Adds optional encryption of sstables and commitlog, using block
level file encryption. Provides key sourcing from various sources,
such as local files or popular KMS systems.