Files
seaweedfs/weed
Chris Lu 0a91b57f16 fix(s3): encrypt SSE-S3 KEK at rest with AES-GCM wrapping (#8880)
* fix(s3): encrypt SSE-S3 KEK at rest using passphrase-derived wrapping key

* fix(s3): surface KEK migration failures instead of silently dropping them

The legacy-plaintext -> encrypted-at-rest path used to swallow both
wrapKEK and updateKEKContent errors. An operator who configured a
passphrase but had a filer permission issue (or a wrap failure) saw
nothing in the logs and the KEK stayed on disk in plaintext, with the
migration retried on every restart and silently failing every time.

Log each failure path explicitly so the unmigrated state is visible.
The server still starts with the in-memory key loaded — refusing to
boot here would be worse than the warning.

Addresses gemini and coderabbit reviews on PR #8880.

* fix(s3): use a per-installation random salt for KEK wrapping HKDF

The original implementation hardcoded
"seaweedfs-sse-s3-kek-wrapping-v1" as the HKDF salt for the KEK
wrapping key. Two SeaweedFS installations using the same passphrase
therefore produced byte-identical wrapping keys, opening a
precomputation/rainbow-table angle against weaker passphrases.

Generate a random 32-byte salt every time wrapKEK runs and embed it
in the on-disk payload alongside the AES-GCM ciphertext. The new
format is base64(magic("SWv2") || salt || nonce || ciphertext+tag);
unwrapKEK detects the magic and reads the salt out of the payload.
KEKs wrapped under the legacy fixed-salt format still unwrap cleanly
and are opportunistically re-wrapped into v2 on the next load so
operators get the stronger format without manual migration.

Addresses gemini review on PR #8880.

* fix(s3): plumb KEK passphrase from env into the global key manager

InitializeGlobalSSES3KeyManager used to ignore the kekPassphrase
field entirely — the global manager was always constructed with the
empty constructor, so the encrypted-at-rest code path never engaged
in production. Read the passphrase from WEED_S3_SSE_KEK_PASSPHRASE
and apply it before InitializeWithFiler so the load path picks up the
encrypted format. Log a warning when the env var is unset to make the
plaintext fallback visible to operators upgrading from earlier builds.

Adds SetKEKPassphrase as the public seam used by the global init and
by tests, plus regression tests for the wrap/unwrap round-trip,
random-salt independence across managers sharing a passphrase, and
the no-passphrase fallback that preserves the legacy hex-decode path.

Addresses coderabbit review on PR #8880.

* fix(s3): drop redundant base64 decode in KEK migration check

unwrapKEK already does the base64 decode and the magic-prefix check;
isV2WrappedKEK was repeating both passes purely so the migration
branch in loadSuperKeyFromFiler could ask "was this v1 or v2". Have
unwrapKEK return the version flag directly and delete the redundant
helper. Single decode pass per load.

Addresses gemini review on PR #8880.

* fix(s3): updateKEKContent must overwrite, not create

filer_pb.MkFile maps to CreateEntry, which is O_EXCL: it fails with
ErrEntryAlreadyExists when the file already exists. Both KEK migration
paths (legacy v1→v2 rewrap and plaintext→encrypted) call
updateKEKContent against the entry they just read, so MkFile errored
out every time and the migrations only ran in memory while the on-disk
KEK stayed in its old format. The previous commit logged the failure
loudly but the result was the same: a pre-existing deployment never
got migrated.

Switch updateKEKContent to LookupDirectoryEntry + UpdateEntry so the
overwrite actually persists. Surface the lookup/update errors so
the caller's existing "migration failed; KEK still on disk in old
form" warnings fire on the right cases.

Addresses coderabbit critical review on PR #8880.

* chore(s3): drop unused generateAndSaveSuperKeyToFiler

Master removed this as dead code in #8913 and I reintroduced it during
the merge resolution thinking the migration paths still needed it.
On second look it has no callers in this branch — every KEK-creation
path on PR #8880 goes through the existing reader code that handles
"file not present, generate" inline. Drop the duplicate.

Addresses gemini medium review on PR #8880.

* fix(s3): updateKEKContent honours km.kekPath instead of hardcoded path

The migration write path was always pointing at /etc/s3/sse_kek even
when the manager was configured with an operator-overridden kekPath.
Split km.kekPath at the last "/" so the lookup + UpdateEntry land on
the same file the read path used. Defaults match defaultKEKPath when
kekPath is unset.

Addresses gemini medium review on PR #8880.

* fix(s3): KEK passphrase via Viper key, with env-var fallback

The KEK passphrase was read straight from os.Getenv, but every other
SSE-S3 secret (s3.sse.kek, s3.sse.key) goes through Viper so an
operator can set them in security.toml or via WEED_ env vars
interchangeably. Add s3.sse.kek.passphrase to the same path; the
existing SSES3KEKPassphraseEnv lookup stays as a fallback so
deployments wired before this commit keep working.

Addresses gemini medium review on PR #8880.
2026-05-04 19:21:41 -07:00
..
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-14 20:48:24 -07:00
2026-04-23 10:05:51 -07:00
2026-05-03 23:15:34 -07:00