mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-07-26 01:53:02 +00:00

T

Chris LuandGitHub 525900dfe4 fix(s3api): backfill multipart SSE-S3 metadata at completion (#9224 )

* fix(s3api): backfill missing per-chunk SSE-S3 metadata at completion

When a part of an SSE-S3 multipart upload lands with SseType=NONE on
its chunks (e.g. a transient failure to apply SSE-S3 setup in
PutObjectPart), the completed object inherits NONE-tagged chunks and
detectPrimarySSEType then misses the chunked SSE-S3 encryption. The
read path falls through to the unencrypted serve and GET returns
ciphertext, producing the SHA mismatch reported in #8908.

Recover at completion using the base IV and key data the upload
directory recorded at CreateMultipartUpload:

  - extractMultipartSSES3Info validates upload-entry metadata up
    front and hard-fails completion if the base IV or key data are
    malformed; serializing chunk metadata we then could not decrypt
    is worse than rejecting the upload.
  - completedMultipartChunk re-derives a per-chunk IV from baseIV +
    chunk.Offset (matching what putToFiler would have written) and
    serializes per-chunk SSE-S3 metadata when the chunk has no tag.
    Existing per-chunk metadata is left alone; we cannot recover an
    already-derived IV from the upload-entry alone.

The IV formula intentionally has no partNumber term: putToFiler
hardcodes partOffset=0 when it calls handleSSES3MultipartEncryption
for every part, so each chunk's encryption IV is
calculateIVWithOffset(baseIV, chunk.Offset_part_local).
PartOffsetMultiplier is defined in s3_constants but is not consumed
by the encryption path. Adopting (partNumber-1)*PartOffsetMultiplier
+ chunk.Offset would produce IVs that fail to decrypt the bytes on
disk - a stronger failure mode than the bug being fixed. Tests pin
this:

  - TestCompletedMultipartChunkBackfilledIVDecryptsActualCiphertext
    runs the round trip across the encryption boundary: encrypt
    parts with CreateSSES3EncryptedReaderWithBaseIV (the call
    putToFiler uses), drop chunk metadata to reproduce #8908,
    backfill, decrypt with backfilled IV, assert plaintext intact.
  - TestCompletedMultipartChunkRejectsPartNumberMultiplierFormula
    constructs the IV the partNumber formula would produce and
    shows it does not decrypt the actual ciphertext.

This commit covers the chunk-level recovery only. The companion
fix for the object-level Extended attributes (SeaweedFSSSES3Key /
X-Amz-Server-Side-Encryption) follows separately.

* fix(s3api): backfill canonical SSE-S3 attributes onto multipart object

The previous commit ensures every chunk of an SSE-S3 multipart upload
carries SseType=SSE_S3 with a per-chunk IV, so the multipart-direct
read path can decrypt. The completed object's Extended map can still
miss the canonical pair detectPrimarySSEType and IsSSES3EncryptedInternal
look at:

  - X-Amz-Server-Side-Encryption (the AmzServerSideEncryption header
    detectPrimarySSEType reads on inline / small-object reads)
  - x-seaweedfs-sse-s3-key (SeaweedFSSSES3Key, required by
    IsSSES3EncryptedInternal and by the read-path key lookup)

When a part of the upload was written by a path that did not set
those (the same #8908 race that produced the NONE chunks),
copySSEHeadersFromFirstPart finds nothing to copy and the final entry
ends up with only the multipart-init keys (SeaweedFSSSES3Encryption /
BaseIV / KeyData). The read path then mis-detects the object as
unencrypted.

applyMultipartSSES3HeadersFromUploadEntry writes the canonical pair
from the multipart-init metadata in all three completion paths
(versioned, suspended, non-versioned), only when the keys are missing
so a healthy first part still wins. extractMultipartSSES3Info already
ran in prepareMultipartCompletionState, so the data is reused without
re-decoding.

Tests: TestApplyMultipartSSES3HeadersFromUploadEntry covers backfill,
do-not-clobber, and nil-info no-op cases.

* fix(s3api): drop double IV adjustment in SSE-KMS chunk view decrypt

decryptSSEKMSChunkView was pre-adjusting the SSE-KMS chunk IV
(calculateIVWithOffset(baseIV, ChunkOffset)) and then handing the
adjusted IV to CreateSSEKMSDecryptedReader, which itself runs
calculateIVWithOffset(IV, ChunkOffset) on whatever it receives. The
offset was being applied twice for any chunk with a non-zero
ChunkOffset, corrupting the keystream for range reads that cross
multipart chunk boundaries.

Pass the raw SSE-KMS key (with base IV and the original ChunkOffset
field) into CreateSSEKMSDecryptedReader so the offset is applied
exactly once, and remove the now-dead intra-block skip that was
compensating for the double adjustment.

Add an anti-test inside TestSSEKMSDecryptChunkView_RequiresOffsetAdjustment
that decrypts the same ciphertext with a deliberately double-adjusted
IV and asserts the output is corrupted, so any regression that
re-introduces the double application fails the unit test.

* test(s3): cover multipart SSE across chunk-spanning parts and ranges

Adds an integration subtest "Multipart Parts Larger Than Internal
Chunks Across SSE Types" to TestSSEMultipartUploadIntegration that
exercises the end-to-end S3 path for the bugs fixed in this branch:

  - Two-part multipart upload with each part larger than the 8MB
    internal SeaweedFS chunk, so each part itself spans multiple
    underlying chunks.
  - Subtests for SSE-C, SSE-KMS, explicit SSE-S3, and bucket-default
    SSE-S3 - the four paths multipart parts can take through the SSE
    pipeline.
  - Each subtest does a full GET (verifying every byte and the
    response Content-Length / SSE response headers) plus a 129-byte
    range read straddling the 8MB internal chunk boundary, which is
    the path that produced the SSE-KMS double-IV corruption (fix in
    the previous commit) and the SSE-S3 chunk-tag loss (fix in the
    earlier commits).

Factored the request shape behind multipartSSEOptions /
uploadAndVerifyMultipartSSEObject so all four SSE flavors share the
same upload+verify code; only the SSE-specific input/output
configuration differs per subtest.

* test(s3): abort orphan multipart uploads on test failure

Address coderabbit nitpick on uploadAndVerifyMultipartSSEObject. The
helper used require.NoError after CreateMultipartUpload, UploadPart
and CompleteMultipartUpload, so a failure in any of those (or in the
later GET / range read on a still-incomplete upload) called t.Fatal
without aborting the in-flight MPU, leaving an orphan upload in the
bucket. Harmless in CI where the data dir is wiped on shutdown, but a
real annoyance when iterating locally and a textbook AWS S3 caveat in
production.

Register a t.Cleanup that calls AbortMultipartUpload unless a
"completed" flag was set right after a successful
CompleteMultipartUpload. Use context.Background for the abort call
since the parent ctx may already be cancelled at cleanup time, and
t.Logf the abort error rather than failing the test so the original
failure remains visible in the run output.

2026-04-25 23:06:37 -07:00

.github

fix(helm): gate S3 TLS cert args on httpsPort to stop probe failures (#9202 ) (#9206 )

2026-04-23 15:00:07 -07:00

.superset

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

cmd

Move SQL engine and PostgreSQL server to their own binaries (#8417 )

2026-02-23 16:27:08 -08:00

docker

ci(pjdfstest): cache docker layers via GHA to avoid apt mirror flakes (#9106 )

2026-04-16 12:50:19 -07:00

k8s/charts

fix(helm): gate S3 TLS cert args on httpsPort to stop probe failures (#9202 ) (#9206 )

2026-04-23 15:00:07 -07:00

note

docs(note): add production-setup slide deck

2026-04-23 02:36:58 -07:00

other

peer chunk sharing 1/8: proto definitions (#9130 )

2026-04-18 20:02:55 -07:00

postgres-examples

Message Queue: Add sql querying (#7185 )

2025-09-09 01:01:03 -07:00

seaweed-volume

build(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /seaweed-volume (#9216 )

2026-04-24 11:44:20 -07:00

seaweedfs-rdma-sidecar

build(deps): bump rand from 0.9.2 to 0.9.4 in /seaweedfs-rdma-sidecar/rdma-engine (#9065 )

2026-04-13 22:51:00 -07:00

snap

move to https://github.com/seaweedfs/seaweedfs

2022-07-29 00:17:28 -07:00

sw-block/design

doc: P14 S8 final bounded close — evidence matrix + P15 handoff (#9142 )

2026-04-20 02:24:44 -07:00

telemetry

fix(telemetry): use correct TopologyId field in integration test (#8714 )

2026-03-20 22:15:05 -07:00

test

fix(s3api): backfill multipart SSE-S3 metadata at completion (#9224 )

2026-04-25 23:06:37 -07:00

unmaintained

go fix

2026-02-20 18:42:00 -08:00

util

util: added gostd script

2019-04-30 03:23:20 +00:00

weed

fix(s3api): backfill multipart SSE-S3 metadata at completion (#9224 )

2026-04-25 23:06:37 -07:00

.gitignore

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

backers.md

chore: add nimbus web services to backers.md (#4769 )

2023-08-20 15:31:23 -07:00

CODE_OF_CONDUCT.md

add code of conduct (#4109 )

2023-01-05 11:01:22 -08:00

go.mod

build(deps): bump github.com/Azure/go-ntlmssp from 0.1.0 to 0.1.1 (#9205 )

2026-04-23 15:02:17 -07:00

go.sum

build(deps): bump github.com/Azure/go-ntlmssp from 0.1.0 to 0.1.1 (#9205 )

2026-04-23 15:02:17 -07:00

install.sh

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

LICENSE

Update LICENSE, fix copyright license year (#6405 )

2025-01-01 01:55:42 -08:00

Makefile

(fix): Add templ install step in admin-generate (#8997 )

2026-04-08 19:23:18 -07:00

README.md

Remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY exports

2026-01-14 13:09:11 -08:00

SECURITY.md

Add security policy for vulnerability reporting

2026-04-17 09:51:21 -07:00

VOLUME_SERVER_RUST_PLAN.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

README.md

SeaweedFS

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.

Your support will be really appreciated by me and other supporters!

Quick Start

Quick Start with weed mini

The easiest way to get started with SeaweedFS for development and testing:

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe.

Example:

# remove quarantine on macOS
# xattr -d com.apple.quarantine  ./weed

./weed mini -dir=/data

This single command starts a complete SeaweedFS setup with:

Master UI: http://localhost:9333
Volume Server: http://localhost:9340
Filer UI: http://localhost:8888
S3 Endpoint: http://localhost:8333
WebDAV: http://localhost:7333
Admin UI: http://localhost:23646

Perfect for development, testing, learning SeaweedFS, and single node deployments!

Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Quick Start with Single Binary

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe. Or run go install github.com/seaweedfs/seaweedfs/weed@latest.
export AWS_ACCESS_KEY_ID=admin ; export AWS_SECRET_ACCESS_KEY=key as the admin credentials to access the object store.
Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway. The difference with weed mini is that weed mini can auto configure based on the single host environment, while weed server requires manual configuration and are designed for production use.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -master="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

to store billions of files!
to serve the files fast!

SeaweedFS started as a blob store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem and Google's Colossus File System

On top of the blob store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, YDB, etc.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and cheaper than direct cloud storage!

System	File Metadata	File Content Read	POSIX	REST API	Optimized for large number of small files
SeaweedFS	lookup volume id, cacheable	O(1) disk seek		Yes	Yes
SeaweedFS Filer	Linearly Scalable, Customizable	O(1) disk seek	FUSE	Yes	Yes
GlusterFS	hashing		FUSE, NFS
Ceph	hashing + rules		FUSE	Yes
MooseFS	in memory		FUSE		No
MinIO	separate meta file for each file			Yes	No

SeaweedFS	comparable to Ceph	advantage
Master	MDS	simpler
Volume	OSD	optimized for small files
Filer	Ceph FS	linearly scalable, Customizable, O(1) or O(logN)

README.md Unescape Escape

SeaweedFS

Sponsor SeaweedFS via Patreon

Gold Sponsors

Table of Contents

Quick Start

Quick Start with weed mini

Quick Start for S3 API on Docker

Quick Start with Single Binary

Introduction

Features

Additional Blob Store Features

Filer Features

Kubernetes

Example: Using Seaweed Blob Store

Start Master Server

Start Volume Servers

Write A Blob

Save Blob Id

Read a Blob

Rack-Aware and Data Center-Aware Replication

Allocate Blob Key on Specific Data Center

Other Features

Blob Store Architecture

Master Server and Volume Server

Write and Read files

Saving memory

Tiered Storage to the cloud

SeaweedFS Filer

Compared to Other File Systems

Compared to HDFS

Compared to GlusterFS, Ceph

Compared to GlusterFS

Compared to MooseFS

Compared to Ceph

Compared to MinIO

Dev Plan

Installation Guide

Disk Related Topics

Hard Drive Performance

Solid State Disk

Benchmark

Run WARP and launch a mixed benchmark.

Enterprise

License

Stargazers over time

README.md