mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-23 18:21:28 +00:00

Go to file

Chris Lu d0a09ea178 fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 )

* fix(s3): honor ChecksumAlgorithm on presigned URL uploads

AWS SDK presigners hoist x-amz-sdk-checksum-algorithm (and related
checksum headers) into the signed URL's query string, so servers must
read either location. detectRequestedChecksumAlgorithm only looked at
request headers, so presigned PUTs with ChecksumAlgorithm set validated
and stored no additional checksum, and HEAD/GET never returned the
x-amz-checksum-* header.

Read these parameters from headers first, then fall back to a
case-insensitive query-string lookup. Apply the same fallback when
comparing an object-level checksum value against the computed one.

Fixes #9075

* test(s3): presigned URL checksum integration tests (#9075)

Adds test/s3/checksum with end-to-end coverage for flexible-checksum
behavior on presigned URL uploads. Tests generate a presigned PUT URL
with ChecksumAlgorithm set, upload the body with a plain http.Client
(bypassing AWS SDK middleware so the server must honor the query-string
hoisted x-amz-sdk-checksum-algorithm), then HEAD/GET with
ChecksumMode=ENABLED and assert the stored x-amz-checksum-* header.

Covers SHA256, SHA1, and a negative control with no checksum requested.
Wires the new directory into s3-go-tests.yml as its own CI job.

* perf(s3): parse presigned query once in detectRequestedChecksumAlgorithm

Previously, each header fallback called getHeaderOrQuery, which re-parsed
r.URL.Query() and allocated a new map on every invocation — up to eight
times per PutObject request. Parse the raw query at most once per request
(only when non-empty) and pass the pre-parsed url.Values into a new
lookupHeaderOrQuery helper.

Also drops a redundant strings.ToLower allocation in the case-insensitive
query key scan (strings.EqualFold already handles ASCII case folding).

Addresses review feedback from gemini-code-assist on PR #9076.

* test(s3): honor credential env vars and add presigned upload timeout

- init() now reads S3_ACCESS_KEY/S3_SECRET_KEY (and AWS_ACCESS_KEY_ID /
  AWS_SECRET_ACCESS_KEY / AWS_REGION fallbacks) so that
  `make test-with-server ACCESS_KEY=... SECRET_KEY=...` no longer
  authenticates with hardcoded defaults while the server has been
  started with different credentials.
- uploadViaPresignedURL uses a dedicated http.Client with a 30s timeout
  instead of http.DefaultClient, so a stalled server fails fast in CI
  instead of blocking until the suite's global timeout fires.

Addresses review feedback from coderabbitai on PR #9076.

* test(s3): pass S3_PORT and credentials through to checksum tests

- 'make test' now exports S3_ENDPOINT, S3_ACCESS_KEY, and S3_SECRET_KEY
  derived from the Makefile variables so the Go test process talks to
  the same endpoint/credentials that start-server was launched with.
- start-server cleans up the background SeaweedFS process and PID file
  when the readiness poll times out, preventing stale port conflicts on
  subsequent runs.

Addresses review feedback from coderabbitai on PR #9076.

* ci(s3): raise checksum tests step timeout

make test-with-server builds weed_binary, waits up to 90s for readiness,
then runs go test -timeout=10m. The previous 12-minute step timeout only
had ~2 minutes of headroom over the Go timeout, risking the Actions
runner killing the step before tests reported a real failure.

Bumps the job timeout from 15 to 20 minutes and the step timeout from
12 to 16 minutes, matching other S3 integration jobs.

Addresses review feedback from coderabbitai on PR #9076.

* perf(s3): thread pre-parsed query through putToFiler hot path

Parse the request's query string once at the top of putToFiler and
reuse the resulting url.Values for both the checksum-algorithm detection
and the expected-checksum verification. Previously, the verification
path called getHeaderOrQuery which re-parsed r.URL.Query() again on
every PutObject, defeating the previous commit's single-parse goal.

- Add parseRequestQuery + detectRequestedChecksumAlgorithmQ (the
  pre-parsed-query variant). detectRequestedChecksumAlgorithm is now a
  thin wrapper used by callers that do a single lookup.
- putToFiler parses once and threads the result through both call sites.
- Remove getHeaderOrQuery and update the unit test to use
  lookupHeaderOrQuery directly.

Addresses follow-up review from gemini-code-assist on PR #9076.

* test(s3): check io.ReadAll error in uploadViaPresignedURL helper

* test(s3): drop SHA1 presigned test case

The AWS SDK v2 presigner signs a Content-MD5 header at presign time
for SHA1 PutObject requests even when no body is attached (the MD5 of
the empty payload gets baked into the signed headers). Uploading the
real body via a plain http.Client then trips SeaweedFS's MD5 validation
and returns BadDigest — an SDK/presigner quirk, not a SeaweedFS bug.

The SHA256 positive case already exercises the server-side
query-hoisted algorithm path that issue #9075 is about, and the unit
tests in weed/s3api cover each algorithm's header mapping. Drop the
SHA1 integration case rather than chase SDK-specific workarounds.

* test(s3): provide real Content-MD5 to presigned checksum test

AWS SDK v2's flexible-checksum middleware signs a Content-MD5 header at
presign time. There is no body to hash at that point, so it seeds the
header with MD5 of the empty payload. When the real body is then PUT
with a plain http.Client, SeaweedFS's server-side Content-MD5
verification correctly rejects the upload with BadDigest.

Pre-compute the MD5 of the test body and thread it into
PutObjectInput.ContentMD5 so the signed Content-MD5 matches the body
that will actually be uploaded. The test still exercises the
server-side path that reads X-Amz-Sdk-Checksum-Algorithm from the
query string (the fix that PR #9076 is validating).

* test(s3): send the signed Content-MD5 header on presigned upload

uploadViaPresignedURL now accepts an extraHeaders map so callers can
thread through headers that the presigner signed but the raw http
request would otherwise omit. The SHA256 test passes the Content-MD5
it computed, matching what the presigner baked into the signature.

Fixes SignatureDoesNotMatch seen in CI after the previous commit set
ContentMD5 on the presign input without sending the corresponding
header on the actual upload.

* test(s3): build presigned URL with the raw v4 signer

The AWS SDK v2 s3.PresignClient runs the flexible-checksum middleware
for any PutObject input that carries ChecksumAlgorithm. That middleware
injects a Content-MD5 header at presign time, and with no body present
it seeds MD5-of-empty. Any subsequent upload of a non-empty body
through a plain http.Client then trips SeaweedFS's Content-MD5
verification and returns BadDigest — not the code path that issue
#9075 is about.

Replace the PresignClient usage in the integration test with a direct
call to v4.Signer.PresignHTTP, building a canonical URL whose query
string already contains x-amz-sdk-checksum-algorithm=SHA256. This is
exactly the shape of URL a browser/curl client would receive from any
presigner that hoists the algorithm header, and it exercises the
server-side fix from PR #9076 without dragging in SDK-specific
middleware quirks.

* test(s3): set X-Amz-Expires on presigned URL before signing

v4.Signer.PresignHTTP does not add X-Amz-Expires on its own — the
caller has to seed it into the request's query string so the signer
includes it in the canonical query and the server accepts the
presigned URL. Without it, SeaweedFS correctly returns
AuthorizationQueryParametersError.

Also adds a .gitignore for the make-managed test volume data, log
file, and PID file so local `make test-with-server` runs do not leave
artifacts tracked by git.

Verified by running the integration tests locally:
  make test-with-server → both presigned checksum tests PASS.

2026-04-14 21:52:49 -07:00

.claude

fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 )

2026-04-14 21:52:49 -07:00

.github

fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 )

2026-04-14 21:52:49 -07:00

.superset

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

cmd

Move SQL engine and PostgreSQL server to their own binaries (#8417 )

2026-02-23 16:27:08 -08:00

docker

build(docker): upgrade all Alpine packages in final image (#9070 )

2026-04-14 02:08:15 -07:00

k8s/charts

fix(helm): skip s3 ServiceMonitor when only filer.s3 is enabled (#9081 )

2026-04-14 19:04:51 -07:00

note

Update Wiki images (#8069 )

2026-01-20 14:12:14 -08:00

other

feat: pass expected_data_size from clients for size-aware assignment (#9032 )

2026-04-11 11:30:47 -07:00

postgres-examples

Message Queue: Add sql querying (#7185 )

2025-09-09 01:01:03 -07:00

seaweed-volume

admin: report file and delete counts for EC volumes (#9060 )

2026-04-13 21:10:36 -07:00

seaweedfs-rdma-sidecar

build(deps): bump rand from 0.9.2 to 0.9.4 in /seaweedfs-rdma-sidecar/rdma-engine (#9065 )

2026-04-13 22:51:00 -07:00

snap

move to https://github.com/seaweedfs/seaweedfs

2022-07-29 00:17:28 -07:00

telemetry

fix(telemetry): use correct TopologyId field in integration test (#8714 )

2026-03-20 22:15:05 -07:00

test

fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 )

2026-04-14 21:52:49 -07:00

unmaintained

go fix

2026-02-20 18:42:00 -08:00

util

util: added gostd script

2019-04-30 03:23:20 +00:00

weed

fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 )

2026-04-14 21:52:49 -07:00

.gitignore

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

backers.md

chore: add nimbus web services to backers.md (#4769 )

2023-08-20 15:31:23 -07:00

CODE_OF_CONDUCT.md

add code of conduct (#4109 )

2023-01-05 11:01:22 -08:00

go.mod

[nfs] Add NFS (#9067 )

2026-04-14 20:48:24 -07:00

go.sum

[nfs] Add NFS (#9067 )

2026-04-14 20:48:24 -07:00

install.sh

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

LICENSE

Update LICENSE, fix copyright license year (#6405 )

2025-01-01 01:55:42 -08:00

Makefile

(fix): Add templ install step in admin-generate (#8997 )

2026-04-08 19:23:18 -07:00

README.md

Remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY exports

2026-01-14 13:09:11 -08:00

VOLUME_SERVER_RUST_PLAN.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

SeaweedFS

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.

Your support will be really appreciated by me and other supporters!

Quick Start

Quick Start with weed mini

The easiest way to get started with SeaweedFS for development and testing:

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe.

Example:

# remove quarantine on macOS
# xattr -d com.apple.quarantine  ./weed

./weed mini -dir=/data

This single command starts a complete SeaweedFS setup with:

Master UI: http://localhost:9333
Volume Server: http://localhost:9340
Filer UI: http://localhost:8888
S3 Endpoint: http://localhost:8333
WebDAV: http://localhost:7333
Admin UI: http://localhost:23646

Perfect for development, testing, learning SeaweedFS, and single node deployments!

Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Quick Start with Single Binary

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe. Or run go install github.com/seaweedfs/seaweedfs/weed@latest.
export AWS_ACCESS_KEY_ID=admin ; export AWS_SECRET_ACCESS_KEY=key as the admin credentials to access the object store.
Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway. The difference with weed mini is that weed mini can auto configure based on the single host environment, while weed server requires manual configuration and are designed for production use.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -master="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

to store billions of files!
to serve the files fast!

SeaweedFS started as a blob store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem and Google's Colossus File System

On top of the blob store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, YDB, etc.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and cheaper than direct cloud storage!

System	File Metadata	File Content Read	POSIX	REST API	Optimized for large number of small files
SeaweedFS	lookup volume id, cacheable	O(1) disk seek		Yes	Yes
SeaweedFS Filer	Linearly Scalable, Customizable	O(1) disk seek	FUSE	Yes	Yes
GlusterFS	hashing		FUSE, NFS
Ceph	hashing + rules		FUSE	Yes
MooseFS	in memory		FUSE		No
MinIO	separate meta file for each file			Yes	No

SeaweedFS	comparable to Ceph	advantage
Master	MDS	simpler
Volume	OSD	optimized for small files
Filer	Ceph FS	linearly scalable, Customizable, O(1) or O(logN)

README.md Unescape Escape

SeaweedFS

Sponsor SeaweedFS via Patreon

Gold Sponsors

Table of Contents

Quick Start

Quick Start with weed mini

Quick Start for S3 API on Docker

Quick Start with Single Binary

Introduction

Features

Additional Blob Store Features

Filer Features

Kubernetes

Example: Using Seaweed Blob Store

Start Master Server

Start Volume Servers

Write A Blob

Save Blob Id

Read a Blob

Rack-Aware and Data Center-Aware Replication

Allocate Blob Key on Specific Data Center

Other Features

Blob Store Architecture

Master Server and Volume Server

Write and Read files

Saving memory

Tiered Storage to the cloud

SeaweedFS Filer

Compared to Other File Systems

Compared to HDFS

Compared to GlusterFS, Ceph

Compared to GlusterFS

Compared to MooseFS

Compared to Ceph

Compared to MinIO

Dev Plan

Installation Guide

Disk Related Topics

Hard Drive Performance

Solid State Disk

Benchmark

Run WARP and launch a mixed benchmark.

Enterprise

License

Stargazers over time

README.md