mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-21 17:21:34 +00:00
* feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin S3 and filer already use a refreshing pemfile provider for their HTTPS cert, so rotated certificates (e.g. from k8s cert-manager) are picked up without a restart. Master, volume, webdav, and admin, however, passed cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at startup — rotating those certs required a pod restart. Add a small helper NewReloadingServerCertificate in weed/security that wraps pemfile.Provider and returns a tls.Config.GetCertificate closure, then wire it into the four remaining HTTPS entry points. httpdown now also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates but CertFile/KeyFile are empty, so volume server can pre-populate TLSConfig. A unit test exercises the rotation path (write cert, rotate on disk, assert the callback returns the new cert) with a short refresh window. * refactor(security): route filer/s3 HTTPS through the shared cert reloader Before: filer.go and s3.go each kept a *certprovider.Provider on the options struct plus a duplicated GetCertificateWithUpdate method. Both were loading pemfile themselves. Behaviorally they already reloaded, but the logic was duplicated two ways and neither path was shared with the newly-added master/volume/webdav/admin wiring. After: both use security.NewReloadingServerCertificate like the other servers. The per-struct certProvider field and GetCertificateWithUpdate method are removed, along with the now-unused certprovider and pemfile imports. Net: -32 lines, one code path for all HTTPS cert reloading. No behavior change — the refresh window, cache, and handshake contract are identical (the helper wraps the same pemfile.NewProvider). * feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc The HTTP client in weed/util/http/client loaded the mTLS client cert once at startup via tls.LoadX509KeyPair. That left every long-lived HTTPS client process (weed mount, backup, filer.copy, filer→volume, s3→filer/volume) unable to pick up a rotated client cert without a restart — even though the same cert-manager setup was already rotating the server side fine. Swap the client cert loader for a tls.Config.GetClientCertificate callback backed by the same refreshing pemfile provider. New TLS handshakes pick up the rotated cert; in-flight pooled connections keep their old cert and drop as normal transport churn happens. To keep this reusable from both server and client TLS code without an import cycle (weed/security already imports weed/util/http/client for LoadHTTPClientFromFile), extract the pemfile wrapper into a new weed/security/certreload subpackage. weed/security keeps its thin NewReloadingServerCertificate wrapper. The existing unit test moves with the implementation. gRPC mTLS was already handled by security.LoadServerTLS / LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ agent, Kafka gateway, and FUSE mount control plane are gRPC-only and therefore already rotate. CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted as a known limitation in the wiki. * fix(security): address PR review feedback on cert reloader Bots (gemini-code-assist + coderabbit) flagged three real issues and a couple of nits. Addressing them here: 1. KeyMaterial used context.Background(). The grpc pemfile provider's KeyMaterial blocks until material arrives or the context deadline expires; with Background() a slow disk could hang the TLS handshake indefinitely. Switched both the server and client callbacks to use hello.Context() / cri.Context() so a stuck read is bounded by the handshake timeout. 2. Admin server loaded TLS inside the serve goroutine. If the cert was bad, the goroutine returned but startAdminServer kept blocking on <-ctx.Done() with no listener, making the process look healthy with nothing bound. Moved TLS setup to run before the goroutine starts and propagate errors via fmt.Errorf; also captures the provider and defers Close(). 3. HTTP client discarded the certprovider.Provider from NewClientGetCertificate. That leaked the refresh goroutine, and NewHttpClientWithTLS had a worse case where a CA-file failure after provider creation orphaned the provider entirely. Added a certProvider field and a Close() method on HTTPClient, and made the constructors close the provider on subsequent error paths. 4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain the provider. filer and webdav run ServeTLS synchronously, so a plain defer works. master/volume/s3 dispatch goroutines and return while the server keeps running, so they hook Close() into grace.OnInterrupt. 5. Test: certreload_test now tolerates transient read/parse errors during file rotation (writeSelfSigned rewrites cert before key) and reports the last error only if the deadline expires. No user-visible behavior change for the happy path. * test(tls): add end-to-end HTTPS cert rotation integration test Boots a real `weed master` with HTTPS enabled, captures the leaf cert served at TLS handshake time, atomically rewrites the cert/key files on disk (the same rename-in-place pattern kubelet does when it swaps a cert-manager Secret), and asserts that a subsequent TLS handshake observes the rotated leaf — with no process restart, no SIGHUP, no reloader sidecar. Verifies the full path: on-disk change → pemfile refresh tick → provider.KeyMaterial → tls.Config.GetCertificate → server TLS handshake. Runtime is ~1s by exposing the reloader's refresh window as an env var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the test. The same env var is user-facing — documented in the wiki — so operators running short-lived certs (Vault, cert-manager with duration: 24h, etc.) can tighten the rotation-pickup window without a rebuild. Defaults to 5h to preserve prior behavior. security.CredRefreshingInterval is kept for API compatibility but now aliases certreload.DefaultRefreshInterval so the same env controls both gRPC mTLS and HTTPS reload. * ci(tls): wire the TLS rotation integration test into GitHub Actions Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner, Go 1.25, build weed, run `go test` in test/tls_rotation, upload master logs on failure. 10-minute job timeout; the test itself finishes in about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms inside the test. Runs on every push to master and on every PR to master. * fix(tls): address follow-up PR review comments Three new comments on the integration test + volume shutdown path: 1. Test: peekServerCert was swallowing every dial/handshake error, which meant waitForCert's "last err: <nil>" fatal message lost all diagnostic value. Thread errors back through: peekServerCert now returns (*x509.Certificate, error), and waitForCert records the latest error so a CI flake points at the actual cause (master didn't come up, handshake rejected, CA pool mismatch, etc.). 2. Test: set HOME=<tempdir> on the master subprocess. Viper today registers the literal path "$HOME/.seaweedfs" without env expansion, so a developer's ~/.seaweedfs/security.toml is accidentally invisible — the test was relying on that. Pinning HOME is belt-and-braces against a future viper upgrade that does expand env vars. 3. volume.go: startClusterHttpService's provider close was registered via grace.OnInterrupt, which fires on SIGTERM but NOT on the v.shutdownCtx.Done() path used by mini / integration tests. The pemfile refresh goroutine leaked in that shutdown path. Now the helper returns a close func and the caller invokes it on BOTH shutdown paths for parity. Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the ast-grep static-analysis nit — zero-risk since the pool only trusts our in-memory CA. Test runs clean 3/3.
173 lines
5.9 KiB
Go
173 lines
5.9 KiB
Go
package command
|
|
|
|
import (
|
|
"context"
|
|
"crypto/tls"
|
|
"fmt"
|
|
"net/http"
|
|
"os"
|
|
"os/user"
|
|
"strconv"
|
|
"time"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/util/version"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/glog"
|
|
"github.com/seaweedfs/seaweedfs/weed/pb"
|
|
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
|
"github.com/seaweedfs/seaweedfs/weed/security"
|
|
weed_server "github.com/seaweedfs/seaweedfs/weed/server"
|
|
"github.com/seaweedfs/seaweedfs/weed/util"
|
|
)
|
|
|
|
var (
|
|
webDavStandaloneOptions WebDavOption
|
|
)
|
|
|
|
type WebDavOption struct {
|
|
filer *string
|
|
ipBind *string
|
|
filerRootPath *string
|
|
port *int
|
|
collection *string
|
|
replication *string
|
|
disk *string
|
|
tlsPrivateKey *string
|
|
tlsCertificate *string
|
|
cacheDir *string
|
|
cacheSizeMB *int64
|
|
maxMB *int
|
|
// shutdownCtx, when non-nil, tells startWebDav to gracefully shut down the
|
|
// HTTP server once the ctx is cancelled. Used by weed mini; nil for
|
|
// standalone weed webdav.
|
|
shutdownCtx context.Context
|
|
}
|
|
|
|
func init() {
|
|
cmdWebDav.Run = runWebDav // break init cycle
|
|
webDavStandaloneOptions.filer = cmdWebDav.Flag.String("filer", "localhost:8888", "filer server address")
|
|
webDavStandaloneOptions.ipBind = cmdWebDav.Flag.String("ip.bind", "", "ip address to bind to. Default listen to all.")
|
|
webDavStandaloneOptions.port = cmdWebDav.Flag.Int("port", 7333, "webdav server http listen port")
|
|
webDavStandaloneOptions.collection = cmdWebDav.Flag.String("collection", "", "collection to create the files")
|
|
webDavStandaloneOptions.replication = cmdWebDav.Flag.String("replication", "", "replication to create the files")
|
|
webDavStandaloneOptions.disk = cmdWebDav.Flag.String("disk", "", "[hdd|ssd|<tag>] hard drive or solid state drive or any tag")
|
|
webDavStandaloneOptions.tlsPrivateKey = cmdWebDav.Flag.String("key.file", "", "path to the TLS private key file")
|
|
webDavStandaloneOptions.tlsCertificate = cmdWebDav.Flag.String("cert.file", "", "path to the TLS certificate file")
|
|
webDavStandaloneOptions.cacheDir = cmdWebDav.Flag.String("cacheDir", os.TempDir(), "local cache directory for file chunks")
|
|
webDavStandaloneOptions.cacheSizeMB = cmdWebDav.Flag.Int64("cacheCapacityMB", 0, "local cache capacity in MB")
|
|
webDavStandaloneOptions.maxMB = cmdWebDav.Flag.Int("maxMB", 4, "split files larger than the limit")
|
|
webDavStandaloneOptions.filerRootPath = cmdWebDav.Flag.String("filer.path", "/", "use this remote path from filer server")
|
|
}
|
|
|
|
var cmdWebDav = &Command{
|
|
UsageLine: "webdav -port=7333 -filer=<ip:port>",
|
|
Short: "start a webdav server that is backed by a filer",
|
|
Long: `start a webdav server that is backed by a filer.
|
|
|
|
`,
|
|
}
|
|
|
|
func runWebDav(cmd *Command, args []string) bool {
|
|
|
|
util.LoadSecurityConfiguration()
|
|
|
|
listenAddress := fmt.Sprintf("%s:%d", *webDavStandaloneOptions.ipBind, *webDavStandaloneOptions.port)
|
|
glog.V(0).Infof("Starting Seaweed WebDav Server %s at %s", version.Version(), listenAddress)
|
|
|
|
return webDavStandaloneOptions.startWebDav()
|
|
|
|
}
|
|
|
|
func (wo *WebDavOption) startWebDav() bool {
|
|
|
|
// detect current user
|
|
uid, gid := uint32(0), uint32(0)
|
|
if u, err := user.Current(); err == nil {
|
|
if parsedId, pe := strconv.ParseUint(u.Uid, 10, 32); pe == nil {
|
|
uid = uint32(parsedId)
|
|
}
|
|
if parsedId, pe := strconv.ParseUint(u.Gid, 10, 32); pe == nil {
|
|
gid = uint32(parsedId)
|
|
}
|
|
}
|
|
|
|
// parse filer grpc address
|
|
filerAddress := pb.ServerAddress(*wo.filer)
|
|
|
|
grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.client")
|
|
|
|
var cipher bool
|
|
// connect to filer
|
|
for {
|
|
err := pb.WithGrpcFilerClient(false, 0, filerAddress, grpcDialOption, func(client filer_pb.SeaweedFilerClient) error {
|
|
resp, err := client.GetFilerConfiguration(context.Background(), &filer_pb.GetFilerConfigurationRequest{})
|
|
if err != nil {
|
|
return fmt.Errorf("get filer %s configuration: %v", filerAddress, err)
|
|
}
|
|
cipher = resp.Cipher
|
|
return nil
|
|
})
|
|
if err != nil {
|
|
glog.V(2).Infof("wait to connect to filer %s grpc address %s", *wo.filer, filerAddress.ToGrpcAddress())
|
|
time.Sleep(time.Second)
|
|
} else {
|
|
glog.V(0).Infof("connected to filer %s grpc address %s", *wo.filer, filerAddress.ToGrpcAddress())
|
|
break
|
|
}
|
|
}
|
|
|
|
ws, webdavServer_err := weed_server.NewWebDavServer(&weed_server.WebDavOption{
|
|
Filer: filerAddress,
|
|
FilerRootPath: *wo.filerRootPath,
|
|
GrpcDialOption: grpcDialOption,
|
|
Collection: *wo.collection,
|
|
Replication: *wo.replication,
|
|
DiskType: *wo.disk,
|
|
Uid: uid,
|
|
Gid: gid,
|
|
Cipher: cipher,
|
|
CacheDir: util.ResolvePath(*wo.cacheDir),
|
|
CacheSizeMB: *wo.cacheSizeMB,
|
|
MaxMB: *wo.maxMB,
|
|
})
|
|
if webdavServer_err != nil {
|
|
glog.Fatalf("WebDav Server startup error: %v", webdavServer_err)
|
|
}
|
|
|
|
httpS := &http.Server{Handler: ws.Handler}
|
|
|
|
listenAddress := fmt.Sprintf("%s:%d", *wo.ipBind, *wo.port)
|
|
webDavListener, err := util.NewListener(listenAddress, time.Duration(10)*time.Second)
|
|
if err != nil {
|
|
glog.Fatalf("WebDav Server listener on %s error: %v", listenAddress, err)
|
|
}
|
|
|
|
if wo.shutdownCtx != nil {
|
|
go func() {
|
|
<-wo.shutdownCtx.Done()
|
|
httpS.Shutdown(context.Background())
|
|
}()
|
|
}
|
|
|
|
if *wo.tlsPrivateKey != "" {
|
|
glog.V(0).Infof("Start Seaweed WebDav Server %s at https %s", version.Version(), listenAddress)
|
|
getCert, certProvider, err := security.NewReloadingServerCertificate(*wo.tlsCertificate, *wo.tlsPrivateKey)
|
|
if err != nil {
|
|
glog.Fatalf("WebDav Server failed to load TLS certificate: %v", err)
|
|
}
|
|
defer certProvider.Close()
|
|
httpS.TLSConfig = &tls.Config{GetCertificate: getCert}
|
|
if err = httpS.ServeTLS(webDavListener, "", ""); err != nil && err != http.ErrServerClosed {
|
|
glog.Fatalf("WebDav Server Fail to serve: %v", err)
|
|
}
|
|
} else {
|
|
glog.V(0).Infof("Start Seaweed WebDav Server %s at http %s", version.Version(), listenAddress)
|
|
if err = httpS.Serve(webDavListener); err != nil && err != http.ErrServerClosed {
|
|
glog.Fatalf("WebDav Server Fail to serve: %v", err)
|
|
}
|
|
}
|
|
|
|
return true
|
|
|
|
}
|