mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-22 17:51:30 +00:00
* feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin S3 and filer already use a refreshing pemfile provider for their HTTPS cert, so rotated certificates (e.g. from k8s cert-manager) are picked up without a restart. Master, volume, webdav, and admin, however, passed cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at startup — rotating those certs required a pod restart. Add a small helper NewReloadingServerCertificate in weed/security that wraps pemfile.Provider and returns a tls.Config.GetCertificate closure, then wire it into the four remaining HTTPS entry points. httpdown now also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates but CertFile/KeyFile are empty, so volume server can pre-populate TLSConfig. A unit test exercises the rotation path (write cert, rotate on disk, assert the callback returns the new cert) with a short refresh window. * refactor(security): route filer/s3 HTTPS through the shared cert reloader Before: filer.go and s3.go each kept a *certprovider.Provider on the options struct plus a duplicated GetCertificateWithUpdate method. Both were loading pemfile themselves. Behaviorally they already reloaded, but the logic was duplicated two ways and neither path was shared with the newly-added master/volume/webdav/admin wiring. After: both use security.NewReloadingServerCertificate like the other servers. The per-struct certProvider field and GetCertificateWithUpdate method are removed, along with the now-unused certprovider and pemfile imports. Net: -32 lines, one code path for all HTTPS cert reloading. No behavior change — the refresh window, cache, and handshake contract are identical (the helper wraps the same pemfile.NewProvider). * feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc The HTTP client in weed/util/http/client loaded the mTLS client cert once at startup via tls.LoadX509KeyPair. That left every long-lived HTTPS client process (weed mount, backup, filer.copy, filer→volume, s3→filer/volume) unable to pick up a rotated client cert without a restart — even though the same cert-manager setup was already rotating the server side fine. Swap the client cert loader for a tls.Config.GetClientCertificate callback backed by the same refreshing pemfile provider. New TLS handshakes pick up the rotated cert; in-flight pooled connections keep their old cert and drop as normal transport churn happens. To keep this reusable from both server and client TLS code without an import cycle (weed/security already imports weed/util/http/client for LoadHTTPClientFromFile), extract the pemfile wrapper into a new weed/security/certreload subpackage. weed/security keeps its thin NewReloadingServerCertificate wrapper. The existing unit test moves with the implementation. gRPC mTLS was already handled by security.LoadServerTLS / LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ agent, Kafka gateway, and FUSE mount control plane are gRPC-only and therefore already rotate. CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted as a known limitation in the wiki. * fix(security): address PR review feedback on cert reloader Bots (gemini-code-assist + coderabbit) flagged three real issues and a couple of nits. Addressing them here: 1. KeyMaterial used context.Background(). The grpc pemfile provider's KeyMaterial blocks until material arrives or the context deadline expires; with Background() a slow disk could hang the TLS handshake indefinitely. Switched both the server and client callbacks to use hello.Context() / cri.Context() so a stuck read is bounded by the handshake timeout. 2. Admin server loaded TLS inside the serve goroutine. If the cert was bad, the goroutine returned but startAdminServer kept blocking on <-ctx.Done() with no listener, making the process look healthy with nothing bound. Moved TLS setup to run before the goroutine starts and propagate errors via fmt.Errorf; also captures the provider and defers Close(). 3. HTTP client discarded the certprovider.Provider from NewClientGetCertificate. That leaked the refresh goroutine, and NewHttpClientWithTLS had a worse case where a CA-file failure after provider creation orphaned the provider entirely. Added a certProvider field and a Close() method on HTTPClient, and made the constructors close the provider on subsequent error paths. 4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain the provider. filer and webdav run ServeTLS synchronously, so a plain defer works. master/volume/s3 dispatch goroutines and return while the server keeps running, so they hook Close() into grace.OnInterrupt. 5. Test: certreload_test now tolerates transient read/parse errors during file rotation (writeSelfSigned rewrites cert before key) and reports the last error only if the deadline expires. No user-visible behavior change for the happy path. * test(tls): add end-to-end HTTPS cert rotation integration test Boots a real `weed master` with HTTPS enabled, captures the leaf cert served at TLS handshake time, atomically rewrites the cert/key files on disk (the same rename-in-place pattern kubelet does when it swaps a cert-manager Secret), and asserts that a subsequent TLS handshake observes the rotated leaf — with no process restart, no SIGHUP, no reloader sidecar. Verifies the full path: on-disk change → pemfile refresh tick → provider.KeyMaterial → tls.Config.GetCertificate → server TLS handshake. Runtime is ~1s by exposing the reloader's refresh window as an env var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the test. The same env var is user-facing — documented in the wiki — so operators running short-lived certs (Vault, cert-manager with duration: 24h, etc.) can tighten the rotation-pickup window without a rebuild. Defaults to 5h to preserve prior behavior. security.CredRefreshingInterval is kept for API compatibility but now aliases certreload.DefaultRefreshInterval so the same env controls both gRPC mTLS and HTTPS reload. * ci(tls): wire the TLS rotation integration test into GitHub Actions Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner, Go 1.25, build weed, run `go test` in test/tls_rotation, upload master logs on failure. 10-minute job timeout; the test itself finishes in about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms inside the test. Runs on every push to master and on every PR to master. * fix(tls): address follow-up PR review comments Three new comments on the integration test + volume shutdown path: 1. Test: peekServerCert was swallowing every dial/handshake error, which meant waitForCert's "last err: <nil>" fatal message lost all diagnostic value. Thread errors back through: peekServerCert now returns (*x509.Certificate, error), and waitForCert records the latest error so a CI flake points at the actual cause (master didn't come up, handshake rejected, CA pool mismatch, etc.). 2. Test: set HOME=<tempdir> on the master subprocess. Viper today registers the literal path "$HOME/.seaweedfs" without env expansion, so a developer's ~/.seaweedfs/security.toml is accidentally invisible — the test was relying on that. Pinning HOME is belt-and-braces against a future viper upgrade that does expand env vars. 3. volume.go: startClusterHttpService's provider close was registered via grace.OnInterrupt, which fires on SIGTERM but NOT on the v.shutdownCtx.Done() path used by mini / integration tests. The pemfile refresh goroutine leaked in that shutdown path. Now the helper returns a close func and the caller invokes it on BOTH shutdown paths for parity. Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the ast-grep static-analysis nit — zero-risk since the pool only trusts our in-memory CA. Test runs clean 3/3.
332 lines
11 KiB
Go
332 lines
11 KiB
Go
package security
|
|
|
|
import (
|
|
"context"
|
|
"crypto/tls"
|
|
"crypto/x509"
|
|
"fmt"
|
|
"net"
|
|
"os"
|
|
"path/filepath"
|
|
"slices"
|
|
"strings"
|
|
|
|
"github.com/spf13/viper"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/glog"
|
|
"github.com/seaweedfs/seaweedfs/weed/security/certreload"
|
|
"github.com/seaweedfs/seaweedfs/weed/util"
|
|
util_http_client "github.com/seaweedfs/seaweedfs/weed/util/http/client"
|
|
"google.golang.org/grpc"
|
|
"google.golang.org/grpc/credentials"
|
|
"google.golang.org/grpc/credentials/insecure"
|
|
"google.golang.org/grpc/credentials/tls/certprovider/pemfile"
|
|
"google.golang.org/grpc/security/advancedtls"
|
|
)
|
|
|
|
// CredRefreshingInterval is the refresh cadence for gRPC mTLS certs.
|
|
// Shares its source of truth with certreload.DefaultRefreshInterval so
|
|
// a single WEED_TLS_CERT_REFRESH_INTERVAL env var tunes both gRPC and
|
|
// HTTPS cert reload.
|
|
var CredRefreshingInterval = certreload.DefaultRefreshInterval
|
|
|
|
type Authenticator struct {
|
|
AllowedWildcardDomain string
|
|
AllowedCommonNames map[string]bool
|
|
}
|
|
|
|
// SNIStrippingTransportCredentials wraps another TransportCredentials
|
|
// and strips the port from the authority in ClientHandshake to prevent
|
|
// advancedtls from using the full "host:port" as ServerName in SNI.
|
|
type SNIStrippingTransportCredentials struct {
|
|
creds credentials.TransportCredentials
|
|
}
|
|
|
|
func (s *SNIStrippingTransportCredentials) ClientHandshake(ctx context.Context, authority string, rawConn net.Conn) (net.Conn, credentials.AuthInfo, error) {
|
|
host, _, err := net.SplitHostPort(authority)
|
|
if err == nil {
|
|
authority = host
|
|
}
|
|
return s.creds.ClientHandshake(ctx, authority, rawConn)
|
|
}
|
|
|
|
func (s *SNIStrippingTransportCredentials) ServerHandshake(rawConn net.Conn) (net.Conn, credentials.AuthInfo, error) {
|
|
return s.creds.ServerHandshake(rawConn)
|
|
}
|
|
|
|
func (s *SNIStrippingTransportCredentials) Info() credentials.ProtocolInfo {
|
|
return s.creds.Info()
|
|
}
|
|
|
|
func (s *SNIStrippingTransportCredentials) Clone() credentials.TransportCredentials {
|
|
return &SNIStrippingTransportCredentials{creds: s.creds.Clone()}
|
|
}
|
|
|
|
func (s *SNIStrippingTransportCredentials) OverrideServerName(serverNameOverride string) error {
|
|
return s.creds.OverrideServerName(serverNameOverride)
|
|
}
|
|
|
|
func LoadServerTLS(config *util.ViperProxy, component string) (grpc.ServerOption, grpc.ServerOption) {
|
|
if config == nil {
|
|
return nil, nil
|
|
}
|
|
|
|
serverOptions := pemfile.Options{
|
|
CertFile: config.GetString(component + ".cert"),
|
|
KeyFile: config.GetString(component + ".key"),
|
|
RefreshDuration: CredRefreshingInterval,
|
|
}
|
|
if serverOptions.CertFile == "" || serverOptions.KeyFile == "" {
|
|
return nil, nil
|
|
}
|
|
|
|
serverIdentityProvider, err := pemfile.NewProvider(serverOptions)
|
|
if err != nil {
|
|
glog.Warningf("pemfile.NewProvider(%v) %v failed: %v", serverOptions, component, err)
|
|
return nil, nil
|
|
}
|
|
|
|
serverRootOptions := pemfile.Options{
|
|
RootFile: config.GetString("grpc.ca"),
|
|
RefreshDuration: CredRefreshingInterval,
|
|
}
|
|
serverRootProvider, err := pemfile.NewProvider(serverRootOptions)
|
|
if err != nil {
|
|
glog.Warningf("pemfile.NewProvider(%v) failed: %v", serverRootOptions, err)
|
|
return nil, nil
|
|
}
|
|
|
|
// Start a server and create a client using advancedtls API with Provider.
|
|
options := &advancedtls.Options{
|
|
IdentityOptions: advancedtls.IdentityCertificateOptions{
|
|
IdentityProvider: serverIdentityProvider,
|
|
},
|
|
RootOptions: advancedtls.RootCertificateOptions{
|
|
RootProvider: serverRootProvider,
|
|
},
|
|
RequireClientCert: true,
|
|
VerificationType: advancedtls.CertVerification,
|
|
}
|
|
options.MinTLSVersion, err = TlsVersionByName(config.GetString("tls.min_version"))
|
|
if err != nil {
|
|
glog.Warningf("tls min version parse failed, %v", err)
|
|
return nil, nil
|
|
}
|
|
options.MaxTLSVersion, err = TlsVersionByName(config.GetString("tls.max_version"))
|
|
if err != nil {
|
|
glog.Warningf("tls max version parse failed, %v", err)
|
|
return nil, nil
|
|
}
|
|
options.CipherSuites, err = TlsCipherSuiteByNames(config.GetString("tls.cipher_suites"))
|
|
if err != nil {
|
|
glog.Warningf("tls cipher suite parse failed, %v", err)
|
|
return nil, nil
|
|
}
|
|
allowedCommonNames := config.GetString(component + ".allowed_commonNames")
|
|
allowedWildcardDomain := config.GetString("grpc.allowed_wildcard_domain")
|
|
if allowedCommonNames != "" || allowedWildcardDomain != "" {
|
|
allowedCommonNamesMap := make(map[string]bool)
|
|
for _, s := range strings.Split(allowedCommonNames, ",") {
|
|
allowedCommonNamesMap[s] = true
|
|
}
|
|
auther := Authenticator{
|
|
AllowedCommonNames: allowedCommonNamesMap,
|
|
AllowedWildcardDomain: allowedWildcardDomain,
|
|
}
|
|
options.AdditionalPeerVerification = auther.Authenticate
|
|
} else {
|
|
options.AdditionalPeerVerification = func(params *advancedtls.HandshakeVerificationInfo) (*advancedtls.PostHandshakeVerificationResults, error) {
|
|
return &advancedtls.PostHandshakeVerificationResults{}, nil
|
|
}
|
|
}
|
|
ta, err := advancedtls.NewServerCreds(options)
|
|
if err != nil {
|
|
glog.Warningf("advancedtls.NewServerCreds(%v) failed: %v", options, err)
|
|
return nil, nil
|
|
}
|
|
return grpc.Creds(ta), nil
|
|
}
|
|
|
|
func LoadClientTLSFromFile(configFile string, component string) (grpc.DialOption, error) {
|
|
v := viper.New()
|
|
v.SetConfigFile(configFile)
|
|
if err := v.ReadInConfig(); err != nil {
|
|
return nil, fmt.Errorf("failed to read security config %s: %v", configFile, err)
|
|
}
|
|
// Resolve relative PEM paths against the config file's directory.
|
|
configDir := filepath.Dir(configFile)
|
|
for _, key := range []string{"grpc.ca", component + ".cert", component + ".key"} {
|
|
p := v.GetString(key)
|
|
if p != "" && !filepath.IsAbs(p) {
|
|
v.Set(key, filepath.Join(configDir, p))
|
|
}
|
|
}
|
|
return LoadClientTLS(&util.ViperProxy{Viper: v}, component), nil
|
|
}
|
|
|
|
func LoadClientTLS(config *util.ViperProxy, component string) grpc.DialOption {
|
|
if config == nil {
|
|
return grpc.WithTransportCredentials(insecure.NewCredentials())
|
|
}
|
|
|
|
certFileName, keyFileName, caFileName := config.GetString(component+".cert"), config.GetString(component+".key"), config.GetString("grpc.ca")
|
|
if certFileName == "" || keyFileName == "" || caFileName == "" {
|
|
return grpc.WithTransportCredentials(insecure.NewCredentials())
|
|
}
|
|
|
|
clientOptions := pemfile.Options{
|
|
CertFile: certFileName,
|
|
KeyFile: keyFileName,
|
|
RefreshDuration: CredRefreshingInterval,
|
|
}
|
|
clientProvider, err := pemfile.NewProvider(clientOptions)
|
|
if err != nil {
|
|
glog.Warningf("pemfile.NewProvider(%v) failed %v", clientOptions, err)
|
|
return grpc.WithTransportCredentials(insecure.NewCredentials())
|
|
}
|
|
clientRootOptions := pemfile.Options{
|
|
RootFile: config.GetString("grpc.ca"),
|
|
RefreshDuration: CredRefreshingInterval,
|
|
}
|
|
clientRootProvider, err := pemfile.NewProvider(clientRootOptions)
|
|
if err != nil {
|
|
glog.Warningf("pemfile.NewProvider(%v) failed: %v", clientRootOptions, err)
|
|
return grpc.WithTransportCredentials(insecure.NewCredentials())
|
|
}
|
|
options := &advancedtls.Options{
|
|
IdentityOptions: advancedtls.IdentityCertificateOptions{
|
|
IdentityProvider: clientProvider,
|
|
},
|
|
AdditionalPeerVerification: func(params *advancedtls.HandshakeVerificationInfo) (*advancedtls.PostHandshakeVerificationResults, error) {
|
|
return &advancedtls.PostHandshakeVerificationResults{}, nil
|
|
},
|
|
RootOptions: advancedtls.RootCertificateOptions{
|
|
RootProvider: clientRootProvider,
|
|
},
|
|
VerificationType: advancedtls.CertVerification,
|
|
}
|
|
ta, err := advancedtls.NewClientCreds(options)
|
|
if err != nil {
|
|
glog.Warningf("advancedtls.NewClientCreds(%v) failed: %v", options, err)
|
|
return grpc.WithTransportCredentials(insecure.NewCredentials())
|
|
}
|
|
wrapped := &SNIStrippingTransportCredentials{creds: ta}
|
|
return grpc.WithTransportCredentials(wrapped)
|
|
}
|
|
|
|
// LoadHTTPClientFromFile creates an HTTP client using the https.client TLS
|
|
// settings from the given security config file. Returns nil if HTTPS is not
|
|
// enabled in the config. This is used by filer.sync to create per-cluster
|
|
// HTTP clients when clusters use different certificates.
|
|
func LoadHTTPClientFromFile(configFile string) (*util_http_client.HTTPClient, error) {
|
|
v := viper.New()
|
|
v.SetConfigFile(configFile)
|
|
if err := v.ReadInConfig(); err != nil {
|
|
return nil, fmt.Errorf("failed to read security config %s: %v", configFile, err)
|
|
}
|
|
|
|
if !v.GetBool("https.client.enabled") {
|
|
return nil, nil
|
|
}
|
|
|
|
configDir := filepath.Dir(configFile)
|
|
resolvePath := func(key string) string {
|
|
p := v.GetString(key)
|
|
if p != "" && !filepath.IsAbs(p) {
|
|
return filepath.Join(configDir, p)
|
|
}
|
|
return p
|
|
}
|
|
|
|
return util_http_client.NewHttpClientWithTLS(
|
|
resolvePath("https.client.cert"),
|
|
resolvePath("https.client.key"),
|
|
resolvePath("https.client.ca"),
|
|
v.GetBool("https.client.insecure_skip_verify"),
|
|
util_http_client.AddDialContext,
|
|
)
|
|
}
|
|
|
|
func LoadClientTLSHTTP(clientCertFile string) *tls.Config {
|
|
clientCerts, err := os.ReadFile(clientCertFile)
|
|
if err != nil {
|
|
glog.Fatal(err)
|
|
}
|
|
certPool := x509.NewCertPool()
|
|
ok := certPool.AppendCertsFromPEM(clientCerts)
|
|
if !ok {
|
|
glog.Fatalf("Error processing client certificate in %s\n", clientCertFile)
|
|
}
|
|
|
|
return &tls.Config{
|
|
ClientCAs: certPool,
|
|
ClientAuth: tls.RequireAndVerifyClientCert,
|
|
}
|
|
}
|
|
|
|
func (a Authenticator) Authenticate(params *advancedtls.HandshakeVerificationInfo) (*advancedtls.PostHandshakeVerificationResults, error) {
|
|
if a.AllowedWildcardDomain != "" && strings.HasSuffix(params.Leaf.Subject.CommonName, a.AllowedWildcardDomain) {
|
|
return &advancedtls.PostHandshakeVerificationResults{}, nil
|
|
}
|
|
if _, ok := a.AllowedCommonNames[params.Leaf.Subject.CommonName]; ok {
|
|
return &advancedtls.PostHandshakeVerificationResults{}, nil
|
|
}
|
|
err := fmt.Errorf("Authenticate: invalid subject client common name: %s", params.Leaf.Subject.CommonName)
|
|
glog.Error(err)
|
|
return nil, err
|
|
}
|
|
|
|
func FixTlsConfig(viper *util.ViperProxy, config *tls.Config) error {
|
|
var err error
|
|
config.MinVersion, err = TlsVersionByName(viper.GetString("tls.min_version"))
|
|
if err != nil {
|
|
return err
|
|
}
|
|
config.MaxVersion, err = TlsVersionByName(viper.GetString("tls.max_version"))
|
|
if err != nil {
|
|
return err
|
|
}
|
|
config.CipherSuites, err = TlsCipherSuiteByNames(viper.GetString("tls.cipher_suites"))
|
|
return err
|
|
}
|
|
|
|
func TlsVersionByName(name string) (uint16, error) {
|
|
switch name {
|
|
case "":
|
|
return 0, nil
|
|
case "SSLv3":
|
|
return tls.VersionSSL30, nil
|
|
case "TLS 1.0":
|
|
return tls.VersionTLS10, nil
|
|
case "TLS 1.1":
|
|
return tls.VersionTLS11, nil
|
|
case "TLS 1.2":
|
|
return tls.VersionTLS12, nil
|
|
case "TLS 1.3":
|
|
return tls.VersionTLS13, nil
|
|
default:
|
|
return 0, fmt.Errorf("invalid tls version %s", name)
|
|
}
|
|
}
|
|
|
|
func TlsCipherSuiteByNames(cipherSuiteNames string) ([]uint16, error) {
|
|
cipherSuiteNames = strings.TrimSpace(cipherSuiteNames)
|
|
if cipherSuiteNames == "" {
|
|
return nil, nil
|
|
}
|
|
names := strings.Split(cipherSuiteNames, ",")
|
|
cipherSuites := tls.CipherSuites()
|
|
cipherIds := make([]uint16, 0, len(names))
|
|
for _, name := range names {
|
|
name = strings.TrimSpace(name)
|
|
index := slices.IndexFunc(cipherSuites, func(suite *tls.CipherSuite) bool {
|
|
return name == suite.Name
|
|
})
|
|
if index == -1 {
|
|
return nil, fmt.Errorf("invalid tls cipher suite name %s", name)
|
|
}
|
|
cipherIds = append(cipherIds, cipherSuites[index].ID)
|
|
}
|
|
return cipherIds, nil
|
|
}
|