mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-14 05:41:29 +00:00
* fix(mount): sanitize non-UTF-8 filenames; keep marshal errors per-request (#9139) A single file with invalid-UTF-8 bytes in its name (e.g. a GNOME Trash "partial" like \x10\x98=\\\x8a\x7f.trashinfo.9a51454f.partial) made every FUSE-initiated filer RPC fail with: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8 and then produced an avalanche of "connection is closing" errors on unrelated LookupEntry / ReadDirAll / UpdateEntry calls, causing the volume-server QPS dips reported in #9139. Root cause is twofold: 1. Proto3 `string` fields require valid UTF-8, but the FUSE kernel passes raw name bytes. Create/Mknod/Mkdir/Unlink/Rmdir/Rename/Lookup/Link/ Symlink all forwarded those bytes directly into CreateEntryRequest.Name, DeleteEntryRequest.Name, StreamRenameEntryRequest.{Old,New}Name and Entry.Name. saveDataAsChunk also copied the FullPath into AssignVolumeRequest.Path unchecked. 2. When the marshal failed, shouldInvalidateConnection treated the resulting codes.Internal as a connection problem and dropped the shared cached ClientConn — canceling every other in-flight RPC on it. Fix: - Add sanitizeFuseName (strings.ToValidUTF8 with '?' replacement, matching util.FullPath.DirAndName) and make checkName return the sanitized name. Apply at every FUSE entry point that passes a name to the filer RPC, including Unlink/Rmdir (which did not previously call checkName) and both oldName/newName in Rename. Add a backstop scrub for AssignVolumeRequest.Path so async flush paths cannot reintroduce invalid bytes from a pre-sanitization cached FullPath. - In weed/pb.shouldInvalidateConnection, detect client-side marshal errors via the gRPC library's "error while marshaling" prefix and return false: the connection is healthy, only the request is bad. Refs: https://github.com/seaweedfs/seaweedfs/issues/9139#issuecomment-4301184231 * fix(mount,util): use '_' for invalid-UTF-8 replacement (URL-safe) Sanitized filenames flow downstream into HTTP URLs (volume-server uploads, filer HTTP API, S3/WebDAV gateways). '?' is the URL query-string delimiter and would split the path the first time the name lands in one, so swap every invalid-UTF-8 replacement to '_'. This covers the two pre-existing sites in weed/util/fullpath.go as well, keeping all paths sanitized the same way. * refactor(pb): detect client-side marshal errors via errors.As, not substring Replace the raw `strings.Contains(err.Error(), ...)` check with a type-based carve-out: use errors.As against the `GRPCStatus() *Status` interface to pull the original Status out of any fmt.Errorf("...: %w") wrapping, then match the library-owned "grpc:" prefix on that Status's Message. Why not errors.Is against a proto-level sentinel: gRPC's encode() collapses the inner proto error with "%v" (stringification) before wrapping it in a Status, so the original error type does not survive into the caller. The Status itself is the structural signal that does survive. Why not status.FromError: when the caller wraps the Status error with fmt.Errorf("...: %w", ...), status.FromError rewrites Status.Message with the full err.Error() of the outermost wrapper, which defeats a prefix check on the library-owned message. errors.As gives us the original Status whose Message is still verbatim from the gRPC library. A new test asserts that a plain errors.New("grpc: error while marshaling: …") — i.e. the same text attached to something that is NOT a gRPC status — does not short-circuit invalidation, so we never silently keep a cached connection alive based on a coincidental substring match. * refactor(util): centralize UTF-8 sanitization; add FullPath.Sanitized Addresses review feedback on PR #9207. Nitpick: every invalid-UTF-8 replacement across the codebase (DirAndName, Name, mount.sanitizeFuseName, the weedfs_write.go backstop) now goes through a single util.SanitizeUTF8Name helper, so the replacement char ('_' — URL-safe) is chosen in one place. Outside-diff: three proto fields took raw FullPath strings that could break marshaling if an entry ever carried invalid UTF-8 (CreateEntryRequest.Directory in Mkdir, DeleteEntryRequest.Directory in Unlink, AssignVolumeRequest.Path in command_fs_merge_volumes). The reviewer's suggested fix — using DirAndName() — would have silently changed Directory from parent to grandparent, because DirAndName sanitizes only the trailing component. Added FullPath.Sanitized(), which scrubs every component, and applied it at the three sites. Exposure is narrow in practice (FUSE-boundary sanitization and the gRPC-side isClientSideMarshalError carve-out already cover the #9139 cascade), but the defense-in-depth is cheap and consistent with the existing AssignVolume backstop. New tests in weed/util/fullpath_test.go document: - SanitizeUTF8Name: valid UTF-8 passes through unchanged; invalid bytes become '_' (not '?', which is URL-special). - FullPath.Sanitized: scrubs bytes in any component, not just the last. - FullPath.DirAndName: dir remains raw on purpose — callers needing a clean full path must use Sanitized(). The test pins this behavior so it is not accidentally "fixed" in a way that changes the (dir, name) semantics callers depend on.
164 lines
4.9 KiB
Go
164 lines
4.9 KiB
Go
package util
|
|
|
|
import (
|
|
"path"
|
|
"path/filepath"
|
|
"strings"
|
|
"unicode/utf8"
|
|
)
|
|
|
|
type FullPath string
|
|
|
|
// invalidUTF8Replacement is the single-byte replacement used everywhere a name
|
|
// or path from an untrusted source (kernel FUSE input, external clients, store
|
|
// imports) may contain bytes that are not valid UTF-8. Proto3 `string` fields
|
|
// require valid UTF-8, so any such bytes must be substituted before the value
|
|
// enters a gRPC request; otherwise marshaling fails for the whole RPC.
|
|
//
|
|
// '_' is URL-safe: these sanitized strings also flow into HTTP URLs
|
|
// (volume-server uploads, filer HTTP API, S3/WebDAV gateways). Using '?'
|
|
// would cause it to be interpreted as the query-string delimiter the first
|
|
// time the name lands in a URL and split the path.
|
|
const invalidUTF8Replacement = "_"
|
|
|
|
// SanitizeUTF8Name replaces every invalid-UTF-8 byte in s with
|
|
// invalidUTF8Replacement. For the common, valid-UTF-8 case the input is
|
|
// returned unchanged with no allocation. Use this for any byte sequence
|
|
// that will be assigned to a proto string field (names, paths) from an
|
|
// untrusted source; centralising the replacement keeps the chosen character
|
|
// consistent across the codebase.
|
|
func SanitizeUTF8Name(s string) string {
|
|
if utf8.ValidString(s) {
|
|
return s
|
|
}
|
|
return strings.ToValidUTF8(s, invalidUTF8Replacement)
|
|
}
|
|
|
|
func NewFullPath(dir, name string) FullPath {
|
|
name = strings.TrimSuffix(name, "/")
|
|
return FullPath(dir).Child(name)
|
|
}
|
|
|
|
func (fp FullPath) DirAndName() (string, string) {
|
|
dir, name := filepath.Split(string(fp))
|
|
name = SanitizeUTF8Name(name)
|
|
if dir == "/" {
|
|
return dir, name
|
|
}
|
|
if len(dir) < 1 {
|
|
return "/", ""
|
|
}
|
|
return dir[:len(dir)-1], name
|
|
}
|
|
|
|
// Name returns the last path component, with any invalid-UTF-8 bytes replaced
|
|
// via SanitizeUTF8Name so the result is always safe to place in a proto
|
|
// string field or HTTP URL.
|
|
func (fp FullPath) Name() string {
|
|
_, name := filepath.Split(string(fp))
|
|
return SanitizeUTF8Name(name)
|
|
}
|
|
|
|
// Sanitized returns the full path with every invalid-UTF-8 byte — in any
|
|
// component, not just the last — replaced via SanitizeUTF8Name. Use this
|
|
// before assigning the path to a proto string field (e.g. Directory,
|
|
// AssignVolumeRequest.Path) when the path may have been produced from
|
|
// sources that do not enforce UTF-8 (cache populated from an external
|
|
// store, legacy metadata, shell traversals of existing filer entries).
|
|
func (fp FullPath) Sanitized() string {
|
|
return SanitizeUTF8Name(string(fp))
|
|
}
|
|
|
|
func (fp FullPath) IsLongerFileName(maxFilenameLength uint32) bool {
|
|
if maxFilenameLength == 0 {
|
|
return false
|
|
}
|
|
return uint32(len([]byte(fp.Name()))) > maxFilenameLength
|
|
}
|
|
|
|
func (fp FullPath) Child(name string) FullPath {
|
|
dir := string(fp)
|
|
noPrefix := name
|
|
if strings.HasPrefix(name, "/") {
|
|
noPrefix = name[1:]
|
|
}
|
|
if strings.HasSuffix(dir, "/") {
|
|
return FullPath(dir + noPrefix)
|
|
}
|
|
return FullPath(dir + "/" + noPrefix)
|
|
}
|
|
|
|
// AsInode an in-memory only inode representation
|
|
func (fp FullPath) AsInode(unixTime int64) uint64 {
|
|
inode := uint64(HashStringToLong(string(fp)))
|
|
inode = inode + uint64(unixTime)*37
|
|
return inode
|
|
}
|
|
|
|
// split, but skipping the root
|
|
func (fp FullPath) Split() []string {
|
|
if fp == "" || fp == "/" {
|
|
return []string{}
|
|
}
|
|
return strings.Split(string(fp)[1:], "/")
|
|
}
|
|
|
|
func Join(names ...string) string {
|
|
return filepath.ToSlash(filepath.Join(names...))
|
|
}
|
|
|
|
func JoinPath(names ...string) FullPath {
|
|
return FullPath(Join(names...))
|
|
}
|
|
|
|
func (fp FullPath) IsUnder(other FullPath) bool {
|
|
if other == "/" {
|
|
return true
|
|
}
|
|
return strings.HasPrefix(string(fp), string(other)+"/")
|
|
}
|
|
|
|
// IsEqualOrUnder reports whether candidate is equal to or a descendant of
|
|
// other using proper directory boundaries (not a plain string prefix check).
|
|
// Empty strings always return false.
|
|
func IsEqualOrUnder(candidate, other string) bool {
|
|
candidatePath := NormalizePath(candidate)
|
|
otherPath := NormalizePath(other)
|
|
if candidatePath == "" || otherPath == "" {
|
|
return false
|
|
}
|
|
return candidatePath == otherPath || candidatePath.IsUnder(otherPath)
|
|
}
|
|
|
|
// NormalizePath trims a trailing slash and returns a FullPath.
|
|
// Empty input returns "" (callers should treat this as "no path").
|
|
func NormalizePath(p string) FullPath {
|
|
if p == "" {
|
|
return ""
|
|
}
|
|
trimmed := strings.TrimSuffix(p, "/")
|
|
if trimmed == "" {
|
|
return "/"
|
|
}
|
|
return FullPath(trimmed)
|
|
}
|
|
|
|
func StringSplit(separatedValues string, sep string) []string {
|
|
if separatedValues == "" {
|
|
return nil
|
|
}
|
|
return strings.Split(separatedValues, sep)
|
|
}
|
|
|
|
// CleanWindowsPath normalizes Windows-style backslashes to forward slashes.
|
|
// This handles paths from Windows clients where paths use backslashes.
|
|
func CleanWindowsPath(p string) string {
|
|
return strings.ReplaceAll(p, "\\", "/")
|
|
}
|
|
|
|
// CleanWindowsPathBase normalizes Windows-style backslashes to forward slashes
|
|
// and returns the base name of the path.
|
|
func CleanWindowsPathBase(p string) string {
|
|
return path.Base(strings.ReplaceAll(p, "\\", "/"))
|
|
}
|