Files
seaweedfs/.github/workflows/helm_ci.yml
Chris Lu 2ed95d7ea9 helm: decouple JWT signing from cert-manager mTLS (fixes #9506) (#9508)
* helm(security): decouple JWT signing from cert-manager mTLS

The filer needs jwt.filer_signing.key to register the IAM gRPC service the
Admin UI Users tab calls (PR #9442). The chart only rendered security.toml
under enableSecurity, which also pulls in cert-manager for mTLS — much heavier
than the Admin UI needs. Operators on Helm without cert-manager have no way
to flip the JWT key on, so the Users tab fails with Unimplemented after
upgrading past 4.24.

Introduce seaweedfs.securityConfigEnabled, true when enableSecurity OR any
explicit jwtSigning toggle (volumeRead/filerWrite/filerRead) is set. The
configmap renders under that helper; the [grpc.*]/[https.*] sections inside
stay gated on enableSecurity. Each pod template splits the security-config
mount onto the helper and keeps the cert volume mounts on enableSecurity.

volumeWrite is intentionally excluded from the helper trigger because it
defaults to true; including it would silently start mounting security.toml on
every fresh install. With this change, enableSecurity=false + defaults
renders nothing (unchanged), enableSecurity=true renders the full toml
(unchanged), and enableSecurity=false + filerWrite=true renders just the
[jwt.*] sections so the Admin UI works without mTLS.

Fixes #9506.

* helm(security): trim verbose comments

* helm(security): handle null securityConfig in helper

Address review feedback: (.Values.global.seaweedfs.securityConfig).jwtSigning
errored if a user explicitly set securityConfig: null in their values. Drop
into intermediate $sec/$jwt with default dict at each step so a missing or
nulled-out parent is tolerated.

* helm(ci): cover IAM gRPC decoupling (issue #9506)

Five regression assertions exercised against the rendered chart so a
future change cannot silently re-couple jwt.filer_signing to mTLS:

1. defaults render no security-config ConfigMap (preserves baseline)
2. filerWrite=true alone renders [jwt.filer_signing] with no [grpc.*]
3. filerWrite=true mounts security-config on filer + admin without
   pulling in cert volumes — the actual fix for the Admin UI Users tab
4. enableSecurity=true still produces the full toml with [grpc.master]
5. securityConfig=null and securityConfig.jwtSigning=null both render
   cleanly (gemini-code-assist review nit, applied chart-wide)

Patch a pre-existing direct-access in filer-statefulset.yaml that
crashed on securityConfig=null, surfaced by the new null assertion.

* helm(ci): drop issue numbers from comments

* helm(ci): install pyyaml; assert [jwt.signing] in mTLS path

Address coderabbit review:

- The new IAM gRPC test block uses `import yaml` but ran before the
  later `pip install pyyaml -q` step that the security+S3 block
  performs. CI happens to pass because the runner image carries
  PyYAML, but make the dependency explicit so a future runner change
  cannot silently break the regression test.

- The enableSecurity=true assertion only checked for [grpc.master].
  Also assert [jwt.signing] so a refactor that drops the volume-side
  JWT stanza from the mTLS path fails the test instead of slipping
  through.
2026-05-14 23:43:24 -07:00

456 lines
22 KiB
YAML

name: "helm: lint and test charts"
on:
push:
branches: [ master ]
paths: ['k8s/**']
pull_request:
branches: [ master ]
paths: ['k8s/**']
permissions:
contents: read
jobs:
lint-test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v5
with:
version: v3.18.4
- uses: actions/setup-python@v6
with:
python-version: '3.10'
check-latest: true
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.8.0
- name: Run chart-testing (list-changed)
id: list-changed
run: |
changed=$(ct list-changed --target-branch ${{ github.event.repository.default_branch }} --chart-dirs k8s/charts)
if [[ -n "$changed" ]]; then
echo "::set-output name=changed::true"
fi
- name: Run chart-testing (lint)
run: ct lint --target-branch ${{ github.event.repository.default_branch }} --all --validate-maintainers=false --chart-dirs k8s/charts
- name: Verify template rendering
run: |
set -e
CHART_DIR="k8s/charts/seaweedfs"
echo "=== Testing default configuration ==="
helm template test $CHART_DIR > /tmp/default.yaml
echo "✓ Default configuration renders successfully"
echo "=== Testing with S3 enabled ==="
helm template test $CHART_DIR --set s3.enabled=true > /tmp/s3.yaml
grep -q "kind: Deployment" /tmp/s3.yaml && grep -q "seaweedfs-s3" /tmp/s3.yaml
echo "✓ S3 deployment renders correctly"
echo "=== Testing with all-in-one mode ==="
helm template test $CHART_DIR --set allInOne.enabled=true > /tmp/allinone.yaml
grep -q "seaweedfs-all-in-one" /tmp/allinone.yaml
echo "✓ All-in-one deployment renders correctly"
echo "=== Testing with security enabled ==="
helm template test $CHART_DIR --set global.seaweedfs.enableSecurity=true > /tmp/security.yaml
grep -q "security-config" /tmp/security.yaml
echo "✓ Security configuration renders correctly"
echo ""
echo "=== Testing IAM gRPC opt-in path ==="
# Regression test: the filer registers the IAM gRPC service the
# Admin UI Users tab calls only when jwt.filer_signing.key is in
# security.toml. Operators must be able to enable that without
# the cert-manager mTLS bundle.
# Install PyYAML explicitly: this block runs before the later
# security+S3 block that does the same install, and we don't
# want to rely on the runner image shipping it.
pip install pyyaml -q
python3 - "$CHART_DIR" <<'PYEOF'
import subprocess, sys, yaml
chart = sys.argv[1]
def render(values):
args = ["helm", "template", "test", chart]
for k, v in values.items():
args += ["--set", f"{k}={v}"]
return subprocess.check_output(args, text=True)
def docs(manifest):
return [d for d in yaml.safe_load_all(manifest) if d]
def configmap(manifest, name):
for d in docs(manifest):
if d.get("kind") == "ConfigMap" and d["metadata"]["name"] == name:
return d
return None
def workload_mounts(manifest, name):
for d in docs(manifest):
if d.get("kind") not in ("Deployment", "StatefulSet"):
continue
if d["metadata"]["name"] != name:
continue
pod = d["spec"]["template"]["spec"]
vols = {v["name"] for v in pod.get("volumes", [])}
mounts = set()
for c in pod.get("containers", []):
for vm in c.get("volumeMounts", []):
mounts.add(vm["name"])
return vols, mounts
return None, None
failed = []
# Case 1: defaults. The chart historically rendered nothing
# security-related; preserve that so this PR is non-breaking on
# existing installs.
out = render({})
if configmap(out, "test-seaweedfs-security-config") is not None:
failed.append("defaults: security ConfigMap should not render")
else:
print("✓ defaults: no security-config ConfigMap (unchanged)")
# Case 2: filerWrite=true alone is the documented opt-in for
# the Admin UI Users tab. Configmap must render with
# [jwt.filer_signing] and NO [grpc.*] sections (cert paths
# only exist with mTLS).
out = render({
"global.seaweedfs.securityConfig.jwtSigning.filerWrite": "true",
"admin.enabled": "true",
})
cm = configmap(out, "test-seaweedfs-security-config")
if cm is None:
failed.append("filerWrite=true: security ConfigMap missing")
else:
toml = cm["data"]["security.toml"]
if "[jwt.filer_signing]" not in toml:
failed.append("filerWrite=true: security.toml missing [jwt.filer_signing]")
if "[grpc" in toml:
failed.append("filerWrite=true: security.toml unexpectedly has [grpc.*] (would need cert mounts)")
if "[jwt.filer_signing]" in toml and "[grpc" not in toml:
print("✓ filerWrite=true: security.toml has [jwt.filer_signing], no [grpc.*]")
# Case 3: filer + admin pods must MOUNT the security ConfigMap
# under filerWrite=true so the JWT key reaches both processes.
# Cert volumes must NOT be present (no mTLS).
for wl in ("test-seaweedfs-filer", "test-seaweedfs-admin"):
vols, mounts = workload_mounts(out, wl)
if vols is None:
failed.append(f"filerWrite=true: workload {wl} not found")
continue
if "security-config" not in vols or "security-config" not in mounts:
failed.append(f"filerWrite=true: {wl} does not mount security-config (IAM gRPC would still fail)")
else:
print(f"✓ filerWrite=true: {wl} mounts security-config")
cert_vols = {v for v in vols if v.endswith("-cert")}
if cert_vols:
failed.append(f"filerWrite=true: {wl} unexpectedly has cert volumes {sorted(cert_vols)}")
# Case 4: enableSecurity=true must still render the full toml
# with both [jwt.signing] and [grpc.*]. Guards against the
# decoupling change accidentally regressing the mTLS path.
out = render({"global.seaweedfs.enableSecurity": "true"})
cm = configmap(out, "test-seaweedfs-security-config")
if cm is None:
failed.append("enableSecurity=true: security ConfigMap missing")
else:
toml = cm["data"]["security.toml"]
missing = [s for s in ("[jwt.signing]", "[grpc.master]") if s not in toml]
if missing:
failed.append(f"enableSecurity=true: security.toml missing {missing}")
else:
print("✓ enableSecurity=true: security.toml has [jwt.signing] + [grpc.*] preserved")
# Case 5: helper must tolerate explicit nulls (gemini-code-assist
# PR review). securityConfig=null was the parens-pattern crash
# the helper review caught.
for null_path in ("global.seaweedfs.securityConfig",
"global.seaweedfs.securityConfig.jwtSigning"):
try:
out = render({null_path: "null"})
except subprocess.CalledProcessError as e:
failed.append(f"{null_path}=null: render failed: {e.output[:200] if e.output else e}")
continue
if configmap(out, "test-seaweedfs-security-config") is not None:
failed.append(f"{null_path}=null: should not render configmap")
else:
print(f"✓ {null_path}=null: render tolerates explicit null")
if failed:
print("\nFAIL:", file=sys.stderr)
for f in failed:
print(f" - {f}", file=sys.stderr)
sys.exit(1)
PYEOF
echo "✓ IAM gRPC decoupling tests passed"
echo "=== Testing with monitoring enabled ==="
helm template test $CHART_DIR \
--set global.seaweedfs.monitoring.enabled=true \
--set global.seaweedfs.monitoring.gatewayHost=prometheus \
--set global.seaweedfs.monitoring.gatewayPort=9091 > /tmp/monitoring.yaml
echo "✓ Monitoring configuration renders correctly"
echo "=== Testing with PVC storage ==="
helm template test $CHART_DIR \
--set master.data.type=persistentVolumeClaim \
--set master.data.size=10Gi \
--set master.data.storageClass=standard > /tmp/pvc.yaml
grep -q "PersistentVolumeClaim" /tmp/pvc.yaml
echo "✓ PVC configuration renders correctly"
echo "=== Testing with custom replicas ==="
helm template test $CHART_DIR \
--set master.replicas=3 \
--set filer.replicas=2 \
--set volume.replicas=3 > /tmp/replicas.yaml
echo "✓ Custom replicas configuration renders correctly"
echo "=== Testing filer with S3 gateway ==="
helm template test $CHART_DIR \
--set filer.s3.enabled=true \
--set filer.s3.enableAuth=true > /tmp/filer-s3.yaml
echo "✓ Filer S3 gateway renders correctly"
echo "=== Testing SFTP enabled ==="
helm template test $CHART_DIR --set sftp.enabled=true > /tmp/sftp.yaml
grep -q "seaweedfs-sftp" /tmp/sftp.yaml
echo "✓ SFTP deployment renders correctly"
echo "=== Testing ingress configurations ==="
helm template test $CHART_DIR \
--set master.ingress.enabled=true \
--set filer.ingress.enabled=true \
--set s3.enabled=true \
--set s3.ingress.enabled=true > /tmp/ingress.yaml
grep -q "kind: Ingress" /tmp/ingress.yaml
echo "✓ Ingress configurations render correctly"
echo "=== Testing COSI driver ==="
helm template test $CHART_DIR --set cosi.enabled=true > /tmp/cosi.yaml
grep -q "seaweedfs-cosi" /tmp/cosi.yaml
echo "✓ COSI driver renders correctly"
echo ""
echo "=== Testing long release name: service names match DNS references ==="
# Use a release name that, combined with chart name "seaweedfs", exceeds 63 chars.
# fullname = "my-very-long-release-name-that-will-cause-truncation-seaweedfs" (65 chars before trunc)
LONG_RELEASE="my-very-long-release-name-that-will-cause-truncation"
# --- Normal mode: master + filer-client services vs helper-produced addresses ---
helm template "$LONG_RELEASE" $CHART_DIR \
--set s3.enabled=true \
--set global.seaweedfs.createBuckets[0].name=test > /tmp/longname.yaml
# Extract Service names from metadata
MASTER_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname.yaml \
| grep -- '-master$')
FILER_CLIENT_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname.yaml \
| grep -- '-filer-client$')
# Extract the hostname from WEED_CLUSTER_SW_MASTER in post-install-bucket-hook
MASTER_ADDR=$(grep 'WEED_CLUSTER_SW_MASTER' -A1 /tmp/longname.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/')
FILER_ADDR=$(grep 'WEED_CLUSTER_SW_FILER' -A1 /tmp/longname.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/')
# Extract the hostname from S3 deployment -filer= argument
S3_FILER_HOST=$(grep '\-filer=' /tmp/longname.yaml \
| head -1 | sed 's/.*-filer=\([^:]*\).*/\1/')
# The address helpers produce "<svc>.<namespace>:<port>"; extract just the svc name
MASTER_ADDR_SVC=$(echo "$MASTER_ADDR" | cut -d. -f1)
FILER_ADDR_SVC=$(echo "$FILER_ADDR" | cut -d. -f1)
S3_FILER_SVC=$(echo "$S3_FILER_HOST" | cut -d. -f1)
echo " master Service.name: $MASTER_SVC"
echo " cluster.masterAddress svc: $MASTER_ADDR_SVC"
echo " filer-client Service.name: $FILER_CLIENT_SVC"
echo " cluster.filerAddress svc: $FILER_ADDR_SVC"
echo " S3 -filer= svc: $S3_FILER_SVC"
[ "$MASTER_SVC" = "$MASTER_ADDR_SVC" ] || { echo "FAIL: master service name mismatch"; exit 1; }
[ "$FILER_CLIENT_SVC" = "$FILER_ADDR_SVC" ] || { echo "FAIL: filer-client service name mismatch"; exit 1; }
[ "$FILER_CLIENT_SVC" = "$S3_FILER_SVC" ] || { echo "FAIL: S3 -filer= does not match filer-client service"; exit 1; }
echo "✓ Normal mode: service names match DNS references with long release name"
# --- All-in-one mode: all-in-one service vs both helper addresses ---
helm template "$LONG_RELEASE" $CHART_DIR \
--set allInOne.enabled=true \
--set global.seaweedfs.createBuckets[0].name=test > /tmp/longname-aio.yaml
AIO_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname-aio.yaml \
| grep -- '-all-in-one$')
AIO_MASTER_ADDR_SVC=$(grep 'WEED_CLUSTER_SW_MASTER' -A1 /tmp/longname-aio.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/' | cut -d. -f1)
AIO_FILER_ADDR_SVC=$(grep 'WEED_CLUSTER_SW_FILER' -A1 /tmp/longname-aio.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/' | cut -d. -f1)
echo " all-in-one Service.name: $AIO_SVC"
echo " cluster.masterAddress svc: $AIO_MASTER_ADDR_SVC"
echo " cluster.filerAddress svc: $AIO_FILER_ADDR_SVC"
[ "$AIO_SVC" = "$AIO_MASTER_ADDR_SVC" ] || { echo "FAIL: all-in-one master address mismatch"; exit 1; }
[ "$AIO_SVC" = "$AIO_FILER_ADDR_SVC" ] || { echo "FAIL: all-in-one filer address mismatch"; exit 1; }
echo "✓ All-in-one mode: service names match DNS references with long release name"
echo ""
echo "=== Testing security+S3: no blank lines in shell command blocks ==="
# Render the three manifests that include seaweedfs.s3.tlsArgs:
# filer-statefulset, s3-deployment, all-in-one-deployment
helm template test $CHART_DIR \
--set global.seaweedfs.enableSecurity=true \
--set filer.s3.enabled=true \
--set s3.enabled=true > /tmp/security-s3.yaml
helm template test $CHART_DIR \
--set global.seaweedfs.enableSecurity=true \
--set allInOne.enabled=true \
--set allInOne.s3.enabled=true > /tmp/security-aio.yaml
pip install pyyaml -q
python3 - /tmp/security-s3.yaml /tmp/security-aio.yaml <<'PYEOF'
import yaml, sys
errors = []
for path in sys.argv[1:]:
with open(path) as f:
docs = list(yaml.safe_load_all(f))
for doc in docs:
if not doc or doc.get("kind") not in ("Deployment", "StatefulSet"):
continue
name = doc["metadata"]["name"]
for c in doc["spec"]["template"]["spec"].get("containers", []):
cmd = c.get("command", [])
if len(cmd) >= 3 and cmd[0] == "/bin/sh" and cmd[1] == "-ec":
script = cmd[2]
for i, line in enumerate(script.splitlines(), 1):
if line.strip() == "":
errors.append(f"{path}: {name}/{c['name']} has blank line at script line {i}")
if errors:
for e in errors:
print(f"FAIL: {e}", file=sys.stderr)
print("Rendered with: global.seaweedfs.enableSecurity=true, filer.s3.enabled=true, s3.enabled=true, allInOne.enabled=true", file=sys.stderr)
sys.exit(1)
print("✓ No blank lines in security+S3 command blocks")
PYEOF
echo ""
echo "=== Testing security+S3: -cert.file/-key.file gated on httpsPort (issue #9202) ==="
# Regression test: when enableSecurity=true but *.httpsPort is 0 (the default),
# the chart must NOT emit -cert.file / -key.file to the S3 frontend. Passing
# them promotes weed s3's main -port to HTTPS (see weed/command/s3.go), which
# makes the HTTP readinessProbe spam "TLS handshake error ... client sent an
# HTTP request to an HTTPS server" into the pod log.
#
# When *.httpsPort > 0, both -port.https and cert/key args MUST be emitted
# together so the opt-in HTTPS listener actually has credentials.
python3 - "$CHART_DIR" <<'PYEOF'
import subprocess, sys, yaml
chart = sys.argv[1]
def render(values):
args = ["helm", "template", "test", chart]
for k, v in values.items():
args += ["--set", f"{k}={v}"]
return subprocess.check_output(args, text=True)
def script_of(manifest, kind_name):
for doc in yaml.safe_load_all(manifest):
if not doc or doc.get("kind") not in ("Deployment", "StatefulSet"):
continue
if doc["metadata"]["name"] != kind_name:
continue
for c in doc["spec"]["template"]["spec"]["containers"]:
cmd = c.get("command", [])
if len(cmd) >= 3 and cmd[0] == "/bin/sh" and cmd[1] == "-ec":
return cmd[2]
raise AssertionError(f"no container script for {kind_name}")
cases = [
# (values, workload-name, httpsPort-set?, arg-prefix)
({"global.seaweedfs.enableSecurity": "true",
"s3.enabled": "true"},
"test-seaweedfs-s3", False, ""),
({"global.seaweedfs.enableSecurity": "true",
"s3.enabled": "true",
"s3.httpsPort": "8443"},
"test-seaweedfs-s3", True, ""),
({"global.seaweedfs.enableSecurity": "true",
"filer.s3.enabled": "true"},
"test-seaweedfs-filer", False, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"filer.s3.enabled": "true",
"filer.s3.httpsPort": "8444"},
"test-seaweedfs-filer", True, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"allInOne.enabled": "true",
"allInOne.s3.enabled": "true"},
"test-seaweedfs-all-in-one", False, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"allInOne.enabled": "true",
"allInOne.s3.enabled": "true",
"allInOne.s3.httpsPort": "8445"},
"test-seaweedfs-all-in-one", True, "s3."),
]
failed = False
for values, name, https_on, prefix in cases:
script = script_of(render(values), name)
cert_flag = f"-{prefix}cert.file="
key_flag = f"-{prefix}key.file="
https_flag = f"-{prefix}port.https="
has_cert = cert_flag in script
has_key = key_flag in script
has_https = https_flag in script
label = f"{name} (httpsPort {'set' if https_on else 'unset'})"
if https_on:
if not (has_cert and has_key and has_https):
print(f"FAIL: {label}: expected {cert_flag}, {key_flag}, {https_flag} all present "
f"(got cert={has_cert} key={has_key} https={has_https})", file=sys.stderr)
failed = True
else:
print(f"✓ {label}: cert/key/https args emitted together")
else:
if has_cert or has_key or has_https:
print(f"FAIL: {label}: expected none of {cert_flag}/{key_flag}/{https_flag}; "
f"main S3 -port would silently become HTTPS and break HTTP probes "
f"(got cert={has_cert} key={has_key} https={has_https})", file=sys.stderr)
failed = True
else:
print(f"✓ {label}: no TLS args emitted, main -port stays HTTP")
# bash -n: pin down that the rendered script parses. Guards against
# a future helper change that leaves a dangling `\` with nothing
# after it (every current caller already exits cleanly because
# bash treats trailing `\<newline><EOF>` as line-continuation to
# an empty line — but keep the contract explicit).
parse = subprocess.run(["bash", "-n"], input=script, text=True,
capture_output=True)
if parse.returncode != 0:
print(f"FAIL: {label}: bash -n rejected rendered script: {parse.stderr.strip()}",
file=sys.stderr)
failed = True
sys.exit(1 if failed else 0)
PYEOF
echo "✅ All template rendering tests passed!"
- name: Create kind cluster
uses: helm/kind-action@v1.14.0
- name: Run chart-testing (install)
run: ct install --target-branch ${{ github.event.repository.default_branch }} --all --chart-dirs k8s/charts