Files
seaweedfs/.github/workflows/helm_ci.yml
Chris Lu 76f361fa77 fix(helm): gate S3 TLS cert args on httpsPort to stop probe failures (#9202) (#9206)
* fix(helm): gate S3 TLS cert args on httpsPort to stop probe failures (#9202)

With `global.seaweedfs.enableSecurity=true` and the default `s3.httpsPort=0`,
the chart was unconditionally passing `-cert.file` / `-key.file` to the S3
frontend. In `weed/command/s3.go`, when `tlsPrivateKey != ""` and
`portHttps == 0`, the server promotes its main `-port` (8333 by default) into
an HTTPS listener. The pod's readiness / liveness probes still use
`scheme: HTTP`, so every kubelet probe produces

    http: TLS handshake error from <node-ip>:<port>: client sent an HTTP
    request to an HTTPS server

in the pod log, as reported in #9202. `enableSecurity=true` is supposed to
activate security.toml / gRPC mTLS, not silently flip the S3 HTTP port to
HTTPS.

Move the `seaweedfs.s3.tlsArgs` include inside the `if httpsPort` guard in
all three templates that wire up an S3 frontend (standalone S3 deployment,
filer with S3 sub-server, all-in-one deployment). The TLS cert args are now
emitted only when the user explicitly opts into an HTTPS port; the main
`-port` stays HTTP so probes work.

Also add a regression test to `.github/workflows/helm_ci.yml` that renders
all three templates with and without `httpsPort` and asserts the cert/key/
`-port.https` args are emitted together or not at all.

* test(helm): add bash -n parse check to the S3 TLS-gating regression test

Addresses gemini-code-assist review comment on #9206 flagging a potential
"dangling backslash" shell-syntax risk in the rendered all-in-one command
script when httpsPort is set but most S3/SFTP args are defaulted off. In
practice bash -n accepts a trailing `\<newline><EOF>` (it's line-continuation
to an empty line), so no current rendering is broken. Locking that contract
down in CI so a future helper change that leaves a dangling backslash — or
any other shell-syntax regression in the rendered command — fails loudly
instead of silently shipping broken pods.
2026-04-23 15:00:07 -07:00

327 lines
16 KiB
YAML

name: "helm: lint and test charts"
on:
push:
branches: [ master ]
paths: ['k8s/**']
pull_request:
branches: [ master ]
paths: ['k8s/**']
permissions:
contents: read
jobs:
lint-test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v5
with:
version: v3.18.4
- uses: actions/setup-python@v6
with:
python-version: '3.10'
check-latest: true
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.8.0
- name: Run chart-testing (list-changed)
id: list-changed
run: |
changed=$(ct list-changed --target-branch ${{ github.event.repository.default_branch }} --chart-dirs k8s/charts)
if [[ -n "$changed" ]]; then
echo "::set-output name=changed::true"
fi
- name: Run chart-testing (lint)
run: ct lint --target-branch ${{ github.event.repository.default_branch }} --all --validate-maintainers=false --chart-dirs k8s/charts
- name: Verify template rendering
run: |
set -e
CHART_DIR="k8s/charts/seaweedfs"
echo "=== Testing default configuration ==="
helm template test $CHART_DIR > /tmp/default.yaml
echo "✓ Default configuration renders successfully"
echo "=== Testing with S3 enabled ==="
helm template test $CHART_DIR --set s3.enabled=true > /tmp/s3.yaml
grep -q "kind: Deployment" /tmp/s3.yaml && grep -q "seaweedfs-s3" /tmp/s3.yaml
echo "✓ S3 deployment renders correctly"
echo "=== Testing with all-in-one mode ==="
helm template test $CHART_DIR --set allInOne.enabled=true > /tmp/allinone.yaml
grep -q "seaweedfs-all-in-one" /tmp/allinone.yaml
echo "✓ All-in-one deployment renders correctly"
echo "=== Testing with security enabled ==="
helm template test $CHART_DIR --set global.seaweedfs.enableSecurity=true > /tmp/security.yaml
grep -q "security-config" /tmp/security.yaml
echo "✓ Security configuration renders correctly"
echo "=== Testing with monitoring enabled ==="
helm template test $CHART_DIR \
--set global.seaweedfs.monitoring.enabled=true \
--set global.seaweedfs.monitoring.gatewayHost=prometheus \
--set global.seaweedfs.monitoring.gatewayPort=9091 > /tmp/monitoring.yaml
echo "✓ Monitoring configuration renders correctly"
echo "=== Testing with PVC storage ==="
helm template test $CHART_DIR \
--set master.data.type=persistentVolumeClaim \
--set master.data.size=10Gi \
--set master.data.storageClass=standard > /tmp/pvc.yaml
grep -q "PersistentVolumeClaim" /tmp/pvc.yaml
echo "✓ PVC configuration renders correctly"
echo "=== Testing with custom replicas ==="
helm template test $CHART_DIR \
--set master.replicas=3 \
--set filer.replicas=2 \
--set volume.replicas=3 > /tmp/replicas.yaml
echo "✓ Custom replicas configuration renders correctly"
echo "=== Testing filer with S3 gateway ==="
helm template test $CHART_DIR \
--set filer.s3.enabled=true \
--set filer.s3.enableAuth=true > /tmp/filer-s3.yaml
echo "✓ Filer S3 gateway renders correctly"
echo "=== Testing SFTP enabled ==="
helm template test $CHART_DIR --set sftp.enabled=true > /tmp/sftp.yaml
grep -q "seaweedfs-sftp" /tmp/sftp.yaml
echo "✓ SFTP deployment renders correctly"
echo "=== Testing ingress configurations ==="
helm template test $CHART_DIR \
--set master.ingress.enabled=true \
--set filer.ingress.enabled=true \
--set s3.enabled=true \
--set s3.ingress.enabled=true > /tmp/ingress.yaml
grep -q "kind: Ingress" /tmp/ingress.yaml
echo "✓ Ingress configurations render correctly"
echo "=== Testing COSI driver ==="
helm template test $CHART_DIR --set cosi.enabled=true > /tmp/cosi.yaml
grep -q "seaweedfs-cosi" /tmp/cosi.yaml
echo "✓ COSI driver renders correctly"
echo ""
echo "=== Testing long release name: service names match DNS references ==="
# Use a release name that, combined with chart name "seaweedfs", exceeds 63 chars.
# fullname = "my-very-long-release-name-that-will-cause-truncation-seaweedfs" (65 chars before trunc)
LONG_RELEASE="my-very-long-release-name-that-will-cause-truncation"
# --- Normal mode: master + filer-client services vs helper-produced addresses ---
helm template "$LONG_RELEASE" $CHART_DIR \
--set s3.enabled=true \
--set global.seaweedfs.createBuckets[0].name=test > /tmp/longname.yaml
# Extract Service names from metadata
MASTER_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname.yaml \
| grep -- '-master$')
FILER_CLIENT_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname.yaml \
| grep -- '-filer-client$')
# Extract the hostname from WEED_CLUSTER_SW_MASTER in post-install-bucket-hook
MASTER_ADDR=$(grep 'WEED_CLUSTER_SW_MASTER' -A1 /tmp/longname.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/')
FILER_ADDR=$(grep 'WEED_CLUSTER_SW_FILER' -A1 /tmp/longname.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/')
# Extract the hostname from S3 deployment -filer= argument
S3_FILER_HOST=$(grep '\-filer=' /tmp/longname.yaml \
| head -1 | sed 's/.*-filer=\([^:]*\).*/\1/')
# The address helpers produce "<svc>.<namespace>:<port>"; extract just the svc name
MASTER_ADDR_SVC=$(echo "$MASTER_ADDR" | cut -d. -f1)
FILER_ADDR_SVC=$(echo "$FILER_ADDR" | cut -d. -f1)
S3_FILER_SVC=$(echo "$S3_FILER_HOST" | cut -d. -f1)
echo " master Service.name: $MASTER_SVC"
echo " cluster.masterAddress svc: $MASTER_ADDR_SVC"
echo " filer-client Service.name: $FILER_CLIENT_SVC"
echo " cluster.filerAddress svc: $FILER_ADDR_SVC"
echo " S3 -filer= svc: $S3_FILER_SVC"
[ "$MASTER_SVC" = "$MASTER_ADDR_SVC" ] || { echo "FAIL: master service name mismatch"; exit 1; }
[ "$FILER_CLIENT_SVC" = "$FILER_ADDR_SVC" ] || { echo "FAIL: filer-client service name mismatch"; exit 1; }
[ "$FILER_CLIENT_SVC" = "$S3_FILER_SVC" ] || { echo "FAIL: S3 -filer= does not match filer-client service"; exit 1; }
echo "✓ Normal mode: service names match DNS references with long release name"
# --- All-in-one mode: all-in-one service vs both helper addresses ---
helm template "$LONG_RELEASE" $CHART_DIR \
--set allInOne.enabled=true \
--set global.seaweedfs.createBuckets[0].name=test > /tmp/longname-aio.yaml
AIO_SVC=$(awk '/kind: Service/{found=1} found && /^ *name:/{print $2; found=0}' /tmp/longname-aio.yaml \
| grep -- '-all-in-one$')
AIO_MASTER_ADDR_SVC=$(grep 'WEED_CLUSTER_SW_MASTER' -A1 /tmp/longname-aio.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/' | cut -d. -f1)
AIO_FILER_ADDR_SVC=$(grep 'WEED_CLUSTER_SW_FILER' -A1 /tmp/longname-aio.yaml \
| grep 'value:' | head -1 | sed 's/.*value: *"\{0,1\}\([^":]*\).*/\1/' | cut -d. -f1)
echo " all-in-one Service.name: $AIO_SVC"
echo " cluster.masterAddress svc: $AIO_MASTER_ADDR_SVC"
echo " cluster.filerAddress svc: $AIO_FILER_ADDR_SVC"
[ "$AIO_SVC" = "$AIO_MASTER_ADDR_SVC" ] || { echo "FAIL: all-in-one master address mismatch"; exit 1; }
[ "$AIO_SVC" = "$AIO_FILER_ADDR_SVC" ] || { echo "FAIL: all-in-one filer address mismatch"; exit 1; }
echo "✓ All-in-one mode: service names match DNS references with long release name"
echo ""
echo "=== Testing security+S3: no blank lines in shell command blocks ==="
# Render the three manifests that include seaweedfs.s3.tlsArgs:
# filer-statefulset, s3-deployment, all-in-one-deployment
helm template test $CHART_DIR \
--set global.seaweedfs.enableSecurity=true \
--set filer.s3.enabled=true \
--set s3.enabled=true > /tmp/security-s3.yaml
helm template test $CHART_DIR \
--set global.seaweedfs.enableSecurity=true \
--set allInOne.enabled=true \
--set allInOne.s3.enabled=true > /tmp/security-aio.yaml
pip install pyyaml -q
python3 - /tmp/security-s3.yaml /tmp/security-aio.yaml <<'PYEOF'
import yaml, sys
errors = []
for path in sys.argv[1:]:
with open(path) as f:
docs = list(yaml.safe_load_all(f))
for doc in docs:
if not doc or doc.get("kind") not in ("Deployment", "StatefulSet"):
continue
name = doc["metadata"]["name"]
for c in doc["spec"]["template"]["spec"].get("containers", []):
cmd = c.get("command", [])
if len(cmd) >= 3 and cmd[0] == "/bin/sh" and cmd[1] == "-ec":
script = cmd[2]
for i, line in enumerate(script.splitlines(), 1):
if line.strip() == "":
errors.append(f"{path}: {name}/{c['name']} has blank line at script line {i}")
if errors:
for e in errors:
print(f"FAIL: {e}", file=sys.stderr)
print("Rendered with: global.seaweedfs.enableSecurity=true, filer.s3.enabled=true, s3.enabled=true, allInOne.enabled=true", file=sys.stderr)
sys.exit(1)
print("✓ No blank lines in security+S3 command blocks")
PYEOF
echo ""
echo "=== Testing security+S3: -cert.file/-key.file gated on httpsPort (issue #9202) ==="
# Regression test: when enableSecurity=true but *.httpsPort is 0 (the default),
# the chart must NOT emit -cert.file / -key.file to the S3 frontend. Passing
# them promotes weed s3's main -port to HTTPS (see weed/command/s3.go), which
# makes the HTTP readinessProbe spam "TLS handshake error ... client sent an
# HTTP request to an HTTPS server" into the pod log.
#
# When *.httpsPort > 0, both -port.https and cert/key args MUST be emitted
# together so the opt-in HTTPS listener actually has credentials.
python3 - "$CHART_DIR" <<'PYEOF'
import subprocess, sys, yaml
chart = sys.argv[1]
def render(values):
args = ["helm", "template", "test", chart]
for k, v in values.items():
args += ["--set", f"{k}={v}"]
return subprocess.check_output(args, text=True)
def script_of(manifest, kind_name):
for doc in yaml.safe_load_all(manifest):
if not doc or doc.get("kind") not in ("Deployment", "StatefulSet"):
continue
if doc["metadata"]["name"] != kind_name:
continue
for c in doc["spec"]["template"]["spec"]["containers"]:
cmd = c.get("command", [])
if len(cmd) >= 3 and cmd[0] == "/bin/sh" and cmd[1] == "-ec":
return cmd[2]
raise AssertionError(f"no container script for {kind_name}")
cases = [
# (values, workload-name, httpsPort-set?, arg-prefix)
({"global.seaweedfs.enableSecurity": "true",
"s3.enabled": "true"},
"test-seaweedfs-s3", False, ""),
({"global.seaweedfs.enableSecurity": "true",
"s3.enabled": "true",
"s3.httpsPort": "8443"},
"test-seaweedfs-s3", True, ""),
({"global.seaweedfs.enableSecurity": "true",
"filer.s3.enabled": "true"},
"test-seaweedfs-filer", False, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"filer.s3.enabled": "true",
"filer.s3.httpsPort": "8444"},
"test-seaweedfs-filer", True, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"allInOne.enabled": "true",
"allInOne.s3.enabled": "true"},
"test-seaweedfs-all-in-one", False, "s3."),
({"global.seaweedfs.enableSecurity": "true",
"allInOne.enabled": "true",
"allInOne.s3.enabled": "true",
"allInOne.s3.httpsPort": "8445"},
"test-seaweedfs-all-in-one", True, "s3."),
]
failed = False
for values, name, https_on, prefix in cases:
script = script_of(render(values), name)
cert_flag = f"-{prefix}cert.file="
key_flag = f"-{prefix}key.file="
https_flag = f"-{prefix}port.https="
has_cert = cert_flag in script
has_key = key_flag in script
has_https = https_flag in script
label = f"{name} (httpsPort {'set' if https_on else 'unset'})"
if https_on:
if not (has_cert and has_key and has_https):
print(f"FAIL: {label}: expected {cert_flag}, {key_flag}, {https_flag} all present "
f"(got cert={has_cert} key={has_key} https={has_https})", file=sys.stderr)
failed = True
else:
print(f"✓ {label}: cert/key/https args emitted together")
else:
if has_cert or has_key or has_https:
print(f"FAIL: {label}: expected none of {cert_flag}/{key_flag}/{https_flag}; "
f"main S3 -port would silently become HTTPS and break HTTP probes "
f"(got cert={has_cert} key={has_key} https={has_https})", file=sys.stderr)
failed = True
else:
print(f"✓ {label}: no TLS args emitted, main -port stays HTTP")
# bash -n: pin down that the rendered script parses. Guards against
# a future helper change that leaves a dangling `\` with nothing
# after it (every current caller already exits cleanly because
# bash treats trailing `\<newline><EOF>` as line-continuation to
# an empty line — but keep the contract explicit).
parse = subprocess.run(["bash", "-n"], input=script, text=True,
capture_output=True)
if parse.returncode != 0:
print(f"FAIL: {label}: bash -n rejected rendered script: {parse.stderr.strip()}",
file=sys.stderr)
failed = True
sys.exit(1 if failed else 0)
PYEOF
echo "✅ All template rendering tests passed!"
- name: Create kind cluster
uses: helm/kind-action@v1.14.0
- name: Run chart-testing (install)
run: ct install --target-branch ${{ github.event.repository.default_branch }} --all --chart-dirs k8s/charts