mirror of
https://github.com/vmware-tanzu/velero.git
synced 2026-01-05 13:05:17 +00:00
@@ -1,407 +1,79 @@
|
||||
# Wildcard Namespace Support Design
|
||||
|
||||
# Wildcard namespace includes/excludes support for backups and restores
|
||||
|
||||
## Abstract
|
||||
This proposal introduces wildcard pattern support for namespace inclusion and exclusion in Velero backups (e.g., `prod-*`, `*-staging`).
|
||||
The implementation uses lazy evaluation within the existing `ShouldInclude()` method to resolve wildcards on-demand with request-scoped caching.
|
||||
Based on [Issue #1874](https://github.com/vmware-tanzu/velero/issues/1874).
|
||||
One to two sentences that describes the goal of this proposal and the problem being solved by the proposed change.
|
||||
The reader should be able to tell by the title, and the opening paragraph, if this document is relevant to them.
|
||||
|
||||
Velero currently does not have any support for wildcard characters in the namespace spec.
|
||||
It fully expects the namespaces to be string literals.
|
||||
|
||||
The only and notable exception is the "*" character by it's lonesome, which acts as an include all and ignore excludes option.
|
||||
Internally Velero treats not specifying anything as the "*" case.
|
||||
|
||||
This document details the approach to implementing wildcard namespaces, while keeping the "*" characters purpose intact for legacy purposes.
|
||||
|
||||
## Background
|
||||
- Currently, Velero users must explicitly list each namespace for backup operations
|
||||
- In environments with many namespaces following naming conventions (e.g., `prod-app`, `prod-db`, `prod-cache`), this becomes:
|
||||
- Cumbersome to maintain
|
||||
- Error-prone to manage
|
||||
- Users have requested wildcard support to enable patterns like `--include-namespaces "prod-*"`
|
||||
This was raised in Issue [#1874](https://github.com/vmware-tanzu/velero/issues/1874)
|
||||
|
||||
|
||||
## Goals
|
||||
- Enable wildcard pattern support for namespace includes and excludes in Velero backup specifications
|
||||
- Maintain optimal performance with lazy evaluation and request-scoped caching
|
||||
- Preserve original wildcard patterns in backup specifications for audit and readability purposes
|
||||
- A short list of things which will be accomplished by implementing this proposal.
|
||||
- Two things is ok.
|
||||
- Three is pushing it.
|
||||
- More than three goals suggests that the proposal's scope is too large.
|
||||
|
||||
- Add support for wildcard namespaces in --include-namespaces and --exclude-namespaces
|
||||
- Ensure legacy "*" support is not affected
|
||||
|
||||
## Non Goals
|
||||
- Support for complex regex patterns beyond basic glob-style wildcards (`*`)
|
||||
- Persistent caching of namespace resolution across backup requests
|
||||
- Real-time namespace discovery that changes during backup execution
|
||||
- A short list of items which are:
|
||||
- a. out of scope
|
||||
- b. follow on items which are deliberately excluded from this proposal.
|
||||
|
||||
- Completely rethinking the way "*" is treated and allowing it to work with wildcard excludes.
|
||||
|
||||
|
||||
## High-Level Design
|
||||
One to two paragraphs that describe the high level changes that will be made to implement this proposal.
|
||||
|
||||
**Core Approach:** We're making the existing concrete type (`*IncludesExcludes`) polymorphic so we can substitute our new lazy evaluation type (`*LazyNamespaceIncludesExcludes`) without changing any calling code.
|
||||
Points of interest are two funcs within the utility layer, in file `velero/pkg/backup/item_collector.go`
|
||||
- [collectNamespaces](https://github.com/vmware-tanzu/velero/blob/1535afb45e33a3d3820088e4189800a21ba55293/pkg/backup/item_collector.go#L742)
|
||||
- [getNamespacesToList](https://github.com/vmware-tanzu/velero/blob/1535afb45e33a3d3820088e4189800a21ba55293/pkg/backup/item_collector.go#L638)
|
||||
|
||||
- Implementation at **backup request level** within the `ShouldInclude()` method
|
||||
- Uses lazy evaluation with `LazyNamespaceIncludesExcludes` wrapper
|
||||
- On-demand namespace resolution with thread-safe caching
|
||||
- First call triggers Kubernetes API namespace enumeration and wildcard resolution
|
||||
- Results cached for subsequent calls within the same backup request
|
||||
collectNamespaces gets all the active namespaces and matches it against the user spec for included namespaces (r.backupRequest.Backup.Spec.IncludedNamespaces)
|
||||
This is an ideal point where wildcard expansion can take place.
|
||||
The implementation would mean that just like "*", namespaces with wildcard symbols would also be passed through without an existence check.
|
||||
The resolved namespaces are stored in new status fields on the backup.
|
||||
|
||||
## Detailed Design
|
||||
A detailed design describing how the changes to the product should be made.
|
||||
|
||||
### Polymorphic Interface Approach
|
||||
The names of types, fields, interfaces, and methods should be agreed on here, not debated in code review.
|
||||
The same applies to changes in CRDs, YAML examples, and so on.
|
||||
|
||||
The key insight is that all existing backup code already calls the same 4 methods on namespace filtering:
|
||||
- `ShouldInclude(namespace string) bool` - Core filtering logic
|
||||
- `IncludesString() string` - Logging display
|
||||
- `ExcludesString() string` - Logging display
|
||||
- `IncludeEverything() bool` - Optimization checks
|
||||
Ideally the changes should be made in sequence so that the work required to implement this design can be done incrementally, possibly in parallel.
|
||||
|
||||
By creating a `NamespaceIncludesExcludesInterface` with these methods, we can:
|
||||
1. **Standard case**: Use existing `*IncludesExcludes` (no wildcards)
|
||||
2. **Wildcard case**: Use new `*LazyNamespaceIncludesExcludes` (with K8s API enumeration)
|
||||
|
||||
**No calling code changes needed** - the interface abstraction handles everything.
|
||||
|
||||
**Cache Scope:** Single backup request only - automatic cleanup when request completes.
|
||||
|
||||
### Implementation Strategy
|
||||
|
||||
**Location:** `pkg/util/collections/includes_excludes.go`
|
||||
- New interface defining the 4 required methods
|
||||
- `LazyNamespaceIncludesExcludes` struct embedding `*IncludesExcludes` for fallback
|
||||
- Lazy resolution with thread-safe caching using mutex
|
||||
- Special case handling for lone `*` to preserve existing efficient behavior
|
||||
|
||||
**Integration:** `pkg/backup/backup.go`
|
||||
- Wildcard detection logic determines which implementation to return
|
||||
- Lone `*` pattern → standard `IncludesExcludes` (preserve current behavior)
|
||||
- Any other wildcards → lazy `LazyNamespaceIncludesExcludes`
|
||||
|
||||
**Type Updates:** Change struct fields from concrete `*IncludesExcludes` to interface type
|
||||
- `pkg/backup/request.go` - Request struct field type
|
||||
- `pkg/backup/item_collector.go` - Function parameter types
|
||||
|
||||
### Performance Characteristics
|
||||
- **First `ShouldInclude()` call:** ~500ms (K8s API namespace enumeration + wildcard resolution)
|
||||
- **Subsequent calls:** ~1ms (cached lookup with read lock)
|
||||
- **Memory overhead:** Minimal (resolved namespace list stored once per backup request)
|
||||
- **Concurrency:** Full concurrent read access to cached results
|
||||
|
||||
## Namespace Discovery Timing and Behavior
|
||||
|
||||
### Snapshot Timing
|
||||
**Wildcard patterns are resolved at backup start time** and remain fixed for the entire backup duration. This provides:
|
||||
- **Consistent behavior**: All resources in a backup come from the same namespace set
|
||||
- **Predictable results**: Backup contents don't change mid-execution
|
||||
- **Performance**: No repeated namespace enumeration during backup processing
|
||||
|
||||
### Runtime Namespace Changes
|
||||
When namespaces are created or deleted during backup execution:
|
||||
|
||||
**Newly Created Namespaces:**
|
||||
- If `prod-new` is created after backup starts, it will **NOT** be included even if it matches `prod-*`
|
||||
- The resolved namespace list is fixed at backup start time
|
||||
|
||||
**Deleted Namespaces:**
|
||||
- If a namespace matching the pattern is deleted during backup, the backup continues
|
||||
- Resources already processed from that namespace remain in the backup
|
||||
- Subsequent resource enumeration for that namespace may result in "not found" errors (handled gracefully)
|
||||
- This should ideally fail so that the user can re-run it without a namespace being deleted while a backup is started which is rare.
|
||||
|
||||
**User Expectations:**
|
||||
This behavior should be explicitly documented with examples:
|
||||
```
|
||||
# At backup start: namespaces [prod-app, prod-db] exist
|
||||
velero backup create --include-namespaces "prod-*"
|
||||
|
||||
# During backup: prod-cache namespace is created
|
||||
# Result: prod-cache is NOT included in this backup
|
||||
# Recommendation: Run another backup to capture newly created namespaces
|
||||
1. Add new status fields to the backup CRD to store expanded wildcard namespaces
|
||||
```
|
||||
|
||||
## Pattern Complexity and Validation
|
||||
|
||||
### Supported Patterns
|
||||
**Basic Wildcard Support (`*` only):**
|
||||
- `prefix-*` - Matches namespaces starting with "prefix-"
|
||||
- `*-suffix` - Matches namespaces ending with "-suffix"
|
||||
- `*-middle-*` - Matches namespaces containing "-middle-"
|
||||
- `*` - Special case: matches all namespaces (preserves current behavior)
|
||||
|
||||
### Unsupported Patterns
|
||||
**Not supported in initial implementation:**
|
||||
- `?` for single character matching (e.g., `prod-?-app`)
|
||||
- Character classes (e.g., `prod-[abc]-app`)
|
||||
- Regex patterns (e.g., `prod-\d+-app`)
|
||||
|
||||
### Pattern Validation
|
||||
**Creation-time validation:**
|
||||
- Invalid patterns containing unsupported characters will be rejected at backup creation
|
||||
- Validation occurs in CLI and API server admission controller
|
||||
- Clear error messages guide users to supported patterns
|
||||
|
||||
**Example validation errors:**
|
||||
```bash
|
||||
# Unsupported pattern
|
||||
velero backup create --include-namespaces "prod-?-app"
|
||||
# Error: Pattern 'prod-?-app' contains unsupported character '?'. Only '*' wildcards are supported.
|
||||
|
||||
# Valid patterns
|
||||
velero backup create --include-namespaces "prod-*,*-staging"
|
||||
# Success: Patterns validated successfully
|
||||
```
|
||||
2. Create a util package for wildcard expansion
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Kubernetes API Failures
|
||||
**Namespace enumeration failures:**
|
||||
- If initial namespace list API call fails → backup fails with clear error message
|
||||
- Transient failures are retried using standard Kubernetes client retry logic
|
||||
- No fallback to cached/partial data to ensure consistent behavior
|
||||
|
||||
**Error response example:**
|
||||
```
|
||||
Error: Failed to enumerate namespaces for wildcard resolution: unable to connect to Kubernetes API
|
||||
Backup creation aborted. Please verify cluster connectivity and try again.
|
||||
```
|
||||
|
||||
### Zero Namespace Matches
|
||||
**When wildcard patterns match no namespaces:**
|
||||
- **Behavior**: Warning logged, backup proceeds with empty namespace set
|
||||
- **User notification**: Warning in backup status and logs
|
||||
- **Rationale**: Allows for valid scenarios (e.g., temporary namespace absence)
|
||||
|
||||
**Warning example:**
|
||||
```
|
||||
Warning: Wildcard pattern 'prod-*' matched 0 namespaces. Backup will include no namespaces from this pattern.
|
||||
```
|
||||
|
||||
### Dry-Run Support
|
||||
**Preview functionality:**
|
||||
```bash
|
||||
# New flag to preview wildcard resolution
|
||||
velero backup create my-backup --include-namespaces "prod-*" --dry-run=wildcards
|
||||
|
||||
# Output:
|
||||
Wildcard pattern 'prod-*' would include namespaces: [prod-app, prod-db, prod-cache]
|
||||
Wildcard pattern '*-staging' would include namespaces: [app-staging, db-staging]
|
||||
Total namespaces: 5
|
||||
```
|
||||
|
||||
## Restore Operations
|
||||
|
||||
### Wildcard Behavior During Restore
|
||||
**Restore uses namespaces captured at backup time:**
|
||||
- Wildcard patterns in backup specs are **not** re-evaluated during restore
|
||||
- Restore operates on the concrete namespace list that was resolved during backup
|
||||
- This ensures restore consistency even if cluster namespace state has changed
|
||||
|
||||
**Implementation approach:**
|
||||
1. **Backup metadata storage**: Store both original patterns and resolved namespace lists
|
||||
2. **Restore processing**: Use resolved namespace lists, ignore original patterns
|
||||
3. **Audit trail**: Both patterns and resolved lists visible in backup metadata
|
||||
|
||||
**Example scenario:**
|
||||
```yaml
|
||||
# Original backup spec
|
||||
includedNamespaces: ["prod-*"]
|
||||
|
||||
# Stored in backup metadata
|
||||
resolvedNamespaces: ["prod-app", "prod-db"]
|
||||
originalPatterns: ["prod-*"]
|
||||
|
||||
# During restore (even if prod-cache now exists)
|
||||
# Only prod-app and prod-db are restored
|
||||
```
|
||||
|
||||
### Disaster Recovery Scenarios
|
||||
**Cross-cluster restore behavior:**
|
||||
- Restore attempts to create resources in target namespaces
|
||||
- If target namespaces don't exist, Velero creates them (existing behavior)
|
||||
- Wildcard patterns are not re-evaluated against target cluster
|
||||
|
||||
## Scheduled Backups
|
||||
|
||||
### Namespace State Changes Between Runs
|
||||
**Each scheduled backup run performs fresh wildcard resolution:**
|
||||
- Pattern `prod-*` may include different namespaces in each backup run
|
||||
- This allows scheduled backups to automatically capture newly created namespaces
|
||||
- **Trade-off**: Backup contents may vary between runs vs. automatic inclusion of new resources
|
||||
|
||||
**Storage implications:**
|
||||
- Varying namespace sets between runs may affect deduplication efficiency
|
||||
- Each backup stores its own resolved namespace list independently
|
||||
|
||||
**Example behavior:**
|
||||
```
|
||||
# Monday backup: prod-* matches [prod-app, prod-db]
|
||||
# Tuesday: prod-cache namespace created
|
||||
# Tuesday backup: prod-* matches [prod-app, prod-db, prod-cache]
|
||||
```
|
||||
|
||||
**User expectations:**
|
||||
- Document that scheduled backups automatically include newly matching namespaces
|
||||
- Provide guidance on namespace naming conventions for predictable backup behavior
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
**Pattern matching tests:**
|
||||
```go
|
||||
func TestWildcardPatterns(t *testing.T) {
|
||||
tests := []struct {
|
||||
pattern string
|
||||
namespace string
|
||||
expected bool
|
||||
}{
|
||||
{"prod-*", "prod-app", true},
|
||||
{"prod-*", "staging-app", false},
|
||||
{"*-staging", "app-staging", true},
|
||||
{"*-test-*", "app-test-db", true},
|
||||
}
|
||||
// ... test implementation
|
||||
}
|
||||
```
|
||||
|
||||
**Edge cases:**
|
||||
- Empty pattern list
|
||||
- Pattern with no matches
|
||||
- Pattern matching single namespace
|
||||
- Multiple overlapping patterns
|
||||
- Special case lone `*` behavior
|
||||
|
||||
### Integration Tests
|
||||
**Kubernetes cluster scenarios:**
|
||||
- Create namespaces, verify wildcard resolution
|
||||
- Test namespace creation/deletion during backup
|
||||
- Verify thread safety with concurrent backup operations
|
||||
- Error scenarios (API failures, network issues)
|
||||
|
||||
**Concurrency testing:**
|
||||
- Multiple concurrent `ShouldInclude()` calls
|
||||
- Thread safety verification
|
||||
- Cache hit ratio measurement
|
||||
|
||||
## Example Usage
|
||||
|
||||
### CLI Usage
|
||||
```bash
|
||||
# Single wildcard pattern
|
||||
velero backup create prod-backup --include-namespaces "prod-*"
|
||||
|
||||
# Multiple patterns
|
||||
velero backup create env-backup --include-namespaces "prod-*,staging-*,dev-*"
|
||||
|
||||
# Mixed literal and wildcard
|
||||
velero backup create mixed-backup --include-namespaces "prod-*,kube-system,monitoring"
|
||||
|
||||
# Exclude patterns
|
||||
velero backup create no-test --include-namespaces "*" --exclude-namespaces "*-test,*-temp"
|
||||
|
||||
# Preview before creating
|
||||
velero backup create my-backup --include-namespaces "prod-*" --dry-run=wildcards
|
||||
```
|
||||
|
||||
### Backup Specification YAML
|
||||
```yaml
|
||||
apiVersion: velero.io/v1
|
||||
kind: Backup
|
||||
metadata:
|
||||
name: production-backup
|
||||
namespace: velero
|
||||
spec:
|
||||
# Wildcard patterns in includedNamespaces
|
||||
includedNamespaces:
|
||||
- "prod-*" # All namespaces starting with "prod-"
|
||||
- "production-*" # All namespaces starting with "production-"
|
||||
- "critical-app" # Literal namespace (mixed with wildcards)
|
||||
|
||||
# Wildcard patterns in excludedNamespaces
|
||||
excludedNamespaces:
|
||||
- "*-test" # Exclude any test namespaces
|
||||
- "*-temp" # Exclude any temporary namespaces
|
||||
|
||||
# Other backup configuration
|
||||
storageLocation: default
|
||||
volumeSnapshotLocations:
|
||||
- default
|
||||
includeClusterResources: false
|
||||
```
|
||||
|
||||
### Stored Backup Metadata
|
||||
```yaml
|
||||
# What gets stored in backup metadata
|
||||
apiVersion: velero.io/v1
|
||||
kind: Backup
|
||||
metadata:
|
||||
name: production-backup
|
||||
status:
|
||||
# Original user patterns preserved for audit
|
||||
originalIncludePatterns: ["prod-*", "production-*", "critical-app"]
|
||||
originalExcludePatterns: ["*-test", "*-temp"]
|
||||
|
||||
# Resolved concrete namespace lists (used for restore)
|
||||
resolvedIncludedNamespaces: ["prod-app", "prod-db", "production-web", "critical-app"]
|
||||
resolvedExcludedNamespaces: ["prod-app-test", "staging-temp"]
|
||||
|
||||
# Resolution timestamp
|
||||
namespaceResolutionTime: "2024-01-15T10:30:00Z"
|
||||
```
|
||||
3. If required, expand wildcards and replace the request's includes and excludes with expanded namespaces
|
||||
4. Populate the expanded namespace status field with the namespaces.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### CLI-Level Resolution
|
||||
**Problem:** Resolving wildcards during `velero backup create` command
|
||||
|
||||
**Why rejected:**
|
||||
- **Lost User Intent:** Backup specs store resolved lists instead of original patterns
|
||||
- **Audit Trail Issues:** Original wildcard intent not visible when examining backup specifications
|
||||
- **CLI Complexity:** CLI requires cluster access and namespace enumeration capabilities
|
||||
|
||||
### Server-Level (Controller) Resolution
|
||||
**Problem:** Resolving wildcards in backup controller with persistent caching
|
||||
|
||||
**Why rejected:**
|
||||
- **Architectural Complexity:** Requires additional API schema changes for storing resolved namespace lists
|
||||
- **Cache Management:** Need cache invalidation, storage, and lifecycle management
|
||||
- **Limited Benefit:** Performance gain only applies to narrow controller reconciliation retry scenarios
|
||||
- **State Management:** Introduces persistent state maintained across backup lifecycle
|
||||
|
||||
### Request-Level (ShouldInclude) Resolution
|
||||
**Chosen Approach:** Lazy evaluation within backup request processing
|
||||
|
||||
**Benefits:**
|
||||
- **Preserved Intent:** Original wildcard patterns remain in backup specifications
|
||||
- **Optimal Performance:** First resolution (~500ms), subsequent calls (~1ms) with request-scoped caching
|
||||
- **Clean Architecture:** No persistent state, no API schema changes, minimal code changes
|
||||
- **Thread Safety:** Proper mutex usage for concurrent worker access
|
||||
- **Scoped Lifetime:** Cache automatically cleaned up when backup request completes
|
||||
If there are alternative high level or detailed designs that were not pursued they should be called out here with a brief explanation of why they were not pursued.
|
||||
|
||||
## Security Considerations
|
||||
- Implementation requires Velero service account to have `list` permissions on namespace resources
|
||||
- Aligns with existing Velero RBAC requirements
|
||||
- No additional privileges or security surface area introduced
|
||||
|
||||
## Addressing Implementation Concerns
|
||||
|
||||
### Multiple Pattern Support
|
||||
Multiple wildcards work naturally: `--include-namespaces "prod-*,staging-*,dev-*"` - each pattern evaluated independently during lazy resolution.
|
||||
|
||||
### Mixed Literal and Wildcard Detection
|
||||
Simple approach: strings containing `*` are wildcards, others use existing literal namespace logic. Zero breaking changes for existing validation.
|
||||
|
||||
### Include/Exclude Conflict Detection
|
||||
Runtime resolution simplifies conflicts - wildcards resolve to actual namespace lists first, then standard include/exclude precedence applies.
|
||||
|
||||
### Backward Compatibility
|
||||
Lazy evaluation triggers only when wildcards detected. Non-wildcard backups have zero overhead and identical behavior to current implementation.
|
||||
|
||||
## Special Consideration: Existing `*` Behavior
|
||||
|
||||
**Current Velero Behavior:** `--include-namespaces "*"` (the CLI default) means "include all namespaces" and uses special logic that doesn't enumerate namespaces - it simply bypasses namespace filtering entirely.
|
||||
|
||||
**Potential Breaking Change:** Our wildcard implementation would treat `*` as a glob pattern, resolving it to a specific list of namespaces at backup start time, which changes the behavior from "include everything" to "include these specific namespaces."
|
||||
|
||||
**Required Solution:** Special-case handling for the lone `*` pattern to preserve existing behavior by using original `IncludesExcludes` logic instead of wildcard resolution.
|
||||
|
||||
This ensures that `--include-namespaces "*"` continues to work exactly as before, while enabling new wildcard patterns like `prod-*`, `*-staging`, etc.
|
||||
If this proposal has an impact to the security of the product, its users, or data stored or transmitted via the product, they must be addressed here.
|
||||
|
||||
## Compatibility
|
||||
- Full backward compatibility with existing backup specifications using literal namespace lists
|
||||
- No changes required to CLI commands, existing backups, or restore operations
|
||||
A discussion of any compatibility issues that need to be considered
|
||||
|
||||
## Implementation
|
||||
The implementation consists of approximately 200 lines of new code across four files:
|
||||
- `pkg/util/collections/includes_excludes.go`: Core lazy evaluation logic (~150 lines)
|
||||
- `pkg/backup/backup.go`: Wildcard detection logic (~20 lines)
|
||||
- `pkg/backup/request.go`: Interface type usage (~5 lines)
|
||||
- `pkg/backup/item_collector.go`: Compatibility method calls (~25 lines)
|
||||
A description of the implementation, timelines, and any resources that have agreed to contribute.
|
||||
|
||||
## Open Issues
|
||||
None.
|
||||
A discussion of issues relating to this proposal for which the author does not know the solution. This section may be omitted if there are none.
|
||||
|
||||
Reference in New Issue
Block a user