New design doc

Signed-off-by: Joseph <jvaikath@redhat.com>
2026-01-05 13:05:17 +00:00 · 2025-08-01 13:25:44 -04:00
parent 69e307918b
commit c0699c443b
1 changed files with 48 additions and 376 deletions
--- a/design/wildcard-namespace-support-design.md
+++ b/design/wildcard-namespace-support-design.md
@@ -1,407 +1,79 @@
-# Wildcard Namespace Support Design
+
+# Wildcard namespace includes/excludes support for backups and restores

 ## Abstract
-This proposal introduces wildcard pattern support for namespace inclusion and exclusion in Velero backups (e.g., `prod-*`, `*-staging`).
-The implementation uses lazy evaluation within the existing `ShouldInclude()` method to resolve wildcards on-demand with request-scoped caching.
-Based on [Issue #1874](https://github.com/vmware-tanzu/velero/issues/1874).
+One to two sentences that describes the goal of this proposal and the problem being solved by the proposed change.
+The reader should be able to tell by the title, and the opening paragraph, if this document is relevant to them.
+
+Velero currently does not have any support for wildcard characters in the namespace spec. 
+It fully expects the namespaces to be string literals.
+
+The only and notable exception is the "*" character by it's lonesome, which acts as an include all and ignore excludes option.
+Internally Velero treats not specifying anything as the "*" case.
+
+This document details the approach to implementing wildcard namespaces, while keeping the "*" characters purpose intact for legacy purposes.

 ## Background
- Currently, Velero users must explicitly list each namespace for backup operations
- In environments with many namespaces following naming conventions (e.g., `prod-app`, `prod-db`, `prod-cache`), this becomes:
-  - Cumbersome to maintain
-  - Error-prone to manage
- Users have requested wildcard support to enable patterns like `--include-namespaces "prod-*"`
+This was raised in Issue [#1874](https://github.com/vmware-tanzu/velero/issues/1874)
+

 ## Goals
- Enable wildcard pattern support for namespace includes and excludes in Velero backup specifications
- Maintain optimal performance with lazy evaluation and request-scoped caching
- Preserve original wildcard patterns in backup specifications for audit and readability purposes
+- A short list of things which will be accomplished by implementing this proposal.
+- Two things is ok.
+- Three is pushing it.
+- More than three goals suggests that the proposal's scope is too large.
+
+- Add support for wildcard namespaces in --include-namespaces and --exclude-namespaces
+- Ensure legacy "*" support is not affected

 ## Non Goals
- Support for complex regex patterns beyond basic glob-style wildcards (`*`)
- Persistent caching of namespace resolution across backup requests
- Real-time namespace discovery that changes during backup execution
+- A short list of items which are:
+- a. out of scope
+- b. follow on items which are deliberately excluded from this proposal.
+
+- Completely rethinking the way "*" is treated and allowing it to work with wildcard excludes.
+

 ## High-Level Design
+One to two paragraphs that describe the high level changes that will be made to implement this proposal.

-**Core Approach:** We're making the existing concrete type (`*IncludesExcludes`) polymorphic so we can substitute our new lazy evaluation type (`*LazyNamespaceIncludesExcludes`) without changing any calling code.
+Points of interest are two funcs within the utility layer, in file `velero/pkg/backup/item_collector.go`
+- [collectNamespaces](https://github.com/vmware-tanzu/velero/blob/1535afb45e33a3d3820088e4189800a21ba55293/pkg/backup/item_collector.go#L742)
+- [getNamespacesToList](https://github.com/vmware-tanzu/velero/blob/1535afb45e33a3d3820088e4189800a21ba55293/pkg/backup/item_collector.go#L638)

- Implementation at **backup request level** within the `ShouldInclude()` method
- Uses lazy evaluation with `LazyNamespaceIncludesExcludes` wrapper
- On-demand namespace resolution with thread-safe caching
- First call triggers Kubernetes API namespace enumeration and wildcard resolution
- Results cached for subsequent calls within the same backup request
+collectNamespaces gets all the active namespaces and matches it against the user spec for included namespaces (r.backupRequest.Backup.Spec.IncludedNamespaces)
+This is an ideal point where wildcard expansion can take place.
+The implementation would mean that just like "*", namespaces with wildcard symbols would also be passed through without an existence check.
+The resolved namespaces are stored in new status fields on the backup.

 ## Detailed Design
+A detailed design describing how the changes to the product should be made.

-### Polymorphic Interface Approach
+The names of types, fields, interfaces, and methods should be agreed on here, not debated in code review.
+The same applies to changes in CRDs, YAML examples, and so on.

-The key insight is that all existing backup code already calls the same 4 methods on namespace filtering:
- `ShouldInclude(namespace string) bool` - Core filtering logic
- `IncludesString() string` - Logging display
- `ExcludesString() string` - Logging display  
- `IncludeEverything() bool` - Optimization checks
+Ideally the changes should be made in sequence so that the work required to implement this design can be done incrementally, possibly in parallel.

-By creating a `NamespaceIncludesExcludesInterface` with these methods, we can:
-1. **Standard case**: Use existing `*IncludesExcludes` (no wildcards)
-2. **Wildcard case**: Use new `*LazyNamespaceIncludesExcludes` (with K8s API enumeration)
-
-**No calling code changes needed** - the interface abstraction handles everything.
-
-**Cache Scope:** Single backup request only - automatic cleanup when request completes.
-
-### Implementation Strategy
-
-**Location:** `pkg/util/collections/includes_excludes.go`
- New interface defining the 4 required methods
- `LazyNamespaceIncludesExcludes` struct embedding `*IncludesExcludes` for fallback
- Lazy resolution with thread-safe caching using mutex
- Special case handling for lone `*` to preserve existing efficient behavior
-
-**Integration:** `pkg/backup/backup.go`  
- Wildcard detection logic determines which implementation to return
- Lone `*` pattern → standard `IncludesExcludes` (preserve current behavior)
- Any other wildcards → lazy `LazyNamespaceIncludesExcludes`
-
-**Type Updates:** Change struct fields from concrete `*IncludesExcludes` to interface type
- `pkg/backup/request.go` - Request struct field type
- `pkg/backup/item_collector.go` - Function parameter types
-
-### Performance Characteristics
- **First `ShouldInclude()` call:** ~500ms (K8s API namespace enumeration + wildcard resolution)
- **Subsequent calls:** ~1ms (cached lookup with read lock)
- **Memory overhead:** Minimal (resolved namespace list stored once per backup request)
- **Concurrency:** Full concurrent read access to cached results
-
-## Namespace Discovery Timing and Behavior
-
-### Snapshot Timing
-**Wildcard patterns are resolved at backup start time** and remain fixed for the entire backup duration. This provides:
- **Consistent behavior**: All resources in a backup come from the same namespace set
- **Predictable results**: Backup contents don't change mid-execution
- **Performance**: No repeated namespace enumeration during backup processing
-
-### Runtime Namespace Changes
-When namespaces are created or deleted during backup execution:
-
-**Newly Created Namespaces:**
- If `prod-new` is created after backup starts, it will **NOT** be included even if it matches `prod-*`
- The resolved namespace list is fixed at backup start time
-
-**Deleted Namespaces:**
- If a namespace matching the pattern is deleted during backup, the backup continues
- Resources already processed from that namespace remain in the backup
- Subsequent resource enumeration for that namespace may result in "not found" errors (handled gracefully)
-  - This should ideally fail so  that the user can re-run it without a namespace being deleted while a backup is started which is rare.
-
-**User Expectations:**
-This behavior should be explicitly documented with examples:
-```
-# At backup start: namespaces [prod-app, prod-db] exist
-velero backup create --include-namespaces "prod-*"
-
-# During backup: prod-cache namespace is created
-# Result: prod-cache is NOT included in this backup
-# Recommendation: Run another backup to capture newly created namespaces
+1. Add new status fields to the backup CRD to store expanded wildcard namespaces
 ```

-## Pattern Complexity and Validation
-
-### Supported Patterns
-**Basic Wildcard Support (`*` only):**
- `prefix-*` - Matches namespaces starting with "prefix-"
- `*-suffix` - Matches namespaces ending with "-suffix"
- `*-middle-*` - Matches namespaces containing "-middle-"
- `*` - Special case: matches all namespaces (preserves current behavior)
-
-### Unsupported Patterns
-**Not supported in initial implementation:**
- `?` for single character matching (e.g., `prod-?-app`)
- Character classes (e.g., `prod-[abc]-app`)
- Regex patterns (e.g., `prod-\d+-app`)
-
-### Pattern Validation
-**Creation-time validation:**
- Invalid patterns containing unsupported characters will be rejected at backup creation
- Validation occurs in CLI and API server admission controller
- Clear error messages guide users to supported patterns
-
-**Example validation errors:**
-```bash
-# Unsupported pattern
-velero backup create --include-namespaces "prod-?-app"
-# Error: Pattern 'prod-?-app' contains unsupported character '?'. Only '*' wildcards are supported.
-
-# Valid patterns
-velero backup create --include-namespaces "prod-*,*-staging"
-# Success: Patterns validated successfully
 ```
+2. Create a util package for wildcard expansion

-## Error Handling
-
-### Kubernetes API Failures
-**Namespace enumeration failures:**
- If initial namespace list API call fails → backup fails with clear error message
- Transient failures are retried using standard Kubernetes client retry logic
- No fallback to cached/partial data to ensure consistent behavior
-
-**Error response example:**
-```
-Error: Failed to enumerate namespaces for wildcard resolution: unable to connect to Kubernetes API
-Backup creation aborted. Please verify cluster connectivity and try again.
-```
-
-### Zero Namespace Matches
-**When wildcard patterns match no namespaces:**
- **Behavior**: Warning logged, backup proceeds with empty namespace set
- **User notification**: Warning in backup status and logs
- **Rationale**: Allows for valid scenarios (e.g., temporary namespace absence)
-
-**Warning example:**
-```
-Warning: Wildcard pattern 'prod-*' matched 0 namespaces. Backup will include no namespaces from this pattern.
-```
-
-### Dry-Run Support
-**Preview functionality:**
-```bash
-# New flag to preview wildcard resolution
-velero backup create my-backup --include-namespaces "prod-*" --dry-run=wildcards
-
-# Output:
-Wildcard pattern 'prod-*' would include namespaces: [prod-app, prod-db, prod-cache]
-Wildcard pattern '*-staging' would include namespaces: [app-staging, db-staging]
-Total namespaces: 5
-```
-
-## Restore Operations
-
-### Wildcard Behavior During Restore
-**Restore uses namespaces captured at backup time:**
- Wildcard patterns in backup specs are **not** re-evaluated during restore
- Restore operates on the concrete namespace list that was resolved during backup
- This ensures restore consistency even if cluster namespace state has changed
-
-**Implementation approach:**
-1. **Backup metadata storage**: Store both original patterns and resolved namespace lists
-2. **Restore processing**: Use resolved namespace lists, ignore original patterns
-3. **Audit trail**: Both patterns and resolved lists visible in backup metadata
-
-**Example scenario:**
-```yaml
-# Original backup spec
-includedNamespaces: ["prod-*"]
-
-# Stored in backup metadata
-resolvedNamespaces: ["prod-app", "prod-db"] 
-originalPatterns: ["prod-*"]
-
-# During restore (even if prod-cache now exists)
-# Only prod-app and prod-db are restored
-```
-
-### Disaster Recovery Scenarios
-**Cross-cluster restore behavior:**
- Restore attempts to create resources in target namespaces
- If target namespaces don't exist, Velero creates them (existing behavior)
- Wildcard patterns are not re-evaluated against target cluster
-
-## Scheduled Backups
-
-### Namespace State Changes Between Runs
-**Each scheduled backup run performs fresh wildcard resolution:**
- Pattern `prod-*` may include different namespaces in each backup run
- This allows scheduled backups to automatically capture newly created namespaces
- **Trade-off**: Backup contents may vary between runs vs. automatic inclusion of new resources
-
-**Storage implications:**
- Varying namespace sets between runs may affect deduplication efficiency
- Each backup stores its own resolved namespace list independently
-
-**Example behavior:**
-```
-# Monday backup: prod-* matches [prod-app, prod-db]
-# Tuesday: prod-cache namespace created
-# Tuesday backup: prod-* matches [prod-app, prod-db, prod-cache]
-```
-
-**User expectations:**
- Document that scheduled backups automatically include newly matching namespaces
- Provide guidance on namespace naming conventions for predictable backup behavior
-
-## Testing Strategy
-
-### Unit Tests
-**Pattern matching tests:**
-```go
-func TestWildcardPatterns(t *testing.T) {
-    tests := []struct {
-        pattern   string
-        namespace string
-        expected  bool
-    }{
-        {"prod-*", "prod-app", true},
-        {"prod-*", "staging-app", false},
-        {"*-staging", "app-staging", true},
-        {"*-test-*", "app-test-db", true},
-    }
-    // ... test implementation
-}
-```
-
-**Edge cases:**
- Empty pattern list
- Pattern with no matches
- Pattern matching single namespace
- Multiple overlapping patterns
- Special case lone `*` behavior
-
-### Integration Tests
-**Kubernetes cluster scenarios:**
- Create namespaces, verify wildcard resolution
- Test namespace creation/deletion during backup
- Verify thread safety with concurrent backup operations
- Error scenarios (API failures, network issues)
-
-**Concurrency testing:**
- Multiple concurrent `ShouldInclude()` calls
- Thread safety verification
- Cache hit ratio measurement
-
-## Example Usage
-
-### CLI Usage
-```bash
-# Single wildcard pattern
-velero backup create prod-backup --include-namespaces "prod-*"
-
-# Multiple patterns
-velero backup create env-backup --include-namespaces "prod-*,staging-*,dev-*"
-
-# Mixed literal and wildcard
-velero backup create mixed-backup --include-namespaces "prod-*,kube-system,monitoring"
-
-# Exclude patterns
-velero backup create no-test --include-namespaces "*" --exclude-namespaces "*-test,*-temp"
-
-# Preview before creating
-velero backup create my-backup --include-namespaces "prod-*" --dry-run=wildcards
-```
-
-### Backup Specification YAML
-```yaml
-apiVersion: velero.io/v1
-kind: Backup
-metadata:
-  name: production-backup
-  namespace: velero
-spec:
-  # Wildcard patterns in includedNamespaces
-  includedNamespaces:
-  - "prod-*"           # All namespaces starting with "prod-"
-  - "production-*"     # All namespaces starting with "production-"
-  - "critical-app"     # Literal namespace (mixed with wildcards)
-  
-  # Wildcard patterns in excludedNamespaces  
-  excludedNamespaces:
-  - "*-test"           # Exclude any test namespaces
-  - "*-temp"           # Exclude any temporary namespaces
-  
-  # Other backup configuration
-  storageLocation: default
-  volumeSnapshotLocations:
-  - default
-  includeClusterResources: false
-```
-
-### Stored Backup Metadata
-```yaml
-# What gets stored in backup metadata
-apiVersion: velero.io/v1
-kind: Backup
-metadata:
-  name: production-backup
-status:
-  # Original user patterns preserved for audit
-  originalIncludePatterns: ["prod-*", "production-*", "critical-app"]
-  originalExcludePatterns: ["*-test", "*-temp"]
-  
-  # Resolved concrete namespace lists (used for restore)
-  resolvedIncludedNamespaces: ["prod-app", "prod-db", "production-web", "critical-app"]
-  resolvedExcludedNamespaces: ["prod-app-test", "staging-temp"]
-  
-  # Resolution timestamp
-  namespaceResolutionTime: "2024-01-15T10:30:00Z"
-```
+3. If required, expand wildcards and replace the request's includes and excludes with expanded namespaces
+4. Populate the expanded namespace status field with the namespaces.

 ## Alternatives Considered
-
-### CLI-Level Resolution
-**Problem:** Resolving wildcards during `velero backup create` command
-
-**Why rejected:**
- **Lost User Intent:** Backup specs store resolved lists instead of original patterns
- **Audit Trail Issues:** Original wildcard intent not visible when examining backup specifications
- **CLI Complexity:** CLI requires cluster access and namespace enumeration capabilities
-
-### Server-Level (Controller) Resolution  
-**Problem:** Resolving wildcards in backup controller with persistent caching
-
-**Why rejected:**
- **Architectural Complexity:** Requires additional API schema changes for storing resolved namespace lists
- **Cache Management:** Need cache invalidation, storage, and lifecycle management
- **Limited Benefit:** Performance gain only applies to narrow controller reconciliation retry scenarios
- **State Management:** Introduces persistent state maintained across backup lifecycle
-
-### Request-Level (ShouldInclude) Resolution
-**Chosen Approach:** Lazy evaluation within backup request processing
-
-**Benefits:**
- **Preserved Intent:** Original wildcard patterns remain in backup specifications
- **Optimal Performance:** First resolution (~500ms), subsequent calls (~1ms) with request-scoped caching
- **Clean Architecture:** No persistent state, no API schema changes, minimal code changes
- **Thread Safety:** Proper mutex usage for concurrent worker access
- **Scoped Lifetime:** Cache automatically cleaned up when backup request completes
+If there are alternative high level or detailed designs that were not pursued they should be called out here with a brief explanation of why they were not pursued.

 ## Security Considerations
- Implementation requires Velero service account to have `list` permissions on namespace resources
- Aligns with existing Velero RBAC requirements
- No additional privileges or security surface area introduced
-
-## Addressing Implementation Concerns
-
-### Multiple Pattern Support
-Multiple wildcards work naturally: `--include-namespaces "prod-*,staging-*,dev-*"` - each pattern evaluated independently during lazy resolution.
-
-### Mixed Literal and Wildcard Detection
-Simple approach: strings containing `*` are wildcards, others use existing literal namespace logic. Zero breaking changes for existing validation.
-
-### Include/Exclude Conflict Detection
-Runtime resolution simplifies conflicts - wildcards resolve to actual namespace lists first, then standard include/exclude precedence applies.
-
-### Backward Compatibility
-Lazy evaluation triggers only when wildcards detected. Non-wildcard backups have zero overhead and identical behavior to current implementation.
-
-## Special Consideration: Existing `*` Behavior
-
-**Current Velero Behavior:** `--include-namespaces "*"` (the CLI default) means "include all namespaces" and uses special logic that doesn't enumerate namespaces - it simply bypasses namespace filtering entirely.
-
-**Potential Breaking Change:** Our wildcard implementation would treat `*` as a glob pattern, resolving it to a specific list of namespaces at backup start time, which changes the behavior from "include everything" to "include these specific namespaces."
-
-**Required Solution:** Special-case handling for the lone `*` pattern to preserve existing behavior by using original `IncludesExcludes` logic instead of wildcard resolution.
-
-This ensures that `--include-namespaces "*"` continues to work exactly as before, while enabling new wildcard patterns like `prod-*`, `*-staging`, etc.
+If this proposal has an impact to the security of the product, its users, or data stored or transmitted via the product, they must be addressed here.

 ## Compatibility
- Full backward compatibility with existing backup specifications using literal namespace lists
- No changes required to CLI commands, existing backups, or restore operations
+A discussion of any compatibility issues that need to be considered

 ## Implementation
-The implementation consists of approximately 200 lines of new code across four files:
- `pkg/util/collections/includes_excludes.go`: Core lazy evaluation logic (~150 lines)
- `pkg/backup/backup.go`: Wildcard detection logic (~20 lines)  
- `pkg/backup/request.go`: Interface type usage (~5 lines)
- `pkg/backup/item_collector.go`: Compatibility method calls (~25 lines)
+A description of the implementation, timelines, and any resources that have agreed to contribute.

 ## Open Issues
-None. 
+A discussion of issues relating to this proposal for which the author does not know the solution. This section may be omitted if there are none.