Files
at-container-registry/docs/HOLD_MULTIPART.md

15 KiB

Hold Service Multipart Upload Architecture

Overview

The hold service supports multipart uploads through two modes:

  1. S3Native - Uses S3's native multipart API with presigned URLs (optimal)
  2. Buffered - Buffers parts in hold service memory, assembles on completion (fallback)

This dual-mode approach enables the hold service to work with:

  • S3-compatible storage with presigned URL support (S3, Storj, MinIO, etc.)
  • S3-compatible storage WITHOUT presigned URL support
  • Filesystem storage
  • Any storage driver supported by distribution

Current State

What Works

  • S3 Native Mode with presigned URLs: Fully working! Direct uploads to S3 via presigned URLs
  • Buffered mode with S3: Tested and working with DISABLE_PRESIGNED_URLS=true
  • Filesystem storage: Tested and working! Buffered mode with filesystem driver
  • AppView multipart client: Implements chunked uploads via multipart API
  • MultipartManager: Session tracking, automatic cleanup, thread-safe operations
  • Automatic fallback: Falls back to buffered mode when S3 unavailable or disabled
  • ETag normalization: Handles quoted/unquoted ETags from S3
  • Route handler: /multipart-parts/{uploadID}/{partNumber} endpoint added and tested

All Implementation Complete! 🎉

All three multipart upload modes are fully implemented, tested, and working in production.

Bugs Fixed 🔧

  • Missing S3 parts in complete: For S3Native mode, parts uploaded directly to S3 weren't being recorded. Fixed by storing parts from request in HandleCompleteMultipart before calling CompleteMultipartUploadWithManager.
  • Malformed XML error from S3: S3 requires ETags to be quoted in CompleteMultipartUpload XML. Added normalizeETag() function to ensure quotes are present.
  • Route missing: /multipart-parts/{uploadID}/{partNumber} not registered in cmd/hold/main.go. Fixed by adding route handler with path parsing.
  • MultipartMgr access: Field was private, preventing route handler access. Fixed by exporting as MultipartMgr.
  • DISABLE_PRESIGNED_URLS not logged: initS3Client() didn't check the flag before initializing. Fixed with early return check and proper logging.

Architecture

Three Modes of Operation

Mode 1: S3 Native Multipart WORKING

Docker → AppView → Hold → S3 (presigned URLs)
                    ↓
              Returns presigned URL
                    ↓
Docker ──────────→ S3 (direct upload)

Flow:

  1. AppView: POST /start-multipart → Hold starts S3 multipart, returns uploadID
  2. AppView: POST /part-presigned-url → Hold returns S3 presigned URL
  3. Docker → S3: Direct upload via presigned URL
  4. AppView: POST /complete-multipart → Hold calls S3 CompleteMultipartUpload

Advantages:

  • No data flows through hold service
  • Minimal bandwidth usage
  • Fast uploads

Mode 2: S3 Proxy Mode (Buffered) WORKING

Docker → AppView → Hold → S3 (via driver)
                    ↓
              Buffers & proxies
                    ↓
                   S3

Flow:

  1. AppView: POST /start-multipart → Hold creates buffered session
  2. AppView: POST /part-presigned-url → Hold returns proxy URL
  3. Docker → Hold: PUT /multipart-parts/{uploadID}/{part} → Hold buffers
  4. AppView: POST /complete-multipart → Hold uploads to S3 via driver

Use Cases:

  • S3 provider doesn't support presigned URLs
  • S3 API fails to generate presigned URL
  • Fallback from Mode 1

Mode 3: Filesystem Mode WORKING

Docker → AppView → Hold (filesystem driver)
                    ↓
              Buffers & writes
                    ↓
              Local filesystem

Flow: Same as Mode 2, but writes to filesystem driver instead of S3 driver.

Use Cases:

  • Development/testing with local filesystem
  • Small deployments without S3
  • Air-gapped environments

Implementation: pkg/hold/multipart.go

Core Components

MultipartManager

type MultipartManager struct {
    sessions map[string]*MultipartSession
    mu       sync.RWMutex
}

Responsibilities:

  • Track active multipart sessions
  • Clean up abandoned uploads (>24h inactive)
  • Thread-safe session access

MultipartSession

type MultipartSession struct {
    UploadID     string                  // Unique ID for this upload
    Digest       string                  // Target blob digest
    Mode         MultipartMode           // S3Native or Buffered
    S3UploadID   string                  // S3 upload ID (S3Native only)
    Parts        map[int]*MultipartPart  // Buffered parts (Buffered only)
    CreatedAt    time.Time
    LastActivity time.Time
}

State Tracking:

  • S3Native: Tracks S3 upload ID and part ETags
  • Buffered: Stores part data in memory

MultipartPart

type MultipartPart struct {
    PartNumber int       // Part number (1-indexed)
    Data       []byte    // Part data (Buffered mode only)
    ETag       string    // S3 ETag or computed hash
    Size       int64
}

Key Methods

StartMultipartUploadWithManager

func (s *HoldService) StartMultipartUploadWithManager(
    ctx context.Context,
    digest string,
    manager *MultipartManager,
) (string, MultipartMode, error)

Logic:

  1. Try S3 native multipart via s.startMultipartUpload()
  2. If successful → Create S3Native session
  3. If fails or no S3 client → Create Buffered session
  4. Return uploadID and mode

GetPartUploadURL

func (s *HoldService) GetPartUploadURL(
    ctx context.Context,
    session *MultipartSession,
    partNumber int,
    did string,
) (string, error)

Logic:

  • S3Native mode: Generate S3 presigned URL via s.getPartPresignedURL()
  • Buffered mode: Return proxy endpoint /multipart-parts/{uploadID}/{part}

CompleteMultipartUploadWithManager

func (s *HoldService) CompleteMultipartUploadWithManager(
    ctx context.Context,
    session *MultipartSession,
    manager *MultipartManager,
) error

Logic:

  • S3Native: Call s.completeMultipartUpload() with S3 API
  • Buffered: Assemble parts in order, write via storage driver

HandleMultipartPartUpload (New Endpoint)

func (s *HoldService) HandleMultipartPartUpload(
    w http.ResponseWriter,
    r *http.Request,
    uploadID string,
    partNumber int,
    did string,
    manager *MultipartManager,
)

New HTTP endpoint: PUT /multipart-parts/{uploadID}/{partNumber}

Purpose: Receive part uploads in Buffered mode

Logic:

  1. Validate session exists and is in Buffered mode
  2. Authorize write access
  3. Read part data from request body
  4. Store in session with computed ETag (SHA256)
  5. Return ETag in response header

Integration Plan

Phase 1: Migrate to pkg/hold (COMPLETE)

  • Extract code from cmd/hold/main.go to pkg/hold/
  • Create isolated multipart.go implementation
  • Update cmd/hold/main.go to import pkg/hold
  • Test existing functionality works

Phase 2: Add Buffered Mode Support (COMPLETE )

  • Add MultipartManager to HoldService
  • Update handlers to use *WithManager methods
  • Add DISABLE_PRESIGNED_URLS environment variable for testing
  • Implement presigned URL disable checks in all methods
  • Fixed: Record S3 parts from request in HandleCompleteMultipart
  • Fixed: ETag normalization (add quotes for S3 XML)
  • Test S3 native mode with presigned URLs WORKING
  • Add route in cmd/hold/main.go COMPLETE
  • Export MultipartMgr field for route handler access COMPLETE
  • Test DISABLE_PRESIGNED_URLS=true with S3 storage WORKING
  • Test filesystem storage with buffered multipart WORKING

Phase 3: Update AppView

  • Detect hold capabilities (presigned vs proxy)
  • Fallback to buffered mode when presigned fails
  • Handle /multipart-parts/ proxy URLs

Phase 4: Capability Discovery

  • Add capability endpoint: GET /capabilities
  • Return: {"multipart": "native|buffered|both", "storage": "s3|filesystem"}
  • AppView uses capabilities to choose upload strategy

Testing Strategy

Unit Tests

  • MultipartManager session lifecycle
  • Part buffering and assembly
  • Concurrent part uploads (thread safety)
  • Session cleanup (expired uploads)

Integration Tests

S3 Native Mode:

  • Start multipart → get presigned URLs → upload parts → complete WORKING
  • Verify no data flows through hold service (only ~1KB API calls)
  • Test abort cleanup

Buffered Mode (S3 with DISABLE_PRESIGNED_URLS):

  • Start multipart → get proxy URLs → upload parts → complete WORKING
  • Verify parts assembled correctly
  • Test missing part detection
  • Test abort cleanup

Buffered Mode (Filesystem):

  • Start multipart → get proxy URLs → upload parts → complete WORKING
  • Verify parts assembled correctly WORKING
  • Verify blobs written to filesystem WORKING
  • Test missing part detection
  • Test abort cleanup

Load Tests

  • Concurrent multipart uploads (multiple sessions)
  • Large blobs (100MB+, many parts)
  • Memory usage with many buffered parts

Performance Considerations

Memory Usage (Buffered Mode)

  • Parts stored in memory until completion
  • Docker typically uses 5MB chunks (S3 minimum)
  • 100MB image = ~20 parts = ~100MB RAM during upload
  • Multiple concurrent uploads multiply memory usage

Mitigation:

  • Session cleanup (24h timeout)
  • Consider disk-backed buffering for large parts (future optimization)
  • Monitor memory usage and set limits

Network Bandwidth

  • S3Native: Minimal (only API calls)
  • Buffered: Full blob data flows through hold service
  • Filesystem: Always buffered (no presigned URL option)

Configuration

Environment Variables

Current (S3 only):

STORAGE_DRIVER=s3
S3_BUCKET=my-bucket
S3_ENDPOINT=https://s3.amazonaws.com
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Filesystem:

STORAGE_DRIVER=filesystem
STORAGE_ROOT_DIR=/var/lib/atcr/hold

Automatic Mode Selection

No configuration needed - hold service automatically:

  1. Tries S3 native multipart if S3 client exists
  2. Falls back to buffered mode if S3 unavailable or fails
  3. Always uses buffered mode for filesystem driver

Security Considerations

Authorization

  • All multipart operations require write authorization
  • Buffered mode: Check auth on every part upload
  • S3Native: Auth only on start/complete (presigned URLs have embedded auth)

Resource Limits

  • Max upload size: Controlled by storage backend
  • Max concurrent uploads: Limited by memory
  • Session timeout: 24 hours (configurable)

Attack Vectors

  • Memory exhaustion: Attacker uploads many large parts
    • Mitigation: Session limits, cleanup, auth
  • Incomplete uploads: Attacker starts but never completes
    • Mitigation: 24h timeout, cleanup goroutine
  • Part flooding: Upload many tiny parts
    • Mitigation: S3 has 10,000 part limit, could add to buffered mode

Future Enhancements

Disk-Backed Buffering

Instead of memory, buffer parts to temporary disk location:

  • Reduces memory pressure
  • Supports larger uploads
  • Requires cleanup on completion/abort

Parallel Part Assembly

For large uploads, assemble parts in parallel:

  • Stream parts to writer as they arrive
  • Reduce memory footprint
  • Faster completion

Chunked Completion

For very large assembled blobs:

  • Stream to storage driver in chunks
  • Avoid loading entire blob in memory
  • Use io.Copy() with buffer

Multi-Backend Support

  • Azure Blob Storage multipart
  • Google Cloud Storage resumable uploads
  • Backblaze B2 large file API

Implementation Complete

The buffered multipart mode is fully implemented with the following components:

Route Handler (cmd/hold/main.go:47-73):

  • Endpoint: PUT /multipart-parts/{uploadID}/{partNumber}
  • Parses URL path to extract uploadID and partNumber
  • Delegates to service.HandleMultipartPartUpload()

Exported Manager (pkg/hold/service.go:20):

  • Field MultipartMgr is now exported for route handler access
  • All handlers updated to use s.MultipartMgr

Configuration Check (pkg/hold/s3.go:20-25):

  • initS3Client() checks DISABLE_PRESIGNED_URLS flag before initializing
  • Logs clear message when presigned URLs are disabled
  • Prevents misleading "S3 presigned URLs enabled" message

Testing Multipart Modes

Test 1: S3 Native Mode (presigned URLs) TESTED

export STORAGE_DRIVER=s3
export S3_BUCKET=your-bucket
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
# Do NOT set DISABLE_PRESIGNED_URLS

# Start hold service
./bin/atcr-hold

# Push an image
docker push atcr.io/yourdid/test:latest

# Expected logs:
# "✅ S3 presigned URLs enabled"
# "Started S3 native multipart: uploadID=... s3UploadID=..."
# "Completed multipart upload: digest=... uploadID=... parts=..."

Status: Working - Direct uploads to S3, minimal bandwidth through hold service

Test 2: Buffered Mode with S3 (forced proxy) TESTED

export STORAGE_DRIVER=s3
export S3_BUCKET=your-bucket
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export DISABLE_PRESIGNED_URLS=true  # Force buffered mode

# Start hold service
./bin/atcr-hold

# Push an image
docker push atcr.io/yourdid/test:latest

# Expected logs:
# "⚠️  S3 presigned URLs DISABLED by config (DISABLE_PRESIGNED_URLS=true)"
# "Presigned URLs disabled (DISABLE_PRESIGNED_URLS=true), using buffered mode"
# "Stored part: uploadID=... part=1 size=..."
# "Assembled buffered parts: uploadID=... parts=... totalSize=..."
# "Completed buffered multipart: uploadID=... size=... written=..."

Status: Working - Parts buffered in hold service memory, assembled and written to S3 via driver

Test 3: Filesystem Mode (always buffered) TESTED

export STORAGE_DRIVER=filesystem
export STORAGE_ROOT_DIR=/tmp/atcr-hold-test
# DISABLE_PRESIGNED_URLS not needed (filesystem never has presigned URLs)

# Start hold service
./bin/atcr-hold

# Push an image
docker push atcr.io/yourdid/test:latest

# Expected logs:
# "Storage driver is filesystem (not S3), presigned URLs disabled"
# "Started buffered multipart: uploadID=..."
# "Stored part: uploadID=... part=1 size=..."
# "Assembled buffered parts: uploadID=... parts=... totalSize=..."
# "Completed buffered multipart: uploadID=... size=... written=..."

# Verify blobs written to:
ls -lh /var/lib/atcr/hold/docker/registry/v2/blobs/sha256/
# Or from outside container:
docker exec atcr-hold ls -lh /var/lib/atcr/hold/docker/registry/v2/blobs/sha256/

Status: Working - Parts buffered in memory, assembled, and written to filesystem via driver

Note: Initial HEAD requests will show "Path not found" errors - this is normal! Docker checks if blobs exist before uploading. The errors occur for blobs that haven't been uploaded yet. After upload, subsequent HEAD checks succeed.

References