Files
rustfs/docs/compression-best-practices.md
唐小鸭 52c2d15a4b feat: Implement whitelist-based HTTP response compression configuration (#1136)
Signed-off-by: 唐小鸭 <tangtang1251@qq.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 15:05:40 +08:00

13 KiB

HTTP Response Compression Best Practices in RustFS

Overview

This document outlines best practices for HTTP response compression in RustFS, based on lessons learned from fixing the NoSuchKey error response regression (Issue #901) and the whitelist-based compression redesign (Issue #902).

Whitelist-Based Compression (Issue #902)

Design Philosophy

After Issue #901, we identified that the blacklist approach (compress everything except known problematic types) was still causing issues with browser downloads showing "unknown file size". In Issue #902, we redesigned the compression system using a whitelist approach aligned with MinIO's behavior:

  1. Compression is disabled by default - Opt-in rather than opt-out
  2. Only explicitly configured content types are compressed - Preserves Content-Length for all other responses
  3. Fine-grained configuration - Control via file extensions, MIME types, and size thresholds
  4. Skip already-encoded content - Avoid double compression

Configuration Options

RustFS provides flexible compression configuration via environment variables and command-line arguments:

Environment Variable CLI Argument Default Description
RUSTFS_COMPRESS_ENABLE false Enable/disable compression
RUSTFS_COMPRESS_EXTENSIONS "" File extensions to compress (e.g., .txt,.log,.csv)
RUSTFS_COMPRESS_MIME_TYPES text/*,application/json,... MIME types to compress (supports wildcards)
RUSTFS_COMPRESS_MIN_SIZE 1000 Minimum file size (bytes) for compression

Usage Examples

# Enable compression for text files and JSON
RUSTFS_COMPRESS_ENABLE=on \
RUSTFS_COMPRESS_EXTENSIONS=.txt,.log,.csv,.json,.xml \
RUSTFS_COMPRESS_MIME_TYPES=text/*,application/json,application/xml \
RUSTFS_COMPRESS_MIN_SIZE=1000 \
rustfs /data

# Or using command-line arguments
rustfs /data \
  --compress-enable \
  --compress-extensions ".txt,.log,.csv" \
  --compress-mime-types "text/*,application/json" \
  --compress-min-size 1000

Implementation Details

The CompressionPredicate implements intelligent compression decisions:

impl Predicate for CompressionPredicate {
    fn should_compress<B>(&self, response: &Response<B>) -> bool {
        // 1. Check if compression is enabled
        if !self.config.enabled { return false; }

        // 2. Never compress error responses
        if status.is_client_error() || status.is_server_error() { return false; }

        // 3. Skip already-encoded content (gzip, br, deflate, etc.)
        if has_content_encoding(response) { return false; }

        // 4. Check minimum size threshold
        if content_length < self.config.min_size { return false; }

        // 5. Check whitelist: extension OR MIME type must match
        if matches_extension(response) || matches_mime_type(response) {
            return true;
        }

        // 6. Default: don't compress (whitelist approach)
        false
    }
}

Benefits of Whitelist Approach

Aspect Blacklist (Old) Whitelist (New)
Default behavior Compress most content No compression
Content-Length Often removed Preserved for unmatched types
Browser downloads "Unknown file size" Accurate file size shown
Configuration Complex exclusion rules Simple inclusion rules
MinIO compatibility Different behavior Aligned behavior

Key Principles

1. Never Compress Error Responses

Rationale: Error responses are typically small (100-500 bytes) and need to be transmitted accurately. Compression can:

  • Introduce Content-Length header mismatches
  • Add unnecessary overhead for small payloads
  • Potentially corrupt error details during buffering

Implementation:

// Always check status code first
if status.is_client_error() || status.is_server_error() {
    return false; // Don't compress
}

Affected Status Codes:

  • 4xx Client Errors (400, 403, 404, etc.)
  • 5xx Server Errors (500, 502, 503, etc.)

2. Size-Based Compression Threshold

Rationale: Compression has overhead in terms of CPU and potentially network roundtrips. For very small responses:

  • Compression overhead > space savings
  • May actually increase payload size
  • Adds latency without benefit

Recommended Threshold: 1000 bytes minimum (configurable via RUSTFS_COMPRESS_MIN_SIZE)

Implementation:

if let Some(content_length) = response.headers().get(CONTENT_LENGTH) {
    if let Ok(length) = content_length.to_str()?.parse::<u64>()? {
        if length < self.config.min_size {
            return false; // Don't compress small responses
        }
    }
}

3. Skip Already-Encoded Content

Rationale: If the response already has a Content-Encoding header (e.g., gzip, br, deflate, zstd), the content is already compressed. Re-compressing provides no benefit and may cause issues:

  • Double compression wastes CPU cycles
  • May corrupt data or increase size
  • Breaks decompression on client side

Implementation:

// Skip if content is already encoded (e.g., gzip, br, deflate, zstd)
if let Some(content_encoding) = response.headers().get(CONTENT_ENCODING) {
    if let Ok(encoding) = content_encoding.to_str() {
        let encoding_lower = encoding.to_lowercase();
        // "identity" means no encoding, so we can still compress
        if encoding_lower != "identity" && !encoding_lower.is_empty() {
            debug!("Skipping compression for already encoded response: {}", encoding);
            return false;
        }
    }
}

Common Content-Encoding Values:

  • gzip - GNU zip compression
  • br - Brotli compression
  • deflate - Deflate compression
  • zstd - Zstandard compression
  • identity - No encoding (compression allowed)

4. Maintain Observability

Rationale: Compression decisions can affect debugging and troubleshooting. Always log when compression is skipped.

Implementation:

debug!(
    "Skipping compression for error response: status={}",
    status.as_u16()
);

Log Analysis:

# Monitor compression decisions
RUST_LOG=rustfs::server::http=debug ./target/release/rustfs

# Look for patterns
grep "Skipping compression" logs/rustfs.log | wc -l

Common Pitfalls

Compressing All Responses Blindly

// BAD - No filtering
.layer(CompressionLayer::new())

Problem: Can cause Content-Length mismatches with error responses and browser download issues

Using Blacklist Approach

// BAD - Blacklist approach (compress everything except...)
fn should_compress(&self, response: &Response<B>) -> bool {
    // Skip images, videos, archives...
    if is_already_compressed_type(content_type) { return false; }
    true  // Compress everything else
}

Problem: Removes Content-Length for many file types, causing "unknown file size" in browsers

Using Whitelist-Based Predicate

// GOOD - Whitelist approach with configurable predicate
.layer(CompressionLayer::new().compress_when(CompressionPredicate::new(config)))

Ignoring Content-Encoding Header

// BAD - May double-compress already compressed content
fn should_compress(&self, response: &Response<B>) -> bool {
    matches_mime_type(response)  // Missing Content-Encoding check
}

Problem: Double compression wastes CPU and may corrupt data

Comprehensive Checks

// GOOD - Multi-criteria whitelist decision
fn should_compress(&self, response: &Response<B>) -> bool {
    // 1. Must be enabled
    if !self.config.enabled { return false; }

    // 2. Skip error responses
    if response.status().is_error() { return false; }

    // 3. Skip already-encoded content
    if has_content_encoding(response) { return false; }

    // 4. Check minimum size
    if get_content_length(response) < self.config.min_size { return false; }

    // 5. Must match whitelist (extension OR MIME type)
    matches_extension(response) || matches_mime_type(response)
}

Performance Considerations

CPU Usage

  • Compression CPU Cost: ~1-5ms for typical responses
  • Benefit: 70-90% size reduction for text/json
  • Break-even: Responses > 512 bytes on fast networks

Network Latency

  • Savings: Proportional to size reduction
  • Break-even: ~256 bytes on typical connections
  • Diminishing Returns: Below 128 bytes

Memory Usage

  • Buffer Size: Usually 4-16KB per connection
  • Trade-off: Memory vs. bandwidth
  • Recommendation: Profile in production

Testing Guidelines

Unit Tests

Test compression predicate logic:

#[test]
fn test_should_not_compress_errors() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(404)
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

#[test]
fn test_should_not_compress_small_responses() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(200)
        .header(CONTENT_LENGTH, "100")
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

Integration Tests

Test actual S3 API responses:

#[tokio::test]
async fn test_error_response_not_truncated() {
    let response = client
        .get_object()
        .bucket("test")
        .key("nonexistent")
        .send()
        .await;

    // Should get proper error, not truncation error
    match response.unwrap_err() {
        SdkError::ServiceError(err) => {
            assert!(err.is_no_such_key());
        }
        other => panic!("Expected ServiceError, got {:?}", other),
    }
}

Monitoring and Alerts

Metrics to Track

  1. Compression Ratio: compressed_size / original_size
  2. Compression Skip Rate: skipped_count / total_count
  3. Error Response Size Distribution
  4. CPU Usage During Compression

Alert Conditions

# Prometheus alert rules
- alert: HighCompressionSkipRate
  expr: |
    rate(http_compression_skipped_total[5m]) 
    / rate(http_responses_total[5m]) > 0.5
  annotations:
    summary: "More than 50% of responses skipping compression"

- alert: LargeErrorResponses
  expr: |
    histogram_quantile(0.95, 
      rate(http_error_response_size_bytes_bucket[5m])) > 1024
  annotations:
    summary: "Error responses larger than 1KB"

Migration Guide

Migrating from Blacklist to Whitelist Approach

If you're upgrading from an older RustFS version with blacklist-based compression:

  1. Compression is now disabled by default

    • Set RUSTFS_COMPRESS_ENABLE=on to enable
    • This ensures backward compatibility for existing deployments
  2. Configure your whitelist

    # Example: Enable compression for common text formats
    RUSTFS_COMPRESS_ENABLE=on
    RUSTFS_COMPRESS_EXTENSIONS=.txt,.log,.csv,.json,.xml,.html,.css,.js
    RUSTFS_COMPRESS_MIME_TYPES=text/*,application/json,application/xml,application/javascript
    RUSTFS_COMPRESS_MIN_SIZE=1000
    
  3. Verify browser downloads

    • Check that file downloads show accurate file sizes
    • Verify Content-Length headers are preserved for non-compressed content

Updating Existing Code

If you're adding compression to an existing service:

  1. Start with compression disabled (default)
  2. Define your whitelist: Identify content types that benefit from compression
  3. Set appropriate thresholds: Start with 1KB minimum size
  4. Enable and monitor: Watch CPU, latency, and download behavior

Rollout Strategy

  1. Stage 1: Deploy to canary (5% traffic)

    • Monitor for 24 hours
    • Check error rates and latency
    • Verify browser download behavior
  2. Stage 2: Expand to 25% traffic

    • Monitor for 48 hours
    • Validate compression ratios
    • Check Content-Length preservation
  3. Stage 3: Full rollout (100% traffic)

    • Continue monitoring for 1 week
    • Document any issues
    • Fine-tune whitelist based on actual usage

Architecture

Module Structure

The compression functionality is organized in a dedicated module for maintainability:

rustfs/src/server/
├── compress.rs        # Compression configuration and predicate
├── http.rs            # HTTP server (uses compress module)
└── mod.rs             # Module declarations

Key Components

  1. CompressionConfig - Stores compression settings parsed from environment/CLI
  2. CompressionPredicate - Implements tower_http::compression::predicate::Predicate
  3. Configuration Constants - Defined in crates/config/src/constants/compress.rs

References

  1. Issue #901: NoSuchKey error response regression
  2. Issue #902: Whitelist-based compression redesign
  3. Google Web Fundamentals - Text Compression
  4. AWS Best Practices - Response Compression

Last Updated: 2025-12-13
Maintainer: RustFS Team