mirror of https://github.com/rustfs/rustfs.git synced 2026-01-17 09:40:32 +00:00

Files

唐小鸭 52c2d15a4b feat: Implement whitelist-based HTTP response compression configuration (#1136 )

Signed-off-by: 唐小鸭 <tangtang1251@qq.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

2025-12-16 15:05:40 +08:00

13 KiB

Raw Blame History

HTTP Response Compression Best Practices in RustFS

Overview

This document outlines best practices for HTTP response compression in RustFS, based on lessons learned from fixing the NoSuchKey error response regression (Issue #901) and the whitelist-based compression redesign (Issue #902).

Whitelist-Based Compression (Issue #902)

Design Philosophy

After Issue #901, we identified that the blacklist approach (compress everything except known problematic types) was still causing issues with browser downloads showing "unknown file size". In Issue #902, we redesigned the compression system using a whitelist approach aligned with MinIO's behavior:

Compression is disabled by default - Opt-in rather than opt-out
Only explicitly configured content types are compressed - Preserves Content-Length for all other responses
Fine-grained configuration - Control via file extensions, MIME types, and size thresholds
Skip already-encoded content - Avoid double compression

Configuration Options

RustFS provides flexible compression configuration via environment variables and command-line arguments:

Environment Variable	Default	Description
`RUSTFS_COMPRESS_ENABLE`	`false`	Enable/disable compression
`RUSTFS_COMPRESS_EXTENSIONS`	`""`	File extensions to compress (e.g., `.txt,.log,.csv`)
`RUSTFS_COMPRESS_MIME_TYPES`	`text/*,application/json,...`	MIME types to compress (supports wildcards)
`RUSTFS_COMPRESS_MIN_SIZE`	`1000`	Minimum file size (bytes) for compression

Usage Examples

# Enable compression for text files and JSON
RUSTFS_COMPRESS_ENABLE=on \
RUSTFS_COMPRESS_EXTENSIONS=.txt,.log,.csv,.json,.xml \
RUSTFS_COMPRESS_MIME_TYPES=text/*,application/json,application/xml \
RUSTFS_COMPRESS_MIN_SIZE=1000 \
rustfs /data

# Or using command-line arguments
rustfs /data \
  --compress-enable \
  --compress-extensions ".txt,.log,.csv" \
  --compress-mime-types "text/*,application/json" \
  --compress-min-size 1000

Implementation Details

The CompressionPredicate implements intelligent compression decisions:

impl Predicate for CompressionPredicate {
    fn should_compress<B>(&self, response: &Response<B>) -> bool {
        // 1. Check if compression is enabled
        if !self.config.enabled { return false; }

        // 2. Never compress error responses
        if status.is_client_error() || status.is_server_error() { return false; }

        // 3. Skip already-encoded content (gzip, br, deflate, etc.)
        if has_content_encoding(response) { return false; }

        // 4. Check minimum size threshold
        if content_length < self.config.min_size { return false; }

        // 5. Check whitelist: extension OR MIME type must match
        if matches_extension(response) || matches_mime_type(response) {
            return true;
        }

        // 6. Default: don't compress (whitelist approach)
        false
    }
}

Benefits of Whitelist Approach

Aspect	Blacklist (Old)	Whitelist (New)
Default behavior	Compress most content	No compression
Content-Length	Often removed	Preserved for unmatched types
Browser downloads	"Unknown file size"	Accurate file size shown
Configuration	Complex exclusion rules	Simple inclusion rules
MinIO compatibility	Different behavior	Aligned behavior

Key Principles

1. Never Compress Error Responses

Rationale: Error responses are typically small (100-500 bytes) and need to be transmitted accurately. Compression can:

Introduce Content-Length header mismatches
Add unnecessary overhead for small payloads
Potentially corrupt error details during buffering

Implementation:

// Always check status code first
if status.is_client_error() || status.is_server_error() {
    return false; // Don't compress
}

Affected Status Codes:

4xx Client Errors (400, 403, 404, etc.)
5xx Server Errors (500, 502, 503, etc.)

2. Size-Based Compression Threshold

Rationale: Compression has overhead in terms of CPU and potentially network roundtrips. For very small responses:

Compression overhead > space savings
May actually increase payload size
Adds latency without benefit

Recommended Threshold: 1000 bytes minimum (configurable via RUSTFS_COMPRESS_MIN_SIZE)

Implementation:

if let Some(content_length) = response.headers().get(CONTENT_LENGTH) {
    if let Ok(length) = content_length.to_str()?.parse::<u64>()? {
        if length < self.config.min_size {
            return false; // Don't compress small responses
        }
    }
}

3. Skip Already-Encoded Content

Rationale: If the response already has a Content-Encoding header (e.g., gzip, br, deflate, zstd), the content is already compressed. Re-compressing provides no benefit and may cause issues:

Double compression wastes CPU cycles
May corrupt data or increase size
Breaks decompression on client side

Implementation:

// Skip if content is already encoded (e.g., gzip, br, deflate, zstd)
if let Some(content_encoding) = response.headers().get(CONTENT_ENCODING) {
    if let Ok(encoding) = content_encoding.to_str() {
        let encoding_lower = encoding.to_lowercase();
        // "identity" means no encoding, so we can still compress
        if encoding_lower != "identity" && !encoding_lower.is_empty() {
            debug!("Skipping compression for already encoded response: {}", encoding);
            return false;
        }
    }
}

Common Content-Encoding Values:

gzip - GNU zip compression
br - Brotli compression
deflate - Deflate compression
zstd - Zstandard compression
identity - No encoding (compression allowed)

4. Maintain Observability

Rationale: Compression decisions can affect debugging and troubleshooting. Always log when compression is skipped.

Implementation:

debug!(
    "Skipping compression for error response: status={}",
    status.as_u16()
);

Log Analysis:

# Monitor compression decisions
RUST_LOG=rustfs::server::http=debug ./target/release/rustfs

# Look for patterns
grep "Skipping compression" logs/rustfs.log | wc -l

Common Pitfalls

❌ Compressing All Responses Blindly

// BAD - No filtering
.layer(CompressionLayer::new())

Problem: Can cause Content-Length mismatches with error responses and browser download issues

❌ Using Blacklist Approach

// BAD - Blacklist approach (compress everything except...)
fn should_compress(&self, response: &Response<B>) -> bool {
    // Skip images, videos, archives...
    if is_already_compressed_type(content_type) { return false; }
    true  // Compress everything else
}

Problem: Removes Content-Length for many file types, causing "unknown file size" in browsers

✅ Using Whitelist-Based Predicate

// GOOD - Whitelist approach with configurable predicate
.layer(CompressionLayer::new().compress_when(CompressionPredicate::new(config)))

❌ Ignoring Content-Encoding Header

// BAD - May double-compress already compressed content
fn should_compress(&self, response: &Response<B>) -> bool {
    matches_mime_type(response)  // Missing Content-Encoding check
}

Problem: Double compression wastes CPU and may corrupt data

✅ Comprehensive Checks

// GOOD - Multi-criteria whitelist decision
fn should_compress(&self, response: &Response<B>) -> bool {
    // 1. Must be enabled
    if !self.config.enabled { return false; }

    // 2. Skip error responses
    if response.status().is_error() { return false; }

    // 3. Skip already-encoded content
    if has_content_encoding(response) { return false; }

    // 4. Check minimum size
    if get_content_length(response) < self.config.min_size { return false; }

    // 5. Must match whitelist (extension OR MIME type)
    matches_extension(response) || matches_mime_type(response)
}

Performance Considerations

CPU Usage

Compression CPU Cost: ~1-5ms for typical responses
Benefit: 70-90% size reduction for text/json
Break-even: Responses > 512 bytes on fast networks

Network Latency

Savings: Proportional to size reduction
Break-even: ~256 bytes on typical connections
Diminishing Returns: Below 128 bytes

Memory Usage

Buffer Size: Usually 4-16KB per connection
Trade-off: Memory vs. bandwidth
Recommendation: Profile in production

Testing Guidelines

Unit Tests

Test compression predicate logic:

#[test]
fn test_should_not_compress_errors() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(404)
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

#[test]
fn test_should_not_compress_small_responses() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(200)
        .header(CONTENT_LENGTH, "100")
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

Integration Tests

Test actual S3 API responses:

#[tokio::test]
async fn test_error_response_not_truncated() {
    let response = client
        .get_object()
        .bucket("test")
        .key("nonexistent")
        .send()
        .await;

    // Should get proper error, not truncation error
    match response.unwrap_err() {
        SdkError::ServiceError(err) => {
            assert!(err.is_no_such_key());
        }
        other => panic!("Expected ServiceError, got {:?}", other),
    }
}

Monitoring and Alerts

Metrics to Track

Compression Ratio: compressed_size / original_size
Compression Skip Rate: skipped_count / total_count
Error Response Size Distribution
CPU Usage During Compression

Alert Conditions

# Prometheus alert rules
- alert: HighCompressionSkipRate
  expr: |
    rate(http_compression_skipped_total[5m]) 
    / rate(http_responses_total[5m]) > 0.5
  annotations:
    summary: "More than 50% of responses skipping compression"

- alert: LargeErrorResponses
  expr: |
    histogram_quantile(0.95, 
      rate(http_error_response_size_bytes_bucket[5m])) > 1024
  annotations:
    summary: "Error responses larger than 1KB"

Migration Guide

Migrating from Blacklist to Whitelist Approach

If you're upgrading from an older RustFS version with blacklist-based compression:

Compression is now disabled by default
- Set RUSTFS_COMPRESS_ENABLE=on to enable
- This ensures backward compatibility for existing deployments

Configure your whitelist

# Example: Enable compression for common text formats
RUSTFS_COMPRESS_ENABLE=on
RUSTFS_COMPRESS_EXTENSIONS=.txt,.log,.csv,.json,.xml,.html,.css,.js
RUSTFS_COMPRESS_MIME_TYPES=text/*,application/json,application/xml,application/javascript
RUSTFS_COMPRESS_MIN_SIZE=1000

Verify browser downloads
- Check that file downloads show accurate file sizes
- Verify Content-Length headers are preserved for non-compressed content

Updating Existing Code

If you're adding compression to an existing service:

Start with compression disabled (default)
Define your whitelist: Identify content types that benefit from compression
Set appropriate thresholds: Start with 1KB minimum size
Enable and monitor: Watch CPU, latency, and download behavior

Rollout Strategy

Stage 1: Deploy to canary (5% traffic)
- Monitor for 24 hours
- Check error rates and latency
- Verify browser download behavior
Stage 2: Expand to 25% traffic
- Monitor for 48 hours
- Validate compression ratios
- Check Content-Length preservation
Stage 3: Full rollout (100% traffic)
- Continue monitoring for 1 week
- Document any issues
- Fine-tune whitelist based on actual usage

Architecture

Module Structure

The compression functionality is organized in a dedicated module for maintainability:

rustfs/src/server/
├── compress.rs        # Compression configuration and predicate
├── http.rs            # HTTP server (uses compress module)
└── mod.rs             # Module declarations

Key Components

CompressionConfig - Stores compression settings parsed from environment/CLI
CompressionPredicate - Implements tower_http::compression::predicate::Predicate
Configuration Constants - Defined in crates/config/src/constants/compress.rs

References

Issue #901: NoSuchKey error response regression
Issue #902: Whitelist-based compression redesign
Google Web Fundamentals - Text Compression
AWS Best Practices - Response Compression

Last Updated: 2025-12-13
Maintainer: RustFS Team

13 KiB Raw Blame History

HTTP Response Compression Best Practices in RustFS

Overview

Whitelist-Based Compression (Issue #902)

Design Philosophy

Configuration Options

Usage Examples

Implementation Details

Benefits of Whitelist Approach

Key Principles

1. Never Compress Error Responses

2. Size-Based Compression Threshold

3. Skip Already-Encoded Content

4. Maintain Observability

Common Pitfalls

❌ Compressing All Responses Blindly

❌ Using Blacklist Approach

✅ Using Whitelist-Based Predicate

❌ Ignoring Content-Encoding Header

✅ Comprehensive Checks

Performance Considerations

CPU Usage

Network Latency

Memory Usage

Testing Guidelines

Unit Tests

Integration Tests

Monitoring and Alerts

Metrics to Track

Alert Conditions

Migration Guide

Migrating from Blacklist to Whitelist Approach

Updating Existing Code

Rollout Strategy

Related Documentation

Architecture

Module Structure

Key Components

References

13 KiB

Raw Blame History