Files
rustfs/docs/compression-best-practices.md
houseme 069194f553 Fix/getobjectlength (#920)
* fix getobject content length resp

* Fix regression in exception handling for non-existent key with enhanced compression predicate and metadata improvements (#915)

* Initial plan

* Fix GetObject regression by excluding error responses from compression

The issue was that CompressionLayer was attempting to compress error responses,
which could cause Content-Length header mismatches. By excluding 4xx and 5xx
responses from compression, we ensure error responses (like NoSuchKey) are sent
correctly without body truncation.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Enhance NoSuchKey fix with improved compression predicate and comprehensive tests

- Enhanced ShouldCompress predicate with size-based exclusion (< 256 bytes)
- Added detailed documentation explaining the compression logic
- Added debug logging for better observability
- Created comprehensive test suite with 4 test cases:
  - test_get_deleted_object_returns_nosuchkey
  - test_head_deleted_object_returns_nosuchkey
  - test_get_nonexistent_object_returns_nosuchkey
  - test_multiple_gets_deleted_object
- Added extensive inline documentation and comments
- Created docs/fix-nosuchkey-regression.md with full analysis

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add compression best practices documentation

Added comprehensive guide covering:
- Best practices for HTTP response compression
- Common pitfalls and solutions
- Performance considerations and trade-offs
- Testing guidelines and examples
- Monitoring and alerting recommendations
- Migration guide for existing services

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fmt

* fmt

* Fix/objectdelete (#917)

* fix getobject content length resp

* fix delete object

---------

Co-authored-by: houseme <housemecn@gmail.com>

* Add comprehensive analysis of NoSuchKey fix and related improvements

Created detailed documentation analyzing:
- HTTP compression layer fix (primary issue)
- Content-length calculation fix from PR #917
- Delete object metadata fixes from PR #917
- How all components work together
- Complete scenario walkthrough
- Performance impact analysis
- Testing strategy and deployment checklist

This ties together all the changes in the PR branch including the merged
improvements from PR #917.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* replace `once_cell` to `std`

* fmt

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: weisd <im@weisd.in>

* fmt

---------

Co-authored-by: weisd <weishidavip@163.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: weisd <im@weisd.in>
2025-11-24 18:56:34 +08:00

6.3 KiB

HTTP Response Compression Best Practices in RustFS

Overview

This document outlines best practices for HTTP response compression in RustFS, based on lessons learned from fixing the NoSuchKey error response regression (Issue #901).

Key Principles

1. Never Compress Error Responses

Rationale: Error responses are typically small (100-500 bytes) and need to be transmitted accurately. Compression can:

  • Introduce Content-Length header mismatches
  • Add unnecessary overhead for small payloads
  • Potentially corrupt error details during buffering

Implementation:

// Always check status code first
if status.is_client_error() || status.is_server_error() {
    return false; // Don't compress
}

Affected Status Codes:

  • 4xx Client Errors (400, 403, 404, etc.)
  • 5xx Server Errors (500, 502, 503, etc.)

2. Size-Based Compression Threshold

Rationale: Compression has overhead in terms of CPU and potentially network roundtrips. For very small responses:

  • Compression overhead > space savings
  • May actually increase payload size
  • Adds latency without benefit

Recommended Threshold: 256 bytes minimum

Implementation:

if let Some(content_length) = response.headers().get(CONTENT_LENGTH) {
    if let Ok(length) = content_length.to_str()?.parse::<u64>()? {
        if length < 256 {
            return false; // Don't compress small responses
        }
    }
}

3. Maintain Observability

Rationale: Compression decisions can affect debugging and troubleshooting. Always log when compression is skipped.

Implementation:

debug!(
    "Skipping compression for error response: status={}",
    status.as_u16()
);

Log Analysis:

# Monitor compression decisions
RUST_LOG=rustfs::server::http=debug ./target/release/rustfs

# Look for patterns
grep "Skipping compression" logs/rustfs.log | wc -l

Common Pitfalls

Compressing All Responses Blindly

// BAD - No filtering
.layer(CompressionLayer::new())

Problem: Can cause Content-Length mismatches with error responses

Using Intelligent Predicates

// GOOD - Filter based on status and size
.layer(CompressionLayer::new().compress_when(ShouldCompress))

Ignoring Content-Length Header

// BAD - Only checking status
fn should_compress(&self, response: &Response<B>) -> bool {
    !response.status().is_client_error()
}

Problem: May compress tiny responses unnecessarily

Checking Both Status and Size

// GOOD - Multi-criteria decision
fn should_compress(&self, response: &Response<B>) -> bool {
    // Check status
    if response.status().is_error() { return false; }

    // Check size
    if get_content_length(response) < 256 { return false; }

    true
}

Performance Considerations

CPU Usage

  • Compression CPU Cost: ~1-5ms for typical responses
  • Benefit: 70-90% size reduction for text/json
  • Break-even: Responses > 512 bytes on fast networks

Network Latency

  • Savings: Proportional to size reduction
  • Break-even: ~256 bytes on typical connections
  • Diminishing Returns: Below 128 bytes

Memory Usage

  • Buffer Size: Usually 4-16KB per connection
  • Trade-off: Memory vs. bandwidth
  • Recommendation: Profile in production

Testing Guidelines

Unit Tests

Test compression predicate logic:

#[test]
fn test_should_not_compress_errors() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(404)
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

#[test]
fn test_should_not_compress_small_responses() {
    let predicate = ShouldCompress;
    let response = Response::builder()
        .status(200)
        .header(CONTENT_LENGTH, "100")
        .body(())
        .unwrap();

    assert!(!predicate.should_compress(&response));
}

Integration Tests

Test actual S3 API responses:

#[tokio::test]
async fn test_error_response_not_truncated() {
    let response = client
        .get_object()
        .bucket("test")
        .key("nonexistent")
        .send()
        .await;

    // Should get proper error, not truncation error
    match response.unwrap_err() {
        SdkError::ServiceError(err) => {
            assert!(err.is_no_such_key());
        }
        other => panic!("Expected ServiceError, got {:?}", other),
    }
}

Monitoring and Alerts

Metrics to Track

  1. Compression Ratio: compressed_size / original_size
  2. Compression Skip Rate: skipped_count / total_count
  3. Error Response Size Distribution
  4. CPU Usage During Compression

Alert Conditions

# Prometheus alert rules
- alert: HighCompressionSkipRate
  expr: |
    rate(http_compression_skipped_total[5m]) 
    / rate(http_responses_total[5m]) > 0.5
  annotations:
    summary: "More than 50% of responses skipping compression"

- alert: LargeErrorResponses
  expr: |
    histogram_quantile(0.95, 
      rate(http_error_response_size_bytes_bucket[5m])) > 1024
  annotations:
    summary: "Error responses larger than 1KB"

Migration Guide

Updating Existing Code

If you're adding compression to an existing service:

  1. Start Conservative: Only compress responses > 1KB
  2. Monitor Impact: Watch CPU and latency metrics
  3. Lower Threshold Gradually: Test with smaller thresholds
  4. Always Exclude Errors: Never compress 4xx/5xx

Rollout Strategy

  1. Stage 1: Deploy to canary (5% traffic)

    • Monitor for 24 hours
    • Check error rates and latency
  2. Stage 2: Expand to 25% traffic

    • Monitor for 48 hours
    • Validate compression ratios
  3. Stage 3: Full rollout (100% traffic)

    • Continue monitoring for 1 week
    • Document any issues

References

  1. Issue #901: NoSuchKey error response regression
  2. Google Web Fundamentals - Text Compression
  3. AWS Best Practices - Response Compression

Last Updated: 2025-11-24
Maintainer: RustFS Team