mirror of https://github.com/rustfs/rustfs.git synced 2026-01-17 09:40:32 +00:00

Files

Copilot 6da5766ea2 Implement adaptive buffer sizing with workload profiles - Production-Ready Implementation (#878 )

* Initial plan

* Add workload profiles module with adaptive buffer sizing

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive adaptive buffer sizing documentation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add implementation summary documentation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 2: Opt-In Usage for buffer profiles

- Add RUSTFS_BUFFER_PROFILE_ENABLE and RUSTFS_BUFFER_PROFILE config options
- Implement global buffer configuration with enable/disable flag
- Add WorkloadProfile::from_name() for parsing profile names
- Update put_object, put_object_extract, and upload_part to use opt-in function
- Add init_buffer_profile_system() in main.rs for initialization
- Add comprehensive tests for opt-in functionality
- Update documentation with Phase 2 usage examples

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 3: Default Enablement for buffer profiles

- Change default to enabled with GeneralPurpose profile
- Replace --buffer-profile-enable with --buffer-profile-disable for opt-out
- Update init_buffer_profile_system() to reflect Phase 3 logic
- Add comprehensive Phase 3 documentation
- Create MIGRATION_PHASE3.md guide for smooth transition
- Add test for Phase 3 default behavior
- Update IMPLEMENTATION_SUMMARY.md to mark Phase 3 as complete

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 4: Full Integration with deprecated legacy and metrics

- Deprecate get_adaptive_buffer_size() function (backward compatible)
- Implement profile-only buffer sizing in get_buffer_size_opt_in()
- Add performance metrics collection (optional, via metrics feature)
- Even disabled mode now uses GeneralPurpose profile (no hardcoded values)
- Add comprehensive Phase 4 documentation and guide
- Add test_phase4_full_integration() test
- Update IMPLEMENTATION_SUMMARY.md to mark Phase 4 as complete

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Finalize adaptive buffer sizing: remove deprecated code and improve documentation

English:
- Remove deprecated get_adaptive_buffer_size() function
- Remove deprecated test cases (test_adaptive_buffer_size, Phase 3 legacy comparisons)
- Improve all code comments to be clear and professional English
- Add comprehensive bilingual implementation summary (COMPLETE_SUMMARY.md)
- Update buffer sizing comments in put_object, put_object_extract, upload_part
- Enhance init_buffer_profile_system documentation
- Clean up phase-specific references in comments

Chinese (中文):
- 移除已弃用的 get_adaptive_buffer_size() 函数
- 移除已弃用的测试用例（test_adaptive_buffer_size、Phase 3 旧版比较）
- 改进所有代码注释，使用清晰专业的英文
- 添加全面的双语实现摘要（COMPLETE_SUMMARY.md）
- 更新 put_object、put_object_extract、upload_part 中的缓冲区调整注释
- 增强 init_buffer_profile_system 文档
- 清理注释中的特定阶段引用

This commit completes the adaptive buffer sizing implementation by:
1. Removing all deprecated legacy code and tests
2. Improving code documentation quality
3. Providing comprehensive bilingual summary

本提交完成自适应缓冲区大小实现：
1. 移除所有已弃用的旧代码和测试
2. 提高代码文档质量
3. 提供全面的双语摘要

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fmt

* fix

* fix

* fix

* fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>

2025-11-18 13:32:02 +08:00

20 KiB

Raw Blame History

Adaptive Buffer Sizing Optimization

RustFS implements intelligent adaptive buffer sizing optimization that automatically adjusts buffer sizes based on file size and workload type to achieve optimal balance between performance, memory usage, and security.

Overview

The adaptive buffer sizing system provides:

Automatic buffer size selection based on file size
Workload-specific optimizations for different use cases
Special environment support (Kylin, NeoKylin, Unity OS, etc.)
Memory pressure awareness with configurable limits
Unknown file size handling for streaming scenarios

Workload Profiles

GeneralPurpose (Default)

Balanced performance and memory usage for general-purpose workloads.

Buffer Sizing:

Small files (< 1MB): 64KB buffer
Medium files (1MB-100MB): 256KB buffer
Large files (≥ 100MB): 1MB buffer

Best for:

General file storage
Mixed workloads
Default configuration when workload type is unknown

AiTraining

Optimized for AI/ML training workloads with large sequential reads.

Buffer Sizing:

Small files (< 10MB): 512KB buffer
Medium files (10MB-500MB): 2MB buffer
Large files (≥ 500MB): 4MB buffer

Best for:

Machine learning model files
Training datasets
Large sequential data processing
Maximum throughput requirements

DataAnalytics

Optimized for data analytics with mixed read-write patterns.

Buffer Sizing:

Small files (< 5MB): 128KB buffer
Medium files (5MB-200MB): 512KB buffer
Large files (≥ 200MB): 2MB buffer

Best for:

Data warehouse operations
Analytics workloads
Business intelligence
Mixed access patterns

WebWorkload

Optimized for web applications with small file intensive operations.

Buffer Sizing:

Small files (< 512KB): 32KB buffer
Medium files (512KB-10MB): 128KB buffer
Large files (≥ 10MB): 256KB buffer

Best for:

Web assets (images, CSS, JavaScript)
Static content delivery
CDN origin storage
High concurrency scenarios

IndustrialIoT

Optimized for industrial IoT with real-time streaming requirements.

Buffer Sizing:

Small files (< 1MB): 64KB buffer
Medium files (1MB-50MB): 256KB buffer
Large files (≥ 50MB): 512KB buffer (capped for memory constraints)

Best for:

Sensor data streams
Real-time telemetry
Edge computing scenarios
Low latency requirements
Memory-constrained devices

SecureStorage

Security-first configuration with strict memory limits for compliance.

Buffer Sizing:

Small files (< 1MB): 32KB buffer
Medium files (1MB-50MB): 128KB buffer
Large files (≥ 50MB): 256KB buffer (strict limit)

Best for:

Compliance-heavy environments
Secure government systems (Kylin, NeoKylin, UOS)
Financial services
Healthcare data storage
Memory-constrained secure environments

Auto-Detection: This profile is automatically selected when running on Chinese secure operating systems:

Kylin
NeoKylin
UOS (Unity OS)
OpenKylin

Usage

Using Default Configuration

The system automatically uses the GeneralPurpose profile by default:

// The buffer size is automatically calculated based on file size
// Uses GeneralPurpose profile by default
let buffer_size = get_adaptive_buffer_size(file_size);

Using Specific Workload Profile

use rustfs::config::workload_profiles::WorkloadProfile;

// For AI/ML workloads
let buffer_size = get_adaptive_buffer_size_with_profile(
    file_size,
    Some(WorkloadProfile::AiTraining)
);

// For web workloads
let buffer_size = get_adaptive_buffer_size_with_profile(
    file_size,
    Some(WorkloadProfile::WebWorkload)
);

// For secure storage
let buffer_size = get_adaptive_buffer_size_with_profile(
    file_size,
    Some(WorkloadProfile::SecureStorage)
);

Auto-Detection Mode

The system can automatically detect the runtime environment:

// Auto-detects OS environment or falls back to GeneralPurpose
let buffer_size = get_adaptive_buffer_size_with_profile(file_size, None);

Custom Configuration

For specialized requirements, create a custom configuration:

use rustfs::config::workload_profiles::{BufferConfig, WorkloadProfile};

let custom_config = BufferConfig {
    min_size: 16 * 1024,        // 16KB minimum
    max_size: 512 * 1024,       // 512KB maximum
    default_unknown: 128 * 1024, // 128KB for unknown sizes
    thresholds: vec![
        (1024 * 1024, 64 * 1024),       // < 1MB: 64KB
        (50 * 1024 * 1024, 256 * 1024), // 1MB-50MB: 256KB
        (i64::MAX, 512 * 1024),         // >= 50MB: 512KB
    ],
};

let profile = WorkloadProfile::Custom(custom_config);
let buffer_size = get_adaptive_buffer_size_with_profile(file_size, Some(profile));

Phase 3: Default Enablement (Current Implementation)

⚡ NEW: Workload profiles are now enabled by default!

Starting from Phase 3, adaptive buffer sizing with workload profiles is enabled by default using the GeneralPurpose profile. This provides improved performance out-of-the-box while maintaining full backward compatibility.

Default Behavior

# Phase 3: Profile-aware buffer sizing enabled by default with GeneralPurpose profile
./rustfs /data

This now automatically uses intelligent buffer sizing based on file size and workload characteristics.

Changing the Workload Profile

# Use a different profile (AI/ML workloads)
export RUSTFS_BUFFER_PROFILE=AiTraining
./rustfs /data

# Or via command-line
./rustfs --buffer-profile AiTraining /data

# Use web workload profile
./rustfs --buffer-profile WebWorkload /data

Opt-Out (Legacy Behavior)

If you need the exact behavior from PR #869 (fixed algorithm), you can disable profiling:

# Disable buffer profiling (revert to PR #869 behavior)
export RUSTFS_BUFFER_PROFILE_DISABLE=true
./rustfs /data

# Or via command-line
./rustfs --buffer-profile-disable /data

Available Profile Names

The following profile names are supported (case-insensitive):

Profile Name	Aliases	Description
`GeneralPurpose`	`general`	Default balanced configuration (same as PR #869 for most files)
`AiTraining`	`ai`	Optimized for AI/ML workloads
`DataAnalytics`	`analytics`	Mixed read-write patterns
`WebWorkload`	`web`	Small file intensive operations
`IndustrialIoT`	`iot`	Real-time streaming
`SecureStorage`	`secure`	Security-first, memory constrained

Behavior Summary

Phase 3 Default (Enabled):

Uses workload-aware buffer sizing with GeneralPurpose profile
Provides same buffer sizes as PR #869 for most scenarios
Allows easy switching to specialized profiles
Buffer sizes: 64KB, 256KB, 1MB based on file size (GeneralPurpose)

With RUSTFS_BUFFER_PROFILE_DISABLE=true:

Uses the exact original adaptive buffer sizing from PR #869
For users who want guaranteed legacy behavior
Buffer sizes: 64KB, 256KB, 1MB based on file size

With Different Profiles:

AiTraining: 512KB, 2MB, 4MB - maximize throughput
WebWorkload: 32KB, 128KB, 256KB - optimize concurrency
SecureStorage: 32KB, 128KB, 256KB - compliance-focused
And more...

Migration Examples

Phase 2 → Phase 3 Migration:

# Phase 2 (Opt-In): Had to explicitly enable
export RUSTFS_BUFFER_PROFILE_ENABLE=true
export RUSTFS_BUFFER_PROFILE=GeneralPurpose
./rustfs /data

# Phase 3 (Default): Enabled automatically
./rustfs /data  # ← Same behavior, no configuration needed!

Using Different Profiles:

# AI/ML workloads - larger buffers for maximum throughput
export RUSTFS_BUFFER_PROFILE=AiTraining
./rustfs /data

# Web workloads - smaller buffers for high concurrency
export RUSTFS_BUFFER_PROFILE=WebWorkload
./rustfs /data

# Secure environments - compliance-focused
export RUSTFS_BUFFER_PROFILE=SecureStorage
./rustfs /data

Reverting to Legacy Behavior:

# If you encounter issues or need exact PR #869 behavior
export RUSTFS_BUFFER_PROFILE_DISABLE=true
./rustfs /data

Phase 4: Full Integration (Current Implementation)

🚀 NEW: Profile-only implementation with performance metrics!

Phase 4 represents the final stage of the adaptive buffer sizing system, providing a unified, profile-based approach with optional performance monitoring.

Key Features

Deprecated Legacy Function
- get_adaptive_buffer_size() is now deprecated
- Maintained for backward compatibility only
- All new code uses the workload profile system
Profile-Only Implementation
- Single entry point: get_buffer_size_opt_in()
- All buffer sizes come from workload profiles
- Even "disabled" mode uses GeneralPurpose profile (no hardcoded values)
Performance Metrics (Optional)
- Built-in metrics collection with metrics feature flag
- Tracks buffer size selections
- Monitors buffer-to-file size ratios
- Helps optimize profile configurations

Unified Buffer Sizing

// Phase 4: Single, unified implementation
fn get_buffer_size_opt_in(file_size: i64) -> usize {
    // Enabled by default (Phase 3)
    // Uses workload profiles exclusively
    // Optional metrics collection
}

Performance Monitoring

When compiled with the metrics feature flag:

# Build with metrics support
cargo build --features metrics

# Run and collect metrics
./rustfs /data

# Metrics collected:
# - buffer_size_bytes: Histogram of selected buffer sizes
# - buffer_size_selections: Counter of buffer size calculations
# - buffer_to_file_ratio: Ratio of buffer size to file size

Migration from Phase 3

No action required! Phase 4 is fully backward compatible with Phase 3:

# Phase 3 usage continues to work
./rustfs /data
export RUSTFS_BUFFER_PROFILE=AiTraining
./rustfs /data

# Phase 4 adds deprecation warnings for direct legacy function calls
# (if you have custom code calling get_adaptive_buffer_size)

What Changed

Aspect	Phase 3	Phase 4
Legacy Function	Active	Deprecated (still works)
Implementation	Hybrid (legacy fallback)	Profile-only
Metrics	None	Optional via feature flag
Buffer Source	Profiles or hardcoded	Profiles only

Benefits

Simplified Codebase
- Single implementation path
- Easier to maintain and optimize
- Consistent behavior across all scenarios
Better Observability
- Optional metrics for performance monitoring
- Data-driven profile optimization
- Production usage insights
Future-Proof
- No legacy code dependencies
- Easy to add new profiles
- Extensible for future enhancements

Code Example

Phase 3 (Still Works):

// Enabled by default
let buffer_size = get_buffer_size_opt_in(file_size);

Phase 4 (Recommended):

// Same call, but now with optional metrics and profile-only implementation
let buffer_size = get_buffer_size_opt_in(file_size);
// Metrics automatically collected if feature enabled

Deprecated (Backward Compatible):

// This still works but generates deprecation warnings
#[allow(deprecated)]
let buffer_size = get_adaptive_buffer_size(file_size);

Enabling Metrics

Add to Cargo.toml:

[dependencies]
rustfs = { version = "*", features = ["metrics"] }

Or build with feature flag:

cargo build --features metrics --release

Metrics Dashboard

When metrics are enabled, you can visualize:

Buffer Size Distribution: Most common buffer sizes used
Profile Effectiveness: How well profiles match actual workloads
Memory Efficiency: Buffer-to-file size ratios
Usage Patterns: File size distribution and buffer selection trends

Use your preferred metrics backend (Prometheus, InfluxDB, etc.) to collect and visualize these metrics.

Phase 2: Opt-In Usage (Previous Implementation)

Note: Phase 2 documentation is kept for historical reference. The current version uses Phase 4 (Full Integration).

Click to expand Phase 2 documentation

Starting from Phase 2 of the migration path, workload profiles can be enabled via environment variables or command-line arguments.

Environment Variables

Enable workload profiling using these environment variables:

# Enable buffer profiling (opt-in)
export RUSTFS_BUFFER_PROFILE_ENABLE=true

# Set the workload profile
export RUSTFS_BUFFER_PROFILE=AiTraining

# Start RustFS
./rustfs /data

Command-Line Arguments

Alternatively, use command-line flags:

# Enable buffer profiling with AI training profile
./rustfs --buffer-profile-enable --buffer-profile AiTraining /data

# Enable buffer profiling with web workload profile
./rustfs --buffer-profile-enable --buffer-profile WebWorkload /data

# Disable buffer profiling (use legacy behavior)
./rustfs /data

Behavior

When RUSTFS_BUFFER_PROFILE_ENABLE=false (default in Phase 2):

Uses the original adaptive buffer sizing from PR #869
No breaking changes to existing deployments
Buffer sizes: 64KB, 256KB, 1MB based on file size

When RUSTFS_BUFFER_PROFILE_ENABLE=true:

Uses the configured workload profile
Allows for workload-specific optimizations
Buffer sizes vary based on the selected profile

Configuration Validation

All buffer configurations are validated to ensure correctness:

let config = BufferConfig { /* ... */ };
config.validate()?; // Returns Err if invalid

Validation Rules:

min_size must be > 0
max_size must be >= min_size
default_unknown must be between min_size and max_size
Thresholds must be in ascending order
Buffer sizes in thresholds must be within [min_size, max_size]

Environment Detection

The system automatically detects special operating system environments by reading /etc/os-release on Linux systems:

if let Some(profile) = WorkloadProfile::detect_os_environment() {
    // Returns SecureStorage profile for Kylin, NeoKylin, UOS, etc.
    let buffer_size = profile.config().calculate_buffer_size(file_size);
}

Detected Environments:

Kylin (麒麟)
NeoKylin (中标麒麟)
UOS / Unity OS (统信)
OpenKylin (开放麒麟)

Performance Considerations

Memory Usage

Different profiles have different memory footprints:

Profile	Min Buffer	Max Buffer	Typical Memory
GeneralPurpose	64KB	1MB	Low-Medium
AiTraining	512KB	4MB	High
DataAnalytics	128KB	2MB	Medium
WebWorkload	32KB	256KB	Low
IndustrialIoT	64KB	512KB	Low
SecureStorage	32KB	256KB	Low

Throughput Impact

Larger buffers generally provide better throughput for large files by reducing system call overhead:

Small buffers (32-64KB): Lower memory, more syscalls, suitable for many small files
Medium buffers (128-512KB): Balanced approach for mixed workloads
Large buffers (1-4MB): Maximum throughput, best for large sequential reads

Concurrency Considerations

For high-concurrency scenarios (e.g., WebWorkload):

Smaller buffers reduce per-connection memory
Allows more concurrent connections
Better overall system resource utilization

Best Practices

1. Choose the Right Profile

Select the profile that matches your primary workload:

// AI/ML training
WorkloadProfile::AiTraining

// Web application
WorkloadProfile::WebWorkload

// General purpose storage
WorkloadProfile::GeneralPurpose

2. Monitor Memory Usage

In production, monitor memory consumption:

// For memory-constrained environments, use smaller buffers
WorkloadProfile::SecureStorage  // or IndustrialIoT

3. Test Performance

Benchmark your specific workload to verify the profile choice:

# Run performance tests with different profiles
cargo test --release -- --ignored performance_tests

4. Consider File Size Distribution

If you know your typical file sizes:

Mostly small files (< 1MB): Use WebWorkload or SecureStorage
Mostly large files (> 100MB): Use AiTraining or DataAnalytics
Mixed sizes: Use GeneralPurpose

5. Compliance Requirements

For regulated environments:

// Automatically uses SecureStorage on detected secure OS
let config = RustFSBufferConfig::with_auto_detect();

// Or explicitly set SecureStorage
let config = RustFSBufferConfig::new(WorkloadProfile::SecureStorage);

Integration Examples

S3 Put Object

async fn put_object(&self, req: S3Request<PutObjectInput>) -> S3Result<S3Response<PutObjectOutput>> {
    let size = req.input.content_length.unwrap_or(-1);
    
    // Use workload-aware buffer sizing
    let buffer_size = get_adaptive_buffer_size_with_profile(
        size,
        Some(WorkloadProfile::GeneralPurpose)
    );
    
    let body = tokio::io::BufReader::with_capacity(
        buffer_size,
        StreamReader::new(body)
    );
    
    // Process upload...
}

Multipart Upload

async fn upload_part(&self, req: S3Request<UploadPartInput>) -> S3Result<S3Response<UploadPartOutput>> {
    let size = req.input.content_length.unwrap_or(-1);
    
    // For large multipart uploads, consider using AiTraining profile
    let buffer_size = get_adaptive_buffer_size_with_profile(
        size,
        Some(WorkloadProfile::AiTraining)
    );
    
    let body = tokio::io::BufReader::with_capacity(
        buffer_size,
        StreamReader::new(body_stream)
    );
    
    // Process part upload...
}

Troubleshooting

High Memory Usage

If experiencing high memory usage:

Switch to a more conservative profile:

WorkloadProfile::WebWorkload  // or SecureStorage

Set explicit memory limits in custom configuration:

let config = BufferConfig {
    min_size: 16 * 1024,
    max_size: 128 * 1024,  // Cap at 128KB
    // ...
};

Low Throughput

If experiencing low throughput for large files:

Use a more aggressive profile:

WorkloadProfile::AiTraining  // or DataAnalytics

Increase buffer sizes in custom configuration:

let config = BufferConfig {
    max_size: 4 * 1024 * 1024,  // 4MB max buffer
    // ...
};

Streaming/Unknown Size Handling

For chunked transfers or streaming:

// Pass -1 for unknown size
let buffer_size = get_adaptive_buffer_size_with_profile(-1, None);
// Returns the profile's default_unknown size

Technical Implementation

Algorithm

The buffer size is selected based on file size thresholds:

pub fn calculate_buffer_size(&self, file_size: i64) -> usize {
    if file_size < 0 {
        return self.default_unknown;
    }
    
    for (threshold, buffer_size) in &self.thresholds {
        if file_size < *threshold {
            return (*buffer_size).clamp(self.min_size, self.max_size);
        }
    }
    
    self.max_size
}

Thread Safety

All configuration structures are:

Immutable after creation
Safe to share across threads
Cloneable for per-thread customization

Performance Overhead

Configuration lookup: O(n) where n = number of thresholds (typically 2-4)
Negligible overhead compared to I/O operations
Configuration can be cached per-connection

Migration Guide

From PR #869

The original get_adaptive_buffer_size function is preserved for backward compatibility:

// Old code (still works)
let buffer_size = get_adaptive_buffer_size(file_size);

// New code (recommended)
let buffer_size = get_adaptive_buffer_size_with_profile(
    file_size,
    Some(WorkloadProfile::GeneralPurpose)
);

Upgrading Existing Code

Identify workload type for each use case
Replace get_adaptive_buffer_size with get_adaptive_buffer_size_with_profile
Choose appropriate profile
Test performance impact

References

License

Licensed under the Apache License, Version 2.0.

20 KiB Raw Blame History

Adaptive Buffer Sizing Optimization

Overview

Workload Profiles

GeneralPurpose (Default)

AiTraining

DataAnalytics

WebWorkload

IndustrialIoT

SecureStorage

Usage

Using Default Configuration

Using Specific Workload Profile

Auto-Detection Mode

Custom Configuration

Phase 3: Default Enablement (Current Implementation)

Default Behavior

Changing the Workload Profile

Opt-Out (Legacy Behavior)

Available Profile Names

Behavior Summary

Migration Examples

Phase 4: Full Integration (Current Implementation)

Key Features

Unified Buffer Sizing

Performance Monitoring

Migration from Phase 3

What Changed

Benefits

Code Example

Enabling Metrics

Metrics Dashboard

Phase 2: Opt-In Usage (Previous Implementation)

Environment Variables

Command-Line Arguments

Behavior

Configuration Validation

Environment Detection

Performance Considerations

Memory Usage

Throughput Impact

Concurrency Considerations

Best Practices

1. Choose the Right Profile

2. Monitor Memory Usage

3. Test Performance

4. Consider File Size Distribution

5. Compliance Requirements

Integration Examples

S3 Put Object

Multipart Upload

Troubleshooting

High Memory Usage

Low Throughput

Streaming/Unknown Size Handling

Technical Implementation

Algorithm

Thread Safety

Performance Overhead

Migration Guide

From PR #869

Upgrading Existing Code

References

License

20 KiB

Raw Blame History