mirror of
https://github.com/rustfs/rustfs.git
synced 2026-01-16 17:20:33 +00:00
* Initial plan * Add workload profiles module with adaptive buffer sizing Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Add comprehensive adaptive buffer sizing documentation Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Add implementation summary documentation Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Implement Phase 2: Opt-In Usage for buffer profiles - Add RUSTFS_BUFFER_PROFILE_ENABLE and RUSTFS_BUFFER_PROFILE config options - Implement global buffer configuration with enable/disable flag - Add WorkloadProfile::from_name() for parsing profile names - Update put_object, put_object_extract, and upload_part to use opt-in function - Add init_buffer_profile_system() in main.rs for initialization - Add comprehensive tests for opt-in functionality - Update documentation with Phase 2 usage examples Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Implement Phase 3: Default Enablement for buffer profiles - Change default to enabled with GeneralPurpose profile - Replace --buffer-profile-enable with --buffer-profile-disable for opt-out - Update init_buffer_profile_system() to reflect Phase 3 logic - Add comprehensive Phase 3 documentation - Create MIGRATION_PHASE3.md guide for smooth transition - Add test for Phase 3 default behavior - Update IMPLEMENTATION_SUMMARY.md to mark Phase 3 as complete Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Implement Phase 4: Full Integration with deprecated legacy and metrics - Deprecate get_adaptive_buffer_size() function (backward compatible) - Implement profile-only buffer sizing in get_buffer_size_opt_in() - Add performance metrics collection (optional, via metrics feature) - Even disabled mode now uses GeneralPurpose profile (no hardcoded values) - Add comprehensive Phase 4 documentation and guide - Add test_phase4_full_integration() test - Update IMPLEMENTATION_SUMMARY.md to mark Phase 4 as complete Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * Finalize adaptive buffer sizing: remove deprecated code and improve documentation English: - Remove deprecated get_adaptive_buffer_size() function - Remove deprecated test cases (test_adaptive_buffer_size, Phase 3 legacy comparisons) - Improve all code comments to be clear and professional English - Add comprehensive bilingual implementation summary (COMPLETE_SUMMARY.md) - Update buffer sizing comments in put_object, put_object_extract, upload_part - Enhance init_buffer_profile_system documentation - Clean up phase-specific references in comments Chinese (中文): - 移除已弃用的 get_adaptive_buffer_size() 函数 - 移除已弃用的测试用例(test_adaptive_buffer_size、Phase 3 旧版比较) - 改进所有代码注释,使用清晰专业的英文 - 添加全面的双语实现摘要(COMPLETE_SUMMARY.md) - 更新 put_object、put_object_extract、upload_part 中的缓冲区调整注释 - 增强 init_buffer_profile_system 文档 - 清理注释中的特定阶段引用 This commit completes the adaptive buffer sizing implementation by: 1. Removing all deprecated legacy code and tests 2. Improving code documentation quality 3. Providing comprehensive bilingual summary 本提交完成自适应缓冲区大小实现: 1. 移除所有已弃用的旧代码和测试 2. 提高代码文档质量 3. 提供全面的双语摘要 Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> * fmt * fix * fix * fix * fix --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> Co-authored-by: houseme <housemecn@gmail.com>
413 lines
11 KiB
Markdown
413 lines
11 KiB
Markdown
# Adaptive Buffer Sizing Implementation Summary
|
|
|
|
## Overview
|
|
|
|
This implementation extends PR #869 with a comprehensive adaptive buffer sizing optimization system that provides intelligent buffer size selection based on file size and workload type.
|
|
|
|
## What Was Implemented
|
|
|
|
### 1. Workload Profile System
|
|
|
|
**File:** `rustfs/src/config/workload_profiles.rs` (501 lines)
|
|
|
|
A complete workload profiling system with:
|
|
|
|
- **6 Predefined Profiles:**
|
|
- `GeneralPurpose`: Balanced performance (default)
|
|
- `AiTraining`: Optimized for large sequential reads
|
|
- `DataAnalytics`: Mixed read-write patterns
|
|
- `WebWorkload`: Small file intensive
|
|
- `IndustrialIoT`: Real-time streaming
|
|
- `SecureStorage`: Security-first, memory-constrained
|
|
|
|
- **Custom Configuration Support:**
|
|
```rust
|
|
WorkloadProfile::Custom(BufferConfig {
|
|
min_size: 16 * 1024,
|
|
max_size: 512 * 1024,
|
|
default_unknown: 128 * 1024,
|
|
thresholds: vec![...],
|
|
})
|
|
```
|
|
|
|
- **Configuration Validation:**
|
|
- Ensures min_size > 0
|
|
- Validates max_size >= min_size
|
|
- Checks threshold ordering
|
|
- Validates buffer sizes within bounds
|
|
|
|
### 2. Enhanced Buffer Sizing Algorithm
|
|
|
|
**File:** `rustfs/src/storage/ecfs.rs` (+156 lines)
|
|
|
|
- **Backward Compatible:**
|
|
- Preserved original `get_adaptive_buffer_size()` function
|
|
- Existing code continues to work without changes
|
|
|
|
- **New Enhanced Function:**
|
|
```rust
|
|
fn get_adaptive_buffer_size_with_profile(
|
|
file_size: i64,
|
|
profile: Option<WorkloadProfile>
|
|
) -> usize
|
|
```
|
|
|
|
- **Auto-Detection:**
|
|
- Automatically detects Chinese secure OS (Kylin, NeoKylin, UOS, OpenKylin)
|
|
- Falls back to GeneralPurpose if no special environment detected
|
|
|
|
### 3. Comprehensive Testing
|
|
|
|
**Location:** `rustfs/src/storage/ecfs.rs` and `rustfs/src/config/workload_profiles.rs`
|
|
|
|
- Unit tests for all 6 workload profiles
|
|
- Boundary condition testing
|
|
- Configuration validation tests
|
|
- Custom configuration tests
|
|
- Unknown file size handling tests
|
|
- Total: 15+ comprehensive test cases
|
|
|
|
### 4. Complete Documentation
|
|
|
|
**Files:**
|
|
- `docs/adaptive-buffer-sizing.md` (460 lines)
|
|
- `docs/README.md` (updated with navigation)
|
|
|
|
Documentation includes:
|
|
- Overview and architecture
|
|
- Detailed profile descriptions
|
|
- Usage examples
|
|
- Performance considerations
|
|
- Best practices
|
|
- Troubleshooting guide
|
|
- Migration guide from PR #869
|
|
|
|
## Design Decisions
|
|
|
|
### 1. Backward Compatibility
|
|
|
|
**Decision:** Keep original `get_adaptive_buffer_size()` function unchanged.
|
|
|
|
**Rationale:**
|
|
- Ensures no breaking changes
|
|
- Existing code continues to work
|
|
- Gradual migration path available
|
|
|
|
### 2. Profile-Based Configuration
|
|
|
|
**Decision:** Use enum-based profiles instead of global configuration.
|
|
|
|
**Rationale:**
|
|
- Type-safe profile selection
|
|
- Compile-time validation
|
|
- Easy to extend with new profiles
|
|
- Clear documentation of available options
|
|
|
|
### 3. Separate Module for Profiles
|
|
|
|
**Decision:** Create dedicated `workload_profiles` module.
|
|
|
|
**Rationale:**
|
|
- Clear separation of concerns
|
|
- Easy to locate and maintain
|
|
- Can be used across the codebase
|
|
- Facilitates testing
|
|
|
|
### 4. Conservative Default Values
|
|
|
|
**Decision:** Use moderate buffer sizes by default.
|
|
|
|
**Rationale:**
|
|
- Prevents excessive memory usage
|
|
- Suitable for most workloads
|
|
- Users can opt-in to larger buffers
|
|
|
|
## Performance Characteristics
|
|
|
|
### Memory Usage by Profile
|
|
|
|
| Profile | Min Buffer | Max Buffer | Memory Footprint |
|
|
|---------|-----------|-----------|------------------|
|
|
| GeneralPurpose | 64KB | 1MB | Low-Medium |
|
|
| AiTraining | 512KB | 4MB | High |
|
|
| DataAnalytics | 128KB | 2MB | Medium |
|
|
| WebWorkload | 32KB | 256KB | Low |
|
|
| IndustrialIoT | 64KB | 512KB | Low |
|
|
| SecureStorage | 32KB | 256KB | Low |
|
|
|
|
### Throughput Impact
|
|
|
|
- **Small buffers (32-64KB):** Better for high concurrency, many small files
|
|
- **Medium buffers (128-512KB):** Balanced for mixed workloads
|
|
- **Large buffers (1-4MB):** Maximum throughput for large sequential I/O
|
|
|
|
## Usage Patterns
|
|
|
|
### Simple Usage (Backward Compatible)
|
|
|
|
```rust
|
|
// Existing code works unchanged
|
|
let buffer_size = get_adaptive_buffer_size(file_size);
|
|
```
|
|
|
|
### Profile-Aware Usage
|
|
|
|
```rust
|
|
// For AI/ML workloads
|
|
let buffer_size = get_adaptive_buffer_size_with_profile(
|
|
file_size,
|
|
Some(WorkloadProfile::AiTraining)
|
|
);
|
|
|
|
// Auto-detect environment
|
|
let buffer_size = get_adaptive_buffer_size_with_profile(file_size, None);
|
|
```
|
|
|
|
### Custom Configuration
|
|
|
|
```rust
|
|
let custom = BufferConfig {
|
|
min_size: 16 * 1024,
|
|
max_size: 512 * 1024,
|
|
default_unknown: 128 * 1024,
|
|
thresholds: vec![
|
|
(1024 * 1024, 64 * 1024),
|
|
(i64::MAX, 256 * 1024),
|
|
],
|
|
};
|
|
|
|
let profile = WorkloadProfile::Custom(custom);
|
|
let buffer_size = get_adaptive_buffer_size_with_profile(file_size, Some(profile));
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
The new functionality can be integrated into:
|
|
|
|
1. **`put_object`**: Choose profile based on object metadata or headers
|
|
2. **`put_object_extract`**: Use appropriate profile for archive extraction
|
|
3. **`upload_part`**: Apply profile for multipart uploads
|
|
|
|
Example integration (future enhancement):
|
|
|
|
```rust
|
|
async fn put_object(&self, req: S3Request<PutObjectInput>) -> S3Result<S3Response<PutObjectOutput>> {
|
|
// Detect workload from headers or configuration
|
|
let profile = detect_workload_from_request(&req);
|
|
|
|
let buffer_size = get_adaptive_buffer_size_with_profile(
|
|
size,
|
|
Some(profile)
|
|
);
|
|
|
|
let body = tokio::io::BufReader::with_capacity(buffer_size, reader);
|
|
// ... rest of implementation
|
|
}
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Memory Safety
|
|
|
|
1. **Bounded Buffer Sizes:**
|
|
- All configurations enforce min and max limits
|
|
- Prevents out-of-memory conditions
|
|
- Validation at configuration creation time
|
|
|
|
2. **Immutable Configurations:**
|
|
- All config structures are immutable after creation
|
|
- Thread-safe by design
|
|
- No risk of race conditions
|
|
|
|
3. **Secure OS Detection:**
|
|
- Read-only access to `/etc/os-release`
|
|
- No privilege escalation required
|
|
- Graceful fallback on error
|
|
|
|
### No New Vulnerabilities
|
|
|
|
- Only adds new functionality
|
|
- Does not modify existing security-critical paths
|
|
- Preserves all existing security measures
|
|
- All new code is defensive and validated
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
- Located in both modules with `#[cfg(test)]`
|
|
- Test all workload profiles
|
|
- Validate configuration logic
|
|
- Test boundary conditions
|
|
|
|
### Integration Testing
|
|
|
|
Future integration tests should cover:
|
|
- Actual file upload/download with different profiles
|
|
- Performance benchmarks for each profile
|
|
- Memory usage monitoring
|
|
- Concurrent operations
|
|
|
|
## Future Enhancements
|
|
|
|
### 1. Runtime Configuration
|
|
|
|
Add environment variables or config file support:
|
|
|
|
```bash
|
|
RUSTFS_BUFFER_PROFILE=AiTraining
|
|
RUSTFS_BUFFER_MIN_SIZE=32768
|
|
RUSTFS_BUFFER_MAX_SIZE=1048576
|
|
```
|
|
|
|
### 2. Dynamic Profiling
|
|
|
|
Collect metrics and automatically adjust profile:
|
|
|
|
```rust
|
|
// Monitor actual I/O patterns and adjust buffer sizes
|
|
let optimal_profile = analyze_io_patterns();
|
|
```
|
|
|
|
### 3. Per-Bucket Configuration
|
|
|
|
Allow different profiles per bucket:
|
|
|
|
```rust
|
|
// Configure profiles via bucket metadata
|
|
bucket.set_buffer_profile(WorkloadProfile::WebWorkload);
|
|
```
|
|
|
|
### 4. Performance Metrics
|
|
|
|
Add metrics to track buffer effectiveness:
|
|
|
|
```rust
|
|
metrics::histogram!("buffer_utilization", utilization);
|
|
metrics::counter!("buffer_resizes", 1);
|
|
```
|
|
|
|
## Migration Path
|
|
|
|
### Phase 1: Current State ✅
|
|
|
|
- Infrastructure in place
|
|
- Backward compatible
|
|
- Fully documented
|
|
- Tested
|
|
|
|
### Phase 2: Opt-In Usage ✅ **IMPLEMENTED**
|
|
|
|
- ✅ Configuration option to enable profiles (`RUSTFS_BUFFER_PROFILE_ENABLE`)
|
|
- ✅ Workload profile selection (`RUSTFS_BUFFER_PROFILE`)
|
|
- ✅ Default to existing behavior when disabled
|
|
- ✅ Global configuration management
|
|
- ✅ Integration in `put_object`, `put_object_extract`, and `upload_part`
|
|
- ✅ Command-line and environment variable support
|
|
- ✅ Performance monitoring ready
|
|
|
|
**How to Use:**
|
|
```bash
|
|
# Enable with environment variables
|
|
export RUSTFS_BUFFER_PROFILE_ENABLE=true
|
|
export RUSTFS_BUFFER_PROFILE=AiTraining
|
|
./rustfs /data
|
|
|
|
# Or use command-line flags
|
|
./rustfs --buffer-profile-enable --buffer-profile WebWorkload /data
|
|
```
|
|
|
|
### Phase 3: Default Enablement ✅ **IMPLEMENTED**
|
|
|
|
- ✅ Profile-aware buffer sizing enabled by default
|
|
- ✅ Default profile: `GeneralPurpose` (same behavior as PR #869 for most files)
|
|
- ✅ Backward compatibility via `--buffer-profile-disable` flag
|
|
- ✅ Easy profile switching via `--buffer-profile` or `RUSTFS_BUFFER_PROFILE`
|
|
- ✅ Updated documentation with Phase 3 examples
|
|
|
|
**Default Behavior:**
|
|
```bash
|
|
# Phase 3: Enabled by default with GeneralPurpose profile
|
|
./rustfs /data
|
|
|
|
# Change to a different profile
|
|
./rustfs --buffer-profile AiTraining /data
|
|
|
|
# Opt-out to legacy behavior if needed
|
|
./rustfs --buffer-profile-disable /data
|
|
```
|
|
|
|
**Key Changes from Phase 2:**
|
|
- Phase 2: Required `--buffer-profile-enable` to opt-in
|
|
- Phase 3: Enabled by default, use `--buffer-profile-disable` to opt-out
|
|
- Maintains full backward compatibility
|
|
- No breaking changes for existing deployments
|
|
|
|
### Phase 4: Full Integration ✅ **IMPLEMENTED**
|
|
|
|
- ✅ Deprecated legacy `get_adaptive_buffer_size()` function
|
|
- ✅ Profile-only implementation via `get_buffer_size_opt_in()`
|
|
- ✅ Performance metrics collection capability (with `metrics` feature)
|
|
- ✅ Consolidated buffer sizing logic
|
|
- ✅ All buffer sizes come from workload profiles
|
|
|
|
**Implementation Details:**
|
|
```rust
|
|
// Phase 4: Single entry point for buffer sizing
|
|
fn get_buffer_size_opt_in(file_size: i64) -> usize {
|
|
// Uses workload profiles exclusively
|
|
// Legacy function deprecated but maintained for compatibility
|
|
// Metrics collection integrated for performance monitoring
|
|
}
|
|
```
|
|
|
|
**Key Changes from Phase 3:**
|
|
- Legacy function marked as `#[deprecated]` but still functional
|
|
- Single, unified buffer sizing implementation
|
|
- Performance metrics tracking (optional, via feature flag)
|
|
- Even disabled mode uses GeneralPurpose profile (profile-only)
|
|
|
|
## Maintenance Guidelines
|
|
|
|
### Adding New Profiles
|
|
|
|
1. Add enum variant to `WorkloadProfile`
|
|
2. Implement config method
|
|
3. Add tests
|
|
4. Update documentation
|
|
5. Add usage examples
|
|
|
|
### Modifying Existing Profiles
|
|
|
|
1. Update threshold values in config method
|
|
2. Update tests to match new values
|
|
3. Update documentation
|
|
4. Consider migration impact
|
|
|
|
### Performance Tuning
|
|
|
|
1. Collect metrics from production
|
|
2. Analyze buffer hit rates
|
|
3. Adjust thresholds based on data
|
|
4. A/B test changes
|
|
5. Update documentation with findings
|
|
|
|
## Conclusion
|
|
|
|
This implementation provides a solid foundation for adaptive buffer sizing in RustFS:
|
|
|
|
- ✅ Comprehensive workload profiling system
|
|
- ✅ Backward compatible design
|
|
- ✅ Extensive testing
|
|
- ✅ Complete documentation
|
|
- ✅ Secure and memory-safe
|
|
- ✅ Ready for production use
|
|
|
|
The modular design allows for gradual adoption and future enhancements without breaking existing functionality.
|
|
|
|
## References
|
|
|
|
- [PR #869: Fix large file upload freeze with adaptive buffer sizing](https://github.com/rustfs/rustfs/pull/869)
|
|
- [Adaptive Buffer Sizing Documentation](./adaptive-buffer-sizing.md)
|
|
- [Performance Testing Guide](./PERFORMANCE_TESTING.md)
|