Files
rustfs/docs/PERFORMANCE_TESTING.md
安正超 dd47fcf2a8 fix: restore localized samples in tests (#749)
* fix: restore required localized examples

* style: fix formatting issues
2025-10-29 13:16:31 +08:00

330 lines
7.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RustFS Performance Testing Guide
This document describes the recommended tools and workflows for benchmarking RustFS and analyzing performance bottlenecks.
## Overview
RustFS exposes several complementary tooling options:
1. **Profiling** collect CPU samples through the built-in `pprof` endpoints.
2. **Load testing** drive concurrent requests with dedicated client utilities.
3. **Monitoring and analysis** inspect collected metrics to locate hotspots.
## Prerequisites
### 1. Enable profiling support
Set the profiling environment variable before launching RustFS:
```bash
export RUSTFS_ENABLE_PROFILING=true
./rustfs
```
### 2. Install required tooling
Make sure the following dependencies are available:
```bash
# Base tools
curl # HTTP requests
jq # JSON processing (optional)
# Analysis tools
go # Go pprof CLI (optional, required for protobuf output)
python3 # Python load-testing scripts
# macOS users
brew install curl jq go python3
# Ubuntu/Debian users
sudo apt-get install curl jq golang-go python3
```
## Performance Testing Methods
### Method 1: Use the dedicated profiling script (recommended)
The repository ships with a helper script for common profiling flows:
```bash
# Show command help
./scripts/profile_rustfs.sh help
# Check profiler status
./scripts/profile_rustfs.sh status
# Capture a 30 second flame graph
./scripts/profile_rustfs.sh flamegraph
# Download protobuf-formatted samples
./scripts/profile_rustfs.sh protobuf
# Collect both formats
./scripts/profile_rustfs.sh both
# Provide custom arguments
./scripts/profile_rustfs.sh -d 60 -u http://192.168.1.100:9000 both
```
### Method 2: Run the Python end-to-end tester
A Python utility combines background load generation with profiling:
```bash
# Launch the integrated test harness
python3 test_load.py
```
The script will:
1. Launch multi-threaded S3 operations as load.
2. Pull profiling samples in parallel.
3. Produce a flame graph for investigation.
### Method 3: Simple shell-based load test
For quick smoke checks, a lightweight bash script is also provided:
```bash
# Execute a lightweight benchmark
./simple_load_test.sh
```
## Profiling Output Formats
### 1. Flame graph (SVG)
- **Purpose**: Visualize CPU time distribution.
- **File name**: `rustfs_profile_TIMESTAMP.svg`
- **How to view**: Open the SVG in a browser.
- **Interpretation tips**:
- Width reflects CPU time per function.
- Height illustrates call-stack depth.
- Click to zoom into specific frames.
```bash
# Example: open the file in a browser
open profiles/rustfs_profile_20240911_143000.svg
```
### 2. Protobuf samples
- **Purpose**: Feed data to the `go tool pprof` command.
- **File name**: `rustfs_profile_TIMESTAMP.pb`
- **Tooling**: `go tool pprof`
```bash
# Analyze the protobuf output
go tool pprof profiles/rustfs_profile_20240911_143000.pb
# Common pprof commands
(pprof) top # Show hottest call sites
(pprof) list func # Display annotated source for a function
(pprof) web # Launch the web UI (requires graphviz)
(pprof) png # Render a PNG flame chart
(pprof) help # List available commands
```
## API Usage
### Check profiling status
```bash
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/status"
```
Sample response:
```json
{
"enabled": "true",
"sampling_rate": "100"
}
```
### Capture profiling data
```bash
# Fetch a 30-second flame graph
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/profile?seconds=30&format=flamegraph" \
-o profile.svg
# Fetch protobuf output
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/profile?seconds=30&format=protobuf" \
-o profile.pb
```
**Parameters**
- `seconds`: Duration between 1 and 300 seconds.
- `format`: Output format (`flamegraph`/`svg` or `protobuf`/`pb`).
## Load Testing Scenarios
### 1. S3 API workload
Use the Python harness to exercise a complete S3 workflow:
```python
# Basic configuration
tester = S3LoadTester(
endpoint="http://127.0.0.1:9000",
access_key="rustfsadmin",
secret_key="rustfsadmin"
)
# Execute the load test
# Four threads, ten operations each
tester.run_load_test(num_threads=4, operations_per_thread=10)
```
Each iteration performs:
1. Upload a 1 MB object.
2. Download the object.
3. Delete the object.
### 2. Custom load scenarios
```bash
# Create a test bucket
curl -X PUT "http://127.0.0.1:9000/test-bucket"
# Concurrent uploads
for i in {1..10}; do
echo "test data $i" | curl -X PUT "http://127.0.0.1:9000/test-bucket/object-$i" -d @- &
done
wait
# Concurrent downloads
for i in {1..10}; do
curl "http://127.0.0.1:9000/test-bucket/object-$i" > /dev/null &
done
wait
```
## Profiling Best Practices
### 1. Environment preparation
- Confirm that `RUSTFS_ENABLE_PROFILING=true` is set.
- Use an isolated benchmark environment to avoid interference.
- Reserve disk space for generated profile artifacts.
### 2. Data collection tips
- **Warm-up**: Run a light workload for 510 minutes before sampling.
- **Sampling window**: Capture 3060 seconds under steady load.
- **Multiple samples**: Take several runs to compare results.
### 3. Analysis focus areas
When inspecting flame graphs, pay attention to:
1. **The widest frames** most CPU time consumed.
2. **Flat plateaus** likely bottlenecks.
3. **Deep call stacks** recursion or complex logic.
4. **Unexpected syscalls** I/O stalls or allocation churn.
### 4. Common issues
- **Lock contention**: Investigate frames under `std::sync`.
- **Memory allocation**: Search for `alloc`-related frames.
- **I/O wait**: Review filesystem or network call stacks.
- **Serialization overhead**: Look for JSON/XML parsing hotspots.
## Troubleshooting
### 1. Profiling disabled
Error: `{"enabled":"false"}`
**Fix**:
```bash
export RUSTFS_ENABLE_PROFILING=true
# Restart RustFS
```
### 2. Connection refused
Error: `Connection refused`
**Checklist**:
- Confirm RustFS is running.
- Ensure the port number is correct (default 9000).
- Verify firewall rules.
### 3. Oversized profile output
If artifacts become too large:
- Shorten the capture window (e.g., 1530 seconds).
- Reduce load-test concurrency.
- Prefer protobuf output instead of SVG.
## Configuration Parameters
### Environment variables
| Variable | Default | Description |
|------|--------|------|
| `RUSTFS_ENABLE_PROFILING` | `false` | Enable profiling support |
| `RUSTFS_URL` | `http://127.0.0.1:9000` | RustFS endpoint |
| `PROFILE_DURATION` | `30` | Profiling duration in seconds |
| `OUTPUT_DIR` | `./profiles` | Output directory |
### Script arguments
```bash
./scripts/profile_rustfs.sh [OPTIONS] [COMMAND]
OPTIONS:
-u, --url URL RustFS URL
-d, --duration SECONDS Profile duration
-o, --output DIR Output directory
COMMANDS:
status Check profiler status
flamegraph Collect a flame graph
protobuf Collect protobuf samples
both Collect both formats (default)
```
## Output Locations
- **Script output**: `./profiles/`
- **Python script**: `/tmp/rustfs_profiles/`
- **File naming**: `rustfs_profile_TIMESTAMP.{svg|pb}`
## Example Workflow
1. **Launch RustFS**
```bash
RUSTFS_ENABLE_PROFILING=true ./rustfs
```
2. **Verify profiling availability**
```bash
./scripts/profile_rustfs.sh status
```
3. **Start a load test**
```bash
python3 test_load.py &
```
4. **Collect samples**
```bash
./scripts/profile_rustfs.sh -d 60 both
```
5. **Inspect the results**
```bash
# Review the flame graph
open profiles/rustfs_profile_*.svg
# Or analyze the protobuf output
go tool pprof profiles/rustfs_profile_*.pb
```
Following this workflow helps you understand RustFS performance characteristics, locate bottlenecks, and implement targeted optimizations.