rustfs/docs/PERFORMANCE_TESTING.md

# RustFS Performance Testing Guide

This document describes the recommended tools and workflows for benchmarking RustFS and analyzing performance bottlenecks.

## Overview

RustFS exposes several complementary tooling options:

1. **Profiling** – collect CPU samples through the built-in `pprof` endpoints.
2. **Load testing** – drive concurrent requests with dedicated client utilities.
3. **Monitoring and analysis** – inspect collected metrics to locate hotspots.

## Prerequisites

### 1. Enable profiling support

Set the profiling environment variable before launching RustFS:

```bash
export RUSTFS_ENABLE_PROFILING=true
./rustfs
```

### 2. Install required tooling

Make sure the following dependencies are available:

```bash
# Base tools
curl       # HTTP requests
jq         # JSON processing (optional)

# Analysis tools
go         # Go pprof CLI (optional, required for protobuf output)
python3    # Python load-testing scripts

# macOS users
brew install curl jq go python3

# Ubuntu/Debian users
sudo apt-get install curl jq golang-go python3
```

## Performance Testing Methods

### Method 1: Use the dedicated profiling script (recommended)

The repository ships with a helper script for common profiling flows:

```bash
# Show command help
./scripts/profile_rustfs.sh help

# Check profiler status
./scripts/profile_rustfs.sh status

# Capture a 30 second flame graph
./scripts/profile_rustfs.sh flamegraph

# Download protobuf-formatted samples
./scripts/profile_rustfs.sh protobuf

# Collect both formats
./scripts/profile_rustfs.sh both

# Provide custom arguments
./scripts/profile_rustfs.sh -d 60 -u http://192.168.1.100:9000 both
```

### Method 2: Run the Python end-to-end tester

A Python utility combines background load generation with profiling:

```bash
# Launch the integrated test harness
python3 test_load.py
```

The script will:

1. Launch multi-threaded S3 operations as load.
2. Pull profiling samples in parallel.
3. Produce a flame graph for investigation.

### Method 3: Simple shell-based load test

For quick smoke checks, a lightweight bash script is also provided:

```bash
# Execute a lightweight benchmark
./simple_load_test.sh
```

## Profiling Output Formats

### 1. Flame graph (SVG)

- **Purpose**: Visualize CPU time distribution.
- **File name**: `rustfs_profile_TIMESTAMP.svg`
- **How to view**: Open the SVG in a browser.
- **Interpretation tips**:
  - Width reflects CPU time per function.
  - Height illustrates call-stack depth.
  - Click to zoom into specific frames.

```bash
# Example: open the file in a browser
open profiles/rustfs_profile_20240911_143000.svg
```

### 2. Protobuf samples

- **Purpose**: Feed data to the `go tool pprof` command.
- **File name**: `rustfs_profile_TIMESTAMP.pb`
- **Tooling**: `go tool pprof`

```bash
# Analyze the protobuf output
go tool pprof profiles/rustfs_profile_20240911_143000.pb

# Common pprof commands
(pprof) top        # Show hottest call sites
(pprof) list func  # Display annotated source for a function
(pprof) web        # Launch the web UI (requires graphviz)
(pprof) png        # Render a PNG flame chart
(pprof) help       # List available commands
```

## API Usage

### Check profiling status

```bash
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/status"
```

Sample response:

```json
{
  "enabled": "true",
  "sampling_rate": "100"
}
```

### Capture profiling data

```bash
# Fetch a 30-second flame graph
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/profile?seconds=30&format=flamegraph" \
  -o profile.svg

# Fetch protobuf output
curl "http://127.0.0.1:9000/rustfs/admin/debug/pprof/profile?seconds=30&format=protobuf" \
  -o profile.pb
```

**Parameters**
- `seconds`: Duration between 1 and 300 seconds.
- `format`: Output format (`flamegraph`/`svg` or `protobuf`/`pb`).

## Load Testing Scenarios

### 1. S3 API workload

Use the Python harness to exercise a complete S3 workflow:

```python
# Basic configuration
tester = S3LoadTester(
    endpoint="http://127.0.0.1:9000",
    access_key="rustfsadmin",
    secret_key="rustfsadmin"
)

# Execute the load test
# Four threads, ten operations each
tester.run_load_test(num_threads=4, operations_per_thread=10)
```

Each iteration performs:
1. Upload a 1 MB object.
2. Download the object.
3. Delete the object.

### 2. Custom load scenarios

```bash
# Create a test bucket
curl -X PUT "http://127.0.0.1:9000/test-bucket"

# Concurrent uploads
for i in {1..10}; do
  echo "test data $i" | curl -X PUT "http://127.0.0.1:9000/test-bucket/object-$i" -d @- &
done
wait

# Concurrent downloads
for i in {1..10}; do
  curl "http://127.0.0.1:9000/test-bucket/object-$i" > /dev/null &
done
wait
```

## Profiling Best Practices

### 1. Environment preparation

- Confirm that `RUSTFS_ENABLE_PROFILING=true` is set.
- Use an isolated benchmark environment to avoid interference.
- Reserve disk space for generated profile artifacts.

### 2. Data collection tips

- **Warm-up**: Run a light workload for 5–10 minutes before sampling.
- **Sampling window**: Capture 30–60 seconds under steady load.
- **Multiple samples**: Take several runs to compare results.

### 3. Analysis focus areas

When inspecting flame graphs, pay attention to:

1. **The widest frames** – most CPU time consumed.
2. **Flat plateaus** – likely bottlenecks.
3. **Deep call stacks** – recursion or complex logic.
4. **Unexpected syscalls** – I/O stalls or allocation churn.

### 4. Common issues

- **Lock contention**: Investigate frames under `std::sync`.
- **Memory allocation**: Search for `alloc`-related frames.
- **I/O wait**: Review filesystem or network call stacks.
- **Serialization overhead**: Look for JSON/XML parsing hotspots.

## Troubleshooting

### 1. Profiling disabled

Error: `{"enabled":"false"}`

**Fix**:

```bash
export RUSTFS_ENABLE_PROFILING=true
# Restart RustFS
```

### 2. Connection refused

Error: `Connection refused`

**Checklist**:
- Confirm RustFS is running.
- Ensure the port number is correct (default 9000).
- Verify firewall rules.

### 3. Oversized profile output

If artifacts become too large:
- Shorten the capture window (e.g., 15–30 seconds).
- Reduce load-test concurrency.
- Prefer protobuf output instead of SVG.

## Configuration Parameters

### Environment variables

| Variable | Default | Description |
|------|--------|------|
| `RUSTFS_ENABLE_PROFILING` | `false` | Enable profiling support |
| `RUSTFS_URL` | `http://127.0.0.1:9000` | RustFS endpoint |
| `PROFILE_DURATION` | `30` | Profiling duration in seconds |
| `OUTPUT_DIR` | `./profiles` | Output directory |

### Script arguments

```bash
./scripts/profile_rustfs.sh [OPTIONS] [COMMAND]

OPTIONS:
  -u, --url URL           RustFS URL
  -d, --duration SECONDS  Profile duration
  -o, --output DIR        Output directory

COMMANDS:
  status      Check profiler status
  flamegraph  Collect a flame graph
  protobuf    Collect protobuf samples
  both        Collect both formats (default)
```

## Output Locations

- **Script output**: `./profiles/`
- **Python script**: `/tmp/rustfs_profiles/`
- **File naming**: `rustfs_profile_TIMESTAMP.{svg|pb}`

## Example Workflow

1. **Launch RustFS**
   ```bash
   RUSTFS_ENABLE_PROFILING=true ./rustfs
   ```

2. **Verify profiling availability**
   ```bash
   ./scripts/profile_rustfs.sh status
   ```

3. **Start a load test**
   ```bash
   python3 test_load.py &
   ```

4. **Collect samples**
   ```bash
   ./scripts/profile_rustfs.sh -d 60 both
   ```

5. **Inspect the results**
   ```bash
   # Review the flame graph
   open profiles/rustfs_profile_*.svg

   # Or analyze the protobuf output
   go tool pprof profiles/rustfs_profile_*.pb
   ```

Following this workflow helps you understand RustFS performance characteristics, locate bottlenecks, and implement targeted optimizations.