build: update docker config and refine s3s region handling (#1976)

Co-authored-by: houseme <housemecn@gmail.com>
This commit is contained in:
heihutu
2026-02-27 01:21:12 +08:00
committed by GitHub
parent eb07f084cb
commit d983638391
43 changed files with 1009 additions and 852 deletions

View File

@@ -1,261 +1,131 @@
# RustFS Docker Images
# RustFS Docker Infrastructure
This directory contains Docker configuration files and supporting infrastructure for building and running RustFS container images.
This directory contains the complete Docker infrastructure for building, deploying, and monitoring RustFS. It provides ready-to-use configurations for development, testing, and production-grade observability.
## 📁 Directory Structure
## 📂 Directory Structure
```
rustfs/
├── Dockerfile # Production image (Alpine + pre-built binaries)
├── Dockerfile.source # Development image (Debian + source build)
├── docker-buildx.sh # Multi-architecture build script
├── Makefile # Build automation with simplified commands
└── .docker/ # Supporting infrastructure
├── observability/ # Monitoring and observability configs
├── compose/ # Docker Compose configurations
├── mqtt/ # MQTT broker configs
└── openobserve-otel/ # OpenObserve + OpenTelemetry configs
```
| Directory | Description | Status |
| :--- | :--- | :--- |
| **[`observability/`](observability/README.md)** | **[RECOMMENDED]** Full-stack observability (Prometheus, Grafana, Tempo, Loki). | ✅ Production-Ready |
| **[`compose/`](compose/README.md)** | Specialized setups (e.g., 4-node distributed cluster testing). | ⚠️ Testing Only |
| **[`mqtt/`](mqtt/README.md)** | EMQX Broker configuration for MQTT integration testing. | 🧪 Development |
| **[`openobserve-otel/`](openobserve-otel/README.md)** | Alternative lightweight observability stack using OpenObserve. | 🔄 Alternative |
## 🎯 Image Variants
---
### Core Images
## 📄 Root Directory Files
| Image | Base OS | Build Method | Size | Use Case |
|-------|---------|--------------|------|----------|
| `production` (default) | Alpine 3.18 | GitHub Releases | Smallest | Production deployment |
| `source` | Debian Bookworm | Source build | Medium | Custom builds with cross-compilation |
| `dev` | Debian Bookworm | Development tools | Large | Interactive development |
The following files in the project root are essential for Docker operations:
## 🚀 Usage Examples
### Build Scripts & Dockerfiles
### Quick Start (Production)
| File | Description | Usage |
| :--- | :--- | :--- |
| **`docker-buildx.sh`** | **Multi-Arch Build Script**<br>Automates building and pushing Docker images for `amd64` and `arm64`. Supports release and dev channels. | `./docker-buildx.sh --push` |
| **`Dockerfile`** | **Production Image (Alpine)**<br>Lightweight image using musl libc. Downloads pre-built binaries from GitHub Releases. | `docker build -t rustfs:latest .` |
| **`Dockerfile.glibc`** | **Production Image (Ubuntu)**<br>Standard image using glibc. Useful if you need specific dynamic libraries. | `docker build -f Dockerfile.glibc .` |
| **`Dockerfile.source`** | **Development Image**<br>Builds RustFS from source code. Includes build tools. Ideal for local development and CI. | `docker build -f Dockerfile.source .` |
### Docker Compose Configurations
| File | Description | Usage |
| :--- | :--- | :--- |
| **`docker-compose.yml`** | **Main Development Setup**<br>Comprehensive setup with profiles for development, observability, and proxying. | `docker compose up -d`<br>`docker compose --profile observability up -d` |
| **`docker-compose-simple.yml`** | **Quick Start Setup**<br>Minimal configuration running a single RustFS instance with 4 volumes. Perfect for first-time users. | `docker compose -f docker-compose-simple.yml up -d` |
---
## 🌟 Observability Stack (Recommended)
Located in: [`.docker/observability/`](observability/README.md)
We provide a comprehensive, industry-standard observability stack designed for deep insights into RustFS performance. This is the recommended setup for both development and production monitoring.
### Components
- **Metrics**: Prometheus (Collection) + Grafana (Visualization)
- **Traces**: Tempo (Storage) + Jaeger (UI)
- **Logs**: Loki
- **Ingestion**: OpenTelemetry Collector
### Key Features
- **Full Persistence**: All metrics, logs, and traces are saved to Docker volumes, ensuring no data loss on restarts.
- **Correlation**: Seamlessly jump between Logs, Traces, and Metrics in Grafana.
- **High Performance**: Optimized configurations for batching, compression, and memory management.
### Quick Start
```bash
# Default production image (Alpine + GitHub Releases)
docker run -p 9000:9000 rustfs/rustfs:latest
# Specific version
docker run -p 9000:9000 rustfs/rustfs:1.2.3
cd .docker/observability
docker compose up -d
```
### Complete Tag Strategy Examples
---
## 🧪 Specialized Environments
Located in: [`.docker/compose/`](compose/README.md)
These configurations are tailored for specific testing scenarios that require complex topologies.
### Distributed Cluster (4-Nodes)
Simulates a real-world distributed environment with 4 RustFS nodes running locally.
```bash
# Stable Releases
docker run rustfs/rustfs:1.2.3 # Main version (production)
docker run rustfs/rustfs:1.2.3-production # Explicit production variant
docker run rustfs/rustfs:1.2.3-source # Source build variant
docker run rustfs/rustfs:latest # Latest stable
# Prerelease Versions
docker run rustfs/rustfs:1.3.0-alpha.2 # Specific alpha version
docker run rustfs/rustfs:alpha # Latest alpha
docker run rustfs/rustfs:beta # Latest beta
docker run rustfs/rustfs:rc # Latest release candidate
# Development Versions
docker run rustfs/rustfs:dev # Latest main branch development
docker run rustfs/rustfs:dev-13e4a0b # Specific commit
docker run rustfs/rustfs:dev-latest # Latest development
docker run rustfs/rustfs:main-latest # Main branch latest
docker compose -f .docker/compose/docker-compose.cluster.yaml up -d
```
### Development Environment
### Integrated Observability Test
A self-contained environment running 4 RustFS nodes alongside the full observability stack. Useful for end-to-end testing of telemetry.
```bash
# Quick setup using Makefile (recommended)
make docker-dev-local # Build development image locally
make dev-env-start # Start development container
# Manual Docker commands
docker run -it -v $(pwd):/workspace -p 9000:9000 rustfs/rustfs:latest-dev
# Build from source locally
docker build -f Dockerfile.source -t rustfs:custom .
# Development with hot reload
docker-compose up rustfs-dev
docker compose -f .docker/compose/docker-compose.observability.yaml up -d
```
## 🏗️ Build Arguments and Scripts
---
### Using Makefile Commands (Recommended)
## 📡 MQTT Integration
The easiest way to build images using simplified commands:
Located in: [`.docker/mqtt/`](mqtt/README.md)
Provides an EMQX broker for testing RustFS MQTT features.
### Quick Start
```bash
# Development images (build from source)
make docker-dev-local # Build for local use (single arch)
make docker-dev # Build multi-arch (for CI/CD)
make docker-dev-push REGISTRY=xxx # Build and push to registry
# Production images (using pre-built binaries)
make docker-buildx # Build multi-arch production images
make docker-buildx-push # Build and push production images
make docker-buildx-version VERSION=v1.0.0 # Build specific version
# Development environment
make dev-env-start # Start development container
make dev-env-stop # Stop development container
make dev-env-restart # Restart development container
# Help
make help-docker # Show all Docker-related commands
cd .docker/mqtt
docker compose up -d
```
- **Dashboard**: [http://localhost:18083](http://localhost:18083) (Default: `admin` / `public`)
- **MQTT Port**: `1883`
### Using docker-buildx.sh (Advanced)
---
For direct script usage and advanced scenarios:
## 👁️ Alternative: OpenObserve
Located in: [`.docker/openobserve-otel/`](openobserve-otel/README.md)
For users preferring a lightweight, all-in-one solution, we support OpenObserve. It combines logs, metrics, and traces into a single binary and UI.
### Quick Start
```bash
# Build latest version for all architectures
./docker-buildx.sh
# Build and push to registry
./docker-buildx.sh --push
# Build specific version
./docker-buildx.sh --release v1.2.3
# Build and push specific version
./docker-buildx.sh --release v1.2.3 --push
cd .docker/openobserve-otel
docker compose up -d
```
### Manual Docker Builds
---
All images support dynamic version selection:
## 🔧 Common Operations
### Cleaning Up
To stop all containers and remove volumes (**WARNING**: deletes all persisted data):
```bash
# Build production image with latest release
docker build --build-arg RELEASE="latest" -t rustfs:latest .
# Build from source with specific target
docker build -f Dockerfile.source \
--build-arg TARGETPLATFORM="linux/amd64" \
-t rustfs:source .
# Development build
docker build -f Dockerfile.source -t rustfs:dev .
docker compose down -v
```
## 🔧 Binary Download Sources
### Unified GitHub Releases
The production image downloads from GitHub Releases for reliability and transparency:
-**production** → GitHub Releases API with automatic latest detection
-**Checksum verification** → SHA256SUMS validation when available
-**Multi-architecture** → Supports amd64 and arm64
### Source Build
The source variant compiles from source code with advanced features:
- 🔧 **Cross-compilation** → Supports multiple target platforms via `TARGETPLATFORM`
-**Build caching** → sccache for faster compilation
- 🎯 **Optimized builds** → Release optimizations with LTO and symbol stripping
## 📋 Architecture Support
All variants support multi-architecture builds:
- **linux/amd64** (x86_64)
- **linux/arm64** (aarch64)
Architecture is automatically detected during build using Docker's `TARGETARCH` build argument.
## 🔐 Security Features
- **Checksum Verification**: Production image verifies SHA256SUMS when available
- **Non-root User**: All images run as user `rustfs` (UID 1000)
- **Minimal Runtime**: Production image only includes necessary dependencies
- **Secure Defaults**: No hardcoded credentials or keys
## 🛠️ Development Workflow
### Quick Start with Makefile (Recommended)
### Viewing Logs
To follow logs for a specific service:
```bash
# 1. Start development environment
make dev-env-start
# 2. Your development container is now running with:
# - Port 9000 exposed for RustFS
# - Port 9010 exposed for admin console
# - Current directory mounted as /workspace
# 3. Stop when done
make dev-env-stop
docker compose logs -f [service_name]
```
### Manual Development Setup
### Checking Status
To see the status of all running containers:
```bash
# Build development image from source
make docker-dev-local
# Or use traditional Docker commands
docker build -f Dockerfile.source -t rustfs:dev .
# Run with development tools
docker run -it -v $(pwd):/workspace -p 9000:9000 rustfs:dev bash
# Or use docker-compose for complex setups
docker-compose up rustfs-dev
docker compose ps
```
### Common Development Tasks
```bash
# Build and test locally
make build # Build binary natively
make docker-dev-local # Build development Docker image
make test # Run tests
make fmt # Format code
make clippy # Run linter
# Get help
make help # General help
make help-docker # Docker-specific help
make help-build # Build-specific help
```
## 🚀 CI/CD Integration
The project uses GitHub Actions for automated multi-architecture Docker builds:
### Automated Builds
- **Tags**: Automatic builds triggered on version tags (e.g., `v1.2.3`)
- **Main Branch**: Development builds with `dev-latest` and `main-latest` tags
- **Pull Requests**: Test builds without registry push
### Build Variants
Each build creates three image variants:
- `rustfs/rustfs:v1.2.3` (production - Alpine-based)
- `rustfs/rustfs:v1.2.3-source` (source build - Debian-based)
- `rustfs/rustfs:v1.2.3-dev` (development - Debian-based with tools)
### Manual Builds
Trigger custom builds via GitHub Actions:
```bash
# Use workflow_dispatch to build specific versions
# Available options: latest, main-latest, dev-latest, v1.2.3, dev-abc123
```
## 📦 Supporting Infrastructure
The `.docker/` directory contains supporting configuration files:
- **observability/** - Prometheus, Grafana, OpenTelemetry configs
- **compose/** - Multi-service Docker Compose setups
- **mqtt/** - MQTT broker configurations
- **openobserve-otel/** - Log aggregation and tracing setup
See individual README files in each subdirectory for specific usage instructions.

View File

@@ -1,80 +1,44 @@
# Docker Compose Configurations
# Specialized Docker Compose Configurations
This directory contains specialized Docker Compose configurations for different use cases.
This directory contains specialized Docker Compose configurations for specific testing scenarios.
## ⚠️ Important Note
**For Observability:**
We **strongly recommend** using the new, fully integrated observability stack located in `../observability/`. It provides a production-ready setup with Prometheus, Grafana, Tempo, Loki, and OpenTelemetry Collector, all with persistent storage and optimized configurations.
The `docker-compose.observability.yaml` in this directory is kept for legacy reference or specific minimal testing needs but is **not** the primary recommended setup.
## 📁 Configuration Files
This directory contains specialized Docker Compose configurations and their associated Dockerfiles, keeping related files organized together.
### Cluster Testing
### Main Configuration (Root Directory)
- **`docker-compose.cluster.yaml`**
- **Purpose**: Simulates a 4-node RustFS distributed cluster.
- **Use Case**: Testing distributed storage logic, consensus, and failover.
- **Nodes**: 4 RustFS instances.
- **Storage**: Uses local HTTP endpoints.
- **`../../docker-compose.yml`** - **Default Production Setup**
- Complete production-ready configuration
- Includes RustFS server + full observability stack
- Supports multiple profiles: `dev`, `observability`, `cache`, `proxy`
- Recommended for most users
### Legacy / Minimal Observability
### Specialized Configurations
- **`docker-compose.cluster.yaml`** - **Distributed Testing**
- 4-node cluster setup for testing distributed storage
- Uses local compiled binaries
- Simulates multi-node environment
- Ideal for development and cluster testing
- **`docker-compose.observability.yaml`** - **Observability Focus**
- Specialized setup for testing observability features
- Includes OpenTelemetry, Jaeger, Prometheus, Loki, Grafana
- Uses `../../Dockerfile.source` for builds
- Perfect for observability development
- **`docker-compose.observability.yaml`**
- **Purpose**: A minimal observability setup.
- **Status**: **Deprecated**. Please use `../observability/docker-compose.yml` instead.
## 🚀 Usage Examples
### Production Setup
```bash
# Start main service
docker-compose up -d
# Start with development profile
docker-compose --profile dev up -d
# Start with full observability
docker-compose --profile observability up -d
```
### Cluster Testing
```bash
# Build and start 4-node cluster (run from project root)
cd .docker/compose
docker-compose -f docker-compose.cluster.yaml up -d
# Or run directly from project root
docker-compose -f .docker/compose/docker-compose.cluster.yaml up -d
```
### Observability Testing
To start a 4-node cluster for distributed testing:
```bash
# Start observability-focused environment (run from project root)
cd .docker/compose
docker-compose -f docker-compose.observability.yaml up -d
# Or run directly from project root
docker-compose -f .docker/compose/docker-compose.observability.yaml up -d
# From project root
docker compose -f .docker/compose/docker-compose.cluster.yaml up -d
```
## 🔧 Configuration Overview
### (Deprecated) Minimal Observability
| Configuration | Nodes | Storage | Observability | Use Case |
|---------------|-------|---------|---------------|----------|
| **Main** | 1 | Volume mounts | Full stack | Production |
| **Cluster** | 4 | HTTP endpoints | Basic | Testing |
| **Observability** | 4 | Local data | Advanced | Development |
## 📝 Notes
- Always ensure you have built the required binaries before starting cluster tests
- The main configuration is sufficient for most use cases
- Specialized configurations are for specific testing scenarios
```bash
# From project root
docker compose -f .docker/compose/docker-compose.observability.yaml up -d
```

View File

@@ -13,65 +13,126 @@
# limitations under the License.
services:
# --- Observability Stack ---
tempo-init:
image: busybox:latest
command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ]
volumes:
- tempo-data:/var/tempo
user: root
networks:
- rustfs-network
restart: "no"
tempo:
image: grafana/tempo:latest
user: "10001"
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ../../.docker/observability/tempo.yaml:/etc/tempo.yaml:ro
- tempo-data:/var/tempo
ports:
- "3200:3200" # tempo
- "4317" # otlp grpc
- "4318" # otlp http
restart: unless-stopped
networks:
- rustfs-network
otel-collector:
image: otel/opentelemetry-collector-contrib:0.129.1
image: otel/opentelemetry-collector-contrib:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ../../.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
- ../../.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro
ports:
- 1888:1888
- 8888:8888
- 8889:8889
- 13133:13133
- 4317:4317
- 4318:4318
- 55679:55679
- "1888:1888" # pprof
- "8888:8888" # Prometheus metrics for Collector
- "8889:8889" # Prometheus metrics for application indicators
- "13133:13133" # health check
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "55679:55679" # zpages
networks:
- rustfs-network
depends_on:
- tempo
- jaeger
- prometheus
- loki
jaeger:
image: jaegertracing/jaeger:2.8.0
image: jaegertracing/jaeger:latest
environment:
- TZ=Asia/Shanghai
- SPAN_STORAGE_TYPE=badger
- BADGER_EPHEMERAL=false
- BADGER_DIRECTORY_VALUE=/badger/data
- BADGER_DIRECTORY_KEY=/badger/key
- COLLECTOR_OTLP_ENABLED=true
volumes:
- jaeger-data:/badger
ports:
- "16686:16686"
- "14317:4317"
- "14318:4318"
- "16686:16686" # Web UI
- "14269:14269" # Admin/Metrics
networks:
- rustfs-network
prometheus:
image: prom/prometheus:v3.4.2
image: prom/prometheus:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ../../.docker/observability/prometheus.yml:/etc/prometheus/prometheus.yml
- ../../.docker/observability/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-otlp-receiver'
- '--web.enable-remote-write-receiver'
- '--enable-feature=promql-experimental-functions'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
networks:
- rustfs-network
loki:
image: grafana/loki:3.5.1
image: grafana/loki:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ../../.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml
- ../../.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml:ro
- loki-data:/loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- rustfs-network
grafana:
image: grafana/grafana:12.0.2
image: grafana/grafana:latest
ports:
- "3000:3000" # Web UI
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_SECURITY_ADMIN_USER=admin
- TZ=Asia/Shanghai
- GF_INSTALL_PLUGINS=grafana-pyroscope-datasource
- GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/home.json
networks:
- rustfs-network
volumes:
- ../../.docker/observability/grafana/provisioning:/etc/grafana/provisioning:ro
- ../../.docker/observability/grafana/dashboards:/var/lib/grafana/dashboards:ro
depends_on:
- prometheus
- tempo
- loki
# --- RustFS Cluster ---
node1:
build:
@@ -86,9 +147,11 @@ services:
- RUSTFS_OBS_LOGGER_LEVEL=debug
platform: linux/amd64
ports:
- "9001:9000" # Map port 9001 of the host to port 9000 of the container
- "9001:9000"
networks:
- rustfs-network
depends_on:
- otel-collector
node2:
build:
@@ -103,9 +166,11 @@ services:
- RUSTFS_OBS_LOGGER_LEVEL=debug
platform: linux/amd64
ports:
- "9002:9000" # Map port 9002 of the host to port 9000 of the container
- "9002:9000"
networks:
- rustfs-network
depends_on:
- otel-collector
node3:
build:
@@ -120,9 +185,11 @@ services:
- RUSTFS_OBS_LOGGER_LEVEL=debug
platform: linux/amd64
ports:
- "9003:9000" # Map port 9003 of the host to port 9000 of the container
- "9003:9000"
networks:
- rustfs-network
depends_on:
- otel-collector
node4:
build:
@@ -137,9 +204,17 @@ services:
- RUSTFS_OBS_LOGGER_LEVEL=debug
platform: linux/amd64
ports:
- "9004:9000" # Map port 9004 of the host to port 9000 of the container
- "9004:9000"
networks:
- rustfs-network
depends_on:
- otel-collector
volumes:
prometheus-data:
tempo-data:
loki-data:
jaeger-data:
networks:
rustfs-network:

30
.docker/mqtt/README.md Normal file
View File

@@ -0,0 +1,30 @@
# MQTT Broker (EMQX)
This directory contains the configuration for running an EMQX MQTT broker, which can be used for testing RustFS's MQTT integration.
## 🚀 Quick Start
To start the EMQX broker:
```bash
docker compose up -d
```
## 📊 Access
- **Dashboard**: [http://localhost:18083](http://localhost:18083)
- **Default Credentials**: `admin` / `public`
- **MQTT Port**: `1883`
- **WebSocket Port**: `8083`
## 🛠️ Configuration
The `docker-compose.yml` file sets up a single-node EMQX instance.
- **Persistence**: Data is not persisted by default (for testing).
- **Network**: Uses the default bridge network.
## 📝 Notes
- This setup is intended for development and testing purposes.
- For production deployments, please refer to the official [EMQX Documentation](https://www.emqx.io/docs/en/latest/).

82
.docker/nginx/nginx.conf Normal file
View File

@@ -0,0 +1,82 @@
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
worker_processes auto;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
error_log /var/log/nginx/error.log warn;
sendfile on;
keepalive_timeout 65;
# RustFS Server Block
server {
listen 80;
server_name localhost;
# Redirect HTTP to HTTPS (optional, uncomment if SSL is configured)
# return 301 https://$host$request_uri;
location / {
proxy_pass http://rustfs:9000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# S3 specific headers
proxy_set_header X-Amz-Date $http_x_amz_date;
proxy_set_header Authorization $http_authorization;
# Disable buffering for large uploads
proxy_request_buffering off;
client_max_body_size 0;
}
location /rustfs/console {
proxy_pass http://rustfs:9001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
# SSL Configuration (Example)
# server {
# listen 443 ssl;
# server_name localhost;
#
# ssl_certificate /etc/nginx/ssl/server.crt;
# ssl_certificate_key /etc/nginx/ssl/server.key;
#
# location / {
# proxy_pass http://rustfs:9000;
# ...
# }
# }
}

0
.docker/nginx/ssl/.keep Normal file
View File

5
.docker/observability/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
jaeger-data/*
loki-data/*
prometheus-data/*
tempo-data/*
grafana-data/*

View File

@@ -1,109 +1,85 @@
# Observability
# RustFS Observability Stack
This directory contains the observability stack for the application. The stack is composed of the following components:
This directory contains the comprehensive observability stack for RustFS, designed to provide deep insights into application performance, logs, and traces.
- Prometheus v3.2.1
- Grafana 11.6.0
- Loki 3.4.2
- Jaeger 2.4.0
- Otel Collector 0.120.0 # 0.121.0 remove loki
## Components
## Prometheus
The stack is composed of the following best-in-class open-source components:
Prometheus is a monitoring and alerting toolkit. It scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to
either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be
used to visualize the collected data.
- **Prometheus** (v2.53.1): The industry standard for metric collection and alerting.
- **Grafana** (v11.1.0): The leading platform for observability visualization.
- **Loki** (v3.1.0): A horizontally-scalable, highly-available, multi-tenant log aggregation system.
- **Tempo** (v2.5.0): A high-volume, minimal dependency distributed tracing backend.
- **Jaeger** (v1.59.0): Distributed tracing system (configured as a secondary UI/storage).
- **OpenTelemetry Collector** (v0.104.0): A vendor-agnostic implementation for receiving, processing, and exporting telemetry data.
## Grafana
## Architecture
Grafana is a multi-platform open-source analytics and interactive visualization web application. It provides charts,
graphs, and alerts for the web when connected to supported data sources.
1. **Telemetry Collection**: Applications send OTLP (OpenTelemetry Protocol) data (Metrics, Logs, Traces) to the **OpenTelemetry Collector**.
2. **Processing & Exporting**: The Collector processes the data (batching, memory limiting) and exports it to the respective backends:
- **Traces** -> **Tempo** (Primary) & **Jaeger** (Secondary/Optional)
- **Metrics** -> **Prometheus** (via scraping the Collector's exporter)
- **Logs** -> **Loki**
3. **Visualization**: **Grafana** connects to all backends (Prometheus, Tempo, Loki, Jaeger) to provide a unified dashboard experience.
## Loki
## Features
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is
designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of
labels for each log stream.
- **Full Persistence**: All data (Metrics, Logs, Traces) is persisted to Docker volumes, ensuring no data loss on restart.
- **Correlation**: Seamless navigation between Metrics, Logs, and Traces in Grafana.
- Jump from a Metric spike to relevant Traces.
- Jump from a Trace to relevant Logs.
- **High Performance**: Optimized configurations for batching, compression, and memory management.
- **Standardized Protocols**: Built entirely on OpenTelemetry standards.
## Jaeger
## Quick Start
Jaeger is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and
troubleshooting microservices-based distributed systems, including:
### Prerequisites
- Distributed context propagation
- Distributed transaction monitoring
- Root cause analysis
- Service dependency analysis
- Performance / latency optimization
- Docker
- Docker Compose
## Otel Collector
### Deploy
The OpenTelemetry Collector offers a vendor-agnostic implementation on how to receive, process, and export telemetry
data. It removes the need to run, operate, and maintain multiple agents/collectors in order to support open-source
observability data formats (e.g. Jaeger, Prometheus, etc.) sending to one or more open-source or commercial back-ends.
## How to use
To deploy the observability stack, run the following command:
- docker latest version
Run the following command to start the entire stack:
```bash
docker compose -f docker-compose.yml -f docker-compose.override.yml up -d
docker compose up -d
```
- docker compose v2.0.0 or before
### Access Dashboards
| Service | URL | Credentials | Description |
| :--- | :--- | :--- | :--- |
| **Grafana** | [http://localhost:3000](http://localhost:3000) | `admin` / `admin` | Main visualization hub. |
| **Prometheus** | [http://localhost:9090](http://localhost:9090) | - | Metric queries and status. |
| **Jaeger UI** | [http://localhost:16686](http://localhost:16686) | - | Secondary trace visualization. |
| **Tempo** | [http://localhost:3200](http://localhost:3200) | - | Tempo status/metrics. |
## Configuration
### Data Persistence
Data is stored in the following Docker volumes:
- `prometheus-data`: Prometheus metrics
- `tempo-data`: Tempo traces (WAL and Blocks)
- `loki-data`: Loki logs (Chunks and Rules)
- `jaeger-data`: Jaeger traces (Badger DB)
To clear all data:
```bash
docker-compose -f docker-compose.yml -f docker-compose.override.yml up -d
docker compose down -v
```
To access the Grafana dashboard, navigate to `http://localhost:3000` in your browser. The default username and password
are `admin` and `admin`, respectively.
To access the Jaeger dashboard, navigate to `http://localhost:16686` in your browser.
To access the Prometheus dashboard, navigate to `http://localhost:9090` in your browser.
## How to stop
To stop the observability stack, run the following command:
```bash
docker compose -f docker-compose.yml -f docker-compose.override.yml down
```
## How to remove data
To remove the data generated by the observability stack, run the following command:
```bash
docker compose -f docker-compose.yml -f docker-compose.override.yml down -v
```
## How to configure
To configure the observability stack, modify the `docker-compose.override.yml` file. The file contains the following
```yaml
services:
prometheus:
environment:
- PROMETHEUS_CONFIG_FILE=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
```
The `prometheus` service mounts the `prometheus.yml` file to `/etc/prometheus/prometheus.yml`. The `grafana` service
mounts the `grafana/provisioning` directory to `/etc/grafana/provisioning`. You can modify these files to configure the
observability stack.
### Customization
- **Prometheus**: Edit `prometheus.yml` to add scrape targets or alerting rules.
- **Grafana**: Dashboards and datasources are provisioned from the `grafana/` directory.
- **Collector**: Edit `otel-collector-config.yaml` to modify pipelines, processors, or exporters.
## Troubleshooting
- **Service Health**: Check the health of services using `docker compose ps`.
- **Logs**: View logs for a specific service using `docker compose logs -f <service_name>`.
- **Otel Collector**: Check `http://localhost:13133` for health status and `http://localhost:1888/debug/pprof/` for profiling.

View File

@@ -1,27 +1,85 @@
## 部署可观测性系统
# RustFS 可观测性技术栈
OpenTelemetry Collector 提供了一个厂商中立的遥测数据处理方案,用于接收、处理和导出遥测数据。它消除了为支持多种开源可观测性数据格式(如
Jaeger、Prometheus 等)而需要运行和维护多个代理/收集器的必要性。
本目录包含 RustFS 的全面可观测性技术栈,旨在提供对应用程序性能、日志和追踪的深入洞察。
### 快速部署
## 组件
1. 进入 `.docker/observability` 目录
2. 执行以下命令启动服务:
该技术栈由以下一流的开源组件组成:
- **Prometheus** (v2.53.1): 行业标准的指标收集和告警工具。
- **Grafana** (v11.1.0): 领先的可观测性可视化平台。
- **Loki** (v3.1.0): 水平可扩展、高可用、多租户的日志聚合系统。
- **Tempo** (v2.5.0): 高吞吐量、最小依赖的分布式追踪后端。
- **Jaeger** (v1.59.0): 分布式追踪系统(配置为辅助 UI/存储)。
- **OpenTelemetry Collector** (v0.104.0): 接收、处理和导出遥测数据的供应商无关实现。
## 架构
1. **遥测收集**: 应用程序将 OTLP (OpenTelemetry Protocol) 数据(指标、日志、追踪)发送到 **OpenTelemetry Collector**
2. **处理与导出**: Collector 处理数据(批处理、内存限制)并将其导出到相应的后端:
- **追踪** -> **Tempo** (主要) & **Jaeger** (辅助/可选)
- **指标** -> **Prometheus** (通过抓取 Collector 的导出器)
- **日志** -> **Loki**
3. **可视化**: **Grafana** 连接到所有后端Prometheus, Tempo, Loki, Jaeger提供统一的仪表盘体验。
## 特性
- **完全持久化**: 所有数据(指标、日志、追踪)都持久化到 Docker 卷,确保重启后无数据丢失。
- **关联性**: 在 Grafana 中实现指标、日志和追踪之间的无缝导航。
- 从指标峰值跳转到相关追踪。
- 从追踪跳转到相关日志。
- **高性能**: 针对批处理、压缩和内存管理进行了优化配置。
- **标准化协议**: 完全基于 OpenTelemetry 标准构建。
## 快速开始
### 前置条件
- Docker
- Docker Compose
### 部署
运行以下命令启动整个技术栈:
```bash
docker compose -f docker-compose.yml up -d
docker compose up -d
```
### 访问监控面板
### 访问仪表盘
服务启动后,可通过以下地址访问各个监控面板:
| 服务 | URL | 凭据 | 描述 |
| :--- | :--- | :--- | :--- |
| **Grafana** | [http://localhost:3000](http://localhost:3000) | `admin` / `admin` | 主要可视化中心。 |
| **Prometheus** | [http://localhost:9090](http://localhost:9090) | - | 指标查询和状态。 |
| **Jaeger UI** | [http://localhost:16686](http://localhost:16686) | - | 辅助追踪可视化。 |
| **Tempo** | [http://localhost:3200](http://localhost:3200) | - | Tempo 状态/指标。 |
- Grafana: `http://localhost:3000` (默认账号/密码:`admin`/`admin`)
- Jaeger: `http://localhost:16686`
- Prometheus: `http://localhost:9090`
## 配置
## 配置可观测性
### 数据持久化
```shell
export RUSTFS_OBS_ENDPOINT="http://localhost:4317" # OpenTelemetry Collector 地址
数据存储在以下 Docker 卷中:
- `prometheus-data`: Prometheus 指标
- `tempo-data`: Tempo 追踪 (WAL 和 Blocks)
- `loki-data`: Loki 日志 (Chunks 和 Rules)
- `jaeger-data`: Jaeger 追踪 (Badger DB)
要清除所有数据:
```bash
docker compose down -v
```
### 自定义
- **Prometheus**: 编辑 `prometheus.yml` 以添加抓取目标或告警规则。
- **Grafana**: 仪表盘和数据源从 `grafana/` 目录预置。
- **Collector**: 编辑 `otel-collector-config.yaml` 以修改管道、处理器或导出器。
## 故障排除
- **服务健康**: 使用 `docker compose ps` 检查服务健康状况。
- **日志**: 使用 `docker compose logs -f <service_name>` 查看特定服务的日志。
- **Otel Collector**: 检查 `http://localhost:13133` 获取健康状态,检查 `http://localhost:1888/debug/pprof/` 进行性能分析。

View File

@@ -14,6 +14,8 @@
services:
# --- Tracing ---
tempo-init:
image: busybox:latest
command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ]
@@ -26,24 +28,104 @@ services:
tempo:
image: grafana/tempo:latest
user: "10001" # The container must be started with root to execute chown in the script
command: [ "-config.file=/etc/tempo.yaml" ] # This is passed as a parameter to the entry point script
user: "10001"
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml:ro
- ./tempo-data:/var/tempo
ports:
- "3200:3200" # tempo
- "24317:4317" # otlp grpc
- "24318:4318" # otlp http
- "4317" # otlp grpc
- "4318" # otlp http
restart: unless-stopped
networks:
- otel-network
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3200/metrics" ]
test: [ "CMD-SHELL", "wget --spider -q http://localhost:3200/metrics || exit 1" ]
interval: 10s
timeout: 5s
retries: 5
start_period: 40s
jaeger:
image: jaegertracing/jaeger:latest
environment:
- TZ=Asia/Shanghai
- SPAN_STORAGE_TYPE=badger
- BADGER_EPHEMERAL=false
- BADGER_DIRECTORY_VALUE=/badger/data
- BADGER_DIRECTORY_KEY=/badger/key
- COLLECTOR_OTLP_ENABLED=true
volumes:
- ./jaeger-data:/badger
ports:
- "16686:16686" # Web UI
- "14269:14269" # Admin/Metrics
- "4317"
- "4318"
networks:
- otel-network
healthcheck:
test: [ "CMD-SHELL", "wget --spider -q http://localhost:14269 || exit 1" ]
interval: 10s
timeout: 5s
retries: 5
start_period: 20s
# --- Metrics ---
prometheus:
image: prom/prometheus:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus-data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-otlp-receiver'
- '--web.enable-remote-write-receiver'
- '--enable-feature=promql-experimental-functions'
- '--storage.tsdb.min-block-duration=2h'
- '--storage.tsdb.max-block-duration=2h'
- '--log.level=info'
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
restart: unless-stopped
networks:
- otel-network
healthcheck:
test: [ "CMD-SHELL", "wget --spider -q http://localhost:9090/-/healthy || exit 1" ]
interval: 10s
timeout: 5s
retries: 3
start_period: 15s
# --- Logging ---
loki:
image: grafana/loki:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml:ro
- ./loki-data:/loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- otel-network
healthcheck:
test: [ "CMD-SHELL", "wget --spider -q http://localhost:3100/metrics || exit 1" ]
interval: 15s
timeout: 10s
retries: 5
start_period: 60s
# --- Collection ---
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
@@ -62,94 +144,33 @@ services:
networks:
- otel-network
depends_on:
jaeger:
condition: service_started
tempo:
condition: service_started
prometheus:
condition: service_started
loki:
condition: service_started
- tempo
- jaeger
- prometheus
- loki
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:13133" ]
test: [ "CMD-SHELL", "wget --spider -q http://localhost:13133 || exit 1" ]
interval: 10s
timeout: 5s
retries: 3
start_period: 20s
# --- Visualization ---
jaeger:
image: jaegertracing/jaeger:latest
environment:
- TZ=Asia/Shanghai
- SPAN_STORAGE_TYPE=memory
- COLLECTOR_OTLP_ENABLED=true
ports:
- "16686:16686" # Web UI
- "14317:4317" # OTLP gRPC
- "14318:4318" # OTLP HTTP
- "18888:8888" # collector
networks:
- otel-network
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:16686" ]
interval: 10s
timeout: 5s
retries: 3
prometheus:
image: prom/prometheus:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus-data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-otlp-receiver' # Enable OTLP
- '--web.enable-remote-write-receiver' # Enable remote write
- '--enable-feature=promql-experimental-functions' # Enable info()
- '--storage.tsdb.min-block-duration=15m' # Minimum block duration
- '--storage.tsdb.max-block-duration=1h' # Maximum block duration
- '--log.level=info'
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
restart: unless-stopped
networks:
- otel-network
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy" ]
interval: 10s
timeout: 5s
retries: 3
loki:
image: grafana/loki:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml:ro
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- otel-network
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3100/ready" ]
interval: 10s
timeout: 5s
retries: 3
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000" # Web UI
- "3000:3000"
volumes:
- ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
- ./grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_SECURITY_ADMIN_USER=admin
- TZ=Asia/Shanghai
- GF_INSTALL_PLUGINS=grafana-pyroscope-datasource
- GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/home.json
restart: unless-stopped
networks:
- otel-network
@@ -158,7 +179,7 @@ services:
- tempo
- loki
healthcheck:
test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health" ]
test: [ "CMD-SHELL", "wget --spider -q http://localhost:3000/api/health || exit 1" ]
interval: 10s
timeout: 5s
retries: 3
@@ -166,11 +187,14 @@ services:
volumes:
prometheus-data:
tempo-data:
loki-data:
jaeger-data:
grafana-data:
networks:
otel-network:
driver: bridge
name: "network_otel_config"
name: "network_otel"
ipam:
config:
- subnet: 172.28.0.0/16

View File

@@ -0,0 +1 @@
*

View File

@@ -1,3 +1,17 @@
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: 1
datasources:
@@ -7,102 +21,77 @@ datasources:
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
isDefault: true
version: 1
editable: false
jsonData:
httpMethod: GET
exemplarTraceIdDestinations:
- name: trace_id
datasourceUid: tempo
- name: Tempo
type: tempo
uid: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: true
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
httpMethod: GET
serviceMap:
datasourceUid: prometheus
streamingEnabled:
search: true
tracesToLogsV2:
# Field with an internal link pointing to a logs data source in Grafana.
# datasourceUid value must match the uid value of the logs data source.
datasourceUid: 'loki'
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
filterByTraceID: false
tracesToLogs:
datasourceUid: loki
tags: [ 'job', 'instance', 'pod', 'namespace', 'service.name' ]
mappedTags: [ { key: 'service.name', value: 'app' } ]
spanStartTimeShift: '1s'
spanEndTimeShift: '-1s'
filterByTraceID: true
filterBySpanID: false
customQuery: true
query: 'method="$${__span.tags.method}"'
tracesToMetrics:
datasourceUid: 'prometheus'
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ]
datasourceUid: prometheus
tags: [ { key: 'service.name' }, { key: 'job' } ]
queries:
- name: 'Sample query'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
tracesToProfiles:
datasourceUid: 'grafana-pyroscope-datasource'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
profileTypeId: 'process_cpu:cpu:nanoseconds:cpu:nanoseconds'
customQuery: true
query: 'method="$${__span.tags.method}"'
serviceMap:
datasourceUid: 'prometheus'
- name: 'Service-Level Latency'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m])) by (le)'
- name: 'Service-Level Calls'
query: 'sum(rate(traces_spanmetrics_calls_total{$$__tags}[5m]))'
- name: 'Service-Level Errors'
query: 'sum(rate(traces_spanmetrics_calls_total{status_code="ERROR", $$__tags}[5m]))'
nodeGraph:
enabled: true
search:
hide: false
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
spanBar:
type: 'Tag'
tag: 'http.path'
streamingEnabled:
search: true
- name: Loki
type: loki
uid: loki
orgId: 1
url: http://loki:3100
isDefault: false
version: 1
editable: false
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: 'trace_id=(\w+)'
name: 'TraceID'
url: '$${__value.raw}'
- name: Jaeger
type: jaeger
uid: Jaeger
uid: jaeger
url: http://jaeger:16686
basicAuth: false
access: proxy
readOnly: false
isDefault: false
editable: false
jsonData:
tracesToLogsV2:
# Field with an internal link pointing to a logs data source in Grafana.
# datasourceUid value must match the uid value of the logs data source.
datasourceUid: 'loki'
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
filterByTraceID: false
tracesToLogs:
datasourceUid: loki
tags: [ 'job', 'instance', 'pod', 'namespace', 'service.name' ]
mappedTags: [ { key: 'service.name', value: 'app' } ]
spanStartTimeShift: '1s'
spanEndTimeShift: '-1s'
filterByTraceID: true
filterBySpanID: false
customQuery: true
query: 'method="$${__span.tags.method}"'
tracesToMetrics:
datasourceUid: 'Prometheus'
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ]
queries:
- name: 'Sample query'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
nodeGraph:
enabled: true
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
spanBar:
type: 'None'

View File

@@ -31,29 +31,19 @@ service:
host: 0.0.0.0
port: 8888
logs:
level: debug
# TODO Initialize telemetry tracer once OTEL released new feature.
# https://github.com/open-telemetry/opentelemetry-collector/issues/10663
level: info
extensions:
healthcheckv2:
use_v2: true
http:
# pprof:
# endpoint: 0.0.0.0:1777
# zpages:
# endpoint: 0.0.0.0:55679
jaeger_query:
storage:
traces: some_store
traces_archive: another_store
traces: badger_store
ui:
config_file: ./cmd/jaeger/config-ui.json
log_access: true
# The maximum duration that is considered for clock skew adjustments.
# Defaults to 0 seconds, which means it's disabled.
max_clock_skew_adjust: 0s
grpc:
endpoint: 0.0.0.0:16685
@@ -62,26 +52,16 @@ extensions:
jaeger_storage:
backends:
some_store:
memory:
max_traces: 1000000
max_events: 100000
another_store:
memory:
max_traces: 1000000
metric_backends:
some_metrics_storage:
prometheus:
endpoint: http://prometheus:9090
normalize_calls: true
normalize_duration: true
badger_store:
badger:
ephemeral: false
directory_key: /badger/key
directory_value: /badger/data
span_store_ttl: 72h
remote_sampling:
# You can either use file or adaptive sampling strategy in remote_sampling
# file:
# path: ./cmd/jaeger/sampling-strategies.json
adaptive:
sampling_store: some_store
sampling_store: badger_store
initial_sampling_probability: 0.1
http:
grpc:
@@ -103,12 +83,8 @@ receivers:
processors:
batch:
metadata_keys: [ "span.kind", "http.method", "http.status_code", "db.system", "db.statement", "messaging.system", "messaging.destination", "messaging.operation","span.events","span.links" ]
# Adaptive Sampling Processor is required to support adaptive sampling.
# It expects remote_sampling extension with `adaptive:` config to be enabled.
adaptive_sampling:
exporters:
jaeger_storage_exporter:
trace_storage: some_store
trace_storage: badger_store

View File

@@ -0,0 +1 @@
*

View File

@@ -17,16 +17,16 @@ auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: debug
log_level: info
grpc_server_max_concurrent_streams: 1000
common:
instance_addr: 127.0.0.1
path_prefix: /tmp/loki
path_prefix: /loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
@@ -66,17 +66,3 @@ ruler:
frontend:
encoding: protobuf
# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
# reporting_enabled: false

View File

@@ -0,0 +1 @@
*

View File

@@ -15,69 +15,70 @@
receivers:
otlp:
protocols:
grpc: # OTLP gRPC receiver
grpc:
endpoint: 0.0.0.0:4317
http: # OTLP HTTP receiver
http:
endpoint: 0.0.0.0:4318
processors:
batch: # Batch processor to improve throughput
timeout: 5s
send_batch_size: 1000
metadata_keys: [ ]
metadata_cardinality_limit: 1000
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
limit_mib: 1024
spike_limit_mib: 256
transform/logs:
log_statements:
- context: log
statements:
# Extract Body as attribute "message"
- set(attributes["message"], body.string)
# Retain the original Body
- set(attributes["log.body"], body.string)
exporters:
otlp/traces: # OTLP exporter for trace data
endpoint: "http://jaeger:4317" # OTLP gRPC endpoint for Jaeger
otlp/tempo:
endpoint: "tempo:4317"
tls:
insecure: true # TLS is disabled in the development environment and a certificate needs to be configured in the production environment.
compression: gzip # Enable compression to reduce network bandwidth
insecure: true
compression: gzip
retry_on_failure:
enabled: true # Enable retry on failure
initial_interval: 1s # Initial interval for retry
max_interval: 30s # Maximum interval for retry
max_elapsed_time: 300s # Maximum elapsed time for retry
enabled: true
initial_interval: 1s
max_interval: 30s
max_elapsed_time: 300s
sending_queue:
enabled: true # Enable sending queue
num_consumers: 10 # Number of consumers
queue_size: 5000 # Queue size
otlp/tempo: # OTLP exporter for trace data
endpoint: "http://tempo:4317" # OTLP gRPC endpoint for tempo
enabled: true
num_consumers: 10
queue_size: 5000
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true # TLS is disabled in the development environment and a certificate needs to be configured in the production environment.
compression: gzip # Enable compression to reduce network bandwidth
insecure: true
compression: gzip
retry_on_failure:
enabled: true # Enable retry on failure
initial_interval: 1s # Initial interval for retry
max_interval: 30s # Maximum interval for retry
max_elapsed_time: 300s # Maximum elapsed time for retry
enabled: true
initial_interval: 1s
max_interval: 30s
max_elapsed_time: 300s
sending_queue:
enabled: true # Enable sending queue
num_consumers: 10 # Number of consumers
queue_size: 5000 # Queue size
prometheus: # Prometheus exporter for metrics data
endpoint: "0.0.0.0:8889" # Prometheus scraping endpoint
send_timestamps: true # Send timestamp
metric_expiration: 5m # Metric expiration time
enabled: true
num_consumers: 10
queue_size: 5000
prometheus:
endpoint: "0.0.0.0:8889"
send_timestamps: true
metric_expiration: 5m
resource_to_telemetry_conversion:
enabled: true # Enable resource to telemetry conversion
otlphttp/loki: # Loki exporter for log data
enabled: true
otlphttp/loki:
endpoint: "http://loki:3100/otlp"
tls:
insecure: true
compression: gzip # Enable compression to reduce network bandwidth
compression: gzip
extensions:
health_check:
endpoint: 0.0.0.0:13133
@@ -85,13 +86,14 @@ extensions:
endpoint: 0.0.0.0:1888
zpages:
endpoint: 0.0.0.0:55679
service:
extensions: [ health_check, pprof, zpages ] # Enable extension
extensions: [ health_check, pprof, zpages ]
pipelines:
traces:
receivers: [ otlp ]
processors: [ memory_limiter, batch ]
exporters: [ otlp/traces, otlp/tempo ]
exporters: [ otlp/tempo, otlp/jaeger ]
metrics:
receivers: [ otlp ]
processors: [ batch ]
@@ -102,20 +104,13 @@ service:
exporters: [ otlphttp/loki ]
telemetry:
logs:
level: "debug" # Collector log level
encoding: "json" # Log encoding: console or json
level: "info"
encoding: "json"
metrics:
level: "detailed" # Can be basic, normal, detailed
level: "normal"
readers:
- periodic:
exporter:
otlp:
protocol: http/protobuf
endpoint: http://otel-collector:4318
- pull:
exporter:
prometheus:
host: '0.0.0.0'
port: 8888

View File

@@ -17,27 +17,40 @@ global:
evaluation_interval: 15s
external_labels:
cluster: 'rustfs-dev' # Label to identify the cluster
relica: '1' # Replica identifier
replica: '1' # Replica identifier
scrape_configs:
- job_name: 'otel-collector-internal'
- job_name: 'otel-collector'
static_configs:
- targets: [ 'otel-collector:8888' ] # Scrape metrics from Collector
scrape_interval: 10s
- job_name: 'rustfs-app-metrics'
static_configs:
- targets: [ 'otel-collector:8889' ] # Application indicators
scrape_interval: 15s
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: 'go_.*'
action: drop # Drop Go runtime metrics if not needed
- job_name: 'tempo'
static_configs:
- targets: [ 'tempo:3200' ] # Scrape metrics from Tempo
- job_name: 'jaeger'
static_configs:
- targets: [ 'jaeger:8888' ] # Jaeger admin port
- targets: [ 'jaeger:14269' ] # Jaeger admin port (14269 is standard for admin/metrics)
- job_name: 'loki'
static_configs:
- targets: [ 'loki:3100' ]
- job_name: 'prometheus'
static_configs:
- targets: [ 'localhost:9090' ]
otlp:
# Recommended attributes to be promoted to labels.
promote_resource_attributes:
- service.instance.id
- service.name
@@ -56,10 +69,8 @@ otlp:
- k8s.pod.name
- k8s.replicaset.name
- k8s.statefulset.name
# Ingest OTLP data keeping all characters in metric/label names.
translation_strategy: NoUTF8EscapingWithSuffixes
storage:
# OTLP is a push-based protocol, Out of order samples is a common scenario.
tsdb:
out_of_order_time_window: 30m

View File

@@ -1,18 +1,21 @@
stream_over_http_enabled: true
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
server:
http_listen_port: 3200
log_level: info
query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
metadata_slo:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s
distributor:
receivers:
otlp:
@@ -25,10 +28,6 @@ distributor:
ingester:
max_block_duration: 5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally
compactor:
compaction:
block_retention: 1h # overall Tempo trace retention. set for demo purposes
metrics_generator:
registry:
external_labels:
@@ -49,9 +48,3 @@ storage:
path: /var/tempo/wal # where to store the wal locally
local:
path: /var/tempo/blocks
overrides:
defaults:
metrics_generator:
processors: [ service-graphs, span-metrics, local-blocks ] # enables metrics generator
generate_native_histograms: both

View File

@@ -5,71 +5,57 @@
English | [中文](README_ZH.md)
This directory contains the configuration files for setting up an observability stack with OpenObserve and OpenTelemetry
Collector.
This directory contains the configuration for an **alternative** observability stack using OpenObserve.
### Overview
## ⚠️ Note
This setup provides a complete observability solution for your applications:
For the **recommended** observability stack (Prometheus, Grafana, Tempo, Loki), please see `../observability/`.
- **OpenObserve**: A modern, open-source observability platform for logs, metrics, and traces.
- **OpenTelemetry Collector**: Collects and processes telemetry data before sending it to OpenObserve.
## 🌟 Overview
### Setup Instructions
OpenObserve is a lightweight, all-in-one observability platform that handles logs, metrics, and traces in a single binary. This setup is ideal for:
- Resource-constrained environments.
- Quick setup and testing.
- Users who prefer a unified UI.
1. **Prerequisites**:
- Docker and Docker Compose installed
- Sufficient memory resources (minimum 2GB recommended)
## 🚀 Quick Start
### 1. Start Services
2. **Starting the Services**:
```bash
cd .docker/openobserve-otel
docker compose -f docker-compose.yml up -d
docker compose up -d
```
3. **Accessing the Dashboard**:
- OpenObserve UI: http://localhost:5080
- Default credentials:
- Username: root@rustfs.com
- Password: rustfs123
### 2. Access Dashboard
### Configuration
- **URL**: [http://localhost:5080](http://localhost:5080)
- **Username**: `root@rustfs.com`
- **Password**: `rustfs123`
#### OpenObserve Configuration
## 🛠️ Configuration
The OpenObserve service is configured with:
### OpenObserve
- Root user credentials
- Data persistence through a volume mount
- Memory cache enabled
- Health checks
- Exposed ports:
- 5080: HTTP API and UI
- 5081: OTLP gRPC
- **Persistence**: Data is persisted to a Docker volume.
- **Ports**:
- `5080`: HTTP API and UI
- `5081`: OTLP gRPC
#### OpenTelemetry Collector Configuration
### OpenTelemetry Collector
The collector is configured to:
- **Receivers**: OTLP (gRPC `4317`, HTTP `4318`)
- **Exporters**: Sends data to OpenObserve.
- Receive telemetry data via OTLP (HTTP and gRPC)
- Collect logs from files
- Process data in batches
- Export data to OpenObserve
- Manage memory usage
## 🔗 Integration
### Integration with Your Application
Configure your application to send OTLP data to the collector:
To send telemetry data from your application, configure your OpenTelemetry SDK to send data to:
- **Endpoint**: `http://localhost:4318` (HTTP) or `localhost:4317` (gRPC)
- OTLP gRPC: `localhost:4317`
- OTLP HTTP: `localhost:4318`
For example, in a Rust application using the `rustfs-obs` library:
Example for RustFS:
```bash
export RUSTFS_OBS_ENDPOINT=http://localhost:4318
export RUSTFS_OBS_SERVICE_NAME=yourservice
export RUSTFS_OBS_SERVICE_VERSION=1.0.0
export RUSTFS_OBS_ENVIRONMENT=development
export RUSTFS_OBS_SERVICE_NAME=rustfs-node-1
```

View File

@@ -5,71 +5,57 @@
[English](README.md) | 中文
## 中文
本目录包含使用 OpenObserve 的**替代**可观测性技术栈配置。
本目录包含搭建 OpenObserve 和 OpenTelemetry Collector 可观测性栈的配置文件。
## ⚠️ 注意
### 概述
对于**推荐**的可观测性技术栈Prometheus, Grafana, Tempo, Loki请参阅 `../observability/`
此设置为应用程序提供了完整的可观测性解决方案:
## 🌟 概览
- **OpenObserve**:现代化、开源的可观测性平台,用于日志、指标和追踪。
- **OpenTelemetry Collector**:收集和处理遥测数据,然后将其发送到 OpenObserve
OpenObserve 是一个轻量级、一体化的可观测性平台,在一个二进制文件中处理日志、指标和追踪。此设置非常适合:
- 资源受限的环境
- 快速设置和测试。
- 喜欢统一 UI 的用户。
### 设置说明
## 🚀 快速开始
1. **前提条件**
- 已安装 Docker 和 Docker Compose
- 足够的内存资源(建议至少 2GB
### 1. 启动服务
2. **启动服务**
```bash
cd .docker/openobserve-otel
docker compose -f docker-compose.yml up -d
docker compose up -d
```
3. **访问仪表板**
- OpenObserve UIhttp://localhost:5080
- 默认凭据:
- 用户名root@rustfs.com
- 密码rustfs123
### 2. 访问仪表盘
### 配置
- **URL**: [http://localhost:5080](http://localhost:5080)
- **用户名**: `root@rustfs.com`
- **密码**: `rustfs123`
#### OpenObserve 配置
## 🛠️ 配置
OpenObserve 服务配置:
### OpenObserve
- 根用户凭据
- 通过卷挂载实现数据持久化
- 启用内存缓存
- 健康检查
- 暴露端口:
- 5080HTTP API 和 UI
- 5081OTLP gRPC
- **持久化**: 数据持久化到 Docker 卷。
- **端口**:
- `5080`: HTTP API 和 UI
- `5081`: OTLP gRPC
#### OpenTelemetry Collector 配置
### OpenTelemetry Collector
收集器配置为:
- **接收器**: OTLP (gRPC `4317`, HTTP `4318`)
- **导出器**: 将数据发送到 OpenObserve。
- 通过 OTLPHTTP 和 gRPC接收遥测数据
- 从文件中收集日志
- 批处理数据
- 将数据导出到 OpenObserve
- 管理内存使用
## 🔗 集成
### 与应用程序集成
配置您的应用程序将 OTLP 数据发送到收集器:
要从应用程序发送遥测数据,将 OpenTelemetry SDK 配置为发送数据到:
- **端点**: `http://localhost:4318` (HTTP) 或 `localhost:4317` (gRPC)
- OTLP gRPC:`localhost:4317`
- OTLP HTTP:`localhost:4318`
例如,在使用 `rustfs-obs` 库的 Rust 应用程序中:
RustFS 示例:
```bash
export RUSTFS_OBS_ENDPOINT=http://localhost:4317
export RUSTFS_OBS_SERVICE_NAME=yourservice
export RUSTFS_OBS_SERVICE_VERSION=1.0.0
export RUSTFS_OBS_ENVIRONMENT=development
export RUSTFS_OBS_ENDPOINT=http://localhost:4318
export RUSTFS_OBS_SERVICE_NAME=rustfs-node-1
```

6
Cargo.lock generated
View File

@@ -530,9 +530,9 @@ dependencies = [
[[package]]
name = "async-compression"
version = "0.4.40"
version = "0.4.41"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7d67d43201f4d20c78bcda740c142ca52482d81da80681533d33bf3f0596c8e2"
checksum = "d0f9ee0f6e02ffd7ad5816e9464499fba7b3effd01123b515c41d1697c43dad1"
dependencies = [
"compression-codecs",
"compression-core",
@@ -8119,7 +8119,7 @@ checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f"
[[package]]
name = "s3s"
version = "0.13.0-alpha.3"
source = "git+https://github.com/s3s-project/s3s.git?rev=61b96d11de81c508ba5361864676824f318ef65c#61b96d11de81c508ba5361864676824f318ef65c"
source = "git+https://github.com/s3s-project/s3s.git?rev=218000387f4c3e67ad478bfc4587931f88b37006#218000387f4c3e67ad478bfc4587931f88b37006"
dependencies = [
"arc-swap",
"arrayvec",

View File

@@ -105,7 +105,7 @@ rustfs-protocols = { path = "crates/protocols", version = "0.0.5" }
# Async Runtime and Networking
async-channel = "2.5.0"
async-compression = { version = "0.4.40" }
async-compression = { version = "0.4.41" }
async-recursion = "1.1.1"
async-trait = "0.1.89"
axum = "0.8.8"
@@ -235,7 +235,7 @@ rumqttc = { version = "0.25.1" }
rustix = { version = "1.1.4", features = ["fs"] }
rust-embed = { version = "8.11.0" }
rustc-hash = { version = "2.1.1" }
s3s = { version = "0.13.0-alpha.3", features = ["minio"], git = "https://github.com/s3s-project/s3s.git", rev = "61b96d11de81c508ba5361864676824f318ef65c" }
s3s = { version = "0.13.0-alpha.3", features = ["minio"], git = "https://github.com/s3s-project/s3s.git", rev = "218000387f4c3e67ad478bfc4587931f88b37006" }
serial_test = "3.4.0"
shadow-rs = { version = "1.7.0", default-features = false }
siphasher = "1.0.2"

View File

@@ -1,3 +1,17 @@
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM alpine:3.23 AS build
ARG TARGETARCH

View File

@@ -1,3 +1,17 @@
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM ubuntu:24.04 AS build
ARG TARGETARCH

View File

@@ -1,4 +1,18 @@
# syntax=docker/dockerfile:1.6
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Multi-stage Dockerfile for RustFS - LOCAL DEVELOPMENT ONLY
#
# IMPORTANT: This Dockerfile builds RustFS from source for local development and testing.

View File

@@ -150,8 +150,8 @@ pub const DEFAULT_CONSOLE_ADDRESS: &str = concat!(":", DEFAULT_CONSOLE_PORT);
/// Default region for rustfs
/// This is the default region for rustfs.
/// It is used to identify the region of the application.
/// Default value: cn-east-1
pub const RUSTFS_REGION: &str = "cn-east-1";
/// Default value: rustfs-global-0
pub const RUSTFS_REGION: &str = "rustfs-global-0";
/// Default log filename for rustfs
/// This is the default log filename for rustfs.

View File

@@ -42,7 +42,6 @@ lazy_static! {
static ref GLOBAL_RUSTFS_PORT: OnceLock<u16> = OnceLock::new();
static ref globalDeploymentIDPtr: OnceLock<Uuid> = OnceLock::new();
pub static ref GLOBAL_OBJECT_API: OnceLock<Arc<ECStore>> = OnceLock::new();
pub static ref GLOBAL_LOCAL_DISK: Arc<RwLock<Vec<Option<DiskStore>>>> = Arc::new(RwLock::new(Vec::new()));
pub static ref GLOBAL_IsErasure: RwLock<bool> = RwLock::new(false);
pub static ref GLOBAL_IsDistErasure: RwLock<bool> = RwLock::new(false);
pub static ref GLOBAL_IsErasureSD: RwLock<bool> = RwLock::new(false);
@@ -57,8 +56,8 @@ lazy_static! {
pub static ref GLOBAL_LocalNodeName: String = "127.0.0.1:9000".to_string();
pub static ref GLOBAL_LocalNodeNameHex: String = rustfs_utils::crypto::hex(GLOBAL_LocalNodeName.as_bytes());
pub static ref GLOBAL_NodeNamesHex: HashMap<String, ()> = HashMap::new();
pub static ref GLOBAL_REGION: OnceLock<String> = OnceLock::new();
pub static ref GLOBAL_LOCAL_LOCK_CLIENT: OnceLock<Arc<dyn rustfs_lock::client::LockClient>> = OnceLock::new();
pub static ref GLOBAL_REGION: OnceLock<s3s::region::Region> = OnceLock::new();
pub static ref GLOBAL_LOCAL_LOCK_CLIENT: OnceLock<Arc<dyn LockClient>> = OnceLock::new();
pub static ref GLOBAL_LOCK_CLIENTS: OnceLock<HashMap<String, Arc<dyn LockClient>>> = OnceLock::new();
pub static ref GLOBAL_BUCKET_MONITOR: OnceLock<Arc<Monitor>> = OnceLock::new();
}
@@ -243,20 +242,20 @@ type TypeLocalDiskSetDrives = Vec<Vec<Vec<Option<DiskStore>>>>;
/// Set the global region
///
/// # Arguments
/// * `region` - The region string to set globally
/// * `region` - The Region instance to set globally
///
/// # Returns
/// * None
pub fn set_global_region(region: String) {
pub fn set_global_region(region: s3s::region::Region) {
GLOBAL_REGION.set(region).unwrap();
}
/// Get the global region
///
/// # Returns
/// * `Option<String>` - The global region string, if set
/// * `Option<s3s::region::Region>` - The global region, if set
///
pub fn get_global_region() -> Option<String> {
pub fn get_global_region() -> Option<s3s::region::Region> {
GLOBAL_REGION.get().cloned()
}

View File

@@ -51,10 +51,10 @@ rustfs-utils = { workspace = true, features = ["path"] }
tokio-util.workspace = true
pollster.workspace = true
reqwest = { workspace = true }
url = { workspace = true }
moka = { workspace = true }
openidconnect = { workspace = true }
http = { workspace = true }
url = { workspace = true }
[dev-dependencies]
pollster.workspace = true

View File

@@ -179,7 +179,7 @@ fn format_with_color(w: &mut dyn std::io::Write, now: &mut DeferredNow, record:
let binding = std::thread::current();
let thread_name = binding.name().unwrap_or("unnamed");
let thread_id = format!("{:?}", std::thread::current().id());
writeln!(
write!(
w,
"[{}] {} [{}] [{}:{}] [{}:{}] {}",
now.now().format(flexi_logger::TS_DASHES_BLANK_COLONS_DOT_BLANK),
@@ -200,7 +200,7 @@ fn format_for_file(w: &mut dyn std::io::Write, now: &mut DeferredNow, record: &R
let binding = std::thread::current();
let thread_name = binding.name().unwrap_or("unnamed");
let thread_id = format!("{:?}", std::thread::current().id());
writeln!(
write!(
w,
"[{}] {} [{}] [{}:{}] [{}:{}] {}",
now.now().format(flexi_logger::TS_DASHES_BLANK_COLONS_DOT_BLANK),

View File

@@ -1,4 +1,17 @@
#!/usr/bin/env bash
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -e

View File

@@ -1,3 +1,17 @@
# Copyright 2024 RustFS Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3.9"
services:

View File

@@ -106,7 +106,37 @@ services:
profiles:
- dev
# OpenTelemetry Collector
# --- Observability Stack ---
tempo-init:
image: busybox:latest
command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ]
volumes:
- tempo_data:/var/tempo
user: root
networks:
- rustfs-network
restart: "no"
profiles:
- observability
tempo:
image: grafana/tempo:latest
user: "10001"
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./.docker/observability/tempo.yaml:/etc/tempo.yaml:ro
- tempo_data:/var/tempo
ports:
- "3200:3200" # tempo
- "4317" # otlp grpc
- "4318" # otlp http
restart: unless-stopped
networks:
- rustfs-network
profiles:
- observability
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
container_name: otel-collector
@@ -115,32 +145,45 @@ services:
volumes:
- ./.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/otel-collector.yml:ro
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "8888:8888" # Prometheus metrics
- "8889:8889" # Prometheus exporter metrics
- "1888:1888" # pprof
- "8888:8888" # Prometheus metrics for Collector
- "8889:8889" # Prometheus metrics for application indicators
- "13133:13133" # health check
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "55679:55679" # zpages
networks:
- rustfs-network
restart: unless-stopped
profiles:
- observability
depends_on:
- tempo
- jaeger
- prometheus
- loki
# Jaeger for tracing
jaeger:
image: jaegertracing/all-in-one:latest
image: jaegertracing/jaeger:latest
container_name: jaeger
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # Jaeger gRPC
environment:
- TZ=Asia/Shanghai
- SPAN_STORAGE_TYPE=badger
- BADGER_EPHEMERAL=false
- BADGER_DIRECTORY_VALUE=/badger/data
- BADGER_DIRECTORY_KEY=/badger/key
- COLLECTOR_OTLP_ENABLED=true
volumes:
- jaeger_data:/badger
ports:
- "16686:16686" # Web UI
- "14269:14269" # Admin/Metrics
networks:
- rustfs-network
restart: unless-stopped
profiles:
- observability
# Prometheus for metrics
prometheus:
image: prom/prometheus:latest
container_name: prometheus
@@ -152,17 +195,35 @@ services:
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.console.libraries=/usr/share/prometheus/console_libraries"
- "--web.console.templates=/usr/share/prometheus/consoles"
- "--storage.tsdb.retention.time=200h"
- "--web.enable-lifecycle"
- "--web.enable-otlp-receiver"
- "--web.enable-remote-write-receiver"
networks:
- rustfs-network
restart: unless-stopped
profiles:
- observability
loki:
image: grafana/loki:latest
container_name: loki
environment:
- TZ=Asia/Shanghai
volumes:
- ./.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml:ro
- loki_data:/loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- rustfs-network
restart: unless-stopped
profiles:
- observability
# Grafana for visualization
grafana:
image: grafana/grafana:latest
container_name: grafana
@@ -171,6 +232,7 @@ services:
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_INSTALL_PLUGINS=grafana-pyroscope-datasource
volumes:
- grafana_data:/var/lib/grafana
- ./.docker/observability/grafana/provisioning:/etc/grafana/provisioning:ro
@@ -180,6 +242,10 @@ services:
restart: unless-stopped
profiles:
- observability
depends_on:
- prometheus
- tempo
- loki
# NGINX reverse proxy (optional)
nginx:
@@ -228,6 +294,12 @@ volumes:
driver: local
grafana_data:
driver: local
tempo_data:
driver: local
loki_data:
driver: local
jaeger_data:
driver: local
logs:
driver: local
cargo_registry:

View File

@@ -324,7 +324,10 @@ impl Operation for ListTargetsArns {
.clone()
.ok_or_else(|| s3_error!(InvalidRequest, "region not found"))?;
let data_target_arn_list: Vec<_> = active_targets.iter().map(|id| id.to_arn(&region).to_string()).collect();
let data_target_arn_list: Vec<_> = active_targets
.iter()
.map(|id| id.to_arn(region.as_str()).to_string())
.collect();
let data = serde_json::to_vec(&data_target_arn_list)
.map_err(|e| s3_error!(InternalError, "failed to serialize targets: {}", e))?;

View File

@@ -242,7 +242,7 @@ impl Operation for AdminOperation {
#[derive(Debug, Clone)]
pub struct Extra {
pub credentials: Option<s3s::auth::Credentials>,
pub region: Option<String>,
pub region: Option<s3s::region::Region>,
pub service: Option<String>,
}

View File

@@ -30,6 +30,7 @@ use crate::storage::*;
use futures::StreamExt;
use http::StatusCode;
use metrics::counter;
use rustfs_config::RUSTFS_REGION;
use rustfs_ecstore::bucket::{
lifecycle::bucket_lifecycle_ops::validate_transition_tier,
metadata::{
@@ -57,6 +58,7 @@ use rustfs_targets::{
use rustfs_utils::http::RUSTFS_FORCE_DELETE;
use rustfs_utils::string::parse_bool;
use s3s::dto::*;
use s3s::region::Region;
use s3s::xml;
use s3s::{S3Error, S3ErrorCode, S3Request, S3Response, S3Result, s3_error};
use std::{fmt::Display, sync::Arc};
@@ -73,8 +75,8 @@ fn to_internal_error(err: impl Display) -> S3Error {
S3Error::with_message(S3ErrorCode::InternalError, format!("{err}"))
}
fn resolve_notification_region(global_region: Option<String>, request_region: Option<String>) -> String {
global_region.unwrap_or_else(|| request_region.unwrap_or_default())
fn resolve_notification_region(global_region: Option<Region>, request_region: Option<Region>) -> Region {
global_region.unwrap_or_else(|| request_region.unwrap_or_else(|| Region::new(RUSTFS_REGION.into()).expect("valid region")))
}
#[derive(Debug, Clone, PartialEq, Eq)]
@@ -165,7 +167,7 @@ impl DefaultBucketUsecase {
self.context.clone()
}
fn global_region(&self) -> Option<String> {
fn global_region(&self) -> Option<Region> {
self.context.as_ref().and_then(|context| context.region().get())
}
@@ -431,7 +433,7 @@ impl DefaultBucketUsecase {
if let Some(region) = self.global_region() {
return Ok(S3Response::new(GetBucketLocationOutput {
location_constraint: Some(BucketLocationConstraint::from(region)),
location_constraint: Some(BucketLocationConstraint::from(region.to_string())),
}));
}
@@ -1230,9 +1232,9 @@ impl DefaultBucketUsecase {
let event_rules =
event_rules_result.map_err(|e| s3_error!(InvalidArgument, "Invalid ARN in notification configuration: {e}"))?;
warn!("notify event rules: {:?}", &event_rules);
let region_clone = region.clone();
notify
.add_event_specific_rules(&bucket, &region, &event_rules)
.add_event_specific_rules(&bucket, region_clone.as_str(), &event_rules)
.await
.map_err(|e| s3_error!(InternalError, "Failed to add rules: {e}"))?;
@@ -1800,20 +1802,23 @@ mod tests {
#[test]
fn resolve_notification_region_prefers_global_region() {
let region = resolve_notification_region(Some("us-east-1".to_string()), Some("ap-southeast-1".to_string()));
let binding = resolve_notification_region(Some("us-east-1".parse().unwrap()), Some("ap-southeast-1".parse().unwrap()));
let region = binding.as_str();
assert_eq!(region, "us-east-1");
}
#[test]
fn resolve_notification_region_falls_back_to_request_region() {
let region = resolve_notification_region(None, Some("ap-southeast-1".to_string()));
let binding = resolve_notification_region(None, Some("ap-southeast-1".parse().unwrap()));
let region = binding.as_str();
assert_eq!(region, "ap-southeast-1");
}
#[test]
fn resolve_notification_region_defaults_to_empty() {
let region = resolve_notification_region(None, None);
assert!(region.is_empty());
fn resolve_notification_region_defaults_value() {
let binding = resolve_notification_region(None, None);
let region = binding.as_str();
assert_eq!(region, RUSTFS_REGION);
}
#[tokio::test]

View File

@@ -75,7 +75,7 @@ pub trait EndpointsInterface: Send + Sync {
/// Region interface for application-layer use-cases.
pub trait RegionInterface: Send + Sync {
fn get(&self) -> Option<String>;
fn get(&self) -> Option<s3s::region::Region>;
}
/// Tier config interface for application-layer and admin handlers.
@@ -190,7 +190,7 @@ impl EndpointsInterface for EndpointsHandle {
pub struct RegionHandle;
impl RegionInterface for RegionHandle {
fn get(&self) -> Option<String> {
fn get(&self) -> Option<s3s::region::Region> {
get_global_region()
}
}

View File

@@ -27,6 +27,7 @@ use crate::storage::options::{
use crate::storage::*;
use bytes::Bytes;
use futures::StreamExt;
use rustfs_config::RUSTFS_REGION;
use rustfs_ecstore::StorageAPI;
use rustfs_ecstore::bucket::quota::checker::QuotaChecker;
use rustfs_ecstore::bucket::{
@@ -164,7 +165,7 @@ impl DefaultMultipartUsecase {
self.context.as_ref().and_then(|context| context.bucket_metadata().handle())
}
fn global_region(&self) -> Option<String> {
fn global_region(&self) -> Option<s3s::region::Region> {
self.context.as_ref().and_then(|context| context.region().get())
}
@@ -422,12 +423,12 @@ impl DefaultMultipartUsecase {
}
}
let region = self.global_region().unwrap_or_else(|| "us-east-1".to_string());
let region = self.global_region().unwrap_or_else(|| RUSTFS_REGION.parse().unwrap());
let output = CompleteMultipartUploadOutput {
bucket: Some(bucket.clone()),
key: Some(key.clone()),
e_tag: obj_info.etag.clone().map(|etag| to_s3s_etag(&etag)),
location: Some(region.clone()),
location: Some(region.to_string()),
server_side_encryption: server_side_encryption.clone(),
ssekms_key_id: ssekms_key_id.clone(),
checksum_crc32: checksum_crc32.clone(),
@@ -448,7 +449,7 @@ impl DefaultMultipartUsecase {
bucket: Some(bucket.clone()),
key: Some(key.clone()),
e_tag: obj_info.etag.clone().map(|etag| to_s3s_etag(&etag)),
location: Some(region),
location: Some(region.to_string()),
server_side_encryption,
ssekms_key_id,
checksum_crc32,

View File

@@ -276,7 +276,7 @@ pub fn get_condition_values(
header: &HeaderMap,
cred: &Credentials,
version_id: Option<&str>,
region: Option<&str>,
region: Option<s3s::region::Region>,
remote_addr: Option<std::net::SocketAddr>,
) -> HashMap<String, Vec<String>> {
let username = if cred.is_temp() || cred.is_service_account() {
@@ -362,7 +362,7 @@ pub fn get_condition_values(
}
if let Some(lc) = region
&& !lc.is_empty()
&& !lc.as_str().is_empty()
{
args.insert("LocationConstraint".to_owned(), vec![lc.to_string()]);
}

View File

@@ -15,8 +15,10 @@
use clap::Parser;
use clap::builder::NonEmptyStringValueParser;
use const_str::concat;
use rustfs_config::RUSTFS_REGION;
use std::path::PathBuf;
use std::string::ToString;
shadow_rs::shadow!(build);
pub mod workload_profiles;
@@ -191,8 +193,10 @@ pub struct Config {
/// tls path for rustfs API and console.
pub tls_path: Option<String>,
/// License key for enterprise features
pub license: Option<String>,
/// Region for the server, used for signing and other region-specific behavior
pub region: Option<String>,
/// Enable KMS encryption for server-side encryption
@@ -280,6 +284,9 @@ impl Config {
.trim()
.to_string();
// Region is optional, but if not set, we should default to "rustfs-global-0" for signing compatibility with AWS S3 clients
let region = region.or_else(|| Some(RUSTFS_REGION.to_string()));
Ok(Config {
volumes,
address,
@@ -329,15 +336,3 @@ impl std::fmt::Debug for Config {
.finish()
}
}
// lazy_static::lazy_static! {
// pub(crate) static ref OPT: OnceLock<Opt> = OnceLock::new();
// }
// pub fn init_config(opt: Opt) {
// OPT.set(opt).expect("Failed to set global config");
// }
// pub fn get_config() -> &'static Opt {
// OPT.get().expect("Global config not initialized")
// }

View File

@@ -93,14 +93,15 @@ pub(crate) fn init_update_check() {
/// * `buckets` - A vector of bucket names to process
#[instrument(skip_all)]
pub(crate) async fn add_bucket_notification_configuration(buckets: Vec<String>) {
let region_opt = rustfs_ecstore::global::get_global_region();
let region = match region_opt {
Some(ref r) if !r.is_empty() => r,
_ => {
let global_region = rustfs_ecstore::global::get_global_region();
let region = global_region
.as_ref()
.filter(|r| !r.as_str().is_empty())
.map(|r| r.as_str())
.unwrap_or_else(|| {
warn!("Global region is not set; attempting notification configuration for all buckets with an empty region.");
""
}
};
});
for bucket in buckets.iter() {
let has_notification_config = metadata_sys::get_notification_config(bucket).await.unwrap_or_else(|err| {
warn!("get_notification_config err {:?}", err);
@@ -368,7 +369,7 @@ pub async fn init_ftp_system() -> Result<Option<tokio::sync::broadcast::Sender<(
// Create FTP server with protocol storage client
let fs = crate::storage::ecfs::FS::new();
let storage_client = ProtocolStorageClient::new(fs);
let server: FtpsServer<crate::protocols::ProtocolStorageClient> = FtpsServer::new(config, storage_client).await?;
let server: FtpsServer<ProtocolStorageClient> = FtpsServer::new(config, storage_client).await?;
// Log server configuration
info!(
@@ -451,7 +452,7 @@ pub async fn init_ftps_system() -> Result<Option<tokio::sync::broadcast::Sender<
// Create FTPS server with protocol storage client
let fs = crate::storage::ecfs::FS::new();
let storage_client = ProtocolStorageClient::new(fs);
let server: FtpsServer<crate::protocols::ProtocolStorageClient> = FtpsServer::new(config, storage_client).await?;
let server: FtpsServer<ProtocolStorageClient> = FtpsServer::new(config, storage_client).await?;
// Log server configuration
info!(

View File

@@ -164,7 +164,10 @@ async fn run(config: config::Config) -> Result<()> {
let readiness = Arc::new(GlobalReadiness::new());
if let Some(region) = &config.region {
rustfs_ecstore::global::set_global_region(region.clone());
let region = region
.parse()
.map_err(|e| Error::other(format!("invalid region {}: {e}", region)))?;
rustfs_ecstore::global::set_global_region(region);
}
let server_addr = parse_and_resolve_address(config.address.as_str()).map_err(Error::other)?;

View File

@@ -44,7 +44,7 @@ pub(crate) struct ReqInfo {
pub bucket: Option<String>,
pub object: Option<String>,
pub version_id: Option<String>,
pub region: Option<String>,
pub region: Option<s3s::region::Region>,
}
#[derive(Clone, Copy)]
@@ -352,7 +352,7 @@ pub async fn authorize_request<T>(req: &mut S3Request<T>, action: Action) -> S3R
&req.headers,
&rustfs_credentials::Credentials::default(),
req_info.version_id.as_deref(),
req.region.as_deref(),
req.region.clone(),
remote_addr,
);
let bucket_name = req_info.bucket.as_deref().unwrap_or("");