diff --git a/.docker/README.md b/.docker/README.md index 6c553495..05d0210f 100644 --- a/.docker/README.md +++ b/.docker/README.md @@ -1,261 +1,131 @@ -# RustFS Docker Images +# RustFS Docker Infrastructure -This directory contains Docker configuration files and supporting infrastructure for building and running RustFS container images. +This directory contains the complete Docker infrastructure for building, deploying, and monitoring RustFS. It provides ready-to-use configurations for development, testing, and production-grade observability. -## 📁 Directory Structure +## 📂 Directory Structure -``` -rustfs/ -├── Dockerfile # Production image (Alpine + pre-built binaries) -├── Dockerfile.source # Development image (Debian + source build) -├── docker-buildx.sh # Multi-architecture build script -├── Makefile # Build automation with simplified commands -└── .docker/ # Supporting infrastructure - ├── observability/ # Monitoring and observability configs - ├── compose/ # Docker Compose configurations - ├── mqtt/ # MQTT broker configs - └── openobserve-otel/ # OpenObserve + OpenTelemetry configs -``` +| Directory | Description | Status | +| :--- | :--- | :--- | +| **[`observability/`](observability/README.md)** | **[RECOMMENDED]** Full-stack observability (Prometheus, Grafana, Tempo, Loki). | ✅ Production-Ready | +| **[`compose/`](compose/README.md)** | Specialized setups (e.g., 4-node distributed cluster testing). | ⚠️ Testing Only | +| **[`mqtt/`](mqtt/README.md)** | EMQX Broker configuration for MQTT integration testing. | 🧪 Development | +| **[`openobserve-otel/`](openobserve-otel/README.md)** | Alternative lightweight observability stack using OpenObserve. | 🔄 Alternative | -## 🎯 Image Variants +--- -### Core Images +## 📄 Root Directory Files -| Image | Base OS | Build Method | Size | Use Case | -|-------|---------|--------------|------|----------| -| `production` (default) | Alpine 3.18 | GitHub Releases | Smallest | Production deployment | -| `source` | Debian Bookworm | Source build | Medium | Custom builds with cross-compilation | -| `dev` | Debian Bookworm | Development tools | Large | Interactive development | +The following files in the project root are essential for Docker operations: -## 🚀 Usage Examples +### Build Scripts & Dockerfiles -### Quick Start (Production) +| File | Description | Usage | +| :--- | :--- | :--- | +| **`docker-buildx.sh`** | **Multi-Arch Build Script**
Automates building and pushing Docker images for `amd64` and `arm64`. Supports release and dev channels. | `./docker-buildx.sh --push` | +| **`Dockerfile`** | **Production Image (Alpine)**
Lightweight image using musl libc. Downloads pre-built binaries from GitHub Releases. | `docker build -t rustfs:latest .` | +| **`Dockerfile.glibc`** | **Production Image (Ubuntu)**
Standard image using glibc. Useful if you need specific dynamic libraries. | `docker build -f Dockerfile.glibc .` | +| **`Dockerfile.source`** | **Development Image**
Builds RustFS from source code. Includes build tools. Ideal for local development and CI. | `docker build -f Dockerfile.source .` | +### Docker Compose Configurations + +| File | Description | Usage | +| :--- | :--- | :--- | +| **`docker-compose.yml`** | **Main Development Setup**
Comprehensive setup with profiles for development, observability, and proxying. | `docker compose up -d`
`docker compose --profile observability up -d` | +| **`docker-compose-simple.yml`** | **Quick Start Setup**
Minimal configuration running a single RustFS instance with 4 volumes. Perfect for first-time users. | `docker compose -f docker-compose-simple.yml up -d` | + +--- + +## 🌟 Observability Stack (Recommended) + +Located in: [`.docker/observability/`](observability/README.md) + +We provide a comprehensive, industry-standard observability stack designed for deep insights into RustFS performance. This is the recommended setup for both development and production monitoring. + +### Components +- **Metrics**: Prometheus (Collection) + Grafana (Visualization) +- **Traces**: Tempo (Storage) + Jaeger (UI) +- **Logs**: Loki +- **Ingestion**: OpenTelemetry Collector + +### Key Features +- **Full Persistence**: All metrics, logs, and traces are saved to Docker volumes, ensuring no data loss on restarts. +- **Correlation**: Seamlessly jump between Logs, Traces, and Metrics in Grafana. +- **High Performance**: Optimized configurations for batching, compression, and memory management. + +### Quick Start ```bash -# Default production image (Alpine + GitHub Releases) -docker run -p 9000:9000 rustfs/rustfs:latest - -# Specific version -docker run -p 9000:9000 rustfs/rustfs:1.2.3 +cd .docker/observability +docker compose up -d ``` -### Complete Tag Strategy Examples +--- +## 🧪 Specialized Environments + +Located in: [`.docker/compose/`](compose/README.md) + +These configurations are tailored for specific testing scenarios that require complex topologies. + +### Distributed Cluster (4-Nodes) +Simulates a real-world distributed environment with 4 RustFS nodes running locally. ```bash -# Stable Releases -docker run rustfs/rustfs:1.2.3 # Main version (production) -docker run rustfs/rustfs:1.2.3-production # Explicit production variant -docker run rustfs/rustfs:1.2.3-source # Source build variant -docker run rustfs/rustfs:latest # Latest stable - -# Prerelease Versions -docker run rustfs/rustfs:1.3.0-alpha.2 # Specific alpha version -docker run rustfs/rustfs:alpha # Latest alpha -docker run rustfs/rustfs:beta # Latest beta -docker run rustfs/rustfs:rc # Latest release candidate - -# Development Versions -docker run rustfs/rustfs:dev # Latest main branch development -docker run rustfs/rustfs:dev-13e4a0b # Specific commit -docker run rustfs/rustfs:dev-latest # Latest development -docker run rustfs/rustfs:main-latest # Main branch latest +docker compose -f .docker/compose/docker-compose.cluster.yaml up -d ``` -### Development Environment - +### Integrated Observability Test +A self-contained environment running 4 RustFS nodes alongside the full observability stack. Useful for end-to-end testing of telemetry. ```bash -# Quick setup using Makefile (recommended) -make docker-dev-local # Build development image locally -make dev-env-start # Start development container - -# Manual Docker commands -docker run -it -v $(pwd):/workspace -p 9000:9000 rustfs/rustfs:latest-dev - -# Build from source locally -docker build -f Dockerfile.source -t rustfs:custom . - -# Development with hot reload -docker-compose up rustfs-dev +docker compose -f .docker/compose/docker-compose.observability.yaml up -d ``` -## 🏗️ Build Arguments and Scripts +--- -### Using Makefile Commands (Recommended) +## 📡 MQTT Integration -The easiest way to build images using simplified commands: +Located in: [`.docker/mqtt/`](mqtt/README.md) +Provides an EMQX broker for testing RustFS MQTT features. + +### Quick Start ```bash -# Development images (build from source) -make docker-dev-local # Build for local use (single arch) -make docker-dev # Build multi-arch (for CI/CD) -make docker-dev-push REGISTRY=xxx # Build and push to registry - -# Production images (using pre-built binaries) -make docker-buildx # Build multi-arch production images -make docker-buildx-push # Build and push production images -make docker-buildx-version VERSION=v1.0.0 # Build specific version - -# Development environment -make dev-env-start # Start development container -make dev-env-stop # Stop development container -make dev-env-restart # Restart development container - -# Help -make help-docker # Show all Docker-related commands +cd .docker/mqtt +docker compose up -d ``` +- **Dashboard**: [http://localhost:18083](http://localhost:18083) (Default: `admin` / `public`) +- **MQTT Port**: `1883` -### Using docker-buildx.sh (Advanced) +--- -For direct script usage and advanced scenarios: +## 👁️ Alternative: OpenObserve +Located in: [`.docker/openobserve-otel/`](openobserve-otel/README.md) + +For users preferring a lightweight, all-in-one solution, we support OpenObserve. It combines logs, metrics, and traces into a single binary and UI. + +### Quick Start ```bash -# Build latest version for all architectures -./docker-buildx.sh - -# Build and push to registry -./docker-buildx.sh --push - -# Build specific version -./docker-buildx.sh --release v1.2.3 - -# Build and push specific version -./docker-buildx.sh --release v1.2.3 --push +cd .docker/openobserve-otel +docker compose up -d ``` -### Manual Docker Builds +--- -All images support dynamic version selection: +## 🔧 Common Operations +### Cleaning Up +To stop all containers and remove volumes (**WARNING**: deletes all persisted data): ```bash -# Build production image with latest release -docker build --build-arg RELEASE="latest" -t rustfs:latest . - -# Build from source with specific target -docker build -f Dockerfile.source \ - --build-arg TARGETPLATFORM="linux/amd64" \ - -t rustfs:source . - -# Development build -docker build -f Dockerfile.source -t rustfs:dev . +docker compose down -v ``` -## 🔧 Binary Download Sources - -### Unified GitHub Releases - -The production image downloads from GitHub Releases for reliability and transparency: - -- ✅ **production** → GitHub Releases API with automatic latest detection -- ✅ **Checksum verification** → SHA256SUMS validation when available -- ✅ **Multi-architecture** → Supports amd64 and arm64 - -### Source Build - -The source variant compiles from source code with advanced features: - -- 🔧 **Cross-compilation** → Supports multiple target platforms via `TARGETPLATFORM` -- ⚡ **Build caching** → sccache for faster compilation -- 🎯 **Optimized builds** → Release optimizations with LTO and symbol stripping - -## 📋 Architecture Support - -All variants support multi-architecture builds: - -- **linux/amd64** (x86_64) -- **linux/arm64** (aarch64) - -Architecture is automatically detected during build using Docker's `TARGETARCH` build argument. - -## 🔐 Security Features - -- **Checksum Verification**: Production image verifies SHA256SUMS when available -- **Non-root User**: All images run as user `rustfs` (UID 1000) -- **Minimal Runtime**: Production image only includes necessary dependencies -- **Secure Defaults**: No hardcoded credentials or keys - -## 🛠️ Development Workflow - -### Quick Start with Makefile (Recommended) - +### Viewing Logs +To follow logs for a specific service: ```bash -# 1. Start development environment -make dev-env-start - -# 2. Your development container is now running with: -# - Port 9000 exposed for RustFS -# - Port 9010 exposed for admin console -# - Current directory mounted as /workspace - -# 3. Stop when done -make dev-env-stop +docker compose logs -f [service_name] ``` -### Manual Development Setup - +### Checking Status +To see the status of all running containers: ```bash -# Build development image from source -make docker-dev-local - -# Or use traditional Docker commands -docker build -f Dockerfile.source -t rustfs:dev . - -# Run with development tools -docker run -it -v $(pwd):/workspace -p 9000:9000 rustfs:dev bash - -# Or use docker-compose for complex setups -docker-compose up rustfs-dev +docker compose ps ``` - -### Common Development Tasks - -```bash -# Build and test locally -make build # Build binary natively -make docker-dev-local # Build development Docker image -make test # Run tests -make fmt # Format code -make clippy # Run linter - -# Get help -make help # General help -make help-docker # Docker-specific help -make help-build # Build-specific help -``` - -## 🚀 CI/CD Integration - -The project uses GitHub Actions for automated multi-architecture Docker builds: - -### Automated Builds - -- **Tags**: Automatic builds triggered on version tags (e.g., `v1.2.3`) -- **Main Branch**: Development builds with `dev-latest` and `main-latest` tags -- **Pull Requests**: Test builds without registry push - -### Build Variants - -Each build creates three image variants: - -- `rustfs/rustfs:v1.2.3` (production - Alpine-based) -- `rustfs/rustfs:v1.2.3-source` (source build - Debian-based) -- `rustfs/rustfs:v1.2.3-dev` (development - Debian-based with tools) - -### Manual Builds - -Trigger custom builds via GitHub Actions: - -```bash -# Use workflow_dispatch to build specific versions -# Available options: latest, main-latest, dev-latest, v1.2.3, dev-abc123 -``` - -## 📦 Supporting Infrastructure - -The `.docker/` directory contains supporting configuration files: - -- **observability/** - Prometheus, Grafana, OpenTelemetry configs -- **compose/** - Multi-service Docker Compose setups -- **mqtt/** - MQTT broker configurations -- **openobserve-otel/** - Log aggregation and tracing setup - -See individual README files in each subdirectory for specific usage instructions. diff --git a/.docker/compose/README.md b/.docker/compose/README.md index 600a6aad..06a91bee 100644 --- a/.docker/compose/README.md +++ b/.docker/compose/README.md @@ -1,80 +1,44 @@ -# Docker Compose Configurations +# Specialized Docker Compose Configurations -This directory contains specialized Docker Compose configurations for different use cases. +This directory contains specialized Docker Compose configurations for specific testing scenarios. + +## ⚠️ Important Note + +**For Observability:** +We **strongly recommend** using the new, fully integrated observability stack located in `../observability/`. It provides a production-ready setup with Prometheus, Grafana, Tempo, Loki, and OpenTelemetry Collector, all with persistent storage and optimized configurations. + +The `docker-compose.observability.yaml` in this directory is kept for legacy reference or specific minimal testing needs but is **not** the primary recommended setup. ## 📁 Configuration Files -This directory contains specialized Docker Compose configurations and their associated Dockerfiles, keeping related files organized together. +### Cluster Testing -### Main Configuration (Root Directory) +- **`docker-compose.cluster.yaml`** + - **Purpose**: Simulates a 4-node RustFS distributed cluster. + - **Use Case**: Testing distributed storage logic, consensus, and failover. + - **Nodes**: 4 RustFS instances. + - **Storage**: Uses local HTTP endpoints. -- **`../../docker-compose.yml`** - **Default Production Setup** - - Complete production-ready configuration - - Includes RustFS server + full observability stack - - Supports multiple profiles: `dev`, `observability`, `cache`, `proxy` - - Recommended for most users +### Legacy / Minimal Observability -### Specialized Configurations - -- **`docker-compose.cluster.yaml`** - **Distributed Testing** - - 4-node cluster setup for testing distributed storage - - Uses local compiled binaries - - Simulates multi-node environment - - Ideal for development and cluster testing - -- **`docker-compose.observability.yaml`** - **Observability Focus** - - Specialized setup for testing observability features - - Includes OpenTelemetry, Jaeger, Prometheus, Loki, Grafana - - Uses `../../Dockerfile.source` for builds - - Perfect for observability development +- **`docker-compose.observability.yaml`** + - **Purpose**: A minimal observability setup. + - **Status**: **Deprecated**. Please use `../observability/docker-compose.yml` instead. ## 🚀 Usage Examples -### Production Setup - -```bash -# Start main service -docker-compose up -d - -# Start with development profile -docker-compose --profile dev up -d - -# Start with full observability -docker-compose --profile observability up -d -``` - ### Cluster Testing -```bash -# Build and start 4-node cluster (run from project root) -cd .docker/compose -docker-compose -f docker-compose.cluster.yaml up -d - -# Or run directly from project root -docker-compose -f .docker/compose/docker-compose.cluster.yaml up -d -``` - -### Observability Testing +To start a 4-node cluster for distributed testing: ```bash -# Start observability-focused environment (run from project root) -cd .docker/compose -docker-compose -f docker-compose.observability.yaml up -d - -# Or run directly from project root -docker-compose -f .docker/compose/docker-compose.observability.yaml up -d +# From project root +docker compose -f .docker/compose/docker-compose.cluster.yaml up -d ``` -## 🔧 Configuration Overview +### (Deprecated) Minimal Observability -| Configuration | Nodes | Storage | Observability | Use Case | -|---------------|-------|---------|---------------|----------| -| **Main** | 1 | Volume mounts | Full stack | Production | -| **Cluster** | 4 | HTTP endpoints | Basic | Testing | -| **Observability** | 4 | Local data | Advanced | Development | - -## 📝 Notes - -- Always ensure you have built the required binaries before starting cluster tests -- The main configuration is sufficient for most use cases -- Specialized configurations are for specific testing scenarios +```bash +# From project root +docker compose -f .docker/compose/docker-compose.observability.yaml up -d +``` diff --git a/.docker/compose/docker-compose.observability.yaml b/.docker/compose/docker-compose.observability.yaml index 08127078..e6495b4c 100644 --- a/.docker/compose/docker-compose.observability.yaml +++ b/.docker/compose/docker-compose.observability.yaml @@ -13,65 +13,126 @@ # limitations under the License. services: + # --- Observability Stack --- + + tempo-init: + image: busybox:latest + command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ] + volumes: + - tempo-data:/var/tempo + user: root + networks: + - rustfs-network + restart: "no" + + tempo: + image: grafana/tempo:latest + user: "10001" + command: [ "-config.file=/etc/tempo.yaml" ] + volumes: + - ../../.docker/observability/tempo.yaml:/etc/tempo.yaml:ro + - tempo-data:/var/tempo + ports: + - "3200:3200" # tempo + - "4317" # otlp grpc + - "4318" # otlp http + restart: unless-stopped + networks: + - rustfs-network + otel-collector: - image: otel/opentelemetry-collector-contrib:0.129.1 + image: otel/opentelemetry-collector-contrib:latest environment: - TZ=Asia/Shanghai volumes: - - ../../.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml + - ../../.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro ports: - - 1888:1888 - - 8888:8888 - - 8889:8889 - - 13133:13133 - - 4317:4317 - - 4318:4318 - - 55679:55679 + - "1888:1888" # pprof + - "8888:8888" # Prometheus metrics for Collector + - "8889:8889" # Prometheus metrics for application indicators + - "13133:13133" # health check + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "55679:55679" # zpages networks: - rustfs-network + depends_on: + - tempo + - jaeger + - prometheus + - loki + jaeger: - image: jaegertracing/jaeger:2.8.0 + image: jaegertracing/jaeger:latest environment: - TZ=Asia/Shanghai + - SPAN_STORAGE_TYPE=badger + - BADGER_EPHEMERAL=false + - BADGER_DIRECTORY_VALUE=/badger/data + - BADGER_DIRECTORY_KEY=/badger/key + - COLLECTOR_OTLP_ENABLED=true + volumes: + - jaeger-data:/badger ports: - - "16686:16686" - - "14317:4317" - - "14318:4318" + - "16686:16686" # Web UI + - "14269:14269" # Admin/Metrics networks: - rustfs-network + prometheus: - image: prom/prometheus:v3.4.2 + image: prom/prometheus:latest environment: - TZ=Asia/Shanghai volumes: - - ../../.docker/observability/prometheus.yml:/etc/prometheus/prometheus.yml + - ../../.docker/observability/prometheus.yml:/etc/prometheus/prometheus.yml:ro + - prometheus-data:/prometheus ports: - "9090:9090" + command: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--web.enable-otlp-receiver' + - '--web.enable-remote-write-receiver' + - '--enable-feature=promql-experimental-functions' + - '--storage.tsdb.path=/prometheus' + - '--web.console.libraries=/usr/share/prometheus/console_libraries' + - '--web.console.templates=/usr/share/prometheus/consoles' networks: - rustfs-network + loki: - image: grafana/loki:3.5.1 + image: grafana/loki:latest environment: - TZ=Asia/Shanghai volumes: - - ../../.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml + - ../../.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml:ro + - loki-data:/loki ports: - "3100:3100" command: -config.file=/etc/loki/local-config.yaml networks: - rustfs-network + grafana: - image: grafana/grafana:12.0.2 + image: grafana/grafana:latest ports: - "3000:3000" # Web UI environment: - GF_SECURITY_ADMIN_PASSWORD=admin + - GF_SECURITY_ADMIN_USER=admin - TZ=Asia/Shanghai + - GF_INSTALL_PLUGINS=grafana-pyroscope-datasource + - GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/home.json networks: - rustfs-network volumes: - ../../.docker/observability/grafana/provisioning:/etc/grafana/provisioning:ro - ../../.docker/observability/grafana/dashboards:/var/lib/grafana/dashboards:ro + depends_on: + - prometheus + - tempo + - loki + + # --- RustFS Cluster --- node1: build: @@ -86,9 +147,11 @@ services: - RUSTFS_OBS_LOGGER_LEVEL=debug platform: linux/amd64 ports: - - "9001:9000" # Map port 9001 of the host to port 9000 of the container + - "9001:9000" networks: - rustfs-network + depends_on: + - otel-collector node2: build: @@ -103,9 +166,11 @@ services: - RUSTFS_OBS_LOGGER_LEVEL=debug platform: linux/amd64 ports: - - "9002:9000" # Map port 9002 of the host to port 9000 of the container + - "9002:9000" networks: - rustfs-network + depends_on: + - otel-collector node3: build: @@ -120,9 +185,11 @@ services: - RUSTFS_OBS_LOGGER_LEVEL=debug platform: linux/amd64 ports: - - "9003:9000" # Map port 9003 of the host to port 9000 of the container + - "9003:9000" networks: - rustfs-network + depends_on: + - otel-collector node4: build: @@ -137,9 +204,17 @@ services: - RUSTFS_OBS_LOGGER_LEVEL=debug platform: linux/amd64 ports: - - "9004:9000" # Map port 9004 of the host to port 9000 of the container + - "9004:9000" networks: - rustfs-network + depends_on: + - otel-collector + +volumes: + prometheus-data: + tempo-data: + loki-data: + jaeger-data: networks: rustfs-network: diff --git a/.docker/mqtt/README.md b/.docker/mqtt/README.md new file mode 100644 index 00000000..f0d05559 --- /dev/null +++ b/.docker/mqtt/README.md @@ -0,0 +1,30 @@ +# MQTT Broker (EMQX) + +This directory contains the configuration for running an EMQX MQTT broker, which can be used for testing RustFS's MQTT integration. + +## 🚀 Quick Start + +To start the EMQX broker: + +```bash +docker compose up -d +``` + +## 📊 Access + +- **Dashboard**: [http://localhost:18083](http://localhost:18083) +- **Default Credentials**: `admin` / `public` +- **MQTT Port**: `1883` +- **WebSocket Port**: `8083` + +## 🛠️ Configuration + +The `docker-compose.yml` file sets up a single-node EMQX instance. + +- **Persistence**: Data is not persisted by default (for testing). +- **Network**: Uses the default bridge network. + +## 📝 Notes + +- This setup is intended for development and testing purposes. +- For production deployments, please refer to the official [EMQX Documentation](https://www.emqx.io/docs/en/latest/). diff --git a/.docker/nginx/nginx.conf b/.docker/nginx/nginx.conf new file mode 100644 index 00000000..2e574e04 --- /dev/null +++ b/.docker/nginx/nginx.conf @@ -0,0 +1,82 @@ +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +worker_processes auto; +pid /var/run/nginx.pid; + +events { + worker_connections 1024; +} + +http { + include /etc/nginx/mime.types; + default_type application/octet-stream; + + log_format main '$remote_addr - $remote_user [$time_local] "$request" ' + '$status $body_bytes_sent "$http_referer" ' + '"$http_user_agent" "$http_x_forwarded_for"'; + + access_log /var/log/nginx/access.log main; + error_log /var/log/nginx/error.log warn; + + sendfile on; + keepalive_timeout 65; + + # RustFS Server Block + server { + listen 80; + server_name localhost; + + # Redirect HTTP to HTTPS (optional, uncomment if SSL is configured) + # return 301 https://$host$request_uri; + + location / { + proxy_pass http://rustfs:9000; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + # S3 specific headers + proxy_set_header X-Amz-Date $http_x_amz_date; + proxy_set_header Authorization $http_authorization; + + # Disable buffering for large uploads + proxy_request_buffering off; + client_max_body_size 0; + } + + location /rustfs/console { + proxy_pass http://rustfs:9001; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + } + + # SSL Configuration (Example) + # server { + # listen 443 ssl; + # server_name localhost; + # + # ssl_certificate /etc/nginx/ssl/server.crt; + # ssl_certificate_key /etc/nginx/ssl/server.key; + # + # location / { + # proxy_pass http://rustfs:9000; + # ... + # } + # } +} diff --git a/.docker/nginx/ssl/.keep b/.docker/nginx/ssl/.keep new file mode 100644 index 00000000..e69de29b diff --git a/.docker/observability/.gitignore b/.docker/observability/.gitignore new file mode 100644 index 00000000..02b805c3 --- /dev/null +++ b/.docker/observability/.gitignore @@ -0,0 +1,5 @@ +jaeger-data/* +loki-data/* +prometheus-data/* +tempo-data/* +grafana-data/* \ No newline at end of file diff --git a/.docker/observability/README.md b/.docker/observability/README.md index e24e928a..43b45cec 100644 --- a/.docker/observability/README.md +++ b/.docker/observability/README.md @@ -1,109 +1,85 @@ -# Observability +# RustFS Observability Stack -This directory contains the observability stack for the application. The stack is composed of the following components: +This directory contains the comprehensive observability stack for RustFS, designed to provide deep insights into application performance, logs, and traces. -- Prometheus v3.2.1 -- Grafana 11.6.0 -- Loki 3.4.2 -- Jaeger 2.4.0 -- Otel Collector 0.120.0 # 0.121.0 remove loki +## Components -## Prometheus +The stack is composed of the following best-in-class open-source components: -Prometheus is a monitoring and alerting toolkit. It scrapes metrics from instrumented jobs, either directly or via an -intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to -either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be -used to visualize the collected data. +- **Prometheus** (v2.53.1): The industry standard for metric collection and alerting. +- **Grafana** (v11.1.0): The leading platform for observability visualization. +- **Loki** (v3.1.0): A horizontally-scalable, highly-available, multi-tenant log aggregation system. +- **Tempo** (v2.5.0): A high-volume, minimal dependency distributed tracing backend. +- **Jaeger** (v1.59.0): Distributed tracing system (configured as a secondary UI/storage). +- **OpenTelemetry Collector** (v0.104.0): A vendor-agnostic implementation for receiving, processing, and exporting telemetry data. -## Grafana +## Architecture -Grafana is a multi-platform open-source analytics and interactive visualization web application. It provides charts, -graphs, and alerts for the web when connected to supported data sources. +1. **Telemetry Collection**: Applications send OTLP (OpenTelemetry Protocol) data (Metrics, Logs, Traces) to the **OpenTelemetry Collector**. +2. **Processing & Exporting**: The Collector processes the data (batching, memory limiting) and exports it to the respective backends: + - **Traces** -> **Tempo** (Primary) & **Jaeger** (Secondary/Optional) + - **Metrics** -> **Prometheus** (via scraping the Collector's exporter) + - **Logs** -> **Loki** +3. **Visualization**: **Grafana** connects to all backends (Prometheus, Tempo, Loki, Jaeger) to provide a unified dashboard experience. -## Loki +## Features -Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is -designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of -labels for each log stream. +- **Full Persistence**: All data (Metrics, Logs, Traces) is persisted to Docker volumes, ensuring no data loss on restart. +- **Correlation**: Seamless navigation between Metrics, Logs, and Traces in Grafana. + - Jump from a Metric spike to relevant Traces. + - Jump from a Trace to relevant Logs. +- **High Performance**: Optimized configurations for batching, compression, and memory management. +- **Standardized Protocols**: Built entirely on OpenTelemetry standards. -## Jaeger +## Quick Start -Jaeger is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and -troubleshooting microservices-based distributed systems, including: +### Prerequisites -- Distributed context propagation -- Distributed transaction monitoring -- Root cause analysis -- Service dependency analysis -- Performance / latency optimization +- Docker +- Docker Compose -## Otel Collector +### Deploy -The OpenTelemetry Collector offers a vendor-agnostic implementation on how to receive, process, and export telemetry -data. It removes the need to run, operate, and maintain multiple agents/collectors in order to support open-source -observability data formats (e.g. Jaeger, Prometheus, etc.) sending to one or more open-source or commercial back-ends. - -## How to use - -To deploy the observability stack, run the following command: - -- docker latest version +Run the following command to start the entire stack: ```bash -docker compose -f docker-compose.yml -f docker-compose.override.yml up -d +docker compose up -d ``` -- docker compose v2.0.0 or before +### Access Dashboards + +| Service | URL | Credentials | Description | +| :--- | :--- | :--- | :--- | +| **Grafana** | [http://localhost:3000](http://localhost:3000) | `admin` / `admin` | Main visualization hub. | +| **Prometheus** | [http://localhost:9090](http://localhost:9090) | - | Metric queries and status. | +| **Jaeger UI** | [http://localhost:16686](http://localhost:16686) | - | Secondary trace visualization. | +| **Tempo** | [http://localhost:3200](http://localhost:3200) | - | Tempo status/metrics. | + +## Configuration + +### Data Persistence + +Data is stored in the following Docker volumes: + +- `prometheus-data`: Prometheus metrics +- `tempo-data`: Tempo traces (WAL and Blocks) +- `loki-data`: Loki logs (Chunks and Rules) +- `jaeger-data`: Jaeger traces (Badger DB) + +To clear all data: ```bash -docker-compose -f docker-compose.yml -f docker-compose.override.yml up -d +docker compose down -v ``` -To access the Grafana dashboard, navigate to `http://localhost:3000` in your browser. The default username and password -are `admin` and `admin`, respectively. - -To access the Jaeger dashboard, navigate to `http://localhost:16686` in your browser. - -To access the Prometheus dashboard, navigate to `http://localhost:9090` in your browser. - -## How to stop - -To stop the observability stack, run the following command: - -```bash -docker compose -f docker-compose.yml -f docker-compose.override.yml down -``` - -## How to remove data - -To remove the data generated by the observability stack, run the following command: - -```bash -docker compose -f docker-compose.yml -f docker-compose.override.yml down -v -``` - -## How to configure - -To configure the observability stack, modify the `docker-compose.override.yml` file. The file contains the following - -```yaml -services: - prometheus: - environment: - - PROMETHEUS_CONFIG_FILE=/etc/prometheus/prometheus.yml - volumes: - - ./prometheus.yml:/etc/prometheus/prometheus.yml - - grafana: - environment: - - GF_SECURITY_ADMIN_PASSWORD=admin - volumes: - - ./grafana/provisioning:/etc/grafana/provisioning -``` - -The `prometheus` service mounts the `prometheus.yml` file to `/etc/prometheus/prometheus.yml`. The `grafana` service -mounts the `grafana/provisioning` directory to `/etc/grafana/provisioning`. You can modify these files to configure the -observability stack. +### Customization +- **Prometheus**: Edit `prometheus.yml` to add scrape targets or alerting rules. +- **Grafana**: Dashboards and datasources are provisioned from the `grafana/` directory. +- **Collector**: Edit `otel-collector-config.yaml` to modify pipelines, processors, or exporters. +## Troubleshooting +- **Service Health**: Check the health of services using `docker compose ps`. +- **Logs**: View logs for a specific service using `docker compose logs -f `. +- **Otel Collector**: Check `http://localhost:13133` for health status and `http://localhost:1888/debug/pprof/` for profiling. diff --git a/.docker/observability/README_ZH.md b/.docker/observability/README_ZH.md index 48568689..75d6e80e 100644 --- a/.docker/observability/README_ZH.md +++ b/.docker/observability/README_ZH.md @@ -1,27 +1,85 @@ -## 部署可观测性系统 +# RustFS 可观测性技术栈 -OpenTelemetry Collector 提供了一个厂商中立的遥测数据处理方案,用于接收、处理和导出遥测数据。它消除了为支持多种开源可观测性数据格式(如 -Jaeger、Prometheus 等)而需要运行和维护多个代理/收集器的必要性。 +本目录包含 RustFS 的全面可观测性技术栈,旨在提供对应用程序性能、日志和追踪的深入洞察。 -### 快速部署 +## 组件 -1. 进入 `.docker/observability` 目录 -2. 执行以下命令启动服务: +该技术栈由以下一流的开源组件组成: + +- **Prometheus** (v2.53.1): 行业标准的指标收集和告警工具。 +- **Grafana** (v11.1.0): 领先的可观测性可视化平台。 +- **Loki** (v3.1.0): 水平可扩展、高可用、多租户的日志聚合系统。 +- **Tempo** (v2.5.0): 高吞吐量、最小依赖的分布式追踪后端。 +- **Jaeger** (v1.59.0): 分布式追踪系统(配置为辅助 UI/存储)。 +- **OpenTelemetry Collector** (v0.104.0): 接收、处理和导出遥测数据的供应商无关实现。 + +## 架构 + +1. **遥测收集**: 应用程序将 OTLP (OpenTelemetry Protocol) 数据(指标、日志、追踪)发送到 **OpenTelemetry Collector**。 +2. **处理与导出**: Collector 处理数据(批处理、内存限制)并将其导出到相应的后端: + - **追踪** -> **Tempo** (主要) & **Jaeger** (辅助/可选) + - **指标** -> **Prometheus** (通过抓取 Collector 的导出器) + - **日志** -> **Loki** +3. **可视化**: **Grafana** 连接到所有后端(Prometheus, Tempo, Loki, Jaeger),提供统一的仪表盘体验。 + +## 特性 + +- **完全持久化**: 所有数据(指标、日志、追踪)都持久化到 Docker 卷,确保重启后无数据丢失。 +- **关联性**: 在 Grafana 中实现指标、日志和追踪之间的无缝导航。 + - 从指标峰值跳转到相关追踪。 + - 从追踪跳转到相关日志。 +- **高性能**: 针对批处理、压缩和内存管理进行了优化配置。 +- **标准化协议**: 完全基于 OpenTelemetry 标准构建。 + +## 快速开始 + +### 前置条件 + +- Docker +- Docker Compose + +### 部署 + +运行以下命令启动整个技术栈: ```bash -docker compose -f docker-compose.yml up -d +docker compose up -d ``` -### 访问监控面板 +### 访问仪表盘 -服务启动后,可通过以下地址访问各个监控面板: +| 服务 | URL | 凭据 | 描述 | +| :--- | :--- | :--- | :--- | +| **Grafana** | [http://localhost:3000](http://localhost:3000) | `admin` / `admin` | 主要可视化中心。 | +| **Prometheus** | [http://localhost:9090](http://localhost:9090) | - | 指标查询和状态。 | +| **Jaeger UI** | [http://localhost:16686](http://localhost:16686) | - | 辅助追踪可视化。 | +| **Tempo** | [http://localhost:3200](http://localhost:3200) | - | Tempo 状态/指标。 | -- Grafana: `http://localhost:3000` (默认账号/密码:`admin`/`admin`) -- Jaeger: `http://localhost:16686` -- Prometheus: `http://localhost:9090` +## 配置 -## 配置可观测性 +### 数据持久化 -```shell -export RUSTFS_OBS_ENDPOINT="http://localhost:4317" # OpenTelemetry Collector 地址 +数据存储在以下 Docker 卷中: + +- `prometheus-data`: Prometheus 指标 +- `tempo-data`: Tempo 追踪 (WAL 和 Blocks) +- `loki-data`: Loki 日志 (Chunks 和 Rules) +- `jaeger-data`: Jaeger 追踪 (Badger DB) + +要清除所有数据: + +```bash +docker compose down -v ``` + +### 自定义 + +- **Prometheus**: 编辑 `prometheus.yml` 以添加抓取目标或告警规则。 +- **Grafana**: 仪表盘和数据源从 `grafana/` 目录预置。 +- **Collector**: 编辑 `otel-collector-config.yaml` 以修改管道、处理器或导出器。 + +## 故障排除 + +- **服务健康**: 使用 `docker compose ps` 检查服务健康状况。 +- **日志**: 使用 `docker compose logs -f ` 查看特定服务的日志。 +- **Otel Collector**: 检查 `http://localhost:13133` 获取健康状态,检查 `http://localhost:1888/debug/pprof/` 进行性能分析。 diff --git a/.docker/observability/docker-compose.yml b/.docker/observability/docker-compose.yml index a9d12685..5ed6ac6d 100644 --- a/.docker/observability/docker-compose.yml +++ b/.docker/observability/docker-compose.yml @@ -14,6 +14,8 @@ services: + # --- Tracing --- + tempo-init: image: busybox:latest command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ] @@ -26,74 +28,52 @@ services: tempo: image: grafana/tempo:latest - user: "10001" # The container must be started with root to execute chown in the script - command: [ "-config.file=/etc/tempo.yaml" ] # This is passed as a parameter to the entry point script + user: "10001" + command: [ "-config.file=/etc/tempo.yaml" ] volumes: - ./tempo.yaml:/etc/tempo.yaml:ro - ./tempo-data:/var/tempo ports: - "3200:3200" # tempo - - "24317:4317" # otlp grpc - - "24318:4318" # otlp http + - "4317" # otlp grpc + - "4318" # otlp http restart: unless-stopped networks: - otel-network healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3200/metrics" ] + test: [ "CMD-SHELL", "wget --spider -q http://localhost:3200/metrics || exit 1" ] interval: 10s timeout: 5s - retries: 3 - start_period: 15s - - otel-collector: - image: otel/opentelemetry-collector-contrib:latest - environment: - - TZ=Asia/Shanghai - volumes: - - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro - ports: - - "1888:1888" # pprof - - "8888:8888" # Prometheus metrics for Collector - - "8889:8889" # Prometheus metrics for application indicators - - "13133:13133" # health check - - "4317:4317" # OTLP gRPC - - "4318:4318" # OTLP HTTP - - "55679:55679" # zpages - networks: - - otel-network - depends_on: - jaeger: - condition: service_started - tempo: - condition: service_started - prometheus: - condition: service_started - loki: - condition: service_started - healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:13133" ] - interval: 10s - timeout: 5s - retries: 3 + retries: 5 + start_period: 40s jaeger: image: jaegertracing/jaeger:latest environment: - TZ=Asia/Shanghai - - SPAN_STORAGE_TYPE=memory + - SPAN_STORAGE_TYPE=badger + - BADGER_EPHEMERAL=false + - BADGER_DIRECTORY_VALUE=/badger/data + - BADGER_DIRECTORY_KEY=/badger/key - COLLECTOR_OTLP_ENABLED=true + volumes: + - ./jaeger-data:/badger ports: - "16686:16686" # Web UI - - "14317:4317" # OTLP gRPC - - "14318:4318" # OTLP HTTP - - "18888:8888" # collector + - "14269:14269" # Admin/Metrics + - "4317" + - "4318" networks: - otel-network healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:16686" ] + test: [ "CMD-SHELL", "wget --spider -q http://localhost:14269 || exit 1" ] interval: 10s timeout: 5s - retries: 3 + retries: 5 + start_period: 20s + + # --- Metrics --- + prometheus: image: prom/prometheus:latest environment: @@ -105,11 +85,11 @@ services: - "9090:9090" command: - '--config.file=/etc/prometheus/prometheus.yml' - - '--web.enable-otlp-receiver' # Enable OTLP - - '--web.enable-remote-write-receiver' # Enable remote write - - '--enable-feature=promql-experimental-functions' # Enable info() - - '--storage.tsdb.min-block-duration=15m' # Minimum block duration - - '--storage.tsdb.max-block-duration=1h' # Maximum block duration + - '--web.enable-otlp-receiver' + - '--web.enable-remote-write-receiver' + - '--enable-feature=promql-experimental-functions' + - '--storage.tsdb.min-block-duration=2h' + - '--storage.tsdb.max-block-duration=2h' - '--log.level=info' - '--storage.tsdb.retention.time=30d' - '--storage.tsdb.path=/prometheus' @@ -119,37 +99,78 @@ services: networks: - otel-network healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy" ] + test: [ "CMD-SHELL", "wget --spider -q http://localhost:9090/-/healthy || exit 1" ] interval: 10s timeout: 5s retries: 3 + + # --- Logging --- + loki: image: grafana/loki:latest environment: - TZ=Asia/Shanghai volumes: - ./loki-config.yaml:/etc/loki/local-config.yaml:ro + - ./loki-data:/loki ports: - "3100:3100" command: -config.file=/etc/loki/local-config.yaml networks: - otel-network healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3100/ready" ] + test: [ "CMD-SHELL", "wget --spider -q http://localhost:3100/metrics || exit 1" ] + interval: 15s + timeout: 10s + retries: 5 + start_period: 60s + + # --- Collection --- + + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + environment: + - TZ=Asia/Shanghai + volumes: + - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro + ports: + - "1888:1888" # pprof + - "8888:8888" # Prometheus metrics for Collector + - "8889:8889" # Prometheus metrics for application indicators + - "13133:13133" # health check + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "55679:55679" # zpages + networks: + - otel-network + depends_on: + - tempo + - jaeger + - prometheus + - loki + healthcheck: + test: [ "CMD-SHELL", "wget --spider -q http://localhost:13133 || exit 1" ] interval: 10s timeout: 5s retries: 3 + start_period: 20s + + # --- Visualization --- + grafana: image: grafana/grafana:latest ports: - - "3000:3000" # Web UI + - "3000:3000" volumes: - - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml + - ./grafana/provisioning:/etc/grafana/provisioning + - ./grafana/dashboards:/var/lib/grafana/dashboards + - ./grafana-data:/var/lib/grafana environment: - GF_SECURITY_ADMIN_PASSWORD=admin - GF_SECURITY_ADMIN_USER=admin - TZ=Asia/Shanghai - GF_INSTALL_PLUGINS=grafana-pyroscope-datasource + - GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/home.json restart: unless-stopped networks: - otel-network @@ -158,7 +179,7 @@ services: - tempo - loki healthcheck: - test: [ "CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health" ] + test: [ "CMD-SHELL", "wget --spider -q http://localhost:3000/api/health || exit 1" ] interval: 10s timeout: 5s retries: 3 @@ -166,11 +187,14 @@ services: volumes: prometheus-data: tempo-data: + loki-data: + jaeger-data: + grafana-data: networks: otel-network: driver: bridge - name: "network_otel_config" + name: "network_otel" ipam: config: - subnet: 172.28.0.0/16 diff --git a/.docker/observability/grafana-data/.gitignore b/.docker/observability/grafana-data/.gitignore new file mode 100644 index 00000000..f59ec20a --- /dev/null +++ b/.docker/observability/grafana-data/.gitignore @@ -0,0 +1 @@ +* \ No newline at end of file diff --git a/.docker/observability/grafana/provisioning/datasources.yaml b/.docker/observability/grafana/provisioning/datasources.yaml index babfd530..b83f3b30 100644 --- a/.docker/observability/grafana/provisioning/datasources.yaml +++ b/.docker/observability/grafana/provisioning/datasources.yaml @@ -1,3 +1,17 @@ +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + apiVersion: 1 datasources: @@ -7,102 +21,77 @@ datasources: access: proxy orgId: 1 url: http://prometheus:9090 - basicAuth: false - isDefault: false + isDefault: true version: 1 editable: false jsonData: httpMethod: GET + exemplarTraceIdDestinations: + - name: trace_id + datasourceUid: tempo + - name: Tempo type: tempo + uid: tempo access: proxy orgId: 1 url: http://tempo:3200 - basicAuth: false - isDefault: true + isDefault: false version: 1 editable: false - apiVersion: 1 - uid: tempo jsonData: httpMethod: GET serviceMap: datasourceUid: prometheus - streamingEnabled: - search: true - tracesToLogsV2: - # Field with an internal link pointing to a logs data source in Grafana. - # datasourceUid value must match the uid value of the logs data source. - datasourceUid: 'loki' - spanStartTimeShift: '-1h' - spanEndTimeShift: '1h' - tags: [ 'job', 'instance', 'pod', 'namespace' ] - filterByTraceID: false + tracesToLogs: + datasourceUid: loki + tags: [ 'job', 'instance', 'pod', 'namespace', 'service.name' ] + mappedTags: [ { key: 'service.name', value: 'app' } ] + spanStartTimeShift: '1s' + spanEndTimeShift: '-1s' + filterByTraceID: true filterBySpanID: false - customQuery: true - query: 'method="$${__span.tags.method}"' - tracesToMetrics: - datasourceUid: 'prometheus' - spanStartTimeShift: '-1h' - spanEndTimeShift: '1h' - tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ] - queries: - - name: 'Sample query' - query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))' - tracesToProfiles: - datasourceUid: 'grafana-pyroscope-datasource' - tags: [ 'job', 'instance', 'pod', 'namespace' ] - profileTypeId: 'process_cpu:cpu:nanoseconds:cpu:nanoseconds' - customQuery: true - query: 'method="$${__span.tags.method}"' - serviceMap: - datasourceUid: 'prometheus' - nodeGraph: - enabled: true - search: - hide: false - traceQuery: - timeShiftEnabled: true - spanStartTimeShift: '-1h' - spanEndTimeShift: '1h' - spanBar: - type: 'Tag' - tag: 'http.path' - streamingEnabled: - search: true - - name: Jaeger - type: jaeger - uid: Jaeger - url: http://jaeger:16686 - basicAuth: false - access: proxy - readOnly: false - isDefault: false - jsonData: - tracesToLogsV2: - # Field with an internal link pointing to a logs data source in Grafana. - # datasourceUid value must match the uid value of the logs data source. - datasourceUid: 'loki' - spanStartTimeShift: '1h' - spanEndTimeShift: '-1h' - tags: [ 'job', 'instance', 'pod', 'namespace' ] - filterByTraceID: false - filterBySpanID: false - customQuery: true - query: 'method="$${__span.tags.method}"' tracesToMetrics: - datasourceUid: 'Prometheus' - spanStartTimeShift: '1h' - spanEndTimeShift: '-1h' - tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ] + datasourceUid: prometheus + tags: [ { key: 'service.name' }, { key: 'job' } ] queries: - - name: 'Sample query' - query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))' + - name: 'Service-Level Latency' + query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m])) by (le)' + - name: 'Service-Level Calls' + query: 'sum(rate(traces_spanmetrics_calls_total{$$__tags}[5m]))' + - name: 'Service-Level Errors' + query: 'sum(rate(traces_spanmetrics_calls_total{status_code="ERROR", $$__tags}[5m]))' nodeGraph: enabled: true - traceQuery: - timeShiftEnabled: true - spanStartTimeShift: '1h' - spanEndTimeShift: '-1h' - spanBar: - type: 'None' \ No newline at end of file + + - name: Loki + type: loki + uid: loki + orgId: 1 + url: http://loki:3100 + isDefault: false + version: 1 + editable: false + jsonData: + derivedFields: + - datasourceUid: tempo + matcherRegex: 'trace_id=(\w+)' + name: 'TraceID' + url: '$${__value.raw}' + + - name: Jaeger + type: jaeger + uid: jaeger + url: http://jaeger:16686 + access: proxy + isDefault: false + editable: false + jsonData: + tracesToLogs: + datasourceUid: loki + tags: [ 'job', 'instance', 'pod', 'namespace', 'service.name' ] + mappedTags: [ { key: 'service.name', value: 'app' } ] + spanStartTimeShift: '1s' + spanEndTimeShift: '-1s' + filterByTraceID: true + filterBySpanID: false diff --git a/.docker/observability/jaeger-config.yaml b/.docker/observability/jaeger-config.yaml index 9f1f1ca0..271b3f66 100644 --- a/.docker/observability/jaeger-config.yaml +++ b/.docker/observability/jaeger-config.yaml @@ -31,29 +31,19 @@ service: host: 0.0.0.0 port: 8888 logs: - level: debug - # TODO Initialize telemetry tracer once OTEL released new feature. - # https://github.com/open-telemetry/opentelemetry-collector/issues/10663 + level: info extensions: healthcheckv2: use_v2: true http: - # pprof: - # endpoint: 0.0.0.0:1777 - # zpages: - # endpoint: 0.0.0.0:55679 - jaeger_query: storage: - traces: some_store - traces_archive: another_store + traces: badger_store ui: config_file: ./cmd/jaeger/config-ui.json log_access: true - # The maximum duration that is considered for clock skew adjustments. - # Defaults to 0 seconds, which means it's disabled. max_clock_skew_adjust: 0s grpc: endpoint: 0.0.0.0:16685 @@ -62,26 +52,16 @@ extensions: jaeger_storage: backends: - some_store: - memory: - max_traces: 1000000 - max_events: 100000 - another_store: - memory: - max_traces: 1000000 - metric_backends: - some_metrics_storage: - prometheus: - endpoint: http://prometheus:9090 - normalize_calls: true - normalize_duration: true + badger_store: + badger: + ephemeral: false + directory_key: /badger/key + directory_value: /badger/data + span_store_ttl: 72h remote_sampling: - # You can either use file or adaptive sampling strategy in remote_sampling - # file: - # path: ./cmd/jaeger/sampling-strategies.json adaptive: - sampling_store: some_store + sampling_store: badger_store initial_sampling_probability: 0.1 http: grpc: @@ -103,12 +83,8 @@ receivers: processors: batch: - metadata_keys: [ "span.kind", "http.method", "http.status_code", "db.system", "db.statement", "messaging.system", "messaging.destination", "messaging.operation","span.events","span.links" ] - # Adaptive Sampling Processor is required to support adaptive sampling. - # It expects remote_sampling extension with `adaptive:` config to be enabled. adaptive_sampling: exporters: jaeger_storage_exporter: - trace_storage: some_store - + trace_storage: badger_store diff --git a/.docker/observability/jaeger-data/.gitignore b/.docker/observability/jaeger-data/.gitignore new file mode 100644 index 00000000..f59ec20a --- /dev/null +++ b/.docker/observability/jaeger-data/.gitignore @@ -0,0 +1 @@ +* \ No newline at end of file diff --git a/.docker/observability/loki-config.yaml b/.docker/observability/loki-config.yaml index 4f5add74..daee60c6 100644 --- a/.docker/observability/loki-config.yaml +++ b/.docker/observability/loki-config.yaml @@ -17,16 +17,16 @@ auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9096 - log_level: debug + log_level: info grpc_server_max_concurrent_streams: 1000 common: instance_addr: 127.0.0.1 - path_prefix: /tmp/loki + path_prefix: /loki storage: filesystem: - chunks_directory: /tmp/loki/chunks - rules_directory: /tmp/loki/rules + chunks_directory: /loki/chunks + rules_directory: /loki/rules replication_factor: 1 ring: kvstore: @@ -66,17 +66,3 @@ ruler: frontend: encoding: protobuf - - -# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration -# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/ -# -# Statistics help us better understand how Loki is used, and they show us performance -# levels for most users. This helps us prioritize features and documentation. -# For more information on what's sent, look at -# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go -# Refer to the buildReport method to see what goes into a report. -# -# If you would like to disable reporting, uncomment the following lines: -#analytics: -# reporting_enabled: false diff --git a/.docker/observability/loki-data/.gitignore b/.docker/observability/loki-data/.gitignore new file mode 100644 index 00000000..f59ec20a --- /dev/null +++ b/.docker/observability/loki-data/.gitignore @@ -0,0 +1 @@ +* \ No newline at end of file diff --git a/.docker/observability/otel-collector-config.yaml b/.docker/observability/otel-collector-config.yaml index 078318f1..da566c36 100644 --- a/.docker/observability/otel-collector-config.yaml +++ b/.docker/observability/otel-collector-config.yaml @@ -15,69 +15,70 @@ receivers: otlp: protocols: - grpc: # OTLP gRPC receiver + grpc: endpoint: 0.0.0.0:4317 - http: # OTLP HTTP receiver + http: endpoint: 0.0.0.0:4318 processors: - batch: # Batch processor to improve throughput - timeout: 5s - send_batch_size: 1000 - metadata_keys: [ ] - metadata_cardinality_limit: 1000 + batch: + timeout: 1s + send_batch_size: 1024 memory_limiter: check_interval: 1s - limit_mib: 512 + limit_mib: 1024 + spike_limit_mib: 256 transform/logs: log_statements: - context: log statements: - # Extract Body as attribute "message" - set(attributes["message"], body.string) - # Retain the original Body - set(attributes["log.body"], body.string) exporters: - otlp/traces: # OTLP exporter for trace data - endpoint: "http://jaeger:4317" # OTLP gRPC endpoint for Jaeger + otlp/tempo: + endpoint: "tempo:4317" tls: - insecure: true # TLS is disabled in the development environment and a certificate needs to be configured in the production environment. - compression: gzip # Enable compression to reduce network bandwidth + insecure: true + compression: gzip retry_on_failure: - enabled: true # Enable retry on failure - initial_interval: 1s # Initial interval for retry - max_interval: 30s # Maximum interval for retry - max_elapsed_time: 300s # Maximum elapsed time for retry + enabled: true + initial_interval: 1s + max_interval: 30s + max_elapsed_time: 300s sending_queue: - enabled: true # Enable sending queue - num_consumers: 10 # Number of consumers - queue_size: 5000 # Queue size - otlp/tempo: # OTLP exporter for trace data - endpoint: "http://tempo:4317" # OTLP gRPC endpoint for tempo + enabled: true + num_consumers: 10 + queue_size: 5000 + + otlp/jaeger: + endpoint: "jaeger:4317" tls: - insecure: true # TLS is disabled in the development environment and a certificate needs to be configured in the production environment. - compression: gzip # Enable compression to reduce network bandwidth + insecure: true + compression: gzip retry_on_failure: - enabled: true # Enable retry on failure - initial_interval: 1s # Initial interval for retry - max_interval: 30s # Maximum interval for retry - max_elapsed_time: 300s # Maximum elapsed time for retry + enabled: true + initial_interval: 1s + max_interval: 30s + max_elapsed_time: 300s sending_queue: - enabled: true # Enable sending queue - num_consumers: 10 # Number of consumers - queue_size: 5000 # Queue size - prometheus: # Prometheus exporter for metrics data - endpoint: "0.0.0.0:8889" # Prometheus scraping endpoint - send_timestamps: true # Send timestamp - metric_expiration: 5m # Metric expiration time + enabled: true + num_consumers: 10 + queue_size: 5000 + + prometheus: + endpoint: "0.0.0.0:8889" + send_timestamps: true + metric_expiration: 5m resource_to_telemetry_conversion: - enabled: true # Enable resource to telemetry conversion - otlphttp/loki: # Loki exporter for log data + enabled: true + + otlphttp/loki: endpoint: "http://loki:3100/otlp" tls: insecure: true - compression: gzip # Enable compression to reduce network bandwidth + compression: gzip + extensions: health_check: endpoint: 0.0.0.0:13133 @@ -85,13 +86,14 @@ extensions: endpoint: 0.0.0.0:1888 zpages: endpoint: 0.0.0.0:55679 + service: - extensions: [ health_check, pprof, zpages ] # Enable extension + extensions: [ health_check, pprof, zpages ] pipelines: traces: receivers: [ otlp ] processors: [ memory_limiter, batch ] - exporters: [ otlp/traces, otlp/tempo ] + exporters: [ otlp/tempo, otlp/jaeger ] metrics: receivers: [ otlp ] processors: [ batch ] @@ -102,20 +104,13 @@ service: exporters: [ otlphttp/loki ] telemetry: logs: - level: "debug" # Collector log level - encoding: "json" # Log encoding: console or json + level: "info" + encoding: "json" metrics: - level: "detailed" # Can be basic, normal, detailed + level: "normal" readers: - - periodic: - exporter: - otlp: - protocol: http/protobuf - endpoint: http://otel-collector:4318 - pull: exporter: prometheus: host: '0.0.0.0' port: 8888 - - diff --git a/.docker/observability/prometheus.yml b/.docker/observability/prometheus.yml index 88b0d0af..25266a3b 100644 --- a/.docker/observability/prometheus.yml +++ b/.docker/observability/prometheus.yml @@ -17,27 +17,40 @@ global: evaluation_interval: 15s external_labels: cluster: 'rustfs-dev' # Label to identify the cluster - relica: '1' # Replica identifier + replica: '1' # Replica identifier scrape_configs: - - job_name: 'otel-collector-internal' + - job_name: 'otel-collector' static_configs: - targets: [ 'otel-collector:8888' ] # Scrape metrics from Collector scrape_interval: 10s + - job_name: 'rustfs-app-metrics' static_configs: - targets: [ 'otel-collector:8889' ] # Application indicators scrape_interval: 15s metric_relabel_configs: + - source_labels: [ __name__ ] + regex: 'go_.*' + action: drop # Drop Go runtime metrics if not needed + - job_name: 'tempo' static_configs: - targets: [ 'tempo:3200' ] # Scrape metrics from Tempo + - job_name: 'jaeger' static_configs: - - targets: [ 'jaeger:8888' ] # Jaeger admin port + - targets: [ 'jaeger:14269' ] # Jaeger admin port (14269 is standard for admin/metrics) + + - job_name: 'loki' + static_configs: + - targets: [ 'loki:3100' ] + + - job_name: 'prometheus' + static_configs: + - targets: [ 'localhost:9090' ] otlp: - # Recommended attributes to be promoted to labels. promote_resource_attributes: - service.instance.id - service.name @@ -56,10 +69,8 @@ otlp: - k8s.pod.name - k8s.replicaset.name - k8s.statefulset.name - # Ingest OTLP data keeping all characters in metric/label names. translation_strategy: NoUTF8EscapingWithSuffixes storage: - # OTLP is a push-based protocol, Out of order samples is a common scenario. tsdb: - out_of_order_time_window: 30m \ No newline at end of file + out_of_order_time_window: 30m diff --git a/.docker/observability/tempo.yaml b/.docker/observability/tempo.yaml index 714d1310..3099aec9 100644 --- a/.docker/observability/tempo.yaml +++ b/.docker/observability/tempo.yaml @@ -1,18 +1,21 @@ -stream_over_http_enabled: true +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + server: http_listen_port: 3200 log_level: info -query_frontend: - search: - duration_slo: 5s - throughput_bytes_slo: 1.073741824e+09 - metadata_slo: - duration_slo: 5s - throughput_bytes_slo: 1.073741824e+09 - trace_by_id: - duration_slo: 5s - distributor: receivers: otlp: @@ -25,10 +28,6 @@ distributor: ingester: max_block_duration: 5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally -compactor: - compaction: - block_retention: 1h # overall Tempo trace retention. set for demo purposes - metrics_generator: registry: external_labels: @@ -49,9 +48,3 @@ storage: path: /var/tempo/wal # where to store the wal locally local: path: /var/tempo/blocks - -overrides: - defaults: - metrics_generator: - processors: [ service-graphs, span-metrics, local-blocks ] # enables metrics generator - generate_native_histograms: both \ No newline at end of file diff --git a/.docker/openobserve-otel/README.md b/.docker/openobserve-otel/README.md index fcf89c0d..bbc863d3 100644 --- a/.docker/openobserve-otel/README.md +++ b/.docker/openobserve-otel/README.md @@ -5,71 +5,57 @@ English | [中文](README_ZH.md) -This directory contains the configuration files for setting up an observability stack with OpenObserve and OpenTelemetry -Collector. +This directory contains the configuration for an **alternative** observability stack using OpenObserve. -### Overview +## ⚠️ Note -This setup provides a complete observability solution for your applications: +For the **recommended** observability stack (Prometheus, Grafana, Tempo, Loki), please see `../observability/`. -- **OpenObserve**: A modern, open-source observability platform for logs, metrics, and traces. -- **OpenTelemetry Collector**: Collects and processes telemetry data before sending it to OpenObserve. +## 🌟 Overview -### Setup Instructions +OpenObserve is a lightweight, all-in-one observability platform that handles logs, metrics, and traces in a single binary. This setup is ideal for: +- Resource-constrained environments. +- Quick setup and testing. +- Users who prefer a unified UI. -1. **Prerequisites**: - - Docker and Docker Compose installed - - Sufficient memory resources (minimum 2GB recommended) +## 🚀 Quick Start -2. **Starting the Services**: - ```bash - cd .docker/openobserve-otel - docker compose -f docker-compose.yml up -d - ``` +### 1. Start Services -3. **Accessing the Dashboard**: - - OpenObserve UI: http://localhost:5080 - - Default credentials: - - Username: root@rustfs.com - - Password: rustfs123 +```bash +cd .docker/openobserve-otel +docker compose up -d +``` -### Configuration +### 2. Access Dashboard -#### OpenObserve Configuration +- **URL**: [http://localhost:5080](http://localhost:5080) +- **Username**: `root@rustfs.com` +- **Password**: `rustfs123` -The OpenObserve service is configured with: +## 🛠️ Configuration -- Root user credentials -- Data persistence through a volume mount -- Memory cache enabled -- Health checks -- Exposed ports: - - 5080: HTTP API and UI - - 5081: OTLP gRPC +### OpenObserve -#### OpenTelemetry Collector Configuration +- **Persistence**: Data is persisted to a Docker volume. +- **Ports**: + - `5080`: HTTP API and UI + - `5081`: OTLP gRPC -The collector is configured to: +### OpenTelemetry Collector -- Receive telemetry data via OTLP (HTTP and gRPC) -- Collect logs from files -- Process data in batches -- Export data to OpenObserve -- Manage memory usage +- **Receivers**: OTLP (gRPC `4317`, HTTP `4318`) +- **Exporters**: Sends data to OpenObserve. -### Integration with Your Application +## 🔗 Integration -To send telemetry data from your application, configure your OpenTelemetry SDK to send data to: +Configure your application to send OTLP data to the collector: -- OTLP gRPC: `localhost:4317` -- OTLP HTTP: `localhost:4318` +- **Endpoint**: `http://localhost:4318` (HTTP) or `localhost:4317` (gRPC) -For example, in a Rust application using the `rustfs-obs` library: +Example for RustFS: ```bash export RUSTFS_OBS_ENDPOINT=http://localhost:4318 -export RUSTFS_OBS_SERVICE_NAME=yourservice -export RUSTFS_OBS_SERVICE_VERSION=1.0.0 -export RUSTFS_OBS_ENVIRONMENT=development +export RUSTFS_OBS_SERVICE_NAME=rustfs-node-1 ``` - diff --git a/.docker/openobserve-otel/README_ZH.md b/.docker/openobserve-otel/README_ZH.md index 2e2e80c9..d926d8bd 100644 --- a/.docker/openobserve-otel/README_ZH.md +++ b/.docker/openobserve-otel/README_ZH.md @@ -5,71 +5,57 @@ [English](README.md) | 中文 -## 中文 +本目录包含使用 OpenObserve 的**替代**可观测性技术栈配置。 -本目录包含搭建 OpenObserve 和 OpenTelemetry Collector 可观测性栈的配置文件。 +## ⚠️ 注意 -### 概述 +对于**推荐**的可观测性技术栈(Prometheus, Grafana, Tempo, Loki),请参阅 `../observability/`。 -此设置为应用程序提供了完整的可观测性解决方案: +## 🌟 概览 -- **OpenObserve**:现代化、开源的可观测性平台,用于日志、指标和追踪。 -- **OpenTelemetry Collector**:收集和处理遥测数据,然后将其发送到 OpenObserve。 +OpenObserve 是一个轻量级、一体化的可观测性平台,在一个二进制文件中处理日志、指标和追踪。此设置非常适合: +- 资源受限的环境。 +- 快速设置和测试。 +- 喜欢统一 UI 的用户。 -### 设置说明 +## 🚀 快速开始 -1. **前提条件**: - - 已安装 Docker 和 Docker Compose - - 足够的内存资源(建议至少 2GB) - -2. **启动服务**: - ```bash - cd .docker/openobserve-otel - docker compose -f docker-compose.yml up -d - ``` - -3. **访问仪表板**: - - OpenObserve UI:http://localhost:5080 - - 默认凭据: - - 用户名:root@rustfs.com - - 密码:rustfs123 - -### 配置 - -#### OpenObserve 配置 - -OpenObserve 服务配置: - -- 根用户凭据 -- 通过卷挂载实现数据持久化 -- 启用内存缓存 -- 健康检查 -- 暴露端口: - - 5080:HTTP API 和 UI - - 5081:OTLP gRPC - -#### OpenTelemetry Collector 配置 - -收集器配置为: - -- 通过 OTLP(HTTP 和 gRPC)接收遥测数据 -- 从文件中收集日志 -- 批处理数据 -- 将数据导出到 OpenObserve -- 管理内存使用 - -### 与应用程序集成 - -要从应用程序发送遥测数据,将 OpenTelemetry SDK 配置为发送数据到: - -- OTLP gRPC:`localhost:4317` -- OTLP HTTP:`localhost:4318` - -例如,在使用 `rustfs-obs` 库的 Rust 应用程序中: +### 1. 启动服务 ```bash -export RUSTFS_OBS_ENDPOINT=http://localhost:4317 -export RUSTFS_OBS_SERVICE_NAME=yourservice -export RUSTFS_OBS_SERVICE_VERSION=1.0.0 -export RUSTFS_OBS_ENVIRONMENT=development -``` \ No newline at end of file +cd .docker/openobserve-otel +docker compose up -d +``` + +### 2. 访问仪表盘 + +- **URL**: [http://localhost:5080](http://localhost:5080) +- **用户名**: `root@rustfs.com` +- **密码**: `rustfs123` + +## 🛠️ 配置 + +### OpenObserve + +- **持久化**: 数据持久化到 Docker 卷。 +- **端口**: + - `5080`: HTTP API 和 UI + - `5081`: OTLP gRPC + +### OpenTelemetry Collector + +- **接收器**: OTLP (gRPC `4317`, HTTP `4318`) +- **导出器**: 将数据发送到 OpenObserve。 + +## 🔗 集成 + +配置您的应用程序将 OTLP 数据发送到收集器: + +- **端点**: `http://localhost:4318` (HTTP) 或 `localhost:4317` (gRPC) + +RustFS 示例: + +```bash +export RUSTFS_OBS_ENDPOINT=http://localhost:4318 +export RUSTFS_OBS_SERVICE_NAME=rustfs-node-1 +``` diff --git a/Cargo.lock b/Cargo.lock index e549865d..b42e5a9e 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -530,9 +530,9 @@ dependencies = [ [[package]] name = "async-compression" -version = "0.4.40" +version = "0.4.41" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7d67d43201f4d20c78bcda740c142ca52482d81da80681533d33bf3f0596c8e2" +checksum = "d0f9ee0f6e02ffd7ad5816e9464499fba7b3effd01123b515c41d1697c43dad1" dependencies = [ "compression-codecs", "compression-core", @@ -8119,7 +8119,7 @@ checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f" [[package]] name = "s3s" version = "0.13.0-alpha.3" -source = "git+https://github.com/s3s-project/s3s.git?rev=61b96d11de81c508ba5361864676824f318ef65c#61b96d11de81c508ba5361864676824f318ef65c" +source = "git+https://github.com/s3s-project/s3s.git?rev=218000387f4c3e67ad478bfc4587931f88b37006#218000387f4c3e67ad478bfc4587931f88b37006" dependencies = [ "arc-swap", "arrayvec", diff --git a/Cargo.toml b/Cargo.toml index d8dd651c..58d64c36 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -105,7 +105,7 @@ rustfs-protocols = { path = "crates/protocols", version = "0.0.5" } # Async Runtime and Networking async-channel = "2.5.0" -async-compression = { version = "0.4.40" } +async-compression = { version = "0.4.41" } async-recursion = "1.1.1" async-trait = "0.1.89" axum = "0.8.8" @@ -235,7 +235,7 @@ rumqttc = { version = "0.25.1" } rustix = { version = "1.1.4", features = ["fs"] } rust-embed = { version = "8.11.0" } rustc-hash = { version = "2.1.1" } -s3s = { version = "0.13.0-alpha.3", features = ["minio"], git = "https://github.com/s3s-project/s3s.git", rev = "61b96d11de81c508ba5361864676824f318ef65c" } +s3s = { version = "0.13.0-alpha.3", features = ["minio"], git = "https://github.com/s3s-project/s3s.git", rev = "218000387f4c3e67ad478bfc4587931f88b37006" } serial_test = "3.4.0" shadow-rs = { version = "1.7.0", default-features = false } siphasher = "1.0.2" diff --git a/Dockerfile b/Dockerfile index db9c0eb7..6cb8eabb 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,3 +1,17 @@ +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + FROM alpine:3.23 AS build ARG TARGETARCH diff --git a/Dockerfile.glibc b/Dockerfile.glibc index 434e8fa0..0d06d71b 100644 --- a/Dockerfile.glibc +++ b/Dockerfile.glibc @@ -1,3 +1,17 @@ +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + FROM ubuntu:24.04 AS build ARG TARGETARCH diff --git a/Dockerfile.source b/Dockerfile.source index 3dc2304b..764f8513 100644 --- a/Dockerfile.source +++ b/Dockerfile.source @@ -1,4 +1,18 @@ # syntax=docker/dockerfile:1.6 +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + # Multi-stage Dockerfile for RustFS - LOCAL DEVELOPMENT ONLY # # IMPORTANT: This Dockerfile builds RustFS from source for local development and testing. diff --git a/crates/config/src/constants/app.rs b/crates/config/src/constants/app.rs index 6460eea8..7b1aa275 100644 --- a/crates/config/src/constants/app.rs +++ b/crates/config/src/constants/app.rs @@ -150,8 +150,8 @@ pub const DEFAULT_CONSOLE_ADDRESS: &str = concat!(":", DEFAULT_CONSOLE_PORT); /// Default region for rustfs /// This is the default region for rustfs. /// It is used to identify the region of the application. -/// Default value: cn-east-1 -pub const RUSTFS_REGION: &str = "cn-east-1"; +/// Default value: rustfs-global-0 +pub const RUSTFS_REGION: &str = "rustfs-global-0"; /// Default log filename for rustfs /// This is the default log filename for rustfs. diff --git a/crates/ecstore/src/global.rs b/crates/ecstore/src/global.rs index 65035a06..559f109c 100644 --- a/crates/ecstore/src/global.rs +++ b/crates/ecstore/src/global.rs @@ -42,7 +42,6 @@ lazy_static! { static ref GLOBAL_RUSTFS_PORT: OnceLock = OnceLock::new(); static ref globalDeploymentIDPtr: OnceLock = OnceLock::new(); pub static ref GLOBAL_OBJECT_API: OnceLock> = OnceLock::new(); - pub static ref GLOBAL_LOCAL_DISK: Arc>>> = Arc::new(RwLock::new(Vec::new())); pub static ref GLOBAL_IsErasure: RwLock = RwLock::new(false); pub static ref GLOBAL_IsDistErasure: RwLock = RwLock::new(false); pub static ref GLOBAL_IsErasureSD: RwLock = RwLock::new(false); @@ -57,8 +56,8 @@ lazy_static! { pub static ref GLOBAL_LocalNodeName: String = "127.0.0.1:9000".to_string(); pub static ref GLOBAL_LocalNodeNameHex: String = rustfs_utils::crypto::hex(GLOBAL_LocalNodeName.as_bytes()); pub static ref GLOBAL_NodeNamesHex: HashMap = HashMap::new(); - pub static ref GLOBAL_REGION: OnceLock = OnceLock::new(); - pub static ref GLOBAL_LOCAL_LOCK_CLIENT: OnceLock> = OnceLock::new(); + pub static ref GLOBAL_REGION: OnceLock = OnceLock::new(); + pub static ref GLOBAL_LOCAL_LOCK_CLIENT: OnceLock> = OnceLock::new(); pub static ref GLOBAL_LOCK_CLIENTS: OnceLock>> = OnceLock::new(); pub static ref GLOBAL_BUCKET_MONITOR: OnceLock> = OnceLock::new(); } @@ -243,20 +242,20 @@ type TypeLocalDiskSetDrives = Vec>>>; /// Set the global region /// /// # Arguments -/// * `region` - The region string to set globally +/// * `region` - The Region instance to set globally /// /// # Returns /// * None -pub fn set_global_region(region: String) { +pub fn set_global_region(region: s3s::region::Region) { GLOBAL_REGION.set(region).unwrap(); } /// Get the global region /// /// # Returns -/// * `Option` - The global region string, if set +/// * `Option` - The global region, if set /// -pub fn get_global_region() -> Option { +pub fn get_global_region() -> Option { GLOBAL_REGION.get().cloned() } diff --git a/crates/iam/Cargo.toml b/crates/iam/Cargo.toml index 67428cd7..bc19644a 100644 --- a/crates/iam/Cargo.toml +++ b/crates/iam/Cargo.toml @@ -51,10 +51,10 @@ rustfs-utils = { workspace = true, features = ["path"] } tokio-util.workspace = true pollster.workspace = true reqwest = { workspace = true } -url = { workspace = true } moka = { workspace = true } openidconnect = { workspace = true } http = { workspace = true } +url = { workspace = true } [dev-dependencies] pollster.workspace = true diff --git a/crates/obs/src/telemetry.rs b/crates/obs/src/telemetry.rs index e152cc3f..6119149d 100644 --- a/crates/obs/src/telemetry.rs +++ b/crates/obs/src/telemetry.rs @@ -179,7 +179,7 @@ fn format_with_color(w: &mut dyn std::io::Write, now: &mut DeferredNow, record: let binding = std::thread::current(); let thread_name = binding.name().unwrap_or("unnamed"); let thread_id = format!("{:?}", std::thread::current().id()); - writeln!( + write!( w, "[{}] {} [{}] [{}:{}] [{}:{}] {}", now.now().format(flexi_logger::TS_DASHES_BLANK_COLONS_DOT_BLANK), @@ -200,7 +200,7 @@ fn format_for_file(w: &mut dyn std::io::Write, now: &mut DeferredNow, record: &R let binding = std::thread::current(); let thread_name = binding.name().unwrap_or("unnamed"); let thread_id = format!("{:?}", std::thread::current().id()); - writeln!( + write!( w, "[{}] {} [{}] [{}:{}] [{}:{}] {}", now.now().format(flexi_logger::TS_DASHES_BLANK_COLONS_DOT_BLANK), diff --git a/docker-buildx.sh b/docker-buildx.sh index ed19c077..e674bdc0 100755 --- a/docker-buildx.sh +++ b/docker-buildx.sh @@ -1,4 +1,17 @@ #!/usr/bin/env bash +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. set -e diff --git a/docker-compose-simple.yml b/docker-compose-simple.yml index 5827ccab..dc107d03 100644 --- a/docker-compose-simple.yml +++ b/docker-compose-simple.yml @@ -1,3 +1,17 @@ +# Copyright 2024 RustFS Team +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + version: "3.9" services: diff --git a/docker-compose.yml b/docker-compose.yml index b8dc5d36..5fdf9c40 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -106,7 +106,37 @@ services: profiles: - dev - # OpenTelemetry Collector + # --- Observability Stack --- + + tempo-init: + image: busybox:latest + command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ] + volumes: + - tempo_data:/var/tempo + user: root + networks: + - rustfs-network + restart: "no" + profiles: + - observability + + tempo: + image: grafana/tempo:latest + user: "10001" + command: [ "-config.file=/etc/tempo.yaml" ] + volumes: + - ./.docker/observability/tempo.yaml:/etc/tempo.yaml:ro + - tempo_data:/var/tempo + ports: + - "3200:3200" # tempo + - "4317" # otlp grpc + - "4318" # otlp http + restart: unless-stopped + networks: + - rustfs-network + profiles: + - observability + otel-collector: image: otel/opentelemetry-collector-contrib:latest container_name: otel-collector @@ -115,32 +145,45 @@ services: volumes: - ./.docker/observability/otel-collector-config.yaml:/etc/otelcol-contrib/otel-collector.yml:ro ports: - - "4317:4317" # OTLP gRPC receiver - - "4318:4318" # OTLP HTTP receiver - - "8888:8888" # Prometheus metrics - - "8889:8889" # Prometheus exporter metrics + - "1888:1888" # pprof + - "8888:8888" # Prometheus metrics for Collector + - "8889:8889" # Prometheus metrics for application indicators + - "13133:13133" # health check + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "55679:55679" # zpages networks: - rustfs-network restart: unless-stopped profiles: - observability + depends_on: + - tempo + - jaeger + - prometheus + - loki - # Jaeger for tracing jaeger: - image: jaegertracing/all-in-one:latest + image: jaegertracing/jaeger:latest container_name: jaeger - ports: - - "16686:16686" # Jaeger UI - - "14250:14250" # Jaeger gRPC environment: + - TZ=Asia/Shanghai + - SPAN_STORAGE_TYPE=badger + - BADGER_EPHEMERAL=false + - BADGER_DIRECTORY_VALUE=/badger/data + - BADGER_DIRECTORY_KEY=/badger/key - COLLECTOR_OTLP_ENABLED=true + volumes: + - jaeger_data:/badger + ports: + - "16686:16686" # Web UI + - "14269:14269" # Admin/Metrics networks: - rustfs-network restart: unless-stopped profiles: - observability - # Prometheus for metrics prometheus: image: prom/prometheus:latest container_name: prometheus @@ -152,17 +195,35 @@ services: command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - - "--web.console.libraries=/etc/prometheus/console_libraries" - - "--web.console.templates=/etc/prometheus/consoles" + - "--web.console.libraries=/usr/share/prometheus/console_libraries" + - "--web.console.templates=/usr/share/prometheus/consoles" - "--storage.tsdb.retention.time=200h" - "--web.enable-lifecycle" + - "--web.enable-otlp-receiver" + - "--web.enable-remote-write-receiver" + networks: + - rustfs-network + restart: unless-stopped + profiles: + - observability + + loki: + image: grafana/loki:latest + container_name: loki + environment: + - TZ=Asia/Shanghai + volumes: + - ./.docker/observability/loki-config.yaml:/etc/loki/local-config.yaml:ro + - loki_data:/loki + ports: + - "3100:3100" + command: -config.file=/etc/loki/local-config.yaml networks: - rustfs-network restart: unless-stopped profiles: - observability - # Grafana for visualization grafana: image: grafana/grafana:latest container_name: grafana @@ -171,6 +232,7 @@ services: environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=admin + - GF_INSTALL_PLUGINS=grafana-pyroscope-datasource volumes: - grafana_data:/var/lib/grafana - ./.docker/observability/grafana/provisioning:/etc/grafana/provisioning:ro @@ -180,6 +242,10 @@ services: restart: unless-stopped profiles: - observability + depends_on: + - prometheus + - tempo + - loki # NGINX reverse proxy (optional) nginx: @@ -228,6 +294,12 @@ volumes: driver: local grafana_data: driver: local + tempo_data: + driver: local + loki_data: + driver: local + jaeger_data: + driver: local logs: driver: local cargo_registry: diff --git a/rustfs/src/admin/handlers/event.rs b/rustfs/src/admin/handlers/event.rs index 1a5b29d4..01d9e9f6 100644 --- a/rustfs/src/admin/handlers/event.rs +++ b/rustfs/src/admin/handlers/event.rs @@ -324,7 +324,10 @@ impl Operation for ListTargetsArns { .clone() .ok_or_else(|| s3_error!(InvalidRequest, "region not found"))?; - let data_target_arn_list: Vec<_> = active_targets.iter().map(|id| id.to_arn(®ion).to_string()).collect(); + let data_target_arn_list: Vec<_> = active_targets + .iter() + .map(|id| id.to_arn(region.as_str()).to_string()) + .collect(); let data = serde_json::to_vec(&data_target_arn_list) .map_err(|e| s3_error!(InternalError, "failed to serialize targets: {}", e))?; diff --git a/rustfs/src/admin/router.rs b/rustfs/src/admin/router.rs index e6d94395..270ddc17 100644 --- a/rustfs/src/admin/router.rs +++ b/rustfs/src/admin/router.rs @@ -242,7 +242,7 @@ impl Operation for AdminOperation { #[derive(Debug, Clone)] pub struct Extra { pub credentials: Option, - pub region: Option, + pub region: Option, pub service: Option, } diff --git a/rustfs/src/app/bucket_usecase.rs b/rustfs/src/app/bucket_usecase.rs index b02c799e..337f7eb7 100644 --- a/rustfs/src/app/bucket_usecase.rs +++ b/rustfs/src/app/bucket_usecase.rs @@ -30,6 +30,7 @@ use crate::storage::*; use futures::StreamExt; use http::StatusCode; use metrics::counter; +use rustfs_config::RUSTFS_REGION; use rustfs_ecstore::bucket::{ lifecycle::bucket_lifecycle_ops::validate_transition_tier, metadata::{ @@ -57,6 +58,7 @@ use rustfs_targets::{ use rustfs_utils::http::RUSTFS_FORCE_DELETE; use rustfs_utils::string::parse_bool; use s3s::dto::*; +use s3s::region::Region; use s3s::xml; use s3s::{S3Error, S3ErrorCode, S3Request, S3Response, S3Result, s3_error}; use std::{fmt::Display, sync::Arc}; @@ -73,8 +75,8 @@ fn to_internal_error(err: impl Display) -> S3Error { S3Error::with_message(S3ErrorCode::InternalError, format!("{err}")) } -fn resolve_notification_region(global_region: Option, request_region: Option) -> String { - global_region.unwrap_or_else(|| request_region.unwrap_or_default()) +fn resolve_notification_region(global_region: Option, request_region: Option) -> Region { + global_region.unwrap_or_else(|| request_region.unwrap_or_else(|| Region::new(RUSTFS_REGION.into()).expect("valid region"))) } #[derive(Debug, Clone, PartialEq, Eq)] @@ -165,7 +167,7 @@ impl DefaultBucketUsecase { self.context.clone() } - fn global_region(&self) -> Option { + fn global_region(&self) -> Option { self.context.as_ref().and_then(|context| context.region().get()) } @@ -431,7 +433,7 @@ impl DefaultBucketUsecase { if let Some(region) = self.global_region() { return Ok(S3Response::new(GetBucketLocationOutput { - location_constraint: Some(BucketLocationConstraint::from(region)), + location_constraint: Some(BucketLocationConstraint::from(region.to_string())), })); } @@ -1230,9 +1232,9 @@ impl DefaultBucketUsecase { let event_rules = event_rules_result.map_err(|e| s3_error!(InvalidArgument, "Invalid ARN in notification configuration: {e}"))?; warn!("notify event rules: {:?}", &event_rules); - + let region_clone = region.clone(); notify - .add_event_specific_rules(&bucket, ®ion, &event_rules) + .add_event_specific_rules(&bucket, region_clone.as_str(), &event_rules) .await .map_err(|e| s3_error!(InternalError, "Failed to add rules: {e}"))?; @@ -1800,20 +1802,23 @@ mod tests { #[test] fn resolve_notification_region_prefers_global_region() { - let region = resolve_notification_region(Some("us-east-1".to_string()), Some("ap-southeast-1".to_string())); + let binding = resolve_notification_region(Some("us-east-1".parse().unwrap()), Some("ap-southeast-1".parse().unwrap())); + let region = binding.as_str(); assert_eq!(region, "us-east-1"); } #[test] fn resolve_notification_region_falls_back_to_request_region() { - let region = resolve_notification_region(None, Some("ap-southeast-1".to_string())); + let binding = resolve_notification_region(None, Some("ap-southeast-1".parse().unwrap())); + let region = binding.as_str(); assert_eq!(region, "ap-southeast-1"); } #[test] - fn resolve_notification_region_defaults_to_empty() { - let region = resolve_notification_region(None, None); - assert!(region.is_empty()); + fn resolve_notification_region_defaults_value() { + let binding = resolve_notification_region(None, None); + let region = binding.as_str(); + assert_eq!(region, RUSTFS_REGION); } #[tokio::test] diff --git a/rustfs/src/app/context.rs b/rustfs/src/app/context.rs index c0ab28b2..c4230409 100644 --- a/rustfs/src/app/context.rs +++ b/rustfs/src/app/context.rs @@ -75,7 +75,7 @@ pub trait EndpointsInterface: Send + Sync { /// Region interface for application-layer use-cases. pub trait RegionInterface: Send + Sync { - fn get(&self) -> Option; + fn get(&self) -> Option; } /// Tier config interface for application-layer and admin handlers. @@ -190,7 +190,7 @@ impl EndpointsInterface for EndpointsHandle { pub struct RegionHandle; impl RegionInterface for RegionHandle { - fn get(&self) -> Option { + fn get(&self) -> Option { get_global_region() } } diff --git a/rustfs/src/app/multipart_usecase.rs b/rustfs/src/app/multipart_usecase.rs index 24cc209d..824a3944 100644 --- a/rustfs/src/app/multipart_usecase.rs +++ b/rustfs/src/app/multipart_usecase.rs @@ -27,6 +27,7 @@ use crate::storage::options::{ use crate::storage::*; use bytes::Bytes; use futures::StreamExt; +use rustfs_config::RUSTFS_REGION; use rustfs_ecstore::StorageAPI; use rustfs_ecstore::bucket::quota::checker::QuotaChecker; use rustfs_ecstore::bucket::{ @@ -164,7 +165,7 @@ impl DefaultMultipartUsecase { self.context.as_ref().and_then(|context| context.bucket_metadata().handle()) } - fn global_region(&self) -> Option { + fn global_region(&self) -> Option { self.context.as_ref().and_then(|context| context.region().get()) } @@ -422,12 +423,12 @@ impl DefaultMultipartUsecase { } } - let region = self.global_region().unwrap_or_else(|| "us-east-1".to_string()); + let region = self.global_region().unwrap_or_else(|| RUSTFS_REGION.parse().unwrap()); let output = CompleteMultipartUploadOutput { bucket: Some(bucket.clone()), key: Some(key.clone()), e_tag: obj_info.etag.clone().map(|etag| to_s3s_etag(&etag)), - location: Some(region.clone()), + location: Some(region.to_string()), server_side_encryption: server_side_encryption.clone(), ssekms_key_id: ssekms_key_id.clone(), checksum_crc32: checksum_crc32.clone(), @@ -448,7 +449,7 @@ impl DefaultMultipartUsecase { bucket: Some(bucket.clone()), key: Some(key.clone()), e_tag: obj_info.etag.clone().map(|etag| to_s3s_etag(&etag)), - location: Some(region), + location: Some(region.to_string()), server_side_encryption, ssekms_key_id, checksum_crc32, diff --git a/rustfs/src/auth.rs b/rustfs/src/auth.rs index 59e919cb..f0ed933f 100644 --- a/rustfs/src/auth.rs +++ b/rustfs/src/auth.rs @@ -276,7 +276,7 @@ pub fn get_condition_values( header: &HeaderMap, cred: &Credentials, version_id: Option<&str>, - region: Option<&str>, + region: Option, remote_addr: Option, ) -> HashMap> { let username = if cred.is_temp() || cred.is_service_account() { @@ -362,7 +362,7 @@ pub fn get_condition_values( } if let Some(lc) = region - && !lc.is_empty() + && !lc.as_str().is_empty() { args.insert("LocationConstraint".to_owned(), vec![lc.to_string()]); } diff --git a/rustfs/src/config/mod.rs b/rustfs/src/config/mod.rs index 1705d6aa..9a3f1e84 100644 --- a/rustfs/src/config/mod.rs +++ b/rustfs/src/config/mod.rs @@ -15,8 +15,10 @@ use clap::Parser; use clap::builder::NonEmptyStringValueParser; use const_str::concat; +use rustfs_config::RUSTFS_REGION; use std::path::PathBuf; use std::string::ToString; + shadow_rs::shadow!(build); pub mod workload_profiles; @@ -191,8 +193,10 @@ pub struct Config { /// tls path for rustfs API and console. pub tls_path: Option, + /// License key for enterprise features pub license: Option, + /// Region for the server, used for signing and other region-specific behavior pub region: Option, /// Enable KMS encryption for server-side encryption @@ -280,6 +284,9 @@ impl Config { .trim() .to_string(); + // Region is optional, but if not set, we should default to "rustfs-global-0" for signing compatibility with AWS S3 clients + let region = region.or_else(|| Some(RUSTFS_REGION.to_string())); + Ok(Config { volumes, address, @@ -329,15 +336,3 @@ impl std::fmt::Debug for Config { .finish() } } - -// lazy_static::lazy_static! { -// pub(crate) static ref OPT: OnceLock = OnceLock::new(); -// } - -// pub fn init_config(opt: Opt) { -// OPT.set(opt).expect("Failed to set global config"); -// } - -// pub fn get_config() -> &'static Opt { -// OPT.get().expect("Global config not initialized") -// } diff --git a/rustfs/src/init.rs b/rustfs/src/init.rs index 366100d7..09f8d77d 100644 --- a/rustfs/src/init.rs +++ b/rustfs/src/init.rs @@ -93,14 +93,15 @@ pub(crate) fn init_update_check() { /// * `buckets` - A vector of bucket names to process #[instrument(skip_all)] pub(crate) async fn add_bucket_notification_configuration(buckets: Vec) { - let region_opt = rustfs_ecstore::global::get_global_region(); - let region = match region_opt { - Some(ref r) if !r.is_empty() => r, - _ => { + let global_region = rustfs_ecstore::global::get_global_region(); + let region = global_region + .as_ref() + .filter(|r| !r.as_str().is_empty()) + .map(|r| r.as_str()) + .unwrap_or_else(|| { warn!("Global region is not set; attempting notification configuration for all buckets with an empty region."); "" - } - }; + }); for bucket in buckets.iter() { let has_notification_config = metadata_sys::get_notification_config(bucket).await.unwrap_or_else(|err| { warn!("get_notification_config err {:?}", err); @@ -368,7 +369,7 @@ pub async fn init_ftp_system() -> Result = FtpsServer::new(config, storage_client).await?; + let server: FtpsServer = FtpsServer::new(config, storage_client).await?; // Log server configuration info!( @@ -451,7 +452,7 @@ pub async fn init_ftps_system() -> Result = FtpsServer::new(config, storage_client).await?; + let server: FtpsServer = FtpsServer::new(config, storage_client).await?; // Log server configuration info!( diff --git a/rustfs/src/main.rs b/rustfs/src/main.rs index d0cc7a1c..eeaee520 100644 --- a/rustfs/src/main.rs +++ b/rustfs/src/main.rs @@ -164,7 +164,10 @@ async fn run(config: config::Config) -> Result<()> { let readiness = Arc::new(GlobalReadiness::new()); if let Some(region) = &config.region { - rustfs_ecstore::global::set_global_region(region.clone()); + let region = region + .parse() + .map_err(|e| Error::other(format!("invalid region {}: {e}", region)))?; + rustfs_ecstore::global::set_global_region(region); } let server_addr = parse_and_resolve_address(config.address.as_str()).map_err(Error::other)?; diff --git a/rustfs/src/storage/access.rs b/rustfs/src/storage/access.rs index 322d7700..e8f9ac2c 100644 --- a/rustfs/src/storage/access.rs +++ b/rustfs/src/storage/access.rs @@ -44,7 +44,7 @@ pub(crate) struct ReqInfo { pub bucket: Option, pub object: Option, pub version_id: Option, - pub region: Option, + pub region: Option, } #[derive(Clone, Copy)] @@ -352,7 +352,7 @@ pub async fn authorize_request(req: &mut S3Request, action: Action) -> S3R &req.headers, &rustfs_credentials::Credentials::default(), req_info.version_id.as_deref(), - req.region.as_deref(), + req.region.clone(), remote_addr, ); let bucket_name = req_info.bucket.as_deref().unwrap_or("");