Files
rustfs/docs/examples/mnmd/README.md
Copilot ad99019749 Add complete MNMD Docker deployment example with startup coordination and VolumeNotFound fix (#642)
* Initial plan

* Add MNMD Docker deployment example with 4 nodes x 4 drives

- Create docs/examples/mnmd/ directory structure
- Add docker-compose.yml with proper disk indexing (1..4)
- Add wait-and-start.sh for startup coordination
- Add README.md with usage instructions and alternatives
- Add CHECKLIST.md with step-by-step verification
- Fixes VolumeNotFound issue by using correct volume paths
- Implements health checks and startup ordering
- Uses service names for stable inter-node addressing

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add docs/examples README as index for deployment examples

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add automated test script for MNMD deployment

- Add test-deployment.sh with comprehensive validation
- Test container status, health, endpoints, connectivity
- Update README to reference test script
- Make script executable

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* improve code

* improve code

* improve dep crates `cargo shear --fix`

* upgrade aws-sdk-s3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-10-12 13:15:14 +08:00

7.1 KiB
Raw Blame History

RustFS MNMD (Multi-Node Multi-Drive) Docker Example

This directory contains a complete, ready-to-use MNMD deployment example for RustFS with 4 nodes and 4 drives per node ( 4x4 configuration).

Overview

This example addresses common deployment issues including:

  • VolumeNotFound errors - Fixed by using correct disk indexing (/data/rustfs{1...4} instead of /data/rustfs{0...3})
  • Startup race conditions - Solved with a simple sleep command in each service.
  • Service discovery - Uses Docker service names (rustfs-node{1..4}) instead of hard-coded IPs
  • Health checks - Implements proper health monitoring with nc (with alternatives documented)

Quick Start

From this directory (docs/examples/mnmd), run:

# Start the cluster
docker-compose up -d

# Check the status
docker-compose ps

# View logs
docker-compose logs -f

# Test the deployment
curl http://localhost:9000/health
curl http://localhost:9001/health

# Run comprehensive tests
./test-deployment.sh

# Stop the cluster
docker-compose down

# Clean up volumes (WARNING: deletes all data)
docker-compose down -v

Configuration Details

Volume Configuration

The example uses the following volume configuration:

RUSTFS_VOLUMES=http://rustfs-node{1...4}:9000/data/rustfs{1...4}

This expands to 16 endpoints (4 nodes × 4 drives):

  • Node 1: /data/rustfs1, /data/rustfs2, /data/rustfs3, /data/rustfs4
  • Node 2: /data/rustfs1, /data/rustfs2, /data/rustfs3, /data/rustfs4
  • Node 3: /data/rustfs1, /data/rustfs2, /data/rustfs3, /data/rustfs4
  • Node 4: /data/rustfs1, /data/rustfs2, /data/rustfs3, /data/rustfs4

Important: Disk indexing starts at 1 to match the mounted paths (/data/rustfs1..4).

Port Mappings

Node API Port Console Port
node1 9000 9001
node2 9010 9011
node3 9020 9021
node4 9030 9031

Startup Coordination

To prevent race conditions during startup where nodes might not find each other, a simple sleep 3 command is added to each service's command. This provides a brief delay, allowing the network and other services to initialize before RustFS starts. For more complex scenarios, a more robust health-check dependency or an external entrypoint script might be required.

Health Checks

Default health check using nc (netcat):

healthcheck:
  test: [ "CMD-SHELL", "nc -z localhost 9000 || exit 1" ]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 30s

Alternative Health Checks

If your base image lacks nc, use one of these alternatives:

Using curl:

healthcheck:
  test: [ "CMD-SHELL", "curl -f http://localhost:9000/health || exit 1" ]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 30s

Using wget:

healthcheck:
  test: [ "CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:9000/health || exit 1" ]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 30s

Brace Expansion Alternatives

If your Docker Compose runtime doesn't support brace expansion ({1...4}), replace with explicit endpoints:

environment:
  - RUSTFS_VOLUMES=http://rustfs-node1:9000/data/rustfs1,http://rustfs-node1:9000/data/rustfs2,http://rustfs-node1:9000/data/rustfs3,http://rustfs-node1:9000/data/rustfs4,http://rustfs-node2:9000/data/rustfs1,http://rustfs-node2:9000/data/rustfs2,http://rustfs-node2:9000/data/rustfs3,http://rustfs-node2:9000/data/rustfs4,http://rustfs-node3:9000/data/rustfs1,http://rustfs-node3:9000/data/rustfs2,http://rustfs-node3:9000/data/rustfs3,http://rustfs-node3:9000/data/rustfs4,http://rustfs-node4:9000/data/rustfs1,http://rustfs-node4:9000/data/rustfs2,http://rustfs-node4:9000/data/rustfs3,http://rustfs-node4:9000/data/rustfs4

Using RUSTFS_CMD

The RUSTFS_CMD environment variable provides a fallback when no command is specified:

environment:
  - RUSTFS_CMD=rustfs  # Default fallback command

This allows the entrypoint to execute the correct command when Docker doesn't provide one.

Testing the Deployment

After starting the cluster, verify it's working:

Automated Testing

Use the provided test script for comprehensive validation:

./test-deployment.sh

This script tests:

  • Container status (4/4 running)
  • Health checks (4/4 healthy)
  • API endpoints (4 ports)
  • Console endpoints (4 ports)
  • Inter-node connectivity
  • Data directory existence

Manual Testing

For manual verification:

# 1. Check all containers are healthy
docker-compose ps

# 2. Test API endpoints
for port in 9000 9010 9020 9030; do
  echo "Testing port $port..."
  curl -s http://localhost:${port}/health | jq '.'
done

# 3. Test console endpoints
for port in 9001 9011 9021 9031; do
  echo "Testing console port $port..."
  curl -s http://localhost:${port}/health | jq '.'
done

# 4. Check inter-node connectivity
docker exec rustfs-node1 nc -zv rustfs-node2 9000
docker exec rustfs-node1 nc -zv rustfs-node3 9000
docker exec rustfs-node1 nc -zv rustfs-node4 9000

Troubleshooting

VolumeNotFound Error

Symptom: Error message about /data/rustfs0 not found.

Solution: This example uses /data/rustfs{1...4} indexing to match the mounted Docker volumes. Ensure your RUSTFS_VOLUMES configuration starts at index 1, not 0.

Health Check Failures

Symptom: Containers show as unhealthy.

Solutions:

  1. Check if nc is available: docker exec rustfs-node1 which nc
  2. Use alternative health checks (curl/wget) as documented above
  3. Increase start_period if nodes need more time to initialize

Startup Timeouts

Symptom: Services timeout waiting for peers.

Solutions:

  1. Check logs: docker-compose logs rustfs-node1
  2. Verify network connectivity: docker-compose exec rustfs-node1 ping rustfs-node2
  3. Consider increasing the sleep duration in the docker-compose.yml command directive if a longer delay is needed.

Permission Issues

Symptom: Cannot create directories or write data.

Solution: Ensure volumes have correct permissions or set RUSTFS_UID and RUSTFS_GID environment variables.

Advanced Configuration

Custom Credentials

Replace default credentials in production:

environment:
  - RUSTFS_ACCESS_KEY=your_access_key
  - RUSTFS_SECRET_KEY=your_secret_key

TLS Configuration

Add TLS certificates:

volumes:
  - ./certs:/opt/tls:ro
environment:
  - RUSTFS_TLS_PATH=/opt/tls

Resource Limits

Add resource constraints:

deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '1'
      memory: 2G

See Also

References