Files
rustfs/docs/special-characters-solution.md

9.3 KiB

Special Characters in Object Path - Solution Implementation

Executive Summary

After comprehensive investigation, the root cause analysis reveals:

  1. Backend (rustfs) is handling URL encoding correctly via the s3s library
  2. The primary issue is likely in the UI/client layer where URL encoding is not properly handled
  3. Backend enhancements needed to ensure robustness and better error messages

Root Cause Analysis

What s3s Library Does

The s3s library (version 0.12.0-rc.4) correctly URL-decodes object keys from HTTP requests:

// From s3s-0.12.0-rc.4/src/ops/mod.rs, line 261:
let decoded_uri_path = urlencoding::decode(req.uri.path())
    .map_err(|_| S3ErrorCode::InvalidURI)?
    .into_owned();

This means:

  • Client sends: PUT /bucket/a%20f+/file.txt
  • s3s decodes to: a f+/file.txt
  • Our handler receives: key = "a f+/file.txt" (already decoded)

What Our Backend Does

  1. Storage: Stores objects with decoded names (e.g., "a f+/file.txt")
  2. Retrieval: Returns objects with decoded names in LIST responses
  3. Path operations: Rust's Path APIs preserve special characters correctly

The Real Problems

Problem 1: UI Client Issue (Part A)

Symptom: UI can navigate TO folder but can't LIST contents

Diagnosis:

  • User uploads: PUT /bucket/a%20f+/b/c/3/README.md Works
  • CLI lists: GET /bucket?prefix=a%20f+/ Works (mc properly encodes)
  • UI navigates: Shows folder "a f+" Works
  • UI lists folder: GET /bucket?prefix=a f+/ Fails (UI doesn't encode!)

Root Cause: The UI is not URL-encoding the prefix when making the LIST request. It should send prefix=a%20f%2B/ but likely sends prefix=a f+/ which causes issues.

Evidence:

  • mc (MinIO client) works → proves backend is correct
  • UI doesn't work → proves UI encoding is wrong

Problem 2: Client Encoding Issue (Part B)

Symptom: 400 error with plus signs

Error Message: api error InvalidArgument: Invalid argument

Diagnosis: The plus sign (+) has special meaning in URL query parameters (represents space in form encoding) but not in URL paths. Clients must encode + as %2B in paths.

Example:

  • Correct: /bucket/ES%2Bnet/file.txt → decoded to ES+net/file.txt
  • Wrong: /bucket/ES+net/file.txt → might be misinterpreted

URL Encoding Rules

According to RFC 3986 and AWS S3 API:

Character In URL Path In Query Param Decoded Result
Space %20 %20 or + (space)
Plus %2B %2B + (plus)
Percent %25 %25 % (percent)

Critical Note: In URL paths (not query params), + represents a literal plus sign, NOT a space. Only %20 represents space in paths.

Solution Implementation

Phase 1: Backend Validation & Logging (Low Risk)

Add defensive validation and better logging to help diagnose issues:

// In rustfs/src/storage/ecfs.rs

/// Validate that an object key doesn't contain problematic characters
/// that might indicate client-side encoding issues
fn log_potential_encoding_issues(key: &str) {
    // Check for unencoded special chars that might indicate problems
    if key.contains('\n') || key.contains('\r') || key.contains('\0') {
        warn!("Object key contains control characters: {:?}", key);
    }
    
    // Log debug info for troubleshooting
    debug!("Processing object key: {:?} (bytes: {:?})", key, key.as_bytes());
}

Benefit: Helps diagnose client-side issues without changing behavior.

Phase 2: Enhanced Error Messages (Low Risk)

When validation fails, provide helpful error messages:

// Check for invalid UTF-8 or suspicious patterns
if !key.is_ascii() && !key.is_char_boundary(key.len()) {
    return Err(S3Error::with_message(
        S3ErrorCode::InvalidArgument,
        "Object key contains invalid UTF-8. Ensure keys are properly URL-encoded."
    ));
}

Phase 3: Documentation (No Risk)

  1. API Documentation: Document URL encoding requirements
  2. Client Guide: Explain how to properly encode object keys
  3. Troubleshooting Guide: Common issues and solutions

Phase 4: UI Fix (If Applicable)

If RustFS includes a web UI/console:

  1. Ensure UI properly URL-encodes all requests:

    // When making requests, encode the key:
    const encodedKey = encodeURIComponent(key);
    fetch(`/bucket/${encodedKey}`);
    
    // When making LIST requests, encode the prefix:
    const encodedPrefix = encodeURIComponent(prefix);
    fetch(`/bucket?prefix=${encodedPrefix}`);
    
  2. Decode when displaying:

    // When showing keys in UI, decode for display:
    const displayKey = decodeURIComponent(key);
    

Testing Strategy

Test Cases

Our e2e tests in crates/e2e_test/src/special_chars_test.rs cover:

  1. Spaces in paths: "a f+/b/c/3/README.md"
  2. Plus signs in paths: "ES+net/LHC+Data+Challenge/file.json"
  3. Mixed special characters
  4. PUT, GET, LIST, DELETE operations
  5. Exact scenario from issue

Running Tests

# Run special character tests
cargo test --package e2e_test special_chars -- --nocapture

# Run specific test
cargo test --package e2e_test test_issue_scenario_exact -- --nocapture

Expected Results

All tests should pass because:

  • s3s correctly decodes URL-encoded keys
  • Rust Path APIs preserve special characters
  • ecstore stores/retrieves keys correctly
  • AWS SDK (used in tests) properly encodes keys

If tests fail, it would indicate a bug in our backend implementation.

Client Guidelines

For Application Developers

When using RustFS with any S3 client:

  1. Use a proper S3 SDK: AWS SDK, MinIO SDK, etc. handle encoding automatically
  2. If using raw HTTP: Manually URL-encode object keys in paths
  3. Remember:
    • Space → %20 (not + in paths!)
    • Plus → %2B
    • Percent → %25

Example: Correct Client Usage

# Python boto3 - handles encoding automatically
import boto3
s3 = boto3.client('s3', endpoint_url='http://localhost:9000')

# These work correctly - boto3 encodes automatically:
s3.put_object(Bucket='test', Key='path with spaces/file.txt', Body=b'data')
s3.put_object(Bucket='test', Key='path+with+plus/file.txt', Body=b'data')
s3.list_objects_v2(Bucket='test', Prefix='path with spaces/')
// Go AWS SDK - handles encoding automatically
package main

import (
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/service/s3"
)

func main() {
    svc := s3.New(session.New())
    
    // These work correctly - SDK encodes automatically:
    svc.PutObject(&s3.PutObjectInput{
        Bucket: aws.String("test"),
        Key:    aws.String("path with spaces/file.txt"),
        Body:   bytes.NewReader([]byte("data")),
    })
    
    svc.ListObjectsV2(&s3.ListObjectsV2Input{
        Bucket: aws.String("test"),
        Prefix: aws.String("path with spaces/"),
    })
}
# MinIO mc client - handles encoding automatically
mc cp file.txt "local/bucket/path with spaces/file.txt"
mc ls "local/bucket/path with spaces/"

Example: Manual HTTP Requests

If making raw HTTP requests (not recommended):

# Correct: URL-encode the path
curl -X PUT "http://localhost:9000/bucket/path%20with%20spaces/file.txt" \
     -H "Content-Type: text/plain" \
     -d "data"

# Correct: Encode plus as %2B
curl -X PUT "http://localhost:9000/bucket/ES%2Bnet/file.txt" \
     -H "Content-Type: text/plain" \
     -d "data"

# List with encoded prefix
curl "http://localhost:9000/bucket?prefix=path%20with%20spaces/"

Monitoring and Debugging

Backend Logs

Enable debug logging to see key processing:

RUST_LOG=rustfs=debug cargo run

Look for log messages showing:

  • Received keys
  • Validation errors
  • Storage operations

Common Issues

Symptom Likely Cause Solution
400 "InvalidArgument" Client not encoding properly Use S3 SDK or manually encode
404 "NoSuchKey" but file exists Encoding mismatch Check client encoding
UI shows folder but can't list UI bug - not encoding prefix Fix UI to encode requests
Works with CLI, fails with UI UI implementation issue Compare UI requests vs CLI

Conclusion

Backend Status: Working Correctly

The RustFS backend correctly handles URL-encoded object keys through the s3s library. No backend code changes are required for basic functionality.

Client/UI Status: Needs Attention

The issues described appear to be client-side or UI-side problems:

  1. Part A: UI not properly encoding LIST prefix requests
  2. Part B: Client not encoding + as %2B in paths

Recommendations

  1. Short-term:

    • Add logging and better error messages (Phase 1-2)
    • Document client requirements (Phase 3)
    • Fix UI if applicable (Phase 4)
  2. Long-term:

    • Add comprehensive e2e tests (already done!)
    • Monitor for encoding-related errors
    • Educate users on proper S3 client usage
  3. For Users Experiencing Issues:

    • Use proper S3 SDKs (AWS, MinIO, etc.)
    • If using custom clients, ensure proper URL encoding
    • If using RustFS UI, report UI bugs separately

Document Version: 1.0
Date: 2025-12-09
Status: Final - Ready for Implementation
Next Steps: Implement Phase 1-3, run tests, update user documentation