Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: houseme <4829346+houseme@users.noreply.github.com> Co-authored-by: houseme <housemecn@gmail.com>
15 KiB
Special Characters in Object Path - Comprehensive Analysis and Solution
Executive Summary
This document provides an in-depth analysis of the issues with special characters (spaces, plus signs, etc.) in object paths within RustFS, along with a comprehensive solution strategy.
Problem Statement
Issue Description
Users encounter problems when working with object paths containing special characters:
Part A: Spaces in Paths
mc cp README.md "local/dummy/a%20f+/b/c/3/README.md"
- The UI allows navigation to the folder
%20f+/ - However, it cannot display the contents within that folder
- CLI tools like
mc lscorrectly show the file exists
Part B: Plus Signs in Paths
Error: blob (key "/test/data/org_main-org/dashboards/ES+net/LHC+Data+Challenge/firefly-details.json")
api error InvalidArgument: Invalid argument
- Files with
+signs in paths cause 400 (Bad Request) errors - This affects clients using the Go Cloud Development Kit or similar libraries
Root Cause Analysis
URL Encoding in S3 API
According to the AWS S3 API specification:
-
Object keys in HTTP URLs MUST be URL-encoded
- Space character →
%20 - Plus sign →
%2B - Literal
+in URL path → stays as+(represents itself, not space)
- Space character →
-
URL encoding rules for S3 paths:
- In HTTP URLs:
/bucket/path%20with%20spaces/file%2Bname.txt - Decoded key:
path with spaces/file+name.txt - Note:
+in URL path represents a literal+, NOT a space
- In HTTP URLs:
-
Important distinction:
- In query parameters,
+represents space (form URL encoding) - In URL paths,
+represents a literal plus sign - Space in paths must be encoded as
%20
- In query parameters,
The s3s Library Behavior
The s3s library (version 0.12.0-rc.4) handles HTTP request parsing and URL decoding:
- Expected behavior: s3s should URL-decode the path from HTTP requests before passing keys to our handlers
- Current observation: There appears to be inconsistency or a bug in how keys are decoded
- Hypothesis: The library may not be properly handling certain special characters or edge cases
Where the Problem Manifests
The issue affects multiple operations:
- PUT Object: Uploading files with special characters in path
- GET Object: Retrieving files with special characters
- LIST Objects: Listing directory contents with special characters in path
- DELETE Object: Deleting files with special characters
Consistency Issues
The core problem is inconsistency in how paths are handled:
- Storage layer: May store objects with URL-encoded names
- Retrieval layer: May expect decoded names
- Comparison layer: Path matching fails when encoding differs
- List operation: Returns encoded or decoded names inconsistently
Technical Analysis
Current Implementation
1. Storage Layer (ecfs.rs)
// In put_object
let PutObjectInput {
bucket,
key, // ← This comes from s3s, should be URL-decoded
...
} = input;
store.put_object(&bucket, &key, &mut reader, &opts).await
2. List Objects Implementation
// In list_objects_v2
let object_infos = store
.list_objects_v2(
&bucket,
&prefix, // ← Should this be decoded?
continuation_token,
delimiter.clone(),
max_keys,
fetch_owner.unwrap_or_default(),
start_after,
incl_deleted,
)
.await
3. Object Retrieval
The key (object name) needs to match exactly between:
- How it's stored (during PUT)
- How it's queried (during GET/LIST)
- How it's compared (path matching)
The URL Encoding Problem
Consider this scenario:
- Client uploads:
PUT /bucket/a%20f+/file.txt - s3s decodes to:
a f+/file.txt(correct: %20→space, +→plus) - We store as:
a f+/file.txt - Client lists:
GET /bucket?prefix=a%20f+/ - s3s decodes to:
a f+/ - We search for:
a f+/ - Should work! ✓
But what if s3s is NOT decoding properly? Or decoding inconsistently?
- Client uploads:
PUT /bucket/a%20f+/file.txt - s3s passes:
a%20f+/file.txt(BUG: not decoded!) - We store as:
a%20f+/file.txt - Client lists:
GET /bucket?prefix=a%20f+/ - s3s passes:
a%20f+/ - We search for:
a%20f+/ - Works by accident! ✓
But then:
8. Client lists: GET /bucket?prefix=a+f%2B/ (encoding + as %2B)
9. s3s passes: a+f%2B/ or a+f+/ ??
10. We search for that, but stored name was a%20f+/
11. Mismatch! ✗
Solution Strategy
Approach 1: Trust s3s Library (Recommended)
Assumption: s3s library correctly URL-decodes all keys from HTTP requests
Strategy:
- Assume all keys received from s3s are already decoded
- Store objects with decoded names (UTF-8 strings with literal special chars)
- Use decoded names for all operations (GET, LIST, DELETE)
- Never manually URL-encode/decode keys in our handlers
- Trust s3s to handle HTTP-level encoding/decoding
Advantages:
- Follows separation of concerns
- Simpler code
- Relies on well-tested library behavior
Risks:
- If s3s has a bug, we're affected
- Need to verify s3s actually does this correctly
Approach 2: Explicit URL Decoding (Defensive)
Assumption: s3s may not decode keys properly, or there are edge cases
Strategy:
- Explicitly URL-decode all keys when received from s3s
- Use
urlencoding::decode()on all keys in handlers - Store and operate on decoded names
- Add safety checks and error handling
Implementation:
use urlencoding::decode;
// In put_object
let key = decode(&input.key)
.map_err(|e| s3_error!(InvalidArgument, format!("Invalid URL encoding in key: {}", e)))?
.into_owned();
Advantages:
- More defensive
- Explicit control
- Handles s3s bugs or limitations
Risks:
- Double-decoding if s3s already decodes
- May introduce new bugs
- More complex code
Approach 3: Hybrid Strategy (Most Robust)
Strategy:
- Add logging to understand what s3s actually passes us
- Create tests with various special characters
- Determine if s3s decodes correctly
- If yes → use Approach 1
- If no → use Approach 2 with explicit decoding
Recommended Implementation Plan
Phase 1: Investigation & Testing
-
Create comprehensive tests for special characters:
- Spaces (
/%20) - Plus signs (
+/%2B) - Percent signs (
%/%25) - Slashes in names (usually not allowed, but test edge cases)
- Unicode characters
- Mixed special characters
- Spaces (
-
Add detailed logging:
debug!("Received key from s3s: {:?}", key); debug!("Key bytes: {:?}", key.as_bytes()); -
Test with real S3 clients:
- AWS SDK
- MinIO client (mc)
- Go Cloud Development Kit
- boto3 (Python)
Phase 2: Fix Implementation
Based on Phase 1 findings, implement one of:
Option A: s3s handles decoding correctly
- Add tests to verify behavior
- Document the assumption
- Add assertions or validation
Option B: s3s has bugs or doesn't decode
- Add explicit URL decoding to all handlers
- Use
urlencoding::decode()consistently - Add error handling for invalid encoding
- Document the workaround
Phase 3: Ensure Consistency
-
Audit all key usage:
- PutObject
- GetObject
- DeleteObject
- ListObjects/ListObjectsV2
- CopyObject (source and destination)
- HeadObject
- Multi-part upload operations
-
Standardize key handling:
- Create a helper function
normalize_object_key() - Use it consistently everywhere
- Add validation
- Create a helper function
-
Update path utilities (
crates/utils/src/path.rs):- Ensure path manipulation functions handle special chars
- Add tests for path operations with special characters
Phase 4: Testing & Validation
-
Unit tests:
#[test] fn test_object_key_with_space() { let key = "path with spaces/file.txt"; // test PUT, GET, LIST operations } #[test] fn test_object_key_with_plus() { let key = "path+with+plus/file+name.txt"; // test all operations } #[test] fn test_object_key_with_mixed_special_chars() { let key = "complex/path with spaces+plus%percent.txt"; // test all operations } -
Integration tests:
- Test with real S3 clients
- Test mc (MinIO client) scenarios from the issue
- Test Go Cloud Development Kit scenario
- Test AWS SDK compatibility
-
Regression testing:
- Ensure existing tests still pass
- Test with normal filenames (no special chars)
- Test with existing data
Implementation Details
Key Functions to Modify
-
rustfs/src/storage/ecfs.rs:
put_object()- line ~2763get_object()- find implementationlist_objects_v2()- line ~2564delete_object()- find implementationcopy_object()- handle source and dest keyshead_object()- find implementation
-
Helper function to add:
/// Normalizes an object key by ensuring it's properly URL-decoded
/// and contains only valid UTF-8 characters.
///
/// This function should be called on all object keys received from
/// the S3 API to ensure consistent handling of special characters.
fn normalize_object_key(key: &str) -> S3Result<String> {
// If s3s already decodes, this is a no-op validation
// If not, this explicitly decodes
match urlencoding::decode(key) {
Ok(decoded) => Ok(decoded.into_owned()),
Err(e) => Err(s3_error!(
InvalidArgument,
format!("Invalid URL encoding in object key: {}", e)
)),
}
}
Testing Strategy
Create a new test module:
// crates/e2e_test/src/special_chars_test.rs
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_put_get_object_with_space() {
// Upload file with space in path
let bucket = "test-bucket";
let key = "folder/file with spaces.txt";
let content = b"test content";
// PUT
put_object(bucket, key, content).await.unwrap();
// GET
let retrieved = get_object(bucket, key).await.unwrap();
assert_eq!(retrieved, content);
// LIST
let objects = list_objects(bucket, "folder/").await.unwrap();
assert!(objects.iter().any(|obj| obj.key == key));
}
#[tokio::test]
async fn test_put_get_object_with_plus() {
let bucket = "test-bucket";
let key = "folder/ES+net/file+name.txt";
// ... similar test
}
#[tokio::test]
async fn test_mc_client_scenario() {
// Reproduce the exact scenario from the issue
let bucket = "dummy";
let key = "a f+/b/c/3/README.md"; // Decoded form
// ... test with mc client or simulate its behavior
}
}
Edge Cases and Considerations
1. Directory Markers
RustFS uses __XLDIR__ suffix for directories:
- Ensure special characters in directory names are handled
- Test:
"folder with spaces/__XLDIR__"
2. Multipart Upload
- Upload ID and part operations must handle special chars
- Test: Multipart upload of object with special char path
3. Copy Operations
CopyObject has both source and destination keys:
// Both need consistent handling
let src_key = input.copy_source.key();
let dest_key = input.key;
4. Presigned URLs
If RustFS supports presigned URLs, they need special attention:
- URL encoding in presigned URLs
- Signature calculation with encoded vs decoded keys
5. Event Notifications
Events include object keys:
- Ensure event payloads have properly encoded/decoded keys
- Test: Webhook target receives correct key format
6. Versioning
Version IDs with special character keys:
- Test: List object versions with special char keys
Security Considerations
Path Traversal
Ensure URL decoding doesn't enable path traversal:
// BAD: Don't allow
key = "../../../etc/passwd"
// After decoding:
key = "..%2F..%2F..%2Fetc%2Fpasswd" → "../../../etc/passwd"
// Solution: Validate decoded keys
fn validate_object_key(key: &str) -> S3Result<()> {
if key.contains("..") {
return Err(s3_error!(InvalidArgument, "Invalid object key"));
}
if key.starts_with('/') {
return Err(s3_error!(InvalidArgument, "Object key cannot start with /"));
}
Ok(())
}
Null Bytes
Ensure no null bytes in decoded keys:
if key.contains('\0') {
return Err(s3_error!(InvalidArgument, "Object key contains null byte"));
}
Testing with Real Clients
MinIO Client (mc)
# Test space in path (from issue)
mc cp README.md "local/dummy/a%20f+/b/c/3/README.md"
mc ls "local/dummy/a%20f+/"
mc ls "local/dummy/a%20f+/b/c/3/"
# Test plus in path
mc cp test.txt "local/bucket/ES+net/file+name.txt"
mc ls "local/bucket/ES+net/"
# Test mixed
mc cp data.json "local/bucket/folder%20with%20spaces+plus/file.json"
AWS CLI
# Upload with space
aws --endpoint-url=http://localhost:9000 s3 cp test.txt "s3://bucket/path with spaces/file.txt"
# List
aws --endpoint-url=http://localhost:9000 s3 ls "s3://bucket/path with spaces/"
Go Cloud Development Kit
import "gocloud.dev/blob"
// Test the exact scenario from the issue
key := "/test/data/org_main-org/dashboards/ES+net/LHC+Data+Challenge/firefly-details.json"
err := bucket.WriteAll(ctx, key, data, nil)
Success Criteria
The fix is successful when:
- ✅ mc client can upload files with spaces in path
- ✅ UI correctly displays folders with special characters
- ✅ UI can list contents of folders with special characters
- ✅ Files with
+in path can be uploaded without errors - ✅ All S3 operations (PUT, GET, LIST, DELETE) work with special chars
- ✅ Go Cloud Development Kit can upload files with
+in path - ✅ All existing tests still pass (no regressions)
- ✅ New tests cover various special character scenarios
Documentation Updates
After implementation, update:
- API Documentation: Document how special characters are handled
- Developer Guide: Best practices for object naming
- Migration Guide: If storage format changes
- FAQ: Common issues with special characters
- This Document: Final solution and lessons learned
References
- AWS S3 API Specification: https://docs.aws.amazon.com/AmazonS3/latest/API/
- URL Encoding RFC 3986: https://tools.ietf.org/html/rfc3986
- s3s Library: https://docs.rs/s3s/0.12.0-rc.4/
- urlencoding crate: https://docs.rs/urlencoding/
- Issue #1072 (referenced in comments)
Conclusion
The issue with special characters in object paths is a critical correctness bug that affects S3 API compatibility. The solution requires:
- Understanding how s3s library handles URL encoding
- Implementing consistent key handling across all operations
- Testing thoroughly with real S3 clients
- Validating that all edge cases are covered
The recommended approach is to start with investigation and testing (Phase 1) to understand the current behavior, then implement the appropriate fix with comprehensive test coverage.
Document Version: 1.0
Date: 2025-12-09
Author: RustFS Team
Status: Draft - Awaiting Investigation Results