Commit Graph

457 Commits

Author SHA1 Message Date
loverustfs
af5c0b13ef fix: HeadObject returns 404 for deleted objects with versioning enabled (#1229)
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-22 20:43:00 +08:00
weisd
80cfb4feab Add Disk Timeout and Health Check Functionality (#1196)
Signed-off-by: weisd <im@weisd.in>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-22 17:15:19 +08:00
houseme
08f1a31f3f Fix notification event stream cleanup, add bounded send concurrency, and reduce overhead (#1224) 2025-12-22 00:57:05 +08:00
loverustfs
f3a1431fa5 fix: resolve TLS handshake failure in inter-node communication (#1201) (#1222)
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-21 16:11:55 +08:00
yxrxy
3bd96bcf10 fix: resolve event target deletion issue (#1219) 2025-12-21 12:43:48 +08:00
GatewayJ
cc31e88c91 fix: expiration time (#1215) 2025-12-20 20:25:52 +08:00
yxrxy
b5535083de fix(iam): store previous credentials in .rustfs.sys bucket to preserv… (#1213) 2025-12-20 19:15:49 +08:00
Copilot
8dd3e8b534 fix: decode form-urlencoded object names in webhook/mqtt Key field (#1210)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-20 01:31:09 +08:00
houseme
4abfc9f554 Fix/fix event 1216 (#1191)
Signed-off-by: loverustfs <hello@rustfs.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-19 12:07:07 +08:00
Muhammed Hussain Karimi
46557cddd1 🧑‍💻 Improve shebang compatibility (#1180)
Signed-off-by: Muhammed Hussain Karimi <info@karimi.dev>
2025-12-18 20:13:24 +08:00
yxrxy
8821fcc1e7 feat: Replace LRU cache with Moka async cache in policy variables (#1166)
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-17 00:19:31 +08:00
houseme
17828ec2a8 Dependabot/cargo/s3s df2434d 1216 (#1170)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 21:21:43 +08:00
mythrnr
94d5b1c1e4 fix: format of bucket event notifications (#1138) 2025-12-16 20:44:57 +08:00
唐小鸭
52c2d15a4b feat: Implement whitelist-based HTTP response compression configuration (#1136)
Signed-off-by: 唐小鸭 <tangtang1251@qq.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 15:05:40 +08:00
yxrxy
352035a06f feat: Implement AWS policy variables support (#1131)
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-16 13:32:01 +08:00
yihong
fe4fabb195 fix: other two memory leak in the code base (#1160)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-16 11:45:45 +08:00
sunfkny
e8fe9731fd Fix memory leak in Cache update method (#1143) 2025-12-15 10:04:14 +08:00
yihong
67095c05f9 fix: update tool chain make everything happy (#1134)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-12-13 20:32:42 +08:00
houseme
9e2fa148ee Fix type errors in ecfs.rs and apply clippy fixes for Rust 1.92.0 (#1121) 2025-12-12 00:49:21 +08:00
houseme
1a4e95e940 chore: remove unused dependencies to optimize build (#1117) 2025-12-11 18:13:26 +08:00
dependabot[bot]
0da943a6a4 build(deps): bump s3s from 0.12.0-rc.4 to 0.12.0-rc.5 in the s3s group (#1046)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
2025-12-11 15:20:36 +08:00
guojidan
fba201df3d fix: harden data usage aggregation and cache handling (#1102)
Signed-off-by: junxiang Mu <1948535941@qq.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-11 09:55:25 +08:00
yxrxy
ccbab3232b fix: ListObjectsV2 correctly handles repeated folder names in prefixes (#1104)
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-11 09:38:52 +08:00
tennisleng
978845b555 fix(lifecycle): Fix ObjectInfo fields and mod_time error handling (#1088)
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-11 07:17:35 +08:00
Jacob
53c126d678 fix: decode percent-encoded paths in get_file_path() (#1072)
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-10 22:30:02 +08:00
Jörg Thalheim
2c86fe30ec Content encoding (#1089)
Signed-off-by: Jörg Thalheim <joerg@thalheim.io>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-10 15:21:51 +08:00
Copilot
20961d7c91 Add comprehensive special character handling with validation refactoring and extensive test coverage (#1078)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-09 13:40:29 +08:00
Jitter
76d25d9a20 Fix/issue #1001 dead node detection (#1054)
Co-authored-by: weisd <im@weisd.in>
Co-authored-by: Jitterx69 <mohit@example.com>
2025-12-08 12:29:46 +08:00
yihong
834025d9e3 docs: fix some dead link (#1053)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-12-08 11:23:24 +08:00
Jitter
cd6a26bc3a fix(net): resolve 1GB upload hang and macos build (Issue #1001 regression) (#1035) 2025-12-07 18:05:51 +08:00
Jitter
b10d80cbb6 fix: detect dead nodes via HTTP/2 keepalives (Issue #1001) (#1025)
Co-authored-by: weisd <im@weisd.in>
2025-12-06 21:45:42 +08:00
0xdx2
7c6cbaf837 feat: enhance error handling and add precondition checks for object o… (#1008) 2025-12-06 20:39:03 +08:00
weisd
030d3c9426 fix filemeta nil versionid (#1002) 2025-12-05 20:30:08 +08:00
唐小鸭
3a79242133 feat: The observability module can be set separately. (#993) 2025-12-05 13:46:06 +08:00
weisd
b3c80ae362 fix: listdir rpc (#979)
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-12-04 16:12:10 +08:00
houseme
5b0a3a0764 upgrade crate version and improve heal config (#963) 2025-12-03 18:49:11 +08:00
weisd
a8b7b28fd0 Fix Admin Heal API and Add Pagination Support for Large Buckets (#933)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-12-03 18:10:46 +08:00
weisd
4d7bf98c82 add logs (#962) 2025-12-03 13:17:47 +08:00
Copilot
fdcdb30d28 Optimize concurrent GetObject performance with Moka cache, comprehensive metrics, complete test suite, cache writeback, and comprehensive documentation (#916)
* Initial plan

* feat: add concurrency-aware buffer sizing and hot object caching for GetObject

- Implement adaptive buffer sizing based on concurrent request load
- Add per-request tracking with automatic cleanup using RAII guards
- Implement hot object cache (LRU) for frequently accessed small files (<= 10MB)
- Add disk I/O semaphore to prevent saturation under extreme load
- Integrate concurrency module into GetObject implementation
- Buffer sizes now adapt: low concurrency uses large buffers for throughput,
  high concurrency uses smaller buffers for fairness and memory efficiency
- Add comprehensive metrics collection for monitoring performance

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* docs: add comprehensive documentation and tests for concurrent GetObject optimization

- Add detailed technical documentation explaining the solution
- Document root cause analysis and solution architecture
- Include performance expectations and testing recommendations
- Add integration tests for concurrency tracking and buffer sizing
- Add cache behavior tests
- Include benchmark tests for concurrent request handling

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix: address code review issues in concurrency module

- Fix race condition in cache size tracking by using consistent atomic operations within lock
- Correct buffer sizing logic: 1-2 requests use 100%, 3-4 use 75%, 5-8 use 50%, >8 use 40%
- Improve error message for semaphore acquire failure
- Document limitation of streaming cache implementation (not yet implemented)
- Add TODO for proper streaming cache with suggested approaches
- Update tests to match corrected buffer sizing thresholds

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* docs: add comprehensive implementation summary for concurrent GetObject optimization

- Executive summary of problem and solution
- Detailed architecture documentation for each optimization
- Integration points and code examples
- Complete testing strategy and performance validation steps
- Monitoring and observability guidelines with Prometheus queries
- Deployment considerations and rollout strategy
- Future enhancement roadmap
- Success criteria and quantitative metrics

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix

* refactor: optimize cache with lru 0.16.2 read-first pattern and add advanced features

- Implement optimized read-first cache access using peek() to reduce write lock contention
- Add batch cache operations: get_cached_batch() for efficient multi-object retrieval
- Add cache utility methods: is_cached(), remove_cached(), get_hot_keys()
- Implement warm_cache() for pre-populating cache on startup
- Add get_advanced_buffer_size() with file size and sequential read optimization
- Enhance test suite with 8 new comprehensive tests covering:
  - Batch operations and cache warming
  - Hot keys tracking and analysis
  - Cache removal and LRU behavior verification
  - Concurrent cache access performance
  - Advanced buffer sizing strategies
- Improve documentation and code comments in English throughout

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* docs: add final comprehensive optimization summary

- Complete technical documentation of all optimizations
- Detailed performance analysis and metrics
- Production deployment guide with examples
- Comprehensive API reference and usage patterns
- Migration guide and future enhancement roadmap
- All documentation in professional English

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix

* add moka crate for cache

* feat: replace LRU with Moka cache and add comprehensive metrics

- Replace lru crate with moka 0.12.11 for superior concurrent performance
- Implement lock-free cache with automatic TTL/TTI expiration
- Add size-based eviction using Moka's weigher function
- Integrate comprehensive metrics collection throughout GetObject flow:
  * Cache hit/miss tracking with per-key access counts
  * Request concurrency gauges
  * Disk permit wait time histograms
  * Total request duration tracking
  * Response size and buffer size histograms
- Deep integration with ecfs.rs GetObject operation
- Add hit rate calculation method
- Enhanced CacheStats with hit/miss counters
- Lock-free concurrent reads for better scalability

Moka advantages over LRU:
- True lock-free concurrent access
- Built-in TTL and TTI support
- Automatic size-based eviction
- Better performance under high concurrency
- Native async support

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* docs: add comprehensive Moka cache migration and metrics documentation

- Complete technical documentation of LRU to Moka migration
- Detailed performance comparison and benchmarks
- Comprehensive metrics catalog with 15+ Prometheus metrics
- Prometheus query examples for monitoring
- Dashboard and alerting recommendations
- Migration guide with code examples
- Troubleshooting guide for common issues
- Future enhancement roadmap

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix

* refactor: update tests for Moka cache implementation

- Completely refactor test suite to align with Moka-based concurrency.rs
- Add Clone derive to ConcurrencyManager for test convenience
- Update all tests to handle Moka's async behavior with proper delays
- Add new tests:
  * test_cache_hit_rate - validate hit rate calculation
  * test_ttl_expiration - verify TTL configuration
  * test_is_cached_no_side_effects - ensure contains doesn't affect LRU
  * bench_concurrent_cache_performance - benchmark concurrent access
- Updated existing tests:
  * test_moka_cache_operations - renamed and updated for Moka API
  * test_moka_cache_eviction - validate automatic eviction
  * test_hot_keys_tracking - improved assertions for sorted results
  * test_concurrent_cache_access - validate lock-free performance
- All tests now include appropriate sleep delays for Moka's async processing
- Enhanced documentation and assertions for better test clarity
- Total: 18 comprehensive integration tests

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* docs: add comprehensive Moka test suite documentation

- Complete test suite documentation for all 18 tests
- Detailed test patterns and best practices for Moka
- Debugging guide for common test failures
- Performance expectations and CI integration
- 100% API coverage validation
- Future enhancement roadmap

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fmt

* docs: enhance code comments and test documentation throughout

- Significantly improve module-level documentation in concurrency.rs
- Add detailed explanations of performance characteristics and expected improvements
- Document the solution to issue #911 with before/after metrics
- Enhance constant documentation with purpose and impact explanations
- Add comprehensive RAII guard documentation with thread safety notes
- Improve test documentation with detailed test strategy and rationale
- Add Moka-specific test patterns and timing guidance
- Include performance expectations for each test category
- Document why each test matters for solving the original issue
- All documentation written in professional English
- Follow Rust documentation best practices with examples

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* remove lru crate

* upgrade version

* fix: resolve test errors by correcting module structure and test assertions

- Fix test import paths to use crate:: instead of rustfs:: (binary-only crate)
- Keep test file in src/storage/ instead of tests/ (no lib.rs exists)
- Add #[cfg(test)] guard to mod declaration in storage/mod.rs
- Fix Arc type annotations for Moka's ConcurrencyManager in concurrent tests
- Correct test_buffer_size_bounds assertions to match actual implementation:
  * Minimum buffer is 32KB for files <100KB, 64KB otherwise
  * Maximum buffer respects base_buffer_size when concurrency is low
  * Buffer sizing doesn't cap at file size, only at min/max constraints
- All 17 integration tests now pass successfully

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix: modify `TimeoutLayer::new` to `TimeoutLayer::with_status_code` and improve docker health check

* fix

* feat: implement cache writeback for small objects in GetObject

- Add cache writeback logic for objects meeting caching criteria:
  * No range/part request (full object retrieval)
  * Object size known and <= 10MB (max_object_size threshold)
  * Not encrypted (SSE-C or managed encryption)
- Read eligible objects into memory and cache via background task
- Serve response from in-memory data for immediate client response
- Add metrics counter for cache writeback operations
- Add 3 new tests for cache writeback functionality:
  * test_cache_writeback_flow - validates round-trip caching
  * test_cache_writeback_size_limit - ensures large objects aren't cached
  * test_cache_writeback_concurrent - validates thread-safe concurrent writes
- Update test suite documentation (now 20 comprehensive tests)

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* improve code for const

* cargo clippy

* feat: add cache enable/disable configuration via environment variable

- Add is_cache_enabled() method to ConcurrencyManager
- Read RUSTFS_OBJECT_CACHE_ENABLE env var (default: false) at startup
- Update ecfs.rs to check is_cache_enabled() before cache lookup and writeback
- Cache lookup and writeback now respect the enable flag
- Add test_cache_enable_configuration test
- Constants already exist in rustfs_config:
  * ENV_OBJECT_CACHE_ENABLE = "RUSTFS_OBJECT_CACHE_ENABLE"
  * DEFAULT_OBJECT_CACHE_ENABLE = false
- Total: 21 comprehensive tests passing

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fmt

* fix

* fix

* feat: implement comprehensive CachedGetObject response cache with metadata

- Add CachedGetObject struct with full response metadata fields:
  * body, content_length, content_type, e_tag, last_modified
  * expires, cache_control, content_disposition, content_encoding
  * storage_class, version_id, delete_marker, tag_count, etc.
- Add dual cache architecture in HotObjectCache:
  * Legacy simple byte cache for backward compatibility
  * New response cache for complete GetObject responses
- Add ConcurrencyManager methods for response caching:
  * get_cached_object() - retrieve cached response with metadata
  * put_cached_object() - store complete response
  * invalidate_cache() - invalidate on write operations
  * invalidate_cache_versioned() - invalidate both version and latest
  * make_cache_key() - generate cache keys with version support
  * max_object_size() - get cache threshold
- Add builder pattern for CachedGetObject construction
- Add 6 new tests for response cache functionality (27 total):
  * test_cached_get_object_basic - basic operations
  * test_cached_get_object_versioned - version key handling
  * test_cache_invalidation - write operation invalidation
  * test_cache_invalidation_versioned - versioned invalidation
  * test_cached_get_object_size_limit - size enforcement
  * test_max_object_size - threshold accessor

All 27 tests pass successfully.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* feat: integrate CachedGetObject cache in ecfs.rs with full metadata and cache invalidation

Integration of CachedGetObject response cache in ecfs.rs:
1. get_object: Cache lookup uses get_cached_object() with full metadata
   - Returns complete response with e_tag, last_modified, content_type, etc.
   - Parses last_modified from RFC3339 string
   - Supports versioned cache keys via make_cache_key()

2. get_object: Cache writeback uses put_cached_object() with metadata
   - Stores content_type, e_tag, last_modified in CachedGetObject
   - Background writeback via tokio::spawn()

3. Cache invalidation added to write operations:
   - put_object: invalidate_cache_versioned() after store.put_object()
   - put_object_extract: invalidate_cache_versioned() after each file extraction
   - copy_object: invalidate_cache_versioned() after store.copy_object()
   - delete_object: invalidate_cache_versioned() after store.delete_object()
   - delete_objects: invalidate_cache_versioned() for each deleted object
   - complete_multipart_upload: invalidate_cache_versioned() after completion

4. Fixed test_adaptive_buffer_sizing to be more robust with parallel tests

All 27 tests pass.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix: add error logging for time parsing in cache lookup and writeback

- Add warning log when RFC3339 parsing fails in cache lookup
- Add warning log when time formatting fails in cache writeback
- Improves debugging for cache-related issues

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix

* upgrade version

* fmt

* add http: 4138 and add max_object_size key

* fix

* fix

* fix

* modify metrics key

* add

* upgrade crates version and improve docker observability

* feat: implement adaptive I/O strategy based on disk permit wait time

Implements adaptive I/O strategy for GetObject operations based on disk permit
wait times, as requested in issue #911.

Key changes:

1. New types in concurrency.rs:
   - IoLoadLevel enum: Low (<10ms), Medium (10-50ms), High (50-200ms), Critical (>200ms)
   - IoStrategy struct: contains optimized I/O parameters based on load level
   - IoLoadMetrics: rolling window metrics for load tracking

2. New ConcurrencyManager methods:
   - calculate_io_strategy(): calculates adaptive strategy from permit wait duration
   - record_permit_wait(): records observation for load tracking
   - smoothed_load_level(): returns averaged load level for stability
   - io_load_stats(): returns (avg_wait, p95_wait, max_wait, count) for monitoring
   - adaptive_buffer_size(): convenience method for buffer sizing

3. Integration in ecfs.rs get_object:
   - Calculate IoStrategy after acquiring disk permit
   - Use strategy buffer_size instead of static sizing
   - Consider strategy.cache_writeback_enabled in cache decision
   - Record new metrics: io.load.level gauge, io.buffer.multiplier gauge,
     io.strategy.selected counter by load level

4. New tests (4 added, 31 total):
   - test_io_load_level_classification: validates load level thresholds
   - test_io_strategy_buffer_sizing: validates buffer multipliers
   - test_calculate_io_strategy: validates manager strategy calculation
   - test_io_load_stats: validates stats tracking

All 31 tests pass. Clippy clean. Formatted.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix

* docs: add comprehensive architecture design and fix cache hit event notification

1. Added CONCURRENCY_ARCHITECTURE.md with complete design documentation:
   - Executive summary and problem statement
   - Architecture overview with request flow diagram
   - Detailed module analysis for concurrency.rs and ecfs.rs
   - Critical analysis of helper.complete() for cache hits
   - Adaptive I/O strategy design with algorithm
   - Cache architecture with CachedGetObject structure
   - Metrics and monitoring with Prometheus queries
   - Performance characteristics and future enhancements

2. Fixed critical issue: Cache hit path now calls helper.complete()
   - S3 bucket notifications (s3:GetObject events) now trigger for cache hits
   - Event-driven workflows (Lambda, SNS) work correctly for all object access
   - Maintains audit trail for both cache hits and misses

All 31 tests pass.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix: set object info and version_id on helper before complete() for cache hits

When serving from cache, properly configure the OperationHelper before
calling complete() to ensure S3 bucket notifications include complete
object metadata:

1. Build ObjectInfo from cached metadata:
   - bucket, name, size, actual_size
   - etag, mod_time, version_id, delete_marker
   - storage_class, content_type, content_encoding
   - user_metadata (user_defined)

2. Set helper.object(event_info).version_id(version_id_str) before complete()

3. Updated CONCURRENCY_ARCHITECTURE.md with:
   - Complete code example for cache hit event notification
   - Explanation of why ObjectInfo is required
   - Documentation of version_id handling

This ensures:
- Lambda triggers receive proper object metadata for cache hits
- SNS/SQS notifications include complete information
- Audit logs contain accurate object details
- Version-specific event routing works correctly

All 31 tests pass.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* improve code

* fmt

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-30 01:16:55 +08:00
唐小鸭
701960dd81 fix out of range for slice (#931) 2025-11-27 15:57:38 +08:00
Shyim
ee04cc77a0 remove debug (#912)
* remove debug

* Refactor get_global_encryption_service function

* Refactor get_global_encryption_service function

---------

Co-authored-by: loverustfs <hello@rustfs.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-26 11:56:01 +08:00
houseme
069194f553 Fix/getobjectlength (#920)
* fix getobject content length resp

* Fix regression in exception handling for non-existent key with enhanced compression predicate and metadata improvements (#915)

* Initial plan

* Fix GetObject regression by excluding error responses from compression

The issue was that CompressionLayer was attempting to compress error responses,
which could cause Content-Length header mismatches. By excluding 4xx and 5xx
responses from compression, we ensure error responses (like NoSuchKey) are sent
correctly without body truncation.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Enhance NoSuchKey fix with improved compression predicate and comprehensive tests

- Enhanced ShouldCompress predicate with size-based exclusion (< 256 bytes)
- Added detailed documentation explaining the compression logic
- Added debug logging for better observability
- Created comprehensive test suite with 4 test cases:
  - test_get_deleted_object_returns_nosuchkey
  - test_head_deleted_object_returns_nosuchkey
  - test_get_nonexistent_object_returns_nosuchkey
  - test_multiple_gets_deleted_object
- Added extensive inline documentation and comments
- Created docs/fix-nosuchkey-regression.md with full analysis

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add compression best practices documentation

Added comprehensive guide covering:
- Best practices for HTTP response compression
- Common pitfalls and solutions
- Performance considerations and trade-offs
- Testing guidelines and examples
- Monitoring and alerting recommendations
- Migration guide for existing services

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fmt

* fmt

* Fix/objectdelete (#917)

* fix getobject content length resp

* fix delete object

---------

Co-authored-by: houseme <housemecn@gmail.com>

* Add comprehensive analysis of NoSuchKey fix and related improvements

Created detailed documentation analyzing:
- HTTP compression layer fix (primary issue)
- Content-length calculation fix from PR #917
- Delete object metadata fixes from PR #917
- How all components work together
- Complete scenario walkthrough
- Performance impact analysis
- Testing strategy and deployment checklist

This ties together all the changes in the PR branch including the merged
improvements from PR #917.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* replace `once_cell` to `std`

* fmt

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: weisd <im@weisd.in>

* fmt

---------

Co-authored-by: weisd <weishidavip@163.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: weisd <im@weisd.in>
2025-11-24 18:56:34 +08:00
weisd
fce4e64da4 Fix/objectdelete (#917)
* fix getobject content length resp

* fix delete object

---------

Co-authored-by: houseme <housemecn@gmail.com>
2025-11-24 16:35:51 +08:00
houseme
18cd9a8b46 build(deps): bump the dependencies group with 5 updates (#896) 2025-11-20 13:04:24 +08:00
Copilot
277d80de13 Fix: Implement priority-based heal queue with comprehensive diagnostic logging (#884)
* Initial plan

* Implement priority-based heal queue with deduplication

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Apply cargo fmt formatting fixes

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add queue monitoring, better error handling, and adaptive processing

- Add priority-based queue statistics tracking
- Implement queue capacity warnings (>80% full)
- Process multiple tasks per cycle when capacity allows
- Add proper error logging for failed heal request submissions
- Add Hash trait to HealPriority for HashMap support
- Improve observability with detailed queue status logs

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive diagnostic logging to heal flow

- Add detailed logging for shard availability checks during reads
- Log successful heal request submissions
- Add detailed logging in heal_object for disk checking
- Log each disk's heal decision with reason and endpoint
- Add logging for heal writing and renaming phases
- Improve variable naming (nil_count -> available_shards)
- Add success confirmation messages for renamed healed data

These logs will help diagnose where the heal flow is failing.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* feat(heal): enhance robustness, safety, and observability of heal process

- **Logging & Observability**:
  - Upgrade critical heal logs from [info](/crates/e2e_test/src/reliant/node_interact_test.rs:196:0-213:1) to `warn` for better visibility.
  - Implement structured logging with `tracing` fields for machine readability.
  - Add `#[tracing::instrument]` to [HealTask](c/crates/ahm/src/heal/task.rs:182:0-205:1) and [SetDisks](/crates/ecstore/src/set_disk.rs:120:0-131:1) methods for automatic context propagation.

- **Robustness**:
  - Add exponential backoff retry (3 attempts) for acquiring write locks in [heal_object](/crates/ahm/src/heal/storage.rs:438:4-460:5) to handle contention.
  - Handle [rename_data](/crates/ecstore/src/set_disk.rs:392:4-516:5) failures gracefully by preserving temporary files instead of forcing deletion, preventing potential data loss.

- **Data Safety**:
  - Fix [object_exists](/crates/ahm/src/heal/storage.rs:395:4-412:5) to propagate IO errors instead of treating them as "object not found".
  - Update [ErasureSetHealer](/crates/ahm/src/heal/erasure_healer.rs:28:0-33:1) to mark objects as failed rather than skipped when existence checks error, ensuring they are tracked for retry.

* fix

* fmt

* improve code for heal_object

* fix

* fix

* fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-20 00:36:25 +08:00
shiro.lee
9b9bbb662b fix: removing the Limit on the Number of Object Versions (#819) (#892)
removing the Limit on the Number of Object Versions (#819)
2025-11-19 22:34:26 +08:00
weisd
55d44622ed list object include deleted support (#882)
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-18 21:51:10 +08:00
weisd
85bc0ce2d5 fix: filemeta version handling and delete operations (#879)
* fix filemeta version

* fix clippy

* fix delete version

* fix clippy/test
2025-11-18 09:24:22 +08:00
Copilot
601f3456bc Fix large file upload freeze with adaptive buffer sizing (#869)
* Initial plan

* Fix large file upload freeze by increasing StreamReader buffer size

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive documentation for large file upload freeze fix

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* upgrade s3s version

* Fix compilation error: use BufReader instead of non-existent StreamReader::with_capacity

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Update documentation with correct BufReader implementation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* add tokio feature `io-util`

* Implement adaptive buffer sizing based on file size

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Constants are managed uniformly and fmt code

* fix

* Fix: Trigger self-heal on read when shards missing from rejoined nodes (#871)

* Initial plan

* Fix: Trigger self-heal when missing shards detected during read

- Added proactive heal detection in get_object_with_fileinfo
- When reading an object, now checks if any shards are missing even if read succeeds
- Sends low-priority heal request to reconstruct missing shards on rejoined nodes
- This fixes the issue where data written during node outage is not healed when node rejoins

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* Unify CRC implementations to crc-fast (#873)

* Initial plan

* Replace CRC libraries with unified crc-fast implementation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix: replace low to Normal

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-17 23:15:20 +08:00
weisd
1279baa72b fix replication (#875) 2025-11-17 17:37:41 +08:00