Compare commits

..

38 Commits

Author SHA1 Message Date
weisd
b73ca0220e add ifmatch support (#885)
Co-authored-by: loverustfs <hello@rustfs.com>
2025-11-20 08:40:47 +08:00
loverustfs
b4071d493c Modify README 2025-11-20 08:24:07 +08:00
Copilot
277d80de13 Fix: Implement priority-based heal queue with comprehensive diagnostic logging (#884)
* Initial plan

* Implement priority-based heal queue with deduplication

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Apply cargo fmt formatting fixes

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add queue monitoring, better error handling, and adaptive processing

- Add priority-based queue statistics tracking
- Implement queue capacity warnings (>80% full)
- Process multiple tasks per cycle when capacity allows
- Add proper error logging for failed heal request submissions
- Add Hash trait to HealPriority for HashMap support
- Improve observability with detailed queue status logs

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive diagnostic logging to heal flow

- Add detailed logging for shard availability checks during reads
- Log successful heal request submissions
- Add detailed logging in heal_object for disk checking
- Log each disk's heal decision with reason and endpoint
- Add logging for heal writing and renaming phases
- Improve variable naming (nil_count -> available_shards)
- Add success confirmation messages for renamed healed data

These logs will help diagnose where the heal flow is failing.

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* feat(heal): enhance robustness, safety, and observability of heal process

- **Logging & Observability**:
  - Upgrade critical heal logs from [info](/crates/e2e_test/src/reliant/node_interact_test.rs:196:0-213:1) to `warn` for better visibility.
  - Implement structured logging with `tracing` fields for machine readability.
  - Add `#[tracing::instrument]` to [HealTask](c/crates/ahm/src/heal/task.rs:182:0-205:1) and [SetDisks](/crates/ecstore/src/set_disk.rs:120:0-131:1) methods for automatic context propagation.

- **Robustness**:
  - Add exponential backoff retry (3 attempts) for acquiring write locks in [heal_object](/crates/ahm/src/heal/storage.rs:438:4-460:5) to handle contention.
  - Handle [rename_data](/crates/ecstore/src/set_disk.rs:392:4-516:5) failures gracefully by preserving temporary files instead of forcing deletion, preventing potential data loss.

- **Data Safety**:
  - Fix [object_exists](/crates/ahm/src/heal/storage.rs:395:4-412:5) to propagate IO errors instead of treating them as "object not found".
  - Update [ErasureSetHealer](/crates/ahm/src/heal/erasure_healer.rs:28:0-33:1) to mark objects as failed rather than skipped when existence checks error, ensuring they are tracked for retry.

* fix

* fmt

* improve code for heal_object

* fix

* fix

* fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-20 00:36:25 +08:00
shiro.lee
9b9bbb662b fix: removing the Limit on the Number of Object Versions (#819) (#892)
removing the Limit on the Number of Object Versions (#819)
2025-11-19 22:34:26 +08:00
majinghe
44f3f3d070 add standalone mode support (#881)
* add standalone mode support

* update readme file

* change non-root from 1000 to 10001

* delete self sign crt content

* modify security content

* fix synatx error for readme file.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update image repository and tag info.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix helm chart syntax issue.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix helm chart syntax issue.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2025-11-19 13:56:21 +08:00
weisd
a13ce08590 fix listbucket sign (#890) 2025-11-19 11:08:16 +08:00
weisd
55d44622ed list object include deleted support (#882)
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-18 21:51:10 +08:00
Copilot
6da5766ea2 Implement adaptive buffer sizing with workload profiles - Production-Ready Implementation (#878)
* Initial plan

* Add workload profiles module with adaptive buffer sizing

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive adaptive buffer sizing documentation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add implementation summary documentation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 2: Opt-In Usage for buffer profiles

- Add RUSTFS_BUFFER_PROFILE_ENABLE and RUSTFS_BUFFER_PROFILE config options
- Implement global buffer configuration with enable/disable flag
- Add WorkloadProfile::from_name() for parsing profile names
- Update put_object, put_object_extract, and upload_part to use opt-in function
- Add init_buffer_profile_system() in main.rs for initialization
- Add comprehensive tests for opt-in functionality
- Update documentation with Phase 2 usage examples

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 3: Default Enablement for buffer profiles

- Change default to enabled with GeneralPurpose profile
- Replace --buffer-profile-enable with --buffer-profile-disable for opt-out
- Update init_buffer_profile_system() to reflect Phase 3 logic
- Add comprehensive Phase 3 documentation
- Create MIGRATION_PHASE3.md guide for smooth transition
- Add test for Phase 3 default behavior
- Update IMPLEMENTATION_SUMMARY.md to mark Phase 3 as complete

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Implement Phase 4: Full Integration with deprecated legacy and metrics

- Deprecate get_adaptive_buffer_size() function (backward compatible)
- Implement profile-only buffer sizing in get_buffer_size_opt_in()
- Add performance metrics collection (optional, via metrics feature)
- Even disabled mode now uses GeneralPurpose profile (no hardcoded values)
- Add comprehensive Phase 4 documentation and guide
- Add test_phase4_full_integration() test
- Update IMPLEMENTATION_SUMMARY.md to mark Phase 4 as complete

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Finalize adaptive buffer sizing: remove deprecated code and improve documentation

English:
- Remove deprecated get_adaptive_buffer_size() function
- Remove deprecated test cases (test_adaptive_buffer_size, Phase 3 legacy comparisons)
- Improve all code comments to be clear and professional English
- Add comprehensive bilingual implementation summary (COMPLETE_SUMMARY.md)
- Update buffer sizing comments in put_object, put_object_extract, upload_part
- Enhance init_buffer_profile_system documentation
- Clean up phase-specific references in comments

Chinese (中文):
- 移除已弃用的 get_adaptive_buffer_size() 函数
- 移除已弃用的测试用例(test_adaptive_buffer_size、Phase 3 旧版比较)
- 改进所有代码注释,使用清晰专业的英文
- 添加全面的双语实现摘要(COMPLETE_SUMMARY.md)
- 更新 put_object、put_object_extract、upload_part 中的缓冲区调整注释
- 增强 init_buffer_profile_system 文档
- 清理注释中的特定阶段引用

This commit completes the adaptive buffer sizing implementation by:
1. Removing all deprecated legacy code and tests
2. Improving code documentation quality
3. Providing comprehensive bilingual summary

本提交完成自适应缓冲区大小实现:
1. 移除所有已弃用的旧代码和测试
2. 提高代码文档质量
3. 提供全面的双语摘要

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fmt

* fix

* fix

* fix

* fix

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-18 13:32:02 +08:00
weisd
85bc0ce2d5 fix: filemeta version handling and delete operations (#879)
* fix filemeta version

* fix clippy

* fix delete version

* fix clippy/test
2025-11-18 09:24:22 +08:00
Copilot
601f3456bc Fix large file upload freeze with adaptive buffer sizing (#869)
* Initial plan

* Fix large file upload freeze by increasing StreamReader buffer size

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Add comprehensive documentation for large file upload freeze fix

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* upgrade s3s version

* Fix compilation error: use BufReader instead of non-existent StreamReader::with_capacity

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Update documentation with correct BufReader implementation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* add tokio feature `io-util`

* Implement adaptive buffer sizing based on file size

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Constants are managed uniformly and fmt code

* fix

* Fix: Trigger self-heal on read when shards missing from rejoined nodes (#871)

* Initial plan

* Fix: Trigger self-heal when missing shards detected during read

- Added proactive heal detection in get_object_with_fileinfo
- When reading an object, now checks if any shards are missing even if read succeeds
- Sends low-priority heal request to reconstruct missing shards on rejoined nodes
- This fixes the issue where data written during node outage is not healed when node rejoins

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* Unify CRC implementations to crc-fast (#873)

* Initial plan

* Replace CRC libraries with unified crc-fast implementation

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* fix

* fix: replace low to Normal

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
2025-11-17 23:15:20 +08:00
weisd
1279baa72b fix replication (#875) 2025-11-17 17:37:41 +08:00
weisd
acdefb6703 fix read lock (#866) 2025-11-16 11:44:13 +08:00
Copilot
b7964081ce Fix KMS configuration synchronization across cluster nodes (#855)
* Initial plan

* Add KMS configuration persistence to cluster storage

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* Apply code formatting to KMS configuration changes

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* add comment

* fix fmt

* fix

* Fix overlapping dependabot cargo configurations

Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>

* improve code for comment and replace  `Once_Cell` to `std::sync::OnceLock`

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: houseme <4829346+houseme@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: loverustfs <155562731+loverustfs@users.noreply.github.com>
2025-11-16 00:05:03 +08:00
Nugine
f73fa59bf6 ci: fix dependabot (#860) 2025-11-15 22:35:59 +08:00
Nugine
0b1b7832fe ci: update s3s weekly (#858) 2025-11-15 22:05:03 +08:00
houseme
c242957c6f build(deps): bump the dependencies group with 8 updates (#857) 2025-11-15 19:51:07 +08:00
houseme
55e3a1f7e0 fix(audit): prevent state transition when no targets exist (#854)
Avoid setting AuditSystemState::Starting when target list is empty.
Now checks target availability before state transition, keeping the
system in Stopped state if no enabled targets are found.

- Check targets.is_empty() before setting Starting state
- Return early with Ok(()) when no targets exist
- Maintain consistent state machine behavior
- Prevent transient "Starting" state with no actual targets

Resolves issue where audit system would incorrectly enter Starting
state even when configuration contained no enabled targets.
2025-11-14 20:23:21 +08:00
majinghe
3cf565e847 delete sink file path env and update readme file with container user change (#852)
* update container user change in readme file

* delete sink file path env vars

---------

Co-authored-by: houseme <housemecn@gmail.com>
2025-11-14 13:00:15 +08:00
houseme
9d553620cf remove linux dep and upgrade Protocol Buffers and FlatBuffers (#853) 2025-11-14 12:50:55 +08:00
houseme
51584986e1 feat(obs): unify metrics initialization and fix exporter move error (#851)
* feat(obs): unify metrics initialization and fix exporter move error

- Fix Rust E0382 (use after move) by removing duplicate MetricExporter consumption.
- Consolidate MeterProvider construction into single Recorder builder path.
- Remove redundant Recorder::builder(...).install_global() call.
- Ensure PeriodicReader setup is performed only once (HTTP + optional stdout).
- Set global meter provider and metrics recorder exactly once.
- Preserve existing behavior for stdout/file vs HTTP modes.
- Minor cleanup: consistent resource reuse and interval handling.

* update telemetry.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix

* fix

* fix

* fix: modify logger level from error to event

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-14 00:50:07 +08:00
majinghe
93090adf7c enhance security context part for k8s deployment (#850) 2025-11-13 18:18:19 +08:00
houseme
d4817a4bea fix: modify logger from warn to info (#842)
* fix: modify logger from warn to info

* upgrade version
2025-11-12 11:13:38 +08:00
houseme
7e1a9e2ede 🔒 Upgrade Cryptography Libraries to Latest RC Versions (#837)
* fix

* chore: upgrade cryptography libraries to RC versions

- Upgrade aes-gcm to 0.11.0-rc.2 with rand_core support
- Upgrade chacha20poly1305 to 0.11.0-rc.2
- Upgrade argon2 to 0.6.0-rc.2 with std features
- Upgrade hmac to 0.13.0-rc.3
- Upgrade pbkdf2 to 0.13.0-rc.2
- Upgrade rsa to 0.10.0-rc.10
- Upgrade sha1 and sha2 to 0.11.0-rc.3
- Upgrade md-5 to 0.11.0-rc.3

These upgrades provide enhanced security features and performance
improvements while maintaining backward compatibility with existing
encryption workflows.

* add

* improve code

* fix
2025-11-11 21:10:03 +08:00
安正超
8a020ec4d9 wip (#830) 2025-11-11 09:34:58 +08:00
weisd
77a3489ed2 fix list object err (#831)
fix list object err (#831)

#827
#815
#635
#752
2025-11-10 23:42:15 +08:00
weisd
5941062909 fix (#828) 2025-11-10 19:22:58 +08:00
houseme
98be7df0f5 feat(storage): refactor audit and notification with OperationHelper (#825)
* improve code for audit

* improve code ecfs.rs

* improve code

* improve code for ecfs.rs

* feat(storage): refactor audit and notification with OperationHelper

This commit introduces a significant refactoring of the audit logging and event notification mechanisms within `ecfs.rs`.

The core of this change is the new `OperationHelper` struct, which encapsulates and simplifies the logic for both concerns. It replaces the previous `AuditHelper` and manual event dispatching.

Key improvements include:

- **Unified Handling**: `OperationHelper` manages both audit and notification builders, providing a single, consistent entry point for S3 operations.
- **RAII for Automation**: By leveraging the `Drop` trait, the helper automatically dispatches logs and notifications when it goes out of scope. This simplifies S3 method implementations and ensures cleanup even on early returns.
- **Fluent API**: A builder-like pattern with methods such as `.object()`, `.version_id()`, and `.suppress_event()` makes the code more readable and expressive.
- **Context-Aware Logic**: The helper's `.complete()` method intelligently populates log details based on the operation's `S3Result` and only triggers notifications on success.
- **Modular Design**: All helper logic is now isolated in `rustfs/src/storage/helper.rs`, improving separation of concerns and making `ecfs.rs` cleaner.

This refactoring significantly enhances code clarity, reduces boilerplate, and improves the robustness of logging and notification handling across the storage layer.

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* improve code for audit and notify

* fix

* fix

* fix
2025-11-10 17:30:50 +08:00
houseme
b26aad4129 improve code for logger (#822)
* improve code for logger

* fix
2025-11-08 22:36:24 +08:00
Alex Bykov
5989589c3e Update configuration.md (#812)
Escaping Pipe Character in the table "CLI Flags..."

Co-authored-by: loverustfs <155562731+loverustfs@users.noreply.github.com>
2025-11-08 10:56:14 +08:00
majinghe
4716454faa add non root user support for container deployment (#817) 2025-11-08 10:00:14 +08:00
houseme
29056a767a Refactor Telemetry Initialization and Environment Utilities (#811)
* improve code for metrics

* improve code for metrics

* fix

* fix

* Refactor telemetry initialization and environment functions ordering

- Reorder functions in envs.rs by type size (8-bit to 64-bit, signed before unsigned) and add missing variants like get_env_opt_u16.
- Optimize init_telemetry to support three modes: stdout logging (default error level with span tracing), file rolling logs (size-based with retention), and HTTP-based observability with sub-endpoints (trace, metric, log) falling back to unified endpoint.
- Fix stdout logging issue by retaining WorkerGuard in OtelGuard to prevent premature release of async writer threads.
- Enhance observability mode with HTTP protocol, compression, and proper resource management.
- Update OtelGuard to include tracing_guard for stdout and flexi_logger_handles for file logging.
- Improve error handling and configuration extraction in OtelConfig.

* fix

* up

* fix

* fix

* improve code for obs

* fix

* fix
2025-11-07 20:01:54 +08:00
weisd
e823922654 feat:add api error message (#801)
* feat:add api error message
* fix: check input
* fix: test
2025-11-07 09:53:49 +08:00
shiro.lee
8203f9ff6f fix: when the Object Lock configuration does not exist, an error message should be returned (#771) (#798)
fix: when the Object Lock configuration does not exist, an error message should be returned (#771) (#798)
2025-11-05 23:48:54 +08:00
houseme
1b22a1e078 Refactor modify stdout (#797)
* fix

* fix
2025-11-05 20:04:28 +08:00
weisd
461d5dff86 fix list max keys (#795) 2025-11-05 15:30:32 +08:00
houseme
38f26b7c94 improve import,crate version,and copyright (#790) 2025-11-05 09:10:06 +08:00
安正超
eb7eb9c5a1 fix: resolve logic errors in ahm heal module (#788)
* fix: resolve logic errors in ahm heal module

- Fix response publishing logic in HealChannelProcessor to properly handle errors
- Fix negative index handling in DiskStatusChange event to fail fast instead of silently converting to 0
- Enhance timeout control in heal_erasure_set Step 3 loop to immediately respond to cancellation/timeout
- Add proper error propagation for task cancellation and timeout in bucket healing loop

* fix: stabilize performance impact measurement test

- Increase measurement count from 3 to 5 runs for better stability
- Increase workload from 5000 to 10000 operations for more accurate timing
- Use median of 5 measurements instead of single measurement
- Ensure with_scanner duration is at least baseline to avoid negative overhead
- Increase wait time for scanner state stabilization

* wip

* Update crates/ahm/src/heal/channel.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor: remove redundant ok_or_else + expect in event.rs

Replace redundant ok_or_else() + expect() pattern with
unwrap_or_else() + panic!() to avoid creating unnecessary Error
type when the value will panic anyway. This also defers error
message formatting until the error actually occurs.

* Update crates/ahm/src/heal/task.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(ahm): fix logic errors and add unit tests

- Fix panic in HealEvent::to_heal_request for invalid indices
- Replace unwrap() calls with proper error handling in resume.rs
- Fix race conditions and timeout calculation in task.rs
- Fix semaphore acquisition error handling in erasure_healer.rs
- Improve error message for large objects in storage.rs
- Add comprehensive unit tests for progress, event, and channel modules
- Fix clippy warning: move test module to end of file in heal_channel.rs

* style: apply cargo fmt formatting

* refactor(ahm): address copilot review suggestions

- Add comment to check_control_flags explaining why return value is discarded
- Fix hardcoded median index in performance test using constant and dynamic calculation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-05 08:15:23 +08:00
houseme
d934e3905b Refactor telemetry initialization for non-production environments (#789)
* add dep `scopeguard`

* improve for tracing

* fix

* fix

* improve code for import

* add logger trace id

* fix

* fix

* fix

* fix

* fix
2025-11-05 00:55:08 +08:00
219 changed files with 11714 additions and 3686 deletions

View File

@@ -16,7 +16,7 @@ services:
tempo-init:
image: busybox:latest
command: ["sh", "-c", "chown -R 10001:10001 /var/tempo"]
command: [ "sh", "-c", "chown -R 10001:10001 /var/tempo" ]
volumes:
- ./tempo-data:/var/tempo
user: root
@@ -39,7 +39,7 @@ services:
- otel-network
otel-collector:
image: otel/opentelemetry-collector-contrib:0.129.1
image: otel/opentelemetry-collector-contrib:latest
environment:
- TZ=Asia/Shanghai
volumes:
@@ -55,7 +55,7 @@ services:
networks:
- otel-network
jaeger:
image: jaegertracing/jaeger:2.8.0
image: jaegertracing/jaeger:latest
environment:
- TZ=Asia/Shanghai
ports:
@@ -65,17 +65,21 @@ services:
networks:
- otel-network
prometheus:
image: prom/prometheus:v3.4.2
image: prom/prometheus:latest
environment:
- TZ=Asia/Shanghai
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-otlp-receiver' # Enable OTLP
- '--enable-feature=promql-experimental-functions' # Enable info()
networks:
- otel-network
loki:
image: grafana/loki:3.5.1
image: grafana/loki:latest
environment:
- TZ=Asia/Shanghai
volumes:
@@ -86,7 +90,7 @@ services:
networks:
- otel-network
grafana:
image: grafana/grafana:12.0.2
image: grafana/grafana:latest
ports:
- "3000:3000" # Web UI
volumes:

View File

@@ -29,4 +29,80 @@ datasources:
serviceMap:
datasourceUid: prometheus
streamingEnabled:
search: true
search: true
tracesToLogsV2:
# Field with an internal link pointing to a logs data source in Grafana.
# datasourceUid value must match the uid value of the logs data source.
datasourceUid: 'loki'
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
filterByTraceID: false
filterBySpanID: false
customQuery: true
query: 'method="$${__span.tags.method}"'
tracesToMetrics:
datasourceUid: 'prom'
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ]
queries:
- name: 'Sample query'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
tracesToProfiles:
datasourceUid: 'grafana-pyroscope-datasource'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
profileTypeId: 'process_cpu:cpu:nanoseconds:cpu:nanoseconds'
customQuery: true
query: 'method="$${__span.tags.method}"'
serviceMap:
datasourceUid: 'prometheus'
nodeGraph:
enabled: true
search:
hide: false
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
spanBar:
type: 'Tag'
tag: 'http.path'
streamingEnabled:
search: true
- name: Jaeger
type: jaeger
uid: Jaeger
url: http://jaeger:16686
basicAuth: false
access: proxy
readOnly: false
isDefault: false
jsonData:
tracesToLogsV2:
# Field with an internal link pointing to a logs data source in Grafana.
# datasourceUid value must match the uid value of the logs data source.
datasourceUid: 'loki'
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
tags: [ 'job', 'instance', 'pod', 'namespace' ]
filterByTraceID: false
filterBySpanID: false
customQuery: true
query: 'method="$${__span.tags.method}"'
tracesToMetrics:
datasourceUid: 'prom'
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
tags: [ { key: 'service.name', value: 'service' }, { key: 'job' } ]
queries:
- name: 'Sample query'
query: 'sum(rate(traces_spanmetrics_latency_bucket{$$__tags}[5m]))'
nodeGraph:
enabled: true
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '1h'
spanEndTimeShift: '-1h'
spanBar:
type: 'None'

View File

@@ -63,6 +63,7 @@ ruler:
frontend:
encoding: protobuf
# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#

View File

@@ -43,7 +43,6 @@ exporters:
send_timestamps: true # 发送时间戳
# enable_open_metrics: true
otlphttp/loki: # Loki 导出器,用于日志数据
# endpoint: "http://loki:3100/otlp/v1/logs"
endpoint: "http://loki:3100/otlp/v1/logs"
tls:
insecure: true

View File

@@ -13,16 +13,43 @@
# limitations under the License.
global:
scrape_interval: 5s # 刮取间隔
scrape_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: [ 'otel-collector:8888' ] # Collector 刮取指标
- targets: [ 'otel-collector:8888' ] # Scrape metrics from Collector
- job_name: 'otel-metrics'
static_configs:
- targets: [ 'otel-collector:8889' ] # 应用指标
- targets: [ 'otel-collector:8889' ] # Application indicators
- job_name: 'tempo'
static_configs:
- targets: [ 'tempo:3200' ]
- targets: [ 'tempo:3200' ] # Scrape metrics from Tempo
otlp:
# Recommended attributes to be promoted to labels.
promote_resource_attributes:
- service.instance.id
- service.name
- service.namespace
- cloud.availability_zone
- cloud.region
- container.name
- deployment.environment.name
- k8s.cluster.name
- k8s.container.name
- k8s.cronjob.name
- k8s.daemonset.name
- k8s.deployment.name
- k8s.job.name
- k8s.namespace.name
- k8s.pod.name
- k8s.replicaset.name
- k8s.statefulset.name
# Ingest OTLP data keeping all characters in metric/label names.
translation_strategy: NoUTF8EscapingWithSuffixes
storage:
# OTLP is a push-based protocol, Out of order samples is a common scenario.
tsdb:
out_of_order_time_window: 30m

View File

@@ -52,24 +52,19 @@ runs:
sudo apt-get install -y \
musl-tools \
build-essential \
lld \
libdbus-1-dev \
libwayland-dev \
libwebkit2gtk-4.1-dev \
libxdo-dev \
pkg-config \
libssl-dev
- name: Install protoc
uses: arduino/setup-protoc@v3
with:
version: "31.1"
version: "33.1"
repo-token: ${{ inputs.github-token }}
- name: Install flatc
uses: Nugine/setup-flatc@v1
with:
version: "25.2.10"
version: "25.9.23"
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable

View File

@@ -22,8 +22,18 @@ updates:
- package-ecosystem: "cargo" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "monthly"
interval: "weekly"
day: "monday"
timezone: "Asia/Shanghai"
time: "08:00"
groups:
s3s:
update-types:
- "minor"
- "patch"
patterns:
- "s3s"
- "s3s-*"
dependencies:
patterns:
- "*"

3
.gitignore vendored
View File

@@ -22,4 +22,5 @@ profile.json
.secrets
*.go
*.pb
*.svg
*.svg
deploy/logs/*.log.*

View File

@@ -8,12 +8,14 @@ The workspace root hosts shared dependencies in `Cargo.toml`. The service binary
## Build, Test, and Development Commands
Run `cargo check --all-targets` for fast validation. Build release binaries via `cargo build --release` or the pipeline-aligned `make build`. Use `./build-rustfs.sh --dev` for iterative development and `./build-rustfs.sh --platform <target>` for cross-compiles. Prefer `make pre-commit` before pushing to cover formatting, clippy, checks, and tests.
Always ensure `cargo fmt --all --check`, `cargo test --workspace --exclude e2e_test`, and `cargo clippy --all-targets --all-features -- -D warnings` complete successfully after each code change to keep the tree healthy and warning-free.
## Coding Style & Naming Conventions
Formatting follows the repo `rustfmt.toml` (130-column width). Use `snake_case` for items, `PascalCase` for types, and `SCREAMING_SNAKE_CASE` for constants. Avoid `unwrap()` or `expect()` outside tests; bubble errors with `Result` and crate-specific `thiserror` types. Keep async code non-blocking and offload CPU-heavy work with `tokio::task::spawn_blocking` when necessary.
## Testing Guidelines
Co-locate unit tests with their modules and give behavior-led names such as `handles_expired_token`. Integration suites belong in each crates `tests/` directory, while exhaustive end-to-end scenarios live in `crates/e2e_test/`. Run `cargo test --workspace --exclude e2e_test` during iteration, `cargo nextest run --all --exclude e2e_test` when available, and finish with `cargo test --all` before requesting review. Use `NO_PROXY=127.0.0.1,localhost HTTP_PROXY= HTTPS_PROXY=` for KMS e2e tests.
When fixing bugs or adding features, include regression tests that capture the new behavior so future changes cannot silently break it.
## Commit & Pull Request Guidelines
Work on feature branches (e.g., `feat/...`) after syncing `main`. Follow Conventional Commits under 72 characters (e.g., `feat: add kms key rotation`). Each commit must compile, format cleanly, and pass `make pre-commit`. Open PRs with a concise summary, note verification commands, link relevant issues, and wait for reviewer approval.

1010
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -97,15 +97,15 @@ async-channel = "2.5.0"
async-compression = { version = "0.4.19" }
async-recursion = "1.1.1"
async-trait = "0.1.89"
axum = "0.8.6"
axum-extra = "0.12.0"
axum-server = { version = "0.7.2", features = ["tls-rustls-no-provider"], default-features = false }
axum = "0.8.7"
axum-extra = "0.12.2"
axum-server = { version = "0.7.3", features = ["tls-rustls-no-provider"], default-features = false }
futures = "0.3.31"
futures-core = "0.3.31"
futures-util = "0.3.31"
hyper = { version = "1.7.0", features = ["http2", "http1", "server"] }
hyper = { version = "1.8.1", features = ["http2", "http1", "server"] }
hyper-rustls = { version = "0.27.7", default-features = false, features = ["native-tokio", "http1", "tls12", "logging", "http2", "ring", "webpki-roots"] }
hyper-util = { version = "0.1.17", features = ["tokio", "server-auto", "server-graceful"] }
hyper-util = { version = "0.1.18", features = ["tokio", "server-auto", "server-graceful"] }
http = "1.3.1"
http-body = "1.0.1"
reqwest = { version = "0.12.24", default-features = false, features = ["rustls-tls-webpki-roots", "charset", "http2", "system-proxy", "stream", "json", "blocking"] }
@@ -122,39 +122,36 @@ tower = { version = "0.5.2", features = ["timeout"] }
tower-http = { version = "0.6.6", features = ["cors"] }
# Serialization and Data Formats
bytes = { version = "1.10.1", features = ["serde"] }
bytesize = "2.1.0"
bytes = { version = "1.11.0", features = ["serde"] }
bytesize = "2.2.0"
byteorder = "1.5.0"
flatbuffers = "25.9.23"
form_urlencoded = "1.2.2"
prost = "0.14.1"
quick-xml = "0.38.3"
rmcp = { version = "0.8.3" }
quick-xml = "0.38.4"
rmcp = { version = "0.8.5" }
rmp = { version = "0.8.14" }
rmp-serde = { version = "1.3.0" }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["raw_value"] }
serde_urlencoded = "0.7.1"
schemars = "1.0.4"
schemars = "1.1.0"
# Cryptography and Security
aes-gcm = { version = "0.10.3", features = ["std"] }
argon2 = { version = "0.5.3", features = ["std"] }
aes-gcm = { version = "0.11.0-rc.2", features = ["rand_core"] }
argon2 = { version = "0.6.0-rc.2", features = ["std"] }
blake3 = { version = "1.8.2" }
chacha20poly1305 = { version = "0.10.1" }
crc-fast = "1.3.0"
crc32c = "0.6.8"
crc32fast = "1.5.0"
crc64fast-nvme = "1.2.0"
hmac = "0.12.1"
jsonwebtoken = { version = "10.1.0", features = ["rust_crypto"] }
pbkdf2 = "0.12.2"
rsa = { version = "0.9.8" }
rustls = { version = "0.23.34", features = ["ring", "logging", "std", "tls12"], default-features = false }
chacha20poly1305 = { version = "0.11.0-rc.2" }
crc-fast = "1.6.0"
hmac = { version = "0.13.0-rc.3" }
jsonwebtoken = { version = "10.2.0", features = ["rust_crypto"] }
pbkdf2 = "0.13.0-rc.2"
rsa = { version = "0.10.0-rc.10" }
rustls = { version = "0.23.35", features = ["ring", "logging", "std", "tls12"], default-features = false }
rustls-pemfile = "2.2.0"
rustls-pki-types = "1.13.0"
sha1 = "0.10.6"
sha2 = "0.10.9"
sha1 = "0.11.0-rc.3"
sha2 = "0.11.0-rc.3"
zeroize = { version = "1.8.2", features = ["derive"] }
# Time and Date
@@ -169,8 +166,8 @@ astral-tokio-tar = "0.5.6"
atoi = "2.0.0"
atomic_enum = "0.3.0"
aws-config = { version = "1.8.10" }
aws-credential-types = { version = "1.2.8" }
aws-sdk-s3 = { version = "1.110.0", default-features = false, features = ["sigv4a", "rustls", "rt-tokio"] }
aws-credential-types = { version = "1.2.9" }
aws-sdk-s3 = { version = "1.112.0", default-features = false, features = ["sigv4a", "rustls", "rt-tokio"] }
aws-smithy-types = { version = "1.3.4" }
base64 = "0.22.1"
base64-simd = "0.8.0"
@@ -178,31 +175,32 @@ brotli = "8.0.2"
cfg-if = "1.0.4"
clap = { version = "4.5.51", features = ["derive", "env"] }
const-str = { version = "0.7.0", features = ["std", "proc"] }
convert_case = "0.8.0"
convert_case = "0.9.0"
criterion = { version = "0.7", features = ["html_reports"] }
crossbeam-queue = "0.3.12"
datafusion = "50.3.0"
derive_builder = "0.20.2"
enumset = "1.1.10"
faster-hex = "0.10.0"
flate2 = "1.1.5"
flexi_logger = { version = "0.31.7", features = ["trc", "dont_minimize_extra_stacks", "compress", "kv"] }
flexi_logger = { version = "0.31.7", features = ["trc", "dont_minimize_extra_stacks", "compress", "kv", "json"] }
glob = "0.3.3"
google-cloud-storage = "1.2.0"
google-cloud-auth = "1.1.0"
google-cloud-storage = "1.4.0"
google-cloud-auth = "1.2.0"
hashbrown = { version = "0.16.0", features = ["serde", "rayon"] }
heed = { version = "0.22.0" }
hex-simd = "0.8.0"
highway = { version = "1.3.0" }
ipnetwork = { version = "0.21.1", features = ["serde"] }
lazy_static = "1.5.0"
libc = "0.2.177"
libsystemd = { version = "0.7.2" }
libsystemd = "0.7.2"
local-ip-address = "0.6.5"
lz4 = "1.28.1"
matchit = "0.9.0"
md-5 = "0.10.6"
md-5 = "0.11.0-rc.3"
md5 = "0.8.0"
metrics = "0.24.2"
metrics-exporter-opentelemetry = "0.1.2"
mime_guess = "2.0.5"
moka = { version = "0.12.11", features = ["future"] }
netif = "0.1.6"
@@ -217,14 +215,14 @@ path-absolutize = "3.1.1"
path-clean = "1.0.1"
pin-project-lite = "0.2.16"
pretty_assertions = "1.4.1"
rand = "0.9.2"
rand = { version = "0.10.0-rc.5", features = ["serde"] }
rayon = "1.11.0"
reed-solomon-simd = { version = "3.1.0" }
regex = { version = "1.12.2" }
rumqttc = { version = "0.25.0" }
rust-embed = { version = "8.9.0" }
rustc-hash = { version = "2.1.1" }
s3s = { version = "0.12.0-rc.3", features = ["minio"] }
s3s = { git = "https://github.com/s3s-project/s3s.git", rev = "ba9f902", version = "0.12.0-rc.3", features = ["minio"] }
serial_test = "3.2.0"
shadow-rs = { version = "1.4.0", default-features = false }
siphasher = "1.0.1"
@@ -241,6 +239,7 @@ tempfile = "3.23.0"
test-case = "3.3.1"
thiserror = "2.0.17"
tracing = { version = "0.1.41" }
tracing-appender = "0.2.3"
tracing-error = "0.2.1"
tracing-opentelemetry = "0.32.0"
tracing-subscriber = { version = "0.3.20", features = ["env-filter", "time"] }
@@ -250,7 +249,7 @@ urlencoding = "2.1.3"
uuid = { version = "1.18.1", features = ["v4", "fast-rng", "macro-diagnostics"] }
vaultrs = { version = "0.7.4" }
walkdir = "2.5.0"
wildmatch = { version = "2.5.0", features = ["serde"] }
wildmatch = { version = "2.6.1", features = ["serde"] }
winapi = { version = "0.3.9" }
xxhash-rust = { version = "0.8.15", features = ["xxh64", "xxh3"] }
zip = "6.0.0"
@@ -259,7 +258,7 @@ zstd = "0.13.3"
# Observability and Metrics
opentelemetry = { version = "0.31.0" }
opentelemetry-appender-tracing = { version = "0.31.1", features = ["experimental_use_tracing_span_context", "experimental_metadata_attributes", "spec_unstable_logs_enabled"] }
opentelemetry-otlp = { version = "0.31.0", default-features = false, features = ["grpc-tonic", "gzip-tonic", "trace", "metrics", "logs", "internal-logs"] }
opentelemetry-otlp = { version = "0.31.0", features = ["http-proto", "zstd-http"] }
opentelemetry_sdk = { version = "0.31.0" }
opentelemetry-semantic-conventions = { version = "0.31.0", features = ["semconv_experimental"] }
opentelemetry-stdout = { version = "0.31.0" }

View File

@@ -64,8 +64,12 @@ COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /build/rustfs /usr/bin/rustfs
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /usr/bin/rustfs /entrypoint.sh && \
RUN chmod +x /usr/bin/rustfs /entrypoint.sh
RUN addgroup -g 10001 -S rustfs && \
adduser -u 10001 -G rustfs -S rustfs -D && \
mkdir -p /data /logs && \
chown -R rustfs:rustfs /data /logs && \
chmod 0750 /data /logs
ENV RUSTFS_ADDRESS=":9000" \
@@ -78,12 +82,14 @@ ENV RUSTFS_ADDRESS=":9000" \
RUSTFS_CONSOLE_CORS_ALLOWED_ORIGINS="*" \
RUSTFS_VOLUMES="/data" \
RUST_LOG="warn" \
RUSTFS_OBS_LOG_DIRECTORY="/logs" \
RUSTFS_SINKS_FILE_PATH="/logs"
RUSTFS_OBS_LOG_DIRECTORY="/logs"
EXPOSE 9000 9001
VOLUME ["/data", "/logs"]
USER rustfs
ENTRYPOINT ["/entrypoint.sh"]
CMD ["rustfs"]

View File

@@ -167,7 +167,6 @@ ENV RUSTFS_ADDRESS=":9000" \
RUSTFS_VOLUMES="/data" \
RUST_LOG="warn" \
RUSTFS_OBS_LOG_DIRECTORY="/logs" \
RUSTFS_SINKS_FILE_PATH="/logs" \
RUSTFS_USERNAME="rustfs" \
RUSTFS_GROUPNAME="rustfs" \
RUSTFS_UID="1000" \

View File

@@ -1,6 +1,6 @@
[![RustFS](https://rustfs.com/images/rustfs-github.png)](https://rustfs.com)
<p align="center">RustFS is a high-performance distributed object storage software built using Rust</p>
<p align="center">RustFS is a high-performance, distributed object storage system built in Rust.</p>
<p align="center">
<a href="https://github.com/rustfs/rustfs/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/rustfs/rustfs/actions/workflows/ci.yml/badge.svg" /></a>
@@ -29,13 +29,13 @@ English | <a href="https://github.com/rustfs/rustfs/blob/main/README_ZH.md">简
<a href="https://readme-i18n.com/rustfs/rustfs?lang=ru">Русский</a>
</p>
RustFS is a high-performance distributed object storage software built using Rust, one of the most popular languages
worldwide. Along with MinIO, it shares a range of advantages such as simplicity, S3 compatibility, open-source nature,
RustFS is a high-performance, distributed object storage system built in Rust., one of the most popular languages
worldwide. RustFS combines the simplicity of MinIO with the memory safety and performance of Rust., S3 compatibility, open-source nature,
support for data lakes, AI, and big data. Furthermore, it has a better and more user-friendly open-source license in
comparison to other storage systems, being constructed under the Apache license. As Rust serves as its foundation,
RustFS provides faster speed and safer distributed features for high-performance object storage.
> ⚠️ **RustFS is under rapid development. Do NOT use in production environments!**
> ⚠️ **Current Status: Beta / Technical Preview. Not yet recommended for critical production workloads.**
## Features
@@ -65,8 +65,8 @@ Stress test server parameters
|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| Powerful Console | Simple and useless Console |
| Developed based on Rust language, memory is safer | Developed in Go or C, with potential issues like memory GC/leaks |
| Does not report logs to third-party countries | Reporting logs to other third countries may violate national security laws |
| Licensed under Apache, more business-friendly | AGPL V3 License and other License, polluted open source and License traps, infringement of intellectual property rights |
| Guaranteed Data Sovereignty: No telemetry or unauthorized data egress | Reporting logs to other third countries may violate national security laws |
| Permissive Apache 2.0 License | AGPL V3 License and other License, polluted open source and License traps, infringement of intellectual property rights |
| Comprehensive S3 support, works with domestic and international cloud providers | Full support for S3, but no local cloud vendor support |
| Rust-based development, strong support for secure and innovative devices | Poor support for edge gateways and secure innovative devices |
| Stable commercial prices, free community support | High pricing, with costs up to $250,000 for 1PiB |
@@ -84,15 +84,20 @@ To get started with RustFS, follow these steps:
2. **Docker Quick Start (Option 2)**
RustFS container run as non-root user `rustfs` with id `1000`, if you run docker with `-v` to mount host directory into docker container, please make sure the owner of host directory has been changed to `1000`, otherwise you will encounter permission denied error.
```bash
# create data and logs directories
mkdir -p data logs
# using latest alpha version
docker run -d -p 9000:9000 -v $(pwd)/data:/data -v $(pwd)/logs:/logs rustfs/rustfs:alpha
# change the owner of those two ditectories
chown -R 10001:10001 data logs
# Specific version
docker run -d -p 9000:9000 -v $(pwd)/data:/data -v $(pwd)/logs:/logs rustfs/rustfs:1.0.0.alpha.45
# using latest version
docker run -d -p 9000:9000 -p 9001:9001 -v $(pwd)/data:/data -v $(pwd)/logs:/logs rustfs/rustfs:latest
# using specific version
docker run -d -p 9000:9000 -p 9001:9001 -v $(pwd)/data:/data -v $(pwd)/logs:/logs rustfs/rustfs:1.0.0.alpha.68
```
For docker installation, you can also run the container with docker compose. With the `docker-compose.yml` file under
@@ -139,6 +144,8 @@ observability. If you want to start redis as well as nginx container, you can sp
make help-docker # Show all Docker-related commands
```
> **Heads-up (macOS cross-compilation)**: macOS keeps the default `ulimit -n` at 256, so `cargo zigbuild` or `./build-rustfs.sh --platform ...` may fail with `ProcessFdQuotaExceeded` when targeting Linux. The build script now tries to raise the limit automatically, but if you still see the warning, run `ulimit -n 4096` (or higher) in your shell before building.
4. **Build with helm chart(Option 4) - Cloud Native environment**
Following the instructions on [helm chart README](./helm/README.md) to install RustFS on kubernetes cluster.
@@ -207,4 +214,3 @@ top charts.
[Apache 2.0](https://opensource.org/licenses/Apache-2.0)
**RustFS** is a trademark of RustFS, Inc. All other trademarks are the property of their respective owners.

View File

@@ -113,12 +113,14 @@ RustFS 是一个使用 Rust全球最受欢迎的编程语言之一构建
你也可以使用 Makefile 提供的目标命令以提升便捷性:
```bash
make docker-buildx # 本地构建
make docker-buildx-push # 构建并推送
make docker-buildx-version VERSION=v1.0.0 # 构建指定版本
make help-docker # 显示全部 Docker 相关命令
```
```bash
make docker-buildx # 本地构建
make docker-buildx-push # 构建并推送
make docker-buildx-version VERSION=v1.0.0 # 构建指定版本
make help-docker # 显示全部 Docker 相关命令
```
> **提示macOS 交叉编译)**macOS 默认的 `ulimit -n` 只有 256使用 `cargo zigbuild` 或 `./build-rustfs.sh --platform ...` 编译 Linux 目标时容易触发 `ProcessFdQuotaExceeded` 链接错误。脚本会尝试自动提升该限制,如仍提示失败,请在构建前手动执行 `ulimit -n 4096`(或更大的值)。
4. **使用 Helm Chart 部署(方案四)- 云原生环境**

View File

@@ -163,6 +163,35 @@ print_message() {
echo -e "${color}${message}${NC}"
}
# Prevent zig/ld from hitting macOS file descriptor defaults during linking
ensure_file_descriptor_limit() {
local required_limit=4096
local current_limit
current_limit=$(ulimit -Sn 2>/dev/null || echo "")
if [ -z "$current_limit" ] || [ "$current_limit" = "unlimited" ]; then
return
fi
if (( current_limit >= required_limit )); then
return
fi
local hard_limit target_limit
hard_limit=$(ulimit -Hn 2>/dev/null || echo "")
target_limit=$required_limit
if [ -n "$hard_limit" ] && [ "$hard_limit" != "unlimited" ] && (( hard_limit < required_limit )); then
target_limit=$hard_limit
fi
if ulimit -Sn "$target_limit" 2>/dev/null; then
print_message $YELLOW "🔧 Increased open file limit from $current_limit to $target_limit to avoid ProcessFdQuotaExceeded"
else
print_message $YELLOW "⚠️ Unable to raise ulimit -n automatically (current: $current_limit, needed: $required_limit). Please run 'ulimit -n $required_limit' manually before building."
fi
}
# Get version from git
get_version() {
if git describe --abbrev=0 --tags >/dev/null 2>&1; then
@@ -570,10 +599,11 @@ main() {
fi
fi
ensure_file_descriptor_limit
# Start build process
build_rustfs
}
# Run main function
main

View File

@@ -40,4 +40,4 @@ serde_json = { workspace = true }
serial_test = { workspace = true }
tracing-subscriber = { workspace = true }
tempfile = { workspace = true }
heed = "0.22.0"
heed = { workspace = true }

View File

@@ -14,6 +14,10 @@
use thiserror::Error;
/// Custom error type for AHM operations
/// This enum defines various error variants that can occur during
/// the execution of AHM-related tasks, such as I/O errors, storage errors,
/// configuration errors, and specific errors related to healing operations.
#[derive(Debug, Error)]
pub enum Error {
#[error("I/O error: {0}")]
@@ -85,9 +89,13 @@ pub enum Error {
ProgressTrackingFailed { message: String },
}
/// A specialized Result type for AHM operations
///This type is a convenient alias for results returned by functions in the AHM crate,
/// using the custom Error type defined above.
pub type Result<T, E = Error> = std::result::Result<T, E>;
impl Error {
/// Create an Other error from any error type
pub fn other<E>(error: E) -> Self
where
E: Into<Box<dyn std::error::Error + Send + Sync>>,

View File

@@ -12,18 +12,19 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::Result;
use crate::heal::{
manager::HealManager,
task::{HealOptions, HealPriority, HealRequest, HealType},
utils,
};
use crate::{Error, Result};
use rustfs_common::heal_channel::{
HealChannelCommand, HealChannelPriority, HealChannelReceiver, HealChannelRequest, HealChannelResponse, HealScanMode,
publish_heal_response,
};
use std::sync::Arc;
use tokio::sync::mpsc;
use tracing::{error, info};
use tracing::{debug, error, info};
/// Heal channel processor
pub struct HealChannelProcessor {
@@ -60,7 +61,7 @@ impl HealChannelProcessor {
}
}
None => {
info!("Heal channel receiver closed, stopping processor");
debug!("Heal channel receiver closed, stopping processor");
break;
}
}
@@ -99,7 +100,6 @@ impl HealChannelProcessor {
Ok(task_id) => {
info!("Successfully submitted heal request: {} as task: {}", request.id, task_id);
// Send success response
let response = HealChannelResponse {
request_id: request.id,
success: true,
@@ -107,9 +107,7 @@ impl HealChannelProcessor {
error: None,
};
if let Err(e) = self.response_sender.send(response) {
error!("Failed to send heal response: {}", e);
}
self.publish_response(response);
}
Err(e) => {
error!("Failed to submit heal request: {} - {}", request.id, e);
@@ -122,9 +120,7 @@ impl HealChannelProcessor {
error: Some(e.to_string()),
};
if let Err(e) = self.response_sender.send(response) {
error!("Failed to send heal error response: {}", e);
}
self.publish_response(response);
}
}
@@ -144,9 +140,7 @@ impl HealChannelProcessor {
error: None,
};
if let Err(e) = self.response_sender.send(response) {
error!("Failed to send query response: {}", e);
}
self.publish_response(response);
Ok(())
}
@@ -164,9 +158,7 @@ impl HealChannelProcessor {
error: None,
};
if let Err(e) = self.response_sender.send(response) {
error!("Failed to send cancel response: {}", e);
}
self.publish_response(response);
Ok(())
}
@@ -174,9 +166,12 @@ impl HealChannelProcessor {
/// Convert channel request to heal request
fn convert_to_heal_request(&self, request: HealChannelRequest) -> Result<HealRequest> {
let heal_type = if let Some(disk_id) = &request.disk {
let set_disk_id = utils::normalize_set_disk_id(disk_id).ok_or_else(|| Error::InvalidHealType {
heal_type: format!("erasure-set({disk_id})"),
})?;
HealType::ErasureSet {
buckets: vec![],
set_disk_id: disk_id.clone(),
set_disk_id,
}
} else if let Some(prefix) = &request.object_prefix {
if !prefix.is_empty() {
@@ -226,8 +221,332 @@ impl HealChannelProcessor {
Ok(HealRequest::new(heal_type, options, priority))
}
fn publish_response(&self, response: HealChannelResponse) {
// Try to send to local channel first, but don't block broadcast on failure
if let Err(e) = self.response_sender.send(response.clone()) {
error!("Failed to enqueue heal response locally: {}", e);
}
// Always attempt to broadcast, even if local send failed
// Use the original response for broadcast; local send uses a clone
if let Err(e) = publish_heal_response(response) {
error!("Failed to broadcast heal response: {}", e);
}
}
/// Get response sender for external use
pub fn get_response_sender(&self) -> mpsc::UnboundedSender<HealChannelResponse> {
self.response_sender.clone()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::heal::storage::HealStorageAPI;
use rustfs_common::heal_channel::{HealChannelPriority, HealChannelRequest, HealScanMode};
use std::sync::Arc;
// Mock storage for testing
struct MockStorage;
#[async_trait::async_trait]
impl HealStorageAPI for MockStorage {
async fn get_object_meta(
&self,
_bucket: &str,
_object: &str,
) -> crate::Result<Option<rustfs_ecstore::store_api::ObjectInfo>> {
Ok(None)
}
async fn get_object_data(&self, _bucket: &str, _object: &str) -> crate::Result<Option<Vec<u8>>> {
Ok(None)
}
async fn put_object_data(&self, _bucket: &str, _object: &str, _data: &[u8]) -> crate::Result<()> {
Ok(())
}
async fn delete_object(&self, _bucket: &str, _object: &str) -> crate::Result<()> {
Ok(())
}
async fn verify_object_integrity(&self, _bucket: &str, _object: &str) -> crate::Result<bool> {
Ok(true)
}
async fn ec_decode_rebuild(&self, _bucket: &str, _object: &str) -> crate::Result<Vec<u8>> {
Ok(vec![])
}
async fn get_disk_status(
&self,
_endpoint: &rustfs_ecstore::disk::endpoint::Endpoint,
) -> crate::Result<crate::heal::storage::DiskStatus> {
Ok(crate::heal::storage::DiskStatus::Ok)
}
async fn format_disk(&self, _endpoint: &rustfs_ecstore::disk::endpoint::Endpoint) -> crate::Result<()> {
Ok(())
}
async fn get_bucket_info(&self, _bucket: &str) -> crate::Result<Option<rustfs_ecstore::store_api::BucketInfo>> {
Ok(None)
}
async fn heal_bucket_metadata(&self, _bucket: &str) -> crate::Result<()> {
Ok(())
}
async fn list_buckets(&self) -> crate::Result<Vec<rustfs_ecstore::store_api::BucketInfo>> {
Ok(vec![])
}
async fn object_exists(&self, _bucket: &str, _object: &str) -> crate::Result<bool> {
Ok(false)
}
async fn get_object_size(&self, _bucket: &str, _object: &str) -> crate::Result<Option<u64>> {
Ok(None)
}
async fn get_object_checksum(&self, _bucket: &str, _object: &str) -> crate::Result<Option<String>> {
Ok(None)
}
async fn heal_object(
&self,
_bucket: &str,
_object: &str,
_version_id: Option<&str>,
_opts: &rustfs_common::heal_channel::HealOpts,
) -> crate::Result<(rustfs_madmin::heal_commands::HealResultItem, Option<crate::Error>)> {
Ok((rustfs_madmin::heal_commands::HealResultItem::default(), None))
}
async fn heal_bucket(
&self,
_bucket: &str,
_opts: &rustfs_common::heal_channel::HealOpts,
) -> crate::Result<rustfs_madmin::heal_commands::HealResultItem> {
Ok(rustfs_madmin::heal_commands::HealResultItem::default())
}
async fn heal_format(
&self,
_dry_run: bool,
) -> crate::Result<(rustfs_madmin::heal_commands::HealResultItem, Option<crate::Error>)> {
Ok((rustfs_madmin::heal_commands::HealResultItem::default(), None))
}
async fn list_objects_for_heal(&self, _bucket: &str, _prefix: &str) -> crate::Result<Vec<String>> {
Ok(vec![])
}
async fn get_disk_for_resume(&self, _set_disk_id: &str) -> crate::Result<rustfs_ecstore::disk::DiskStore> {
Err(crate::Error::other("Not implemented in mock"))
}
}
fn create_test_heal_manager() -> Arc<HealManager> {
let storage: Arc<dyn HealStorageAPI> = Arc::new(MockStorage);
Arc::new(HealManager::new(storage, None))
}
#[test]
fn test_heal_channel_processor_new() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
// Verify processor is created successfully
let _sender = processor.get_response_sender();
// If we can get the sender, processor was created correctly
}
#[tokio::test]
async fn test_convert_to_heal_request_bucket() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: None,
disk: None,
priority: HealChannelPriority::Normal,
scan_mode: None,
remove_corrupted: None,
recreate_missing: None,
update_parity: None,
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: false,
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert!(matches!(heal_request.heal_type, HealType::Bucket { .. }));
assert_eq!(heal_request.priority, HealPriority::Normal);
}
#[tokio::test]
async fn test_convert_to_heal_request_object() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: Some("test-object".to_string()),
disk: None,
priority: HealChannelPriority::High,
scan_mode: Some(HealScanMode::Deep),
remove_corrupted: Some(true),
recreate_missing: Some(true),
update_parity: Some(true),
recursive: Some(false),
dry_run: Some(false),
timeout_seconds: Some(300),
pool_index: Some(0),
set_index: Some(1),
force_start: false,
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert!(matches!(heal_request.heal_type, HealType::Object { .. }));
assert_eq!(heal_request.priority, HealPriority::High);
assert_eq!(heal_request.options.scan_mode, HealScanMode::Deep);
assert!(heal_request.options.remove_corrupted);
assert!(heal_request.options.recreate_missing);
}
#[tokio::test]
async fn test_convert_to_heal_request_erasure_set() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: None,
disk: Some("pool_0_set_1".to_string()),
priority: HealChannelPriority::Critical,
scan_mode: None,
remove_corrupted: None,
recreate_missing: None,
update_parity: None,
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: false,
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert!(matches!(heal_request.heal_type, HealType::ErasureSet { .. }));
assert_eq!(heal_request.priority, HealPriority::Urgent);
}
#[tokio::test]
async fn test_convert_to_heal_request_invalid_disk_id() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: None,
disk: Some("invalid-disk-id".to_string()),
priority: HealChannelPriority::Normal,
scan_mode: None,
remove_corrupted: None,
recreate_missing: None,
update_parity: None,
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: false,
};
let result = processor.convert_to_heal_request(channel_request);
assert!(result.is_err());
}
#[tokio::test]
async fn test_convert_to_heal_request_priority_mapping() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let priorities = vec![
(HealChannelPriority::Low, HealPriority::Low),
(HealChannelPriority::Normal, HealPriority::Normal),
(HealChannelPriority::High, HealPriority::High),
(HealChannelPriority::Critical, HealPriority::Urgent),
];
for (channel_priority, expected_heal_priority) in priorities {
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: None,
disk: None,
priority: channel_priority,
scan_mode: None,
remove_corrupted: None,
recreate_missing: None,
update_parity: None,
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: false,
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert_eq!(heal_request.priority, expected_heal_priority);
}
}
#[tokio::test]
async fn test_convert_to_heal_request_force_start() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: None,
disk: None,
priority: HealChannelPriority::Normal,
scan_mode: None,
remove_corrupted: Some(false),
recreate_missing: Some(false),
update_parity: Some(false),
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: true, // Should override the above false values
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert!(heal_request.options.remove_corrupted);
assert!(heal_request.options.recreate_missing);
assert!(heal_request.options.update_parity);
}
#[tokio::test]
async fn test_convert_to_heal_request_empty_object_prefix() {
let heal_manager = create_test_heal_manager();
let processor = HealChannelProcessor::new(heal_manager);
let channel_request = HealChannelRequest {
id: "test-id".to_string(),
bucket: "test-bucket".to_string(),
object_prefix: Some("".to_string()), // Empty prefix should be treated as bucket heal
disk: None,
priority: HealChannelPriority::Normal,
scan_mode: None,
remove_corrupted: None,
recreate_missing: None,
update_parity: None,
recursive: None,
dry_run: None,
timeout_seconds: None,
pool_index: None,
set_index: None,
force_start: false,
};
let heal_request = processor.convert_to_heal_request(channel_request).unwrap();
assert!(matches!(heal_request.heal_type, HealType::Bucket { .. }));
}
}

View File

@@ -12,12 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::{Error, Result};
use crate::heal::{
progress::HealProgress,
resume::{CheckpointManager, ResumeManager, ResumeUtils},
storage::HealStorageAPI,
};
use crate::{Error, Result};
use futures::future::join_all;
use rustfs_common::heal_channel::{HealOpts, HealScanMode};
use rustfs_ecstore::disk::DiskStore;
@@ -49,14 +49,15 @@ impl ErasureSetHealer {
}
/// execute erasure set heal with resume
#[tracing::instrument(skip(self, buckets), fields(set_disk_id = %set_disk_id, bucket_count = buckets.len()))]
pub async fn heal_erasure_set(&self, buckets: &[String], set_disk_id: &str) -> Result<()> {
info!("Starting erasure set heal for {} buckets on set disk {}", buckets.len(), set_disk_id);
info!("Starting erasure set heal");
// 1. generate or get task id
let task_id = self.get_or_create_task_id(set_disk_id).await?;
// 2. initialize or resume resume state
let (resume_manager, checkpoint_manager) = self.initialize_resume_state(&task_id, buckets).await?;
let (resume_manager, checkpoint_manager) = self.initialize_resume_state(&task_id, set_disk_id, buckets).await?;
// 3. execute heal with resume
let result = self
@@ -77,25 +78,38 @@ impl ErasureSetHealer {
}
/// get or create task id
async fn get_or_create_task_id(&self, _set_disk_id: &str) -> Result<String> {
async fn get_or_create_task_id(&self, set_disk_id: &str) -> Result<String> {
// check if there are resumable tasks
let resumable_tasks = ResumeUtils::get_resumable_tasks(&self.disk).await?;
for task_id in resumable_tasks {
if ResumeUtils::can_resume_task(&self.disk, &task_id).await {
info!("Found resumable task: {}", task_id);
return Ok(task_id);
match ResumeManager::load_from_disk(self.disk.clone(), &task_id).await {
Ok(manager) => {
let state = manager.get_state().await;
if state.set_disk_id == set_disk_id && ResumeUtils::can_resume_task(&self.disk, &task_id).await {
info!("Found resumable task: {} for set {}", task_id, set_disk_id);
return Ok(task_id);
}
}
Err(e) => {
warn!("Failed to load resume state for task {}: {}", task_id, e);
}
}
}
// create new task id
let task_id = ResumeUtils::generate_task_id();
let task_id = format!("{}_{}", set_disk_id, ResumeUtils::generate_task_id());
info!("Created new heal task: {}", task_id);
Ok(task_id)
}
/// initialize or resume resume state
async fn initialize_resume_state(&self, task_id: &str, buckets: &[String]) -> Result<(ResumeManager, CheckpointManager)> {
async fn initialize_resume_state(
&self,
task_id: &str,
set_disk_id: &str,
buckets: &[String],
) -> Result<(ResumeManager, CheckpointManager)> {
// check if resume state exists
if ResumeManager::has_resume_state(&self.disk, task_id).await {
info!("Loading existing resume state for task: {}", task_id);
@@ -111,8 +125,14 @@ impl ErasureSetHealer {
} else {
info!("Creating new resume state for task: {}", task_id);
let resume_manager =
ResumeManager::new(self.disk.clone(), task_id.to_string(), "erasure_set".to_string(), buckets.to_vec()).await?;
let resume_manager = ResumeManager::new(
self.disk.clone(),
task_id.to_string(),
"erasure_set".to_string(),
set_disk_id.to_string(),
buckets.to_vec(),
)
.await?;
let checkpoint_manager = CheckpointManager::new(self.disk.clone(), task_id.to_string()).await?;
@@ -162,6 +182,7 @@ impl ErasureSetHealer {
let bucket_result = self
.heal_bucket_with_resume(
bucket,
bucket_idx,
&mut current_object_index,
&mut processed_objects,
&mut successful_objects,
@@ -182,7 +203,7 @@ impl ErasureSetHealer {
// check cancel status
if self.cancel_token.is_cancelled() {
info!("Heal task cancelled");
warn!("Heal task cancelled");
return Err(Error::TaskCancelled);
}
@@ -211,9 +232,11 @@ impl ErasureSetHealer {
/// heal single bucket with resume
#[allow(clippy::too_many_arguments)]
#[tracing::instrument(skip(self, current_object_index, processed_objects, successful_objects, failed_objects, _skipped_objects, resume_manager, checkpoint_manager), fields(bucket = %bucket, bucket_index = bucket_index))]
async fn heal_bucket_with_resume(
&self,
bucket: &str,
bucket_index: usize,
current_object_index: &mut usize,
processed_objects: &mut u64,
successful_objects: &mut u64,
@@ -222,7 +245,7 @@ impl ErasureSetHealer {
resume_manager: &ResumeManager,
checkpoint_manager: &CheckpointManager,
) -> Result<()> {
info!("Starting heal for bucket: {} from object index {}", bucket, current_object_index);
info!(target: "rustfs:ahm:heal_bucket_with_resume" ,"Starting heal for bucket from object index {}", current_object_index);
// 1. get bucket info
let _bucket_info = match self.storage.get_bucket_info(bucket).await? {
@@ -252,7 +275,9 @@ impl ErasureSetHealer {
let object_exists = match self.storage.object_exists(bucket, object).await {
Ok(exists) => exists,
Err(e) => {
warn!("Failed to check existence of {}/{}: {}, skipping", bucket, object, e);
warn!("Failed to check existence of {}/{}: {}, marking as failed", bucket, object, e);
*failed_objects += 1;
checkpoint_manager.add_failed_object(object.clone()).await?;
*current_object_index = obj_idx + 1;
continue;
}
@@ -260,7 +285,7 @@ impl ErasureSetHealer {
if !object_exists {
info!(
"Object {}/{} no longer exists, skipping heal (likely deleted intentionally)",
target: "rustfs:ahm:heal_bucket_with_resume" ,"Object {}/{} no longer exists, skipping heal (likely deleted intentionally)",
bucket, object
);
checkpoint_manager.add_processed_object(object.clone()).await?;
@@ -306,7 +331,9 @@ impl ErasureSetHealer {
// save checkpoint periodically
if obj_idx % 100 == 0 {
checkpoint_manager.update_position(0, *current_object_index).await?;
checkpoint_manager
.update_position(bucket_index, *current_object_index)
.await?;
}
}
@@ -337,7 +364,10 @@ impl ErasureSetHealer {
let cancel_token = self.cancel_token.clone();
async move {
let _permit = semaphore.acquire().await.unwrap();
let _permit = semaphore
.acquire()
.await
.map_err(|e| Error::other(format!("Failed to acquire semaphore for bucket heal: {e}")))?;
if cancel_token.is_cancelled() {
return Err(Error::TaskCancelled);
@@ -432,7 +462,10 @@ impl ErasureSetHealer {
let semaphore = semaphore.clone();
async move {
let _permit = semaphore.acquire().await.unwrap();
let _permit = semaphore
.acquire()
.await
.map_err(|e| Error::other(format!("Failed to acquire semaphore for object heal: {e}")))?;
match storage.heal_object(&bucket, &object, None, &heal_opts).await {
Ok((_result, None)) => {

View File

@@ -12,7 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::heal::task::{HealOptions, HealPriority, HealRequest, HealType};
use crate::heal::{HealOptions, HealPriority, HealRequest, HealType};
use crate::{Error, Result};
use rustfs_ecstore::disk::endpoint::Endpoint;
use serde::{Deserialize, Serialize};
use std::time::SystemTime;
@@ -104,7 +105,7 @@ pub enum HealEvent {
impl HealEvent {
/// Convert HealEvent to HealRequest
pub fn to_heal_request(&self) -> HealRequest {
pub fn to_heal_request(&self) -> Result<HealRequest> {
match self {
HealEvent::ObjectCorruption {
bucket,
@@ -112,7 +113,7 @@ impl HealEvent {
version_id,
severity,
..
} => HealRequest::new(
} => Ok(HealRequest::new(
HealType::Object {
bucket: bucket.clone(),
object: object.clone(),
@@ -120,13 +121,13 @@ impl HealEvent {
},
HealOptions::default(),
Self::severity_to_priority(severity),
),
)),
HealEvent::ObjectMissing {
bucket,
object,
version_id,
..
} => HealRequest::new(
} => Ok(HealRequest::new(
HealType::Object {
bucket: bucket.clone(),
object: object.clone(),
@@ -134,34 +135,38 @@ impl HealEvent {
},
HealOptions::default(),
HealPriority::High,
),
HealEvent::MetadataCorruption { bucket, object, .. } => HealRequest::new(
)),
HealEvent::MetadataCorruption { bucket, object, .. } => Ok(HealRequest::new(
HealType::Metadata {
bucket: bucket.clone(),
object: object.clone(),
},
HealOptions::default(),
HealPriority::High,
),
)),
HealEvent::DiskStatusChange { endpoint, .. } => {
// Convert disk status change to erasure set heal
// Note: This requires access to storage to get bucket list, which is not available here
// The actual bucket list will need to be provided by the caller or retrieved differently
HealRequest::new(
let set_disk_id = crate::heal::utils::format_set_disk_id_from_i32(endpoint.pool_idx, endpoint.set_idx)
.ok_or_else(|| Error::InvalidHealType {
heal_type: format!("erasure-set(pool={}, set={})", endpoint.pool_idx, endpoint.set_idx),
})?;
Ok(HealRequest::new(
HealType::ErasureSet {
buckets: vec![], // Empty bucket list - caller should populate this
set_disk_id: format!("{}_{}", endpoint.pool_idx, endpoint.set_idx),
set_disk_id,
},
HealOptions::default(),
HealPriority::High,
)
))
}
HealEvent::ECDecodeFailure {
bucket,
object,
version_id,
..
} => HealRequest::new(
} => Ok(HealRequest::new(
HealType::ECDecode {
bucket: bucket.clone(),
object: object.clone(),
@@ -169,13 +174,13 @@ impl HealEvent {
},
HealOptions::default(),
HealPriority::Urgent,
),
)),
HealEvent::ChecksumMismatch {
bucket,
object,
version_id,
..
} => HealRequest::new(
} => Ok(HealRequest::new(
HealType::Object {
bucket: bucket.clone(),
object: object.clone(),
@@ -183,17 +188,19 @@ impl HealEvent {
},
HealOptions::default(),
HealPriority::High,
),
HealEvent::BucketMetadataCorruption { bucket, .. } => {
HealRequest::new(HealType::Bucket { bucket: bucket.clone() }, HealOptions::default(), HealPriority::High)
}
HealEvent::MRFMetadataCorruption { meta_path, .. } => HealRequest::new(
)),
HealEvent::BucketMetadataCorruption { bucket, .. } => Ok(HealRequest::new(
HealType::Bucket { bucket: bucket.clone() },
HealOptions::default(),
HealPriority::High,
)),
HealEvent::MRFMetadataCorruption { meta_path, .. } => Ok(HealRequest::new(
HealType::MRF {
meta_path: meta_path.clone(),
},
HealOptions::default(),
HealPriority::High,
),
)),
}
}
@@ -357,3 +364,319 @@ impl Default for HealEventHandler {
Self::new(1000)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::heal::task::{HealPriority, HealType};
#[test]
fn test_heal_event_object_corruption_to_request() {
let event = HealEvent::ObjectCorruption {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::Object { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_object_missing_to_request() {
let event = HealEvent::ObjectMissing {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: Some("v1".to_string()),
expected_locations: vec![0, 1],
available_locations: vec![2, 3],
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::Object { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_metadata_corruption_to_request() {
let event = HealEvent::MetadataCorruption {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
corruption_type: CorruptionType::MetadataCorruption,
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::Metadata { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_ec_decode_failure_to_request() {
let event = HealEvent::ECDecodeFailure {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
missing_shards: vec![0, 1],
available_shards: vec![2, 3, 4],
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::ECDecode { .. }));
assert_eq!(request.priority, HealPriority::Urgent);
}
#[test]
fn test_heal_event_checksum_mismatch_to_request() {
let event = HealEvent::ChecksumMismatch {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
expected_checksum: "abc123".to_string(),
actual_checksum: "def456".to_string(),
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::Object { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_bucket_metadata_corruption_to_request() {
let event = HealEvent::BucketMetadataCorruption {
bucket: "test-bucket".to_string(),
corruption_type: CorruptionType::MetadataCorruption,
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::Bucket { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_mrf_metadata_corruption_to_request() {
let event = HealEvent::MRFMetadataCorruption {
meta_path: "test-bucket/test-object".to_string(),
corruption_type: CorruptionType::MetadataCorruption,
};
let request = event.to_heal_request().unwrap();
assert!(matches!(request.heal_type, HealType::MRF { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_severity_to_priority() {
let event_low = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::Low,
};
let request = event_low.to_heal_request().unwrap();
assert_eq!(request.priority, HealPriority::Low);
let event_medium = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::Medium,
};
let request = event_medium.to_heal_request().unwrap();
assert_eq!(request.priority, HealPriority::Normal);
let event_high = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
let request = event_high.to_heal_request().unwrap();
assert_eq!(request.priority, HealPriority::High);
let event_critical = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::Critical,
};
let request = event_critical.to_heal_request().unwrap();
assert_eq!(request.priority, HealPriority::Urgent);
}
#[test]
fn test_heal_event_description() {
let event = HealEvent::ObjectCorruption {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
let desc = event.description();
assert!(desc.contains("Object corruption detected"));
assert!(desc.contains("test-bucket/test-object"));
assert!(desc.contains("DataCorruption"));
}
#[test]
fn test_heal_event_severity() {
let event = HealEvent::ECDecodeFailure {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
missing_shards: vec![],
available_shards: vec![],
};
assert_eq!(event.severity(), Severity::Critical);
let event = HealEvent::ObjectMissing {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
expected_locations: vec![],
available_locations: vec![],
};
assert_eq!(event.severity(), Severity::High);
}
#[test]
fn test_heal_event_handler_new() {
let handler = HealEventHandler::new(10);
assert_eq!(handler.event_count(), 0);
assert_eq!(handler.max_events, 10);
}
#[test]
fn test_heal_event_handler_default() {
let handler = HealEventHandler::default();
assert_eq!(handler.max_events, 1000);
}
#[test]
fn test_heal_event_handler_add_event() {
let mut handler = HealEventHandler::new(3);
let event = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
handler.add_event(event.clone());
assert_eq!(handler.event_count(), 1);
handler.add_event(event.clone());
handler.add_event(event.clone());
assert_eq!(handler.event_count(), 3);
}
#[test]
fn test_heal_event_handler_max_events() {
let mut handler = HealEventHandler::new(2);
let event = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
handler.add_event(event.clone());
handler.add_event(event.clone());
handler.add_event(event.clone()); // Should remove oldest
assert_eq!(handler.event_count(), 2);
}
#[test]
fn test_heal_event_handler_get_events() {
let mut handler = HealEventHandler::new(10);
let event = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
handler.add_event(event.clone());
handler.add_event(event.clone());
let events = handler.get_events();
assert_eq!(events.len(), 2);
}
#[test]
fn test_heal_event_handler_clear_events() {
let mut handler = HealEventHandler::new(10);
let event = HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
};
handler.add_event(event);
assert_eq!(handler.event_count(), 1);
handler.clear_events();
assert_eq!(handler.event_count(), 0);
}
#[test]
fn test_heal_event_handler_filter_by_severity() {
let mut handler = HealEventHandler::new(10);
handler.add_event(HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::Low,
});
handler.add_event(HealEvent::ECDecodeFailure {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
missing_shards: vec![],
available_shards: vec![],
});
let high_severity = handler.filter_by_severity(Severity::High);
assert_eq!(high_severity.len(), 1); // Only ECDecodeFailure is Critical >= High
}
#[test]
fn test_heal_event_handler_filter_by_type() {
let mut handler = HealEventHandler::new(10);
handler.add_event(HealEvent::ObjectCorruption {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
corruption_type: CorruptionType::DataCorruption,
severity: Severity::High,
});
handler.add_event(HealEvent::ObjectMissing {
bucket: "test".to_string(),
object: "test".to_string(),
version_id: None,
expected_locations: vec![],
available_locations: vec![],
});
let corruption_events = handler.filter_by_type("ObjectCorruption");
assert_eq!(corruption_events.len(), 1);
let missing_events = handler.filter_by_type("ObjectMissing");
assert_eq!(missing_events.len(), 1);
}
}

View File

@@ -12,17 +12,17 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::{Error, Result};
use crate::heal::{
progress::{HealProgress, HealStatistics},
storage::HealStorageAPI,
task::{HealOptions, HealPriority, HealRequest, HealTask, HealTaskStatus, HealType},
};
use crate::{Error, Result};
use rustfs_ecstore::disk::DiskAPI;
use rustfs_ecstore::disk::error::DiskError;
use rustfs_ecstore::global::GLOBAL_LOCAL_DISK_MAP;
use std::{
collections::{HashMap, VecDeque},
collections::{BinaryHeap, HashMap, HashSet},
sync::Arc,
time::{Duration, SystemTime},
};
@@ -33,6 +33,151 @@ use tokio::{
use tokio_util::sync::CancellationToken;
use tracing::{error, info, warn};
/// Priority queue wrapper for heal requests
/// Uses BinaryHeap for priority-based ordering while maintaining FIFO for same-priority items
#[derive(Debug)]
struct PriorityHealQueue {
/// Heap of (priority, sequence, request) tuples
heap: BinaryHeap<PriorityQueueItem>,
/// Sequence counter for FIFO ordering within same priority
sequence: u64,
/// Set of request keys to prevent duplicates
dedup_keys: HashSet<String>,
}
/// Wrapper for heap items to implement proper ordering
#[derive(Debug)]
struct PriorityQueueItem {
priority: HealPriority,
sequence: u64,
request: HealRequest,
}
impl Eq for PriorityQueueItem {}
impl PartialEq for PriorityQueueItem {
fn eq(&self, other: &Self) -> bool {
self.priority == other.priority && self.sequence == other.sequence
}
}
impl Ord for PriorityQueueItem {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
// First compare by priority (higher priority first)
match self.priority.cmp(&other.priority) {
std::cmp::Ordering::Equal => {
// If priorities are equal, use sequence for FIFO (lower sequence first)
other.sequence.cmp(&self.sequence)
}
ordering => ordering,
}
}
}
impl PartialOrd for PriorityQueueItem {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl PriorityHealQueue {
fn new() -> Self {
Self {
heap: BinaryHeap::new(),
sequence: 0,
dedup_keys: HashSet::new(),
}
}
fn len(&self) -> usize {
self.heap.len()
}
fn is_empty(&self) -> bool {
self.heap.is_empty()
}
fn push(&mut self, request: HealRequest) -> bool {
let key = Self::make_dedup_key(&request);
// Check for duplicates
if self.dedup_keys.contains(&key) {
return false; // Duplicate request, don't add
}
self.dedup_keys.insert(key);
self.sequence += 1;
self.heap.push(PriorityQueueItem {
priority: request.priority,
sequence: self.sequence,
request,
});
true
}
/// Get statistics about queue contents by priority
fn get_priority_stats(&self) -> HashMap<HealPriority, usize> {
let mut stats = HashMap::new();
for item in &self.heap {
*stats.entry(item.priority).or_insert(0) += 1;
}
stats
}
fn pop(&mut self) -> Option<HealRequest> {
self.heap.pop().map(|item| {
let key = Self::make_dedup_key(&item.request);
self.dedup_keys.remove(&key);
item.request
})
}
/// Create a deduplication key from a heal request
fn make_dedup_key(request: &HealRequest) -> String {
match &request.heal_type {
HealType::Object {
bucket,
object,
version_id,
} => {
format!("object:{}:{}:{}", bucket, object, version_id.as_deref().unwrap_or(""))
}
HealType::Bucket { bucket } => {
format!("bucket:{}", bucket)
}
HealType::ErasureSet { set_disk_id, .. } => {
format!("erasure_set:{}", set_disk_id)
}
HealType::Metadata { bucket, object } => {
format!("metadata:{}:{}", bucket, object)
}
HealType::MRF { meta_path } => {
format!("mrf:{}", meta_path)
}
HealType::ECDecode {
bucket,
object,
version_id,
} => {
format!("ecdecode:{}:{}:{}", bucket, object, version_id.as_deref().unwrap_or(""))
}
}
}
/// Check if a request with the same key already exists in the queue
#[allow(dead_code)]
fn contains_key(&self, request: &HealRequest) -> bool {
let key = Self::make_dedup_key(request);
self.dedup_keys.contains(&key)
}
/// Check if an erasure set heal request for a specific set_disk_id exists
fn contains_erasure_set(&self, set_disk_id: &str) -> bool {
let key = format!("erasure_set:{}", set_disk_id);
self.dedup_keys.contains(&key)
}
}
/// Heal config
#[derive(Debug, Clone)]
pub struct HealConfig {
@@ -85,8 +230,8 @@ pub struct HealManager {
state: Arc<RwLock<HealState>>,
/// Active heal tasks
active_heals: Arc<Mutex<HashMap<String, Arc<HealTask>>>>,
/// Heal queue
heal_queue: Arc<Mutex<VecDeque<HealRequest>>>,
/// Heal queue (priority-based)
heal_queue: Arc<Mutex<PriorityHealQueue>>,
/// Storage layer interface
storage: Arc<dyn HealStorageAPI>,
/// Cancel token
@@ -103,7 +248,7 @@ impl HealManager {
config: Arc::new(RwLock::new(config)),
state: Arc::new(RwLock::new(HealState::default())),
active_heals: Arc::new(Mutex::new(HashMap::new())),
heal_queue: Arc::new(Mutex::new(VecDeque::new())),
heal_queue: Arc::new(Mutex::new(PriorityHealQueue::new())),
storage,
cancel_token: CancellationToken::new(),
statistics: Arc::new(RwLock::new(HealStatistics::new())),
@@ -161,17 +306,54 @@ impl HealManager {
let config = self.config.read().await;
let mut queue = self.heal_queue.lock().await;
if queue.len() >= config.queue_size {
let queue_len = queue.len();
let queue_capacity = config.queue_size;
if queue_len >= queue_capacity {
return Err(Error::ConfigurationError {
message: "Heal queue is full".to_string(),
message: format!("Heal queue is full ({}/{})", queue_len, queue_capacity),
});
}
// Warn when queue is getting full (>80% capacity)
let capacity_threshold = (queue_capacity as f64 * 0.8) as usize;
if queue_len >= capacity_threshold {
warn!(
"Heal queue is {}% full ({}/{}). Consider increasing queue size or processing capacity.",
(queue_len * 100) / queue_capacity,
queue_len,
queue_capacity
);
}
let request_id = request.id.clone();
queue.push_back(request);
let priority = request.priority;
// Try to push the request; if it's a duplicate, still return the request_id
let is_new = queue.push(request);
// Log queue statistics periodically (when adding high/urgent priority items)
if matches!(priority, HealPriority::High | HealPriority::Urgent) {
let stats = queue.get_priority_stats();
info!(
"Heal queue stats after adding {:?} priority request: total={}, urgent={}, high={}, normal={}, low={}",
priority,
queue_len + 1,
stats.get(&HealPriority::Urgent).unwrap_or(&0),
stats.get(&HealPriority::High).unwrap_or(&0),
stats.get(&HealPriority::Normal).unwrap_or(&0),
stats.get(&HealPriority::Low).unwrap_or(&0)
);
}
drop(queue);
info!("Submitted heal request: {}", request_id);
if is_new {
info!("Submitted heal request: {} with priority: {:?}", request_id, priority);
} else {
info!("Heal request already queued (duplicate): {}", request_id);
}
Ok(request_id)
}
@@ -310,17 +492,30 @@ impl HealManager {
// Create erasure set heal requests for each endpoint
for ep in endpoints {
let Some(set_disk_id) =
crate::heal::utils::format_set_disk_id_from_i32(ep.pool_idx, ep.set_idx)
else {
warn!("Skipping endpoint {} without valid pool/set index", ep);
continue;
};
// skip if already queued or healing
// Use consistent lock order: queue first, then active_heals to avoid deadlock
let mut skip = false;
{
let queue = heal_queue.lock().await;
if queue.iter().any(|req| matches!(&req.heal_type, crate::heal::task::HealType::ErasureSet { set_disk_id, .. } if set_disk_id == &format!("{}_{}", ep.pool_idx, ep.set_idx))) {
if queue.contains_erasure_set(&set_disk_id) {
skip = true;
}
}
if !skip {
let active = active_heals.lock().await;
if active.values().any(|task| matches!(&task.heal_type, crate::heal::task::HealType::ErasureSet { set_disk_id, .. } if set_disk_id == &format!("{}_{}", ep.pool_idx, ep.set_idx))) {
if active.values().any(|task| {
matches!(
&task.heal_type,
crate::heal::task::HealType::ErasureSet { set_disk_id: active_id, .. }
if active_id == &set_disk_id
)
}) {
skip = true;
}
}
@@ -330,17 +525,16 @@ impl HealManager {
}
// enqueue erasure set heal request for this disk
let set_disk_id = format!("pool_{}_set_{}", ep.pool_idx, ep.set_idx);
let req = HealRequest::new(
HealType::ErasureSet {
buckets: buckets.clone(),
set_disk_id: set_disk_id.clone()
set_disk_id: set_disk_id.clone(),
},
HealOptions::default(),
HealPriority::Normal,
);
let mut queue = heal_queue.lock().await;
queue.push_back(req);
queue.push(req);
info!("Enqueued auto erasure set heal for endpoint: {} (set_disk_id: {})", ep, set_disk_id);
}
}
@@ -351,8 +545,9 @@ impl HealManager {
}
/// Process heal queue
/// Processes multiple tasks per cycle when capacity allows and queue has high-priority items
async fn process_heal_queue(
heal_queue: &Arc<Mutex<VecDeque<HealRequest>>>,
heal_queue: &Arc<Mutex<PriorityHealQueue>>,
active_heals: &Arc<Mutex<HashMap<String, Arc<HealTask>>>>,
config: &Arc<RwLock<HealConfig>>,
statistics: &Arc<RwLock<HealStatistics>>,
@@ -361,51 +556,83 @@ impl HealManager {
let config = config.read().await;
let mut active_heals_guard = active_heals.lock().await;
// check if new heal tasks can be started
if active_heals_guard.len() >= config.max_concurrent_heals {
// Check if new heal tasks can be started
let active_count = active_heals_guard.len();
if active_count >= config.max_concurrent_heals {
return;
}
// Calculate how many tasks we can start this cycle
let available_slots = config.max_concurrent_heals - active_count;
let mut queue = heal_queue.lock().await;
if let Some(request) = queue.pop_front() {
let task = Arc::new(HealTask::from_request(request, storage.clone()));
let task_id = task.id.clone();
active_heals_guard.insert(task_id.clone(), task.clone());
drop(active_heals_guard);
let active_heals_clone = active_heals.clone();
let statistics_clone = statistics.clone();
let queue_len = queue.len();
// start heal task
tokio::spawn(async move {
info!("Starting heal task: {}", task_id);
let result = task.execute().await;
match result {
Ok(_) => {
info!("Heal task completed successfully: {}", task_id);
}
Err(e) => {
error!("Heal task failed: {} - {}", task_id, e);
}
}
let mut active_heals_guard = active_heals_clone.lock().await;
if let Some(completed_task) = active_heals_guard.remove(&task_id) {
// update statistics
let mut stats = statistics_clone.write().await;
match completed_task.get_status().await {
HealTaskStatus::Completed => {
stats.update_task_completion(true);
if queue_len == 0 {
return;
}
// Process multiple tasks if:
// 1. We have available slots
// 2. Queue is not empty
// Prioritize urgent/high priority tasks by processing up to 2 tasks per cycle if available
let tasks_to_process = if queue_len > 0 {
std::cmp::min(available_slots, std::cmp::min(2, queue_len))
} else {
0
};
for _ in 0..tasks_to_process {
if let Some(request) = queue.pop() {
let task_priority = request.priority;
let task = Arc::new(HealTask::from_request(request, storage.clone()));
let task_id = task.id.clone();
active_heals_guard.insert(task_id.clone(), task.clone());
let active_heals_clone = active_heals.clone();
let statistics_clone = statistics.clone();
// start heal task
tokio::spawn(async move {
info!("Starting heal task: {} with priority: {:?}", task_id, task_priority);
let result = task.execute().await;
match result {
Ok(_) => {
info!("Heal task completed successfully: {}", task_id);
}
_ => {
stats.update_task_completion(false);
Err(e) => {
error!("Heal task failed: {} - {}", task_id, e);
}
}
stats.update_running_tasks(active_heals_guard.len() as u64);
}
});
let mut active_heals_guard = active_heals_clone.lock().await;
if let Some(completed_task) = active_heals_guard.remove(&task_id) {
// update statistics
let mut stats = statistics_clone.write().await;
match completed_task.get_status().await {
HealTaskStatus::Completed => {
stats.update_task_completion(true);
}
_ => {
stats.update_task_completion(false);
}
}
stats.update_running_tasks(active_heals_guard.len() as u64);
}
});
} else {
break;
}
}
// update statistics
let mut stats = statistics.write().await;
stats.total_tasks += 1;
// Update statistics for all started tasks
let mut stats = statistics.write().await;
stats.total_tasks += tasks_to_process as u64;
// Log queue status if items remain
if !queue.is_empty() {
let remaining = queue.len();
if remaining > 10 {
info!("Heal queue has {} pending requests, {} tasks active", remaining, active_heals_guard.len());
}
}
}
}
@@ -420,3 +647,333 @@ impl std::fmt::Debug for HealManager {
.finish()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::heal::task::{HealOptions, HealPriority, HealRequest, HealType};
#[test]
fn test_priority_queue_ordering() {
let mut queue = PriorityHealQueue::new();
// Add requests with different priorities
let low_req = HealRequest::new(
HealType::Bucket {
bucket: "bucket1".to_string(),
},
HealOptions::default(),
HealPriority::Low,
);
let normal_req = HealRequest::new(
HealType::Bucket {
bucket: "bucket2".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let high_req = HealRequest::new(
HealType::Bucket {
bucket: "bucket3".to_string(),
},
HealOptions::default(),
HealPriority::High,
);
let urgent_req = HealRequest::new(
HealType::Bucket {
bucket: "bucket4".to_string(),
},
HealOptions::default(),
HealPriority::Urgent,
);
// Add in random order: low, high, normal, urgent
assert!(queue.push(low_req));
assert!(queue.push(high_req));
assert!(queue.push(normal_req));
assert!(queue.push(urgent_req));
assert_eq!(queue.len(), 4);
// Should pop in priority order: urgent, high, normal, low
let popped1 = queue.pop().unwrap();
assert_eq!(popped1.priority, HealPriority::Urgent);
let popped2 = queue.pop().unwrap();
assert_eq!(popped2.priority, HealPriority::High);
let popped3 = queue.pop().unwrap();
assert_eq!(popped3.priority, HealPriority::Normal);
let popped4 = queue.pop().unwrap();
assert_eq!(popped4.priority, HealPriority::Low);
assert_eq!(queue.len(), 0);
}
#[test]
fn test_priority_queue_fifo_same_priority() {
let mut queue = PriorityHealQueue::new();
// Add multiple requests with same priority
let req1 = HealRequest::new(
HealType::Bucket {
bucket: "bucket1".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let req2 = HealRequest::new(
HealType::Bucket {
bucket: "bucket2".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let req3 = HealRequest::new(
HealType::Bucket {
bucket: "bucket3".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let id1 = req1.id.clone();
let id2 = req2.id.clone();
let id3 = req3.id.clone();
assert!(queue.push(req1));
assert!(queue.push(req2));
assert!(queue.push(req3));
// Should maintain FIFO order for same priority
let popped1 = queue.pop().unwrap();
assert_eq!(popped1.id, id1);
let popped2 = queue.pop().unwrap();
assert_eq!(popped2.id, id2);
let popped3 = queue.pop().unwrap();
assert_eq!(popped3.id, id3);
}
#[test]
fn test_priority_queue_deduplication() {
let mut queue = PriorityHealQueue::new();
let req1 = HealRequest::new(
HealType::Object {
bucket: "bucket1".to_string(),
object: "object1".to_string(),
version_id: None,
},
HealOptions::default(),
HealPriority::Normal,
);
let req2 = HealRequest::new(
HealType::Object {
bucket: "bucket1".to_string(),
object: "object1".to_string(),
version_id: None,
},
HealOptions::default(),
HealPriority::High,
);
// First request should be added
assert!(queue.push(req1));
assert_eq!(queue.len(), 1);
// Second request with same object should be rejected (duplicate)
assert!(!queue.push(req2));
assert_eq!(queue.len(), 1);
}
#[test]
fn test_priority_queue_contains_erasure_set() {
let mut queue = PriorityHealQueue::new();
let req = HealRequest::new(
HealType::ErasureSet {
buckets: vec!["bucket1".to_string()],
set_disk_id: "pool_0_set_1".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
assert!(queue.push(req));
assert!(queue.contains_erasure_set("pool_0_set_1"));
assert!(!queue.contains_erasure_set("pool_0_set_2"));
}
#[test]
fn test_priority_queue_dedup_key_generation() {
// Test different heal types generate different keys
let obj_req = HealRequest::new(
HealType::Object {
bucket: "bucket1".to_string(),
object: "object1".to_string(),
version_id: None,
},
HealOptions::default(),
HealPriority::Normal,
);
let bucket_req = HealRequest::new(
HealType::Bucket {
bucket: "bucket1".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let erasure_req = HealRequest::new(
HealType::ErasureSet {
buckets: vec!["bucket1".to_string()],
set_disk_id: "pool_0_set_1".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
);
let obj_key = PriorityHealQueue::make_dedup_key(&obj_req);
let bucket_key = PriorityHealQueue::make_dedup_key(&bucket_req);
let erasure_key = PriorityHealQueue::make_dedup_key(&erasure_req);
// All keys should be different
assert_ne!(obj_key, bucket_key);
assert_ne!(obj_key, erasure_key);
assert_ne!(bucket_key, erasure_key);
assert!(obj_key.starts_with("object:"));
assert!(bucket_key.starts_with("bucket:"));
assert!(erasure_key.starts_with("erasure_set:"));
}
#[test]
fn test_priority_queue_mixed_priorities_and_types() {
let mut queue = PriorityHealQueue::new();
// Add various requests
let requests = vec![
(
HealType::Object {
bucket: "b1".to_string(),
object: "o1".to_string(),
version_id: None,
},
HealPriority::Low,
),
(
HealType::Bucket {
bucket: "b2".to_string(),
},
HealPriority::Urgent,
),
(
HealType::ErasureSet {
buckets: vec!["b3".to_string()],
set_disk_id: "pool_0_set_1".to_string(),
},
HealPriority::Normal,
),
(
HealType::Object {
bucket: "b4".to_string(),
object: "o4".to_string(),
version_id: None,
},
HealPriority::High,
),
];
for (heal_type, priority) in requests {
let req = HealRequest::new(heal_type, HealOptions::default(), priority);
queue.push(req);
}
assert_eq!(queue.len(), 4);
// Check they come out in priority order
let priorities: Vec<HealPriority> = (0..4).filter_map(|_| queue.pop().map(|r| r.priority)).collect();
assert_eq!(
priorities,
vec![
HealPriority::Urgent,
HealPriority::High,
HealPriority::Normal,
HealPriority::Low,
]
);
}
#[test]
fn test_priority_queue_stats() {
let mut queue = PriorityHealQueue::new();
// Add requests with different priorities
for _ in 0..3 {
queue.push(HealRequest::new(
HealType::Bucket {
bucket: format!("bucket-low-{}", queue.len()),
},
HealOptions::default(),
HealPriority::Low,
));
}
for _ in 0..2 {
queue.push(HealRequest::new(
HealType::Bucket {
bucket: format!("bucket-normal-{}", queue.len()),
},
HealOptions::default(),
HealPriority::Normal,
));
}
queue.push(HealRequest::new(
HealType::Bucket {
bucket: "bucket-high".to_string(),
},
HealOptions::default(),
HealPriority::High,
));
let stats = queue.get_priority_stats();
assert_eq!(*stats.get(&HealPriority::Low).unwrap_or(&0), 3);
assert_eq!(*stats.get(&HealPriority::Normal).unwrap_or(&0), 2);
assert_eq!(*stats.get(&HealPriority::High).unwrap_or(&0), 1);
assert_eq!(*stats.get(&HealPriority::Urgent).unwrap_or(&0), 0);
}
#[test]
fn test_priority_queue_is_empty() {
let mut queue = PriorityHealQueue::new();
assert!(queue.is_empty());
queue.push(HealRequest::new(
HealType::Bucket {
bucket: "test".to_string(),
},
HealOptions::default(),
HealPriority::Normal,
));
assert!(!queue.is_empty());
queue.pop();
assert!(queue.is_empty());
}
}

View File

@@ -20,6 +20,7 @@ pub mod progress;
pub mod resume;
pub mod storage;
pub mod task;
pub mod utils;
pub use erasure_healer::ErasureSetHealer;
pub use manager::HealManager;

View File

@@ -146,3 +146,244 @@ impl HealStatistics {
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_heal_progress_new() {
let progress = HealProgress::new();
assert_eq!(progress.objects_scanned, 0);
assert_eq!(progress.objects_healed, 0);
assert_eq!(progress.objects_failed, 0);
assert_eq!(progress.bytes_processed, 0);
assert_eq!(progress.progress_percentage, 0.0);
assert!(progress.start_time.is_some());
assert!(progress.last_update_time.is_some());
assert!(progress.current_object.is_none());
}
#[test]
fn test_heal_progress_update_progress() {
let mut progress = HealProgress::new();
progress.update_progress(10, 8, 2, 1024);
assert_eq!(progress.objects_scanned, 10);
assert_eq!(progress.objects_healed, 8);
assert_eq!(progress.objects_failed, 2);
assert_eq!(progress.bytes_processed, 1024);
// Progress percentage should be calculated based on healed/total
// total = scanned + healed + failed = 10 + 8 + 2 = 20
// healed/total = 8/20 = 0.4 = 40%
assert!((progress.progress_percentage - 40.0).abs() < 0.001);
assert!(progress.last_update_time.is_some());
}
#[test]
fn test_heal_progress_update_progress_zero_total() {
let mut progress = HealProgress::new();
progress.update_progress(0, 0, 0, 0);
assert_eq!(progress.progress_percentage, 0.0);
}
#[test]
fn test_heal_progress_update_progress_all_healed() {
let mut progress = HealProgress::new();
// When scanned=0, healed=10, failed=0: total=10, progress = 10/10 = 100%
progress.update_progress(0, 10, 0, 2048);
// All healed, should be 100%
assert!((progress.progress_percentage - 100.0).abs() < 0.001);
}
#[test]
fn test_heal_progress_set_current_object() {
let mut progress = HealProgress::new();
let initial_time = progress.last_update_time;
// Small delay to ensure time difference
std::thread::sleep(std::time::Duration::from_millis(10));
progress.set_current_object(Some("test-bucket/test-object".to_string()));
assert_eq!(progress.current_object, Some("test-bucket/test-object".to_string()));
assert!(progress.last_update_time.is_some());
// last_update_time should be updated
assert_ne!(progress.last_update_time, initial_time);
}
#[test]
fn test_heal_progress_set_current_object_none() {
let mut progress = HealProgress::new();
progress.set_current_object(Some("test".to_string()));
progress.set_current_object(None);
assert!(progress.current_object.is_none());
}
#[test]
fn test_heal_progress_is_completed_by_percentage() {
let mut progress = HealProgress::new();
progress.update_progress(10, 10, 0, 1024);
assert!(progress.is_completed());
}
#[test]
fn test_heal_progress_is_completed_by_processed() {
let mut progress = HealProgress::new();
progress.objects_scanned = 10;
progress.objects_healed = 8;
progress.objects_failed = 2;
// healed + failed = 8 + 2 = 10 >= scanned = 10
assert!(progress.is_completed());
}
#[test]
fn test_heal_progress_is_not_completed() {
let mut progress = HealProgress::new();
progress.objects_scanned = 10;
progress.objects_healed = 5;
progress.objects_failed = 2;
// healed + failed = 5 + 2 = 7 < scanned = 10
assert!(!progress.is_completed());
}
#[test]
fn test_heal_progress_get_success_rate() {
let mut progress = HealProgress::new();
progress.objects_healed = 8;
progress.objects_failed = 2;
// success_rate = 8 / (8 + 2) * 100 = 80%
assert!((progress.get_success_rate() - 80.0).abs() < 0.001);
}
#[test]
fn test_heal_progress_get_success_rate_zero_total() {
let progress = HealProgress::new();
// No healed or failed objects
assert_eq!(progress.get_success_rate(), 0.0);
}
#[test]
fn test_heal_progress_get_success_rate_all_success() {
let mut progress = HealProgress::new();
progress.objects_healed = 10;
progress.objects_failed = 0;
assert!((progress.get_success_rate() - 100.0).abs() < 0.001);
}
#[test]
fn test_heal_statistics_new() {
let stats = HealStatistics::new();
assert_eq!(stats.total_tasks, 0);
assert_eq!(stats.successful_tasks, 0);
assert_eq!(stats.failed_tasks, 0);
assert_eq!(stats.running_tasks, 0);
assert_eq!(stats.total_objects_healed, 0);
assert_eq!(stats.total_bytes_healed, 0);
}
#[test]
fn test_heal_statistics_default() {
let stats = HealStatistics::default();
assert_eq!(stats.total_tasks, 0);
assert_eq!(stats.successful_tasks, 0);
assert_eq!(stats.failed_tasks, 0);
}
#[test]
fn test_heal_statistics_update_task_completion_success() {
let mut stats = HealStatistics::new();
let initial_time = stats.last_update_time;
std::thread::sleep(std::time::Duration::from_millis(10));
stats.update_task_completion(true);
assert_eq!(stats.successful_tasks, 1);
assert_eq!(stats.failed_tasks, 0);
assert!(stats.last_update_time > initial_time);
}
#[test]
fn test_heal_statistics_update_task_completion_failure() {
let mut stats = HealStatistics::new();
stats.update_task_completion(false);
assert_eq!(stats.successful_tasks, 0);
assert_eq!(stats.failed_tasks, 1);
}
#[test]
fn test_heal_statistics_update_running_tasks() {
let mut stats = HealStatistics::new();
let initial_time = stats.last_update_time;
std::thread::sleep(std::time::Duration::from_millis(10));
stats.update_running_tasks(5);
assert_eq!(stats.running_tasks, 5);
assert!(stats.last_update_time > initial_time);
}
#[test]
fn test_heal_statistics_add_healed_objects() {
let mut stats = HealStatistics::new();
let initial_time = stats.last_update_time;
std::thread::sleep(std::time::Duration::from_millis(10));
stats.add_healed_objects(10, 10240);
assert_eq!(stats.total_objects_healed, 10);
assert_eq!(stats.total_bytes_healed, 10240);
assert!(stats.last_update_time > initial_time);
}
#[test]
fn test_heal_statistics_add_healed_objects_accumulative() {
let mut stats = HealStatistics::new();
stats.add_healed_objects(5, 5120);
stats.add_healed_objects(3, 3072);
assert_eq!(stats.total_objects_healed, 8);
assert_eq!(stats.total_bytes_healed, 8192);
}
#[test]
fn test_heal_statistics_get_success_rate() {
let mut stats = HealStatistics::new();
stats.successful_tasks = 8;
stats.failed_tasks = 2;
// success_rate = 8 / (8 + 2) * 100 = 80%
assert!((stats.get_success_rate() - 80.0).abs() < 0.001);
}
#[test]
fn test_heal_statistics_get_success_rate_zero_total() {
let stats = HealStatistics::new();
assert_eq!(stats.get_success_rate(), 0.0);
}
#[test]
fn test_heal_statistics_get_success_rate_all_success() {
let mut stats = HealStatistics::new();
stats.successful_tasks = 10;
stats.failed_tasks = 0;
assert!((stats.get_success_rate() - 100.0).abs() < 0.001);
}
#[test]
fn test_heal_statistics_get_success_rate_all_failure() {
let mut stats = HealStatistics::new();
stats.successful_tasks = 0;
stats.failed_tasks = 5;
assert_eq!(stats.get_success_rate(), 0.0);
}
}

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::{Error, Result};
use crate::{Error, Result};
use rustfs_ecstore::disk::{BUCKET_META_PREFIX, DiskAPI, DiskStore, RUSTFS_META_BUCKET};
use serde::{Deserialize, Serialize};
use std::path::Path;
@@ -27,6 +27,12 @@ const RESUME_STATE_FILE: &str = "ahm_resume_state.json";
const RESUME_PROGRESS_FILE: &str = "ahm_progress.json";
const RESUME_CHECKPOINT_FILE: &str = "ahm_checkpoint.json";
/// Helper function to convert Path to &str, returning an error if conversion fails
fn path_to_str(path: &Path) -> Result<&str> {
path.to_str()
.ok_or_else(|| Error::other(format!("Invalid UTF-8 path: {path:?}")))
}
/// resume state
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ResumeState {
@@ -34,6 +40,9 @@ pub struct ResumeState {
pub task_id: String,
/// task type
pub task_type: String,
/// set disk identifier (for erasure set tasks)
#[serde(default)]
pub set_disk_id: String,
/// start time
pub start_time: u64,
/// last update time
@@ -67,12 +76,13 @@ pub struct ResumeState {
}
impl ResumeState {
pub fn new(task_id: String, task_type: String, buckets: Vec<String>) -> Self {
pub fn new(task_id: String, task_type: String, set_disk_id: String, buckets: Vec<String>) -> Self {
Self {
task_id,
task_type,
start_time: SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs(),
last_update: SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs(),
set_disk_id,
start_time: SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs(),
last_update: SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs(),
completed: false,
total_objects: 0,
processed_objects: 0,
@@ -94,13 +104,13 @@ impl ResumeState {
self.successful_objects = successful;
self.failed_objects = failed;
self.skipped_objects = skipped;
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn set_current_item(&mut self, bucket: Option<String>, object: Option<String>) {
self.current_bucket = bucket;
self.current_object = object;
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn complete_bucket(&mut self, bucket: &str) {
@@ -110,22 +120,22 @@ impl ResumeState {
if let Some(pos) = self.pending_buckets.iter().position(|b| b == bucket) {
self.pending_buckets.remove(pos);
}
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn mark_completed(&mut self) {
self.completed = true;
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn set_error(&mut self, error: String) {
self.error_message = Some(error);
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn increment_retry(&mut self) {
self.retry_count += 1;
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.last_update = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn can_retry(&self) -> bool {
@@ -156,8 +166,14 @@ pub struct ResumeManager {
impl ResumeManager {
/// create new resume manager
pub async fn new(disk: DiskStore, task_id: String, task_type: String, buckets: Vec<String>) -> Result<Self> {
let state = ResumeState::new(task_id, task_type, buckets);
pub async fn new(
disk: DiskStore,
task_id: String,
task_type: String,
set_disk_id: String,
buckets: Vec<String>,
) -> Result<Self> {
let state = ResumeState::new(task_id, task_type, set_disk_id, buckets);
let manager = Self {
disk,
state: Arc::new(RwLock::new(state)),
@@ -184,8 +200,11 @@ impl ResumeManager {
/// check if resume state exists
pub async fn has_resume_state(disk: &DiskStore, task_id: &str) -> bool {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_STATE_FILE}"));
match disk.read_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap()).await {
Ok(data) => !data.is_empty(),
match path_to_str(&file_path) {
Ok(path_str) => match disk.read_all(RUSTFS_META_BUCKET, path_str).await {
Ok(data) => !data.is_empty(),
Err(_) => false,
},
Err(_) => false,
}
}
@@ -254,18 +273,15 @@ impl ResumeManager {
let checkpoint_file = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_CHECKPOINT_FILE}"));
// ignore delete errors, files may not exist
let _ = self
.disk
.delete(RUSTFS_META_BUCKET, state_file.to_str().unwrap(), Default::default())
.await;
let _ = self
.disk
.delete(RUSTFS_META_BUCKET, progress_file.to_str().unwrap(), Default::default())
.await;
let _ = self
.disk
.delete(RUSTFS_META_BUCKET, checkpoint_file.to_str().unwrap(), Default::default())
.await;
if let Ok(path_str) = path_to_str(&state_file) {
let _ = self.disk.delete(RUSTFS_META_BUCKET, path_str, Default::default()).await;
}
if let Ok(path_str) = path_to_str(&progress_file) {
let _ = self.disk.delete(RUSTFS_META_BUCKET, path_str, Default::default()).await;
}
if let Ok(path_str) = path_to_str(&checkpoint_file) {
let _ = self.disk.delete(RUSTFS_META_BUCKET, path_str, Default::default()).await;
}
info!("Cleaned up resume state for task: {}", task_id);
Ok(())
@@ -280,8 +296,9 @@ impl ResumeManager {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{}_{}", state.task_id, RESUME_STATE_FILE));
let path_str = path_to_str(&file_path)?;
self.disk
.write_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap(), state_data.into())
.write_all(RUSTFS_META_BUCKET, path_str, state_data.into())
.await
.map_err(|e| Error::TaskExecutionFailed {
message: format!("Failed to save resume state: {e}"),
@@ -295,7 +312,8 @@ impl ResumeManager {
async fn read_state_file(disk: &DiskStore, task_id: &str) -> Result<Vec<u8>> {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_STATE_FILE}"));
disk.read_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap())
let path_str = path_to_str(&file_path)?;
disk.read_all(RUSTFS_META_BUCKET, path_str)
.await
.map(|bytes| bytes.to_vec())
.map_err(|e| Error::TaskExecutionFailed {
@@ -327,7 +345,7 @@ impl ResumeCheckpoint {
pub fn new(task_id: String) -> Self {
Self {
task_id,
checkpoint_time: SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs(),
checkpoint_time: SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs(),
current_bucket_index: 0,
current_object_index: 0,
processed_objects: Vec::new(),
@@ -339,7 +357,7 @@ impl ResumeCheckpoint {
pub fn update_position(&mut self, bucket_index: usize, object_index: usize) {
self.current_bucket_index = bucket_index;
self.current_object_index = object_index;
self.checkpoint_time = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
self.checkpoint_time = SystemTime::now().duration_since(UNIX_EPOCH).unwrap_or_default().as_secs();
}
pub fn add_processed_object(&mut self, object: String) {
@@ -397,8 +415,11 @@ impl CheckpointManager {
/// check if checkpoint exists
pub async fn has_checkpoint(disk: &DiskStore, task_id: &str) -> bool {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_CHECKPOINT_FILE}"));
match disk.read_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap()).await {
Ok(data) => !data.is_empty(),
match path_to_str(&file_path) {
Ok(path_str) => match disk.read_all(RUSTFS_META_BUCKET, path_str).await {
Ok(data) => !data.is_empty(),
Err(_) => false,
},
Err(_) => false,
}
}
@@ -446,10 +467,9 @@ impl CheckpointManager {
let task_id = &checkpoint.task_id;
let checkpoint_file = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_CHECKPOINT_FILE}"));
let _ = self
.disk
.delete(RUSTFS_META_BUCKET, checkpoint_file.to_str().unwrap(), Default::default())
.await;
if let Ok(path_str) = path_to_str(&checkpoint_file) {
let _ = self.disk.delete(RUSTFS_META_BUCKET, path_str, Default::default()).await;
}
info!("Cleaned up checkpoint for task: {}", task_id);
Ok(())
@@ -464,8 +484,9 @@ impl CheckpointManager {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{}_{}", checkpoint.task_id, RESUME_CHECKPOINT_FILE));
let path_str = path_to_str(&file_path)?;
self.disk
.write_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap(), checkpoint_data.into())
.write_all(RUSTFS_META_BUCKET, path_str, checkpoint_data.into())
.await
.map_err(|e| Error::TaskExecutionFailed {
message: format!("Failed to save checkpoint: {e}"),
@@ -479,7 +500,8 @@ impl CheckpointManager {
async fn read_checkpoint_file(disk: &DiskStore, task_id: &str) -> Result<Vec<u8>> {
let file_path = Path::new(BUCKET_META_PREFIX).join(format!("{task_id}_{RESUME_CHECKPOINT_FILE}"));
disk.read_all(RUSTFS_META_BUCKET, file_path.to_str().unwrap())
let path_str = path_to_str(&file_path)?;
disk.read_all(RUSTFS_META_BUCKET, path_str)
.await
.map(|bytes| bytes.to_vec())
.map_err(|e| Error::TaskExecutionFailed {
@@ -562,7 +584,7 @@ mod tests {
async fn test_resume_state_creation() {
let task_id = ResumeUtils::generate_task_id();
let buckets = vec!["bucket1".to_string(), "bucket2".to_string()];
let state = ResumeState::new(task_id.clone(), "erasure_set".to_string(), buckets);
let state = ResumeState::new(task_id.clone(), "erasure_set".to_string(), "pool_0_set_0".to_string(), buckets);
assert_eq!(state.task_id, task_id);
assert_eq!(state.task_type, "erasure_set");
@@ -575,7 +597,7 @@ mod tests {
async fn test_resume_state_progress() {
let task_id = ResumeUtils::generate_task_id();
let buckets = vec!["bucket1".to_string()];
let mut state = ResumeState::new(task_id, "erasure_set".to_string(), buckets);
let mut state = ResumeState::new(task_id, "erasure_set".to_string(), "pool_0_set_0".to_string(), buckets);
state.update_progress(10, 8, 1, 1);
assert_eq!(state.processed_objects, 10);
@@ -595,7 +617,7 @@ mod tests {
async fn test_resume_state_bucket_completion() {
let task_id = ResumeUtils::generate_task_id();
let buckets = vec!["bucket1".to_string(), "bucket2".to_string()];
let mut state = ResumeState::new(task_id, "erasure_set".to_string(), buckets);
let mut state = ResumeState::new(task_id, "erasure_set".to_string(), "pool_0_set_0".to_string(), buckets);
assert_eq!(state.pending_buckets.len(), 2);
assert_eq!(state.completed_buckets.len(), 0);
@@ -650,6 +672,7 @@ mod tests {
let state = ResumeState::new(
task_id.clone(),
"erasure_set".to_string(),
"pool_0_set_0".to_string(),
vec!["bucket1".to_string(), "bucket2".to_string()],
);

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::{Error, Result};
use crate::{Error, Result};
use async_trait::async_trait;
use rustfs_common::heal_channel::{HealOpts, HealScanMode};
use rustfs_ecstore::{
@@ -179,7 +179,9 @@ impl HealStorageAPI for ECStoreHealStorage {
"Object data exceeds cap ({} bytes), aborting full read to prevent OOM: {}/{}",
MAX_READ_BYTES, bucket, object
);
return Ok(None);
return Err(Error::other(format!(
"Object too large: {n_read} bytes (max: {MAX_READ_BYTES} bytes) for {bucket}/{object}"
)));
}
}
Err(e) => {
@@ -398,13 +400,13 @@ impl HealStorageAPI for ECStoreHealStorage {
match self.ecstore.get_object_info(bucket, object, &Default::default()).await {
Ok(_) => Ok(true), // Object exists
Err(e) => {
// Map ObjectNotFound to false, other errors to false as well for safety
// Map ObjectNotFound to false, other errors must be propagated!
if matches!(e, rustfs_ecstore::error::StorageError::ObjectNotFound(_, _)) {
debug!("Object not found: {}/{}", bucket, object);
Ok(false)
} else {
debug!("Error checking object existence {}/{}: {}", bucket, object, e);
Ok(false) // Treat errors as non-existence to be safe
error!("Error checking object existence {}/{}: {}", bucket, object, e);
Err(Error::other(e))
}
}
}
@@ -496,7 +498,7 @@ impl HealStorageAPI for ECStoreHealStorage {
match self
.ecstore
.clone()
.list_objects_v2(bucket, prefix, None, None, 1000, false, None)
.list_objects_v2(bucket, prefix, None, None, 1000, false, None, false)
.await
{
Ok(list_info) => {
@@ -515,21 +517,7 @@ impl HealStorageAPI for ECStoreHealStorage {
debug!("Getting disk for resume: {}", set_disk_id);
// Parse set_disk_id to extract pool and set indices
// Format: "pool_{pool_idx}_set_{set_idx}"
let parts: Vec<&str> = set_disk_id.split('_').collect();
if parts.len() != 4 || parts[0] != "pool" || parts[2] != "set" {
return Err(Error::TaskExecutionFailed {
message: format!("Invalid set_disk_id format: {set_disk_id}"),
});
}
let pool_idx: usize = parts[1].parse().map_err(|_| Error::TaskExecutionFailed {
message: format!("Invalid pool index in set_disk_id: {set_disk_id}"),
})?;
let set_idx: usize = parts[3].parse().map_err(|_| Error::TaskExecutionFailed {
message: format!("Invalid set index in set_disk_id: {set_disk_id}"),
})?;
let (pool_idx, set_idx) = crate::heal::utils::parse_set_disk_id(set_disk_id)?;
// Get the first available disk from the set
let disks = self

View File

@@ -12,13 +12,15 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::error::{Error, Result};
use crate::heal::ErasureSetHealer;
use crate::heal::{progress::HealProgress, storage::HealStorageAPI};
use crate::heal::{ErasureSetHealer, progress::HealProgress, storage::HealStorageAPI};
use crate::{Error, Result};
use rustfs_common::heal_channel::{HealOpts, HealScanMode};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use std::time::{Duration, SystemTime};
use std::{
future::Future,
sync::Arc,
time::{Duration, Instant, SystemTime},
};
use tokio::sync::RwLock;
use tracing::{error, info, warn};
use uuid::Uuid;
@@ -49,7 +51,7 @@ pub enum HealType {
}
/// Heal priority
#[derive(Debug, Default, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize)]
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
pub enum HealPriority {
/// Low priority
Low = 0,
@@ -195,6 +197,8 @@ pub struct HealTask {
pub started_at: Arc<RwLock<Option<SystemTime>>>,
/// Completed time
pub completed_at: Arc<RwLock<Option<SystemTime>>>,
/// Task start instant for timeout calculation (monotonic)
task_start_instant: Arc<RwLock<Option<Instant>>>,
/// Cancel token
pub cancel_token: tokio_util::sync::CancellationToken,
/// Storage layer interface
@@ -212,23 +216,77 @@ impl HealTask {
created_at: request.created_at,
started_at: Arc::new(RwLock::new(None)),
completed_at: Arc::new(RwLock::new(None)),
task_start_instant: Arc::new(RwLock::new(None)),
cancel_token: tokio_util::sync::CancellationToken::new(),
storage,
}
}
async fn remaining_timeout(&self) -> Result<Option<Duration>> {
if let Some(total) = self.options.timeout {
let start_instant = { *self.task_start_instant.read().await };
if let Some(started_at) = start_instant {
let elapsed = started_at.elapsed();
if elapsed >= total {
return Err(Error::TaskTimeout);
}
return Ok(Some(total - elapsed));
}
Ok(Some(total))
} else {
Ok(None)
}
}
async fn check_control_flags(&self) -> Result<()> {
if self.cancel_token.is_cancelled() {
return Err(Error::TaskCancelled);
}
// Only interested in propagating an error if the timeout has expired;
// the actual Duration value is not needed here
let _ = self.remaining_timeout().await?;
Ok(())
}
async fn await_with_control<F, T>(&self, fut: F) -> Result<T>
where
F: Future<Output = Result<T>> + Send,
T: Send,
{
let cancel_token = self.cancel_token.clone();
if let Some(remaining) = self.remaining_timeout().await? {
if remaining.is_zero() {
return Err(Error::TaskTimeout);
}
let mut fut = Box::pin(fut);
tokio::select! {
_ = cancel_token.cancelled() => Err(Error::TaskCancelled),
_ = tokio::time::sleep(remaining) => Err(Error::TaskTimeout),
result = &mut fut => result,
}
} else {
tokio::select! {
_ = cancel_token.cancelled() => Err(Error::TaskCancelled),
result = fut => result,
}
}
}
#[tracing::instrument(skip(self), fields(task_id = %self.id, heal_type = ?self.heal_type))]
pub async fn execute(&self) -> Result<()> {
// update status to running
// update status and timestamps atomically to avoid race conditions
let now = SystemTime::now();
let start_instant = Instant::now();
{
let mut status = self.status.write().await;
*status = HealTaskStatus::Running;
}
{
let mut started_at = self.started_at.write().await;
*started_at = Some(SystemTime::now());
let mut task_start_instant = self.task_start_instant.write().await;
*status = HealTaskStatus::Running;
*started_at = Some(now);
*task_start_instant = Some(start_instant);
}
info!("Starting heal task: {} with type: {:?}", self.id, self.heal_type);
info!("Task started");
let result = match &self.heal_type {
HealType::Object {
@@ -258,7 +316,17 @@ impl HealTask {
Ok(_) => {
let mut status = self.status.write().await;
*status = HealTaskStatus::Completed;
info!("Heal task completed successfully: {}", self.id);
info!("Task completed successfully");
}
Err(Error::TaskCancelled) => {
let mut status = self.status.write().await;
*status = HealTaskStatus::Cancelled;
info!("Heal task was cancelled: {}", self.id);
}
Err(Error::TaskTimeout) => {
let mut status = self.status.write().await;
*status = HealTaskStatus::Timeout;
warn!("Heal task timed out: {}", self.id);
}
Err(e) => {
let mut status = self.status.write().await;
@@ -287,8 +355,9 @@ impl HealTask {
}
// specific heal implementation method
#[tracing::instrument(skip(self), fields(bucket = %bucket, object = %object, version_id = ?version_id))]
async fn heal_object(&self, bucket: &str, object: &str, version_id: Option<&str>) -> Result<()> {
info!("Healing object: {}/{}", bucket, object);
info!("Starting object heal workflow");
// update progress
{
@@ -298,8 +367,9 @@ impl HealTask {
}
// Step 1: Check if object exists and get metadata
info!("Step 1: Checking object existence and metadata");
let object_exists = self.storage.object_exists(bucket, object).await?;
warn!("Step 1: Checking object existence and metadata");
self.check_control_flags().await?;
let object_exists = self.await_with_control(self.storage.object_exists(bucket, object)).await?;
if !object_exists {
warn!("Object does not exist: {}/{}", bucket, object);
if self.options.recreate_missing {
@@ -331,7 +401,11 @@ impl HealTask {
set: self.options.set_index,
};
match self.storage.heal_object(bucket, object, version_id, &heal_opts).await {
let heal_result = self
.await_with_control(self.storage.heal_object(bucket, object, version_id, &heal_opts))
.await;
match heal_result {
Ok((result, error)) => {
if let Some(e) = error {
// Check if this is a "File not found" error during delete operations
@@ -352,9 +426,9 @@ impl HealTask {
// If heal failed and remove_corrupted is enabled, delete the corrupted object
if self.options.remove_corrupted {
warn!("Removing corrupted object: {}/{}", bucket, object);
info!("Removing corrupted object: {}/{}", bucket, object);
if !self.options.dry_run {
self.storage.delete_object(bucket, object).await?;
self.await_with_control(self.storage.delete_object(bucket, object)).await?;
info!("Successfully deleted corrupted object: {}/{}", bucket, object);
} else {
info!("Dry run mode - would delete corrupted object: {}/{}", bucket, object);
@@ -375,11 +449,9 @@ impl HealTask {
info!("Step 3: Verifying heal result");
let object_size = result.object_size as u64;
info!(
"Heal completed successfully: {}/{} ({} bytes, {} drives healed)",
bucket,
object,
object_size,
result.after.drives.len()
object_size = object_size,
drives_healed = result.after.drives.len(),
"Heal completed successfully"
);
{
@@ -388,6 +460,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
// Check if this is a "File not found" error during delete operations
let error_msg = format!("{e}");
@@ -407,9 +481,9 @@ impl HealTask {
// If heal failed and remove_corrupted is enabled, delete the corrupted object
if self.options.remove_corrupted {
warn!("Removing corrupted object: {}/{}", bucket, object);
info!("Removing corrupted object: {}/{}", bucket, object);
if !self.options.dry_run {
self.storage.delete_object(bucket, object).await?;
self.await_with_control(self.storage.delete_object(bucket, object)).await?;
info!("Successfully deleted corrupted object: {}/{}", bucket, object);
} else {
info!("Dry run mode - would delete corrupted object: {}/{}", bucket, object);
@@ -445,7 +519,10 @@ impl HealTask {
set: None,
};
match self.storage.heal_object(bucket, object, version_id, &heal_opts).await {
match self
.await_with_control(self.storage.heal_object(bucket, object, version_id, &heal_opts))
.await
{
Ok((result, error)) => {
if let Some(e) = error {
error!("Failed to recreate missing object: {}/{} - {}", bucket, object, e);
@@ -463,6 +540,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("Failed to recreate missing object: {}/{} - {}", bucket, object, e);
Err(Error::TaskExecutionFailed {
@@ -484,7 +563,8 @@ impl HealTask {
// Step 1: Check if bucket exists
info!("Step 1: Checking bucket existence");
let bucket_exists = self.storage.get_bucket_info(bucket).await?.is_some();
self.check_control_flags().await?;
let bucket_exists = self.await_with_control(self.storage.get_bucket_info(bucket)).await?.is_some();
if !bucket_exists {
warn!("Bucket does not exist: {}", bucket);
return Err(Error::TaskExecutionFailed {
@@ -511,7 +591,9 @@ impl HealTask {
set: self.options.set_index,
};
match self.storage.heal_bucket(bucket, &heal_opts).await {
let heal_result = self.await_with_control(self.storage.heal_bucket(bucket, &heal_opts)).await;
match heal_result {
Ok(result) => {
info!("Bucket heal completed successfully: {} ({} drives)", bucket, result.after.drives.len());
@@ -521,6 +603,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("Bucket heal failed: {} - {}", bucket, e);
{
@@ -546,7 +630,8 @@ impl HealTask {
// Step 1: Check if object exists
info!("Step 1: Checking object existence");
let object_exists = self.storage.object_exists(bucket, object).await?;
self.check_control_flags().await?;
let object_exists = self.await_with_control(self.storage.object_exists(bucket, object)).await?;
if !object_exists {
warn!("Object does not exist: {}/{}", bucket, object);
return Err(Error::TaskExecutionFailed {
@@ -573,7 +658,11 @@ impl HealTask {
set: self.options.set_index,
};
match self.storage.heal_object(bucket, object, None, &heal_opts).await {
let heal_result = self
.await_with_control(self.storage.heal_object(bucket, object, None, &heal_opts))
.await;
match heal_result {
Ok((result, error)) => {
if let Some(e) = error {
error!("Metadata heal failed: {}/{} - {}", bucket, object, e);
@@ -599,6 +688,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("Metadata heal failed: {}/{} - {}", bucket, object, e);
{
@@ -647,7 +738,11 @@ impl HealTask {
set: None,
};
match self.storage.heal_object(bucket, &object, None, &heal_opts).await {
let heal_result = self
.await_with_control(self.storage.heal_object(bucket, &object, None, &heal_opts))
.await;
match heal_result {
Ok((result, error)) => {
if let Some(e) = error {
error!("MRF heal failed: {} - {}", meta_path, e);
@@ -668,6 +763,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("MRF heal failed: {} - {}", meta_path, e);
{
@@ -693,7 +790,8 @@ impl HealTask {
// Step 1: Check if object exists
info!("Step 1: Checking object existence");
let object_exists = self.storage.object_exists(bucket, object).await?;
self.check_control_flags().await?;
let object_exists = self.await_with_control(self.storage.object_exists(bucket, object)).await?;
if !object_exists {
warn!("Object does not exist: {}/{}", bucket, object);
return Err(Error::TaskExecutionFailed {
@@ -720,7 +818,11 @@ impl HealTask {
set: None,
};
match self.storage.heal_object(bucket, object, version_id, &heal_opts).await {
let heal_result = self
.await_with_control(self.storage.heal_object(bucket, object, version_id, &heal_opts))
.await;
match heal_result {
Ok((result, error)) => {
if let Some(e) = error {
error!("EC decode heal failed: {}/{} - {}", bucket, object, e);
@@ -748,6 +850,8 @@ impl HealTask {
}
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("EC decode heal failed: {}/{} - {}", bucket, object, e);
{
@@ -773,7 +877,7 @@ impl HealTask {
let buckets = if buckets.is_empty() {
info!("No buckets specified, listing all buckets");
let bucket_infos = self.storage.list_buckets().await?;
let bucket_infos = self.await_with_control(self.storage.list_buckets()).await?;
bucket_infos.into_iter().map(|info| info.name).collect()
} else {
buckets
@@ -781,7 +885,9 @@ impl HealTask {
// Step 1: Perform disk format heal using ecstore
info!("Step 1: Performing disk format heal using ecstore");
match self.storage.heal_format(self.options.dry_run).await {
let format_result = self.await_with_control(self.storage.heal_format(self.options.dry_run)).await;
match format_result {
Ok((result, error)) => {
if let Some(e) = error {
error!("Disk format heal failed: {} - {}", set_disk_id, e);
@@ -800,6 +906,8 @@ impl HealTask {
result.after.drives.len()
);
}
Err(Error::TaskCancelled) => return Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => return Err(Error::TaskTimeout),
Err(e) => {
error!("Disk format heal failed: {} - {}", set_disk_id, e);
{
@@ -819,7 +927,9 @@ impl HealTask {
// Step 2: Get disk for resume functionality
info!("Step 2: Getting disk for resume functionality");
let disk = self.storage.get_disk_for_resume(&set_disk_id).await?;
let disk = self
.await_with_control(self.storage.get_disk_for_resume(&set_disk_id))
.await?;
{
let mut progress = self.progress.write().await;
@@ -827,9 +937,18 @@ impl HealTask {
}
// Step 3: Heal bucket structure
// Check control flags before each iteration to ensure timely cancellation.
// Each heal_bucket call may handle timeout/cancellation internally, see its implementation for details.
for bucket in buckets.iter() {
// Check control flags before starting each bucket heal
self.check_control_flags().await?;
// heal_bucket internally uses await_with_control for timeout/cancellation handling
if let Err(err) = self.heal_bucket(bucket).await {
info!("{}", err.to_string());
// Check if error is due to cancellation or timeout
if matches!(err, Error::TaskCancelled | Error::TaskTimeout) {
return Err(err);
}
info!("Bucket heal failed: {}", err.to_string());
}
}
@@ -856,6 +975,8 @@ impl HealTask {
info!("Erasure set heal completed successfully: {} ({} buckets)", set_disk_id, buckets.len());
Ok(())
}
Err(Error::TaskCancelled) => Err(Error::TaskCancelled),
Err(Error::TaskTimeout) => Err(Error::TaskTimeout),
Err(e) => {
error!("Erasure set heal failed: {} - {}", set_disk_id, e);
Err(Error::TaskExecutionFailed {

View File

@@ -0,0 +1,110 @@
// Copyright 2024 RustFS Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::{Error, Result};
/// Prefix for pool index in set disk identifiers.
const POOL_PREFIX: &str = "pool";
/// Prefix for set index in set disk identifiers.
const SET_PREFIX: &str = "set";
/// Format a set disk identifier using unsigned indices.
pub fn format_set_disk_id(pool_idx: usize, set_idx: usize) -> String {
format!("{POOL_PREFIX}_{pool_idx}_{SET_PREFIX}_{set_idx}")
}
/// Format a set disk identifier from signed indices.
pub fn format_set_disk_id_from_i32(pool_idx: i32, set_idx: i32) -> Option<String> {
if pool_idx < 0 || set_idx < 0 {
None
} else {
Some(format_set_disk_id(pool_idx as usize, set_idx as usize))
}
}
/// Normalise external set disk identifiers into the canonical format.
pub fn normalize_set_disk_id(raw: &str) -> Option<String> {
if raw.starts_with(&format!("{POOL_PREFIX}_")) {
Some(raw.to_string())
} else {
parse_compact_set_disk_id(raw).map(|(pool, set)| format_set_disk_id(pool, set))
}
}
/// Parse a canonical set disk identifier into pool/set indices.
pub fn parse_set_disk_id(raw: &str) -> Result<(usize, usize)> {
let parts: Vec<&str> = raw.split('_').collect();
if parts.len() != 4 || parts[0] != POOL_PREFIX || parts[2] != SET_PREFIX {
return Err(Error::TaskExecutionFailed {
message: format!("Invalid set_disk_id format: {raw}"),
});
}
let pool_idx = parts[1].parse::<usize>().map_err(|_| Error::TaskExecutionFailed {
message: format!("Invalid pool index in set_disk_id: {raw}"),
})?;
let set_idx = parts[3].parse::<usize>().map_err(|_| Error::TaskExecutionFailed {
message: format!("Invalid set index in set_disk_id: {raw}"),
})?;
Ok((pool_idx, set_idx))
}
fn parse_compact_set_disk_id(raw: &str) -> Option<(usize, usize)> {
let (pool, set) = raw.split_once('_')?;
let pool_idx = pool.parse::<usize>().ok()?;
let set_idx = set.parse::<usize>().ok()?;
Some((pool_idx, set_idx))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn format_from_unsigned_indices() {
assert_eq!(format_set_disk_id(1, 2), "pool_1_set_2");
}
#[test]
fn format_from_signed_indices() {
assert_eq!(format_set_disk_id_from_i32(3, 4), Some("pool_3_set_4".into()));
assert_eq!(format_set_disk_id_from_i32(-1, 4), None);
}
#[test]
fn normalize_compact_identifier() {
assert_eq!(normalize_set_disk_id("3_5"), Some("pool_3_set_5".to_string()));
}
#[test]
fn normalize_prefixed_identifier() {
assert_eq!(normalize_set_disk_id("pool_7_set_1"), Some("pool_7_set_1".to_string()));
}
#[test]
fn normalize_invalid_identifier() {
assert_eq!(normalize_set_disk_id("invalid"), None);
}
#[test]
fn parse_prefixed_identifier() {
assert_eq!(parse_set_disk_id("pool_9_set_3").unwrap(), (9, 3));
}
#[test]
fn parse_invalid_identifier() {
assert!(parse_set_disk_id("bad").is_err());
assert!(parse_set_disk_id("pool_X_set_1").is_err());
}
}

View File

@@ -12,17 +12,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::{Arc, OnceLock};
use tokio_util::sync::CancellationToken;
use tracing::{error, info};
pub mod error;
mod error;
pub mod heal;
pub mod scanner;
pub use error::{Error, Result};
pub use heal::{HealManager, HealOptions, HealPriority, HealRequest, HealType, channel::HealChannelProcessor};
pub use scanner::Scanner;
use std::sync::{Arc, OnceLock};
use tokio_util::sync::CancellationToken;
use tracing::{error, info};
// Global cancellation token for AHM services (scanner and other background tasks)
static GLOBAL_AHM_SERVICES_CANCEL_TOKEN: OnceLock<CancellationToken> = OnceLock::new();

View File

@@ -12,18 +12,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::scanner::node_scanner::ScanProgress;
use crate::{Error, Result};
use serde::{Deserialize, Serialize};
use std::{
path::{Path, PathBuf},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;
use tracing::{debug, error, info, warn};
use super::node_scanner::ScanProgress;
use crate::{Error, error::Result};
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct CheckpointData {
pub version: u32,

View File

@@ -12,46 +12,39 @@
// See the License for the specific language governing permissions and
// limitations under the License.
// IO throttling component is integrated into NodeScanner
use crate::{
Error, HealRequest, Result, get_ahm_services_cancel_token,
heal::HealManager,
scanner::{
BucketMetrics, DecentralizedStatsAggregator, DecentralizedStatsAggregatorConfig, DiskMetrics, MetricsCollector,
NodeScanner, NodeScannerConfig, ScannerMetrics,
lifecycle::ScannerItem,
local_scan::{self, LocalObjectRecord, LocalScanOutcome},
},
};
use rustfs_common::data_usage::{DataUsageInfo, SizeSummary};
use rustfs_common::metrics::{Metric, Metrics, global_metrics};
use rustfs_ecstore::{
self as ecstore, StorageAPI,
bucket::versioning::VersioningApi,
bucket::versioning_sys::BucketVersioningSys,
data_usage::{aggregate_local_snapshots, store_data_usage_in_backend},
disk::{Disk, DiskAPI, DiskStore, RUSTFS_META_BUCKET, WalkDirOptions},
set_disk::SetDisks,
store_api::ObjectInfo,
};
use rustfs_filemeta::{MetacacheReader, VersionType};
use s3s::dto::{BucketVersioningStatus, VersioningConfiguration};
use std::{
collections::HashMap,
sync::Arc,
time::{Duration, SystemTime},
};
use time::OffsetDateTime;
use ecstore::{
disk::{Disk, DiskAPI, DiskStore, WalkDirOptions},
set_disk::SetDisks,
};
use rustfs_ecstore::store_api::ObjectInfo;
use rustfs_ecstore::{
self as ecstore, StorageAPI,
data_usage::{aggregate_local_snapshots, store_data_usage_in_backend},
};
use rustfs_filemeta::{MetacacheReader, VersionType};
use s3s::dto::{BucketVersioningStatus, VersioningConfiguration};
use tokio::sync::{Mutex, RwLock};
use tokio_util::sync::CancellationToken;
use tracing::{debug, error, info, warn};
use super::metrics::{BucketMetrics, DiskMetrics, MetricsCollector, ScannerMetrics};
use super::node_scanner::{NodeScanner, NodeScannerConfig};
use super::stats_aggregator::{DecentralizedStatsAggregator, DecentralizedStatsAggregatorConfig};
// IO throttling component is integrated into NodeScanner
use crate::heal::HealManager;
use crate::scanner::lifecycle::ScannerItem;
use crate::scanner::local_scan::{self, LocalObjectRecord, LocalScanOutcome};
use crate::{
HealRequest,
error::{Error, Result},
get_ahm_services_cancel_token,
};
use rustfs_common::data_usage::{DataUsageInfo, SizeSummary};
use rustfs_common::metrics::{Metric, Metrics, globalMetrics};
use rustfs_ecstore::bucket::versioning::VersioningApi;
use rustfs_ecstore::bucket::versioning_sys::BucketVersioningSys;
use rustfs_ecstore::disk::RUSTFS_META_BUCKET;
use uuid;
/// Custom scan mode enum for AHM scanner
@@ -772,7 +765,7 @@ impl Scanner {
/// Get global metrics from common crate
pub async fn get_global_metrics(&self) -> rustfs_madmin::metrics::ScannerMetrics {
(*globalMetrics).report().await
global_metrics().report().await
}
/// Perform a single scan cycle using optimized node scanner
@@ -802,7 +795,7 @@ impl Scanner {
cycle_completed: vec![chrono::Utc::now()],
started: chrono::Utc::now(),
};
(*globalMetrics).set_cycle(Some(cycle_info)).await;
global_metrics().set_cycle(Some(cycle_info)).await;
self.metrics.set_current_cycle(self.state.read().await.current_cycle);
self.metrics.increment_total_cycles();
@@ -1012,6 +1005,7 @@ impl Scanner {
100, // max_keys - small limit for performance
false, // fetch_owner
None, // start_after
false, // incl_deleted
)
.await
{

View File

@@ -12,13 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use serde::{Deserialize, Serialize};
use std::{
collections::HashMap,
sync::atomic::{AtomicU64, Ordering},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tracing::info;
/// Scanner metrics

View File

@@ -12,6 +12,9 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::Result;
use crate::scanner::LoadLevel;
use serde::{Deserialize, Serialize};
use std::{
collections::VecDeque,
sync::{
@@ -20,15 +23,10 @@ use std::{
},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;
use tokio_util::sync::CancellationToken;
use tracing::{debug, error, info, warn};
use super::node_scanner::LoadLevel;
use crate::error::Result;
/// IO monitor config
#[derive(Debug, Clone)]
pub struct IOMonitorConfig {

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::scanner::LoadLevel;
use std::{
sync::{
Arc,
@@ -19,12 +20,9 @@ use std::{
},
time::{Duration, SystemTime},
};
use tokio::sync::RwLock;
use tracing::{debug, info, warn};
use super::node_scanner::LoadLevel;
/// IO throttler config
#[derive(Debug, Clone)]
pub struct IOThrottlerConfig {

View File

@@ -12,25 +12,28 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};
use crate::error::Result;
use crate::Result;
use rustfs_common::data_usage::SizeSummary;
use rustfs_common::metrics::IlmAction;
use rustfs_ecstore::bucket::lifecycle::{
bucket_lifecycle_audit::LcEventSrc,
bucket_lifecycle_ops::{GLOBAL_ExpiryState, apply_lifecycle_action, eval_action_from_lifecycle},
lifecycle,
lifecycle::Lifecycle,
use rustfs_ecstore::bucket::{
lifecycle::{
bucket_lifecycle_audit::LcEventSrc,
bucket_lifecycle_ops::{GLOBAL_ExpiryState, apply_lifecycle_action, eval_action_from_lifecycle},
lifecycle,
lifecycle::Lifecycle,
},
metadata_sys::get_object_lock_config,
object_lock::objectlock_sys::{BucketObjectLockSys, enforce_retention_for_deletion},
versioning::VersioningApi,
versioning_sys::BucketVersioningSys,
};
use rustfs_ecstore::bucket::metadata_sys::get_object_lock_config;
use rustfs_ecstore::bucket::object_lock::objectlock_sys::{BucketObjectLockSys, enforce_retention_for_deletion};
use rustfs_ecstore::bucket::versioning::VersioningApi;
use rustfs_ecstore::bucket::versioning_sys::BucketVersioningSys;
use rustfs_ecstore::store_api::{ObjectInfo, ObjectToDelete};
use rustfs_filemeta::FileInfo;
use s3s::dto::{BucketLifecycleConfiguration as LifecycleConfig, VersioningConfiguration};
use std::sync::{
Arc,
atomic::{AtomicU64, Ordering},
};
use time::OffsetDateTime;
use tracing::info;

View File

@@ -1,16 +1,18 @@
use std::collections::{HashMap, HashSet};
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};
use serde::{Deserialize, Serialize};
use serde_json::{from_slice, to_vec};
use tokio::{fs, task};
use tracing::warn;
use walkdir::WalkDir;
use crate::error::{Error, Result};
// Copyright 2024 RustFS Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::{Error, Result};
use rustfs_common::data_usage::DiskUsageStatus;
use rustfs_ecstore::data_usage::{
LocalUsageSnapshot, LocalUsageSnapshotMeta, data_usage_state_dir, ensure_data_usage_layout, snapshot_file_name,
@@ -20,6 +22,15 @@ use rustfs_ecstore::disk::DiskAPI;
use rustfs_ecstore::store::ECStore;
use rustfs_ecstore::store_api::ObjectInfo;
use rustfs_filemeta::{FileInfo, FileMeta, FileMetaVersion, VersionType};
use serde::{Deserialize, Serialize};
use serde_json::{from_slice, to_vec};
use std::collections::{HashMap, HashSet};
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};
use tokio::{fs, task};
use tracing::warn;
use walkdir::WalkDir;
const STATE_FILE_EXTENSION: &str = "";

View File

@@ -12,22 +12,19 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::scanner::node_scanner::{BucketStats, DiskStats, LocalScanStats};
use crate::{Error, Result};
use rustfs_common::data_usage::DataUsageInfo;
use serde::{Deserialize, Serialize};
use std::{
path::{Path, PathBuf},
sync::Arc,
sync::atomic::{AtomicU64, Ordering},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;
use tracing::{debug, error, info, warn};
use rustfs_common::data_usage::DataUsageInfo;
use super::node_scanner::{BucketStats, DiskStats, LocalScanStats};
use crate::{Error, error::Result};
/// local stats manager
pub struct LocalStatsManager {
/// node id

View File

@@ -12,13 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use serde::{Deserialize, Serialize};
use std::{
collections::HashMap,
sync::atomic::{AtomicU64, Ordering},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tracing::info;
/// Scanner metrics

View File

@@ -27,8 +27,10 @@ pub mod stats_aggregator;
pub use checkpoint::{CheckpointData, CheckpointInfo, CheckpointManager};
pub use data_scanner::{ScanMode, Scanner, ScannerConfig, ScannerState};
pub use io_monitor::{AdvancedIOMonitor, IOMetrics, IOMonitorConfig};
pub use io_throttler::{AdvancedIOThrottler, IOThrottlerConfig, ResourceAllocation, ThrottleDecision};
pub use io_throttler::{AdvancedIOThrottler, IOThrottlerConfig, MetricsSnapshot, ResourceAllocation, ThrottleDecision};
pub use local_stats::{BatchScanResult, LocalStatsManager, ScanResultEntry, StatsSummary};
pub use metrics::ScannerMetrics;
pub use metrics::{BucketMetrics, DiskMetrics, MetricsCollector, ScannerMetrics};
pub use node_scanner::{IOMonitor, IOThrottler, LoadLevel, LocalScanStats, NodeScanner, NodeScannerConfig};
pub use stats_aggregator::{AggregatedStats, DecentralizedStatsAggregator, NodeClient, NodeInfo};
pub use stats_aggregator::{
AggregatedStats, DecentralizedStatsAggregator, DecentralizedStatsAggregatorConfig, NodeClient, NodeInfo,
};

View File

@@ -12,6 +12,15 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::Result;
use crate::scanner::{
AdvancedIOMonitor, AdvancedIOThrottler, BatchScanResult, CheckpointManager, IOMonitorConfig, IOThrottlerConfig,
LocalStatsManager, MetricsSnapshot, ScanResultEntry,
};
use rustfs_common::data_usage::DataUsageInfo;
use rustfs_ecstore::StorageAPI;
use rustfs_ecstore::disk::{DiskAPI, DiskStore};
use serde::{Deserialize, Serialize};
use std::{
collections::{HashMap, HashSet},
path::{Path, PathBuf},
@@ -21,22 +30,10 @@ use std::{
},
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;
use tokio_util::sync::CancellationToken;
use tracing::{debug, error, info, warn};
use rustfs_common::data_usage::DataUsageInfo;
use rustfs_ecstore::StorageAPI;
use rustfs_ecstore::disk::{DiskAPI, DiskStore}; // Add this import
use super::checkpoint::CheckpointManager;
use super::io_monitor::{AdvancedIOMonitor, IOMonitorConfig};
use super::io_throttler::{AdvancedIOThrottler, IOThrottlerConfig, MetricsSnapshot};
use super::local_stats::{BatchScanResult, LocalStatsManager, ScanResultEntry};
use crate::error::Result;
/// SystemTime serde
mod system_time_serde {
use serde::{Deserialize, Deserializer, Serialize, Serializer};

View File

@@ -12,24 +12,21 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::scanner::{
local_stats::StatsSummary,
node_scanner::{BucketStats, LoadLevel, ScanProgress},
};
use crate::{Error, Result};
use rustfs_common::data_usage::DataUsageInfo;
use serde::{Deserialize, Serialize};
use std::{
collections::HashMap,
sync::Arc,
time::{Duration, SystemTime},
};
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;
use tracing::{debug, info, warn};
use rustfs_common::data_usage::DataUsageInfo;
use super::{
local_stats::StatsSummary,
node_scanner::{BucketStats, LoadLevel, ScanProgress},
};
use crate::{Error, error::Result};
/// node client config
#[derive(Debug, Clone)]
pub struct NodeClientConfig {

View File

@@ -0,0 +1,275 @@
// Copyright 2024 RustFS Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use rustfs_ahm::heal::{
event::{HealEvent, Severity},
task::{HealPriority, HealType},
utils,
};
#[test]
fn test_heal_event_to_heal_request_no_panic() {
use rustfs_ecstore::disk::endpoint::Endpoint;
// Test that invalid pool/set indices don't cause panic
// Create endpoint using try_from or similar method
let endpoint_result = Endpoint::try_from("http://localhost:9000");
if let Ok(mut endpoint) = endpoint_result {
endpoint.pool_idx = -1;
endpoint.set_idx = -1;
endpoint.disk_idx = 0;
let event = HealEvent::DiskStatusChange {
endpoint,
old_status: "ok".to_string(),
new_status: "offline".to_string(),
};
// Should return error instead of panicking
let result = event.to_heal_request();
assert!(result.is_err());
assert!(result.unwrap_err().to_string().contains("Invalid heal type"));
}
}
#[test]
fn test_heal_event_to_heal_request_valid_indices() {
use rustfs_ecstore::disk::endpoint::Endpoint;
// Test that valid indices work correctly
let endpoint_result = Endpoint::try_from("http://localhost:9000");
if let Ok(mut endpoint) = endpoint_result {
endpoint.pool_idx = 0;
endpoint.set_idx = 1;
endpoint.disk_idx = 0;
let event = HealEvent::DiskStatusChange {
endpoint,
old_status: "ok".to_string(),
new_status: "offline".to_string(),
};
let result = event.to_heal_request();
assert!(result.is_ok());
let request = result.unwrap();
assert!(matches!(request.heal_type, HealType::ErasureSet { .. }));
}
}
#[test]
fn test_heal_event_object_corruption() {
let event = HealEvent::ObjectCorruption {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
corruption_type: rustfs_ahm::heal::event::CorruptionType::DataCorruption,
severity: Severity::High,
};
let result = event.to_heal_request();
assert!(result.is_ok());
let request = result.unwrap();
assert!(matches!(request.heal_type, HealType::Object { .. }));
assert_eq!(request.priority, HealPriority::High);
}
#[test]
fn test_heal_event_ec_decode_failure() {
let event = HealEvent::ECDecodeFailure {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
missing_shards: vec![0, 1],
available_shards: vec![2, 3],
};
let result = event.to_heal_request();
assert!(result.is_ok());
let request = result.unwrap();
assert!(matches!(request.heal_type, HealType::ECDecode { .. }));
assert_eq!(request.priority, HealPriority::Urgent);
}
#[test]
fn test_format_set_disk_id_from_i32_negative() {
// Test that negative indices return None
assert!(utils::format_set_disk_id_from_i32(-1, 0).is_none());
assert!(utils::format_set_disk_id_from_i32(0, -1).is_none());
assert!(utils::format_set_disk_id_from_i32(-1, -1).is_none());
}
#[test]
fn test_format_set_disk_id_from_i32_valid() {
// Test that valid indices return Some
let result = utils::format_set_disk_id_from_i32(0, 1);
assert!(result.is_some());
assert_eq!(result.unwrap(), "pool_0_set_1");
}
#[test]
fn test_resume_state_timestamp_handling() {
use rustfs_ahm::heal::resume::ResumeState;
// Test that ResumeState creation doesn't panic even if system time is before epoch
// This is a theoretical test - in practice, system time should never be before epoch
// But we want to ensure unwrap_or_default handles edge cases
let state = ResumeState::new(
"test-task".to_string(),
"test-type".to_string(),
"pool_0_set_1".to_string(),
vec!["bucket1".to_string()],
);
// Verify fields are initialized (u64 is always >= 0)
// The important thing is that unwrap_or_default prevents panic
let _ = state.start_time;
let _ = state.last_update;
}
#[test]
fn test_resume_checkpoint_timestamp_handling() {
use rustfs_ahm::heal::resume::ResumeCheckpoint;
// Test that ResumeCheckpoint creation doesn't panic
let checkpoint = ResumeCheckpoint::new("test-task".to_string());
// Verify field is initialized (u64 is always >= 0)
// The important thing is that unwrap_or_default prevents panic
let _ = checkpoint.checkpoint_time;
}
#[test]
fn test_path_to_str_helper() {
use std::path::Path;
// Test that path conversion handles non-UTF-8 paths gracefully
// Note: This is a compile-time test - actual non-UTF-8 paths are hard to construct in Rust
// The helper function should properly handle the conversion
let valid_path = Path::new("test/path");
assert!(valid_path.to_str().is_some());
}
#[test]
fn test_heal_task_status_atomic_update() {
use rustfs_ahm::heal::storage::HealStorageAPI;
use rustfs_ahm::heal::task::{HealOptions, HealRequest, HealTask, HealTaskStatus};
use std::sync::Arc;
// Mock storage for testing
struct MockStorage;
#[async_trait::async_trait]
impl HealStorageAPI for MockStorage {
async fn get_object_meta(
&self,
_bucket: &str,
_object: &str,
) -> rustfs_ahm::Result<Option<rustfs_ecstore::store_api::ObjectInfo>> {
Ok(None)
}
async fn get_object_data(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<Option<Vec<u8>>> {
Ok(None)
}
async fn put_object_data(&self, _bucket: &str, _object: &str, _data: &[u8]) -> rustfs_ahm::Result<()> {
Ok(())
}
async fn delete_object(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<()> {
Ok(())
}
async fn verify_object_integrity(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<bool> {
Ok(true)
}
async fn ec_decode_rebuild(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<Vec<u8>> {
Ok(vec![])
}
async fn get_disk_status(
&self,
_endpoint: &rustfs_ecstore::disk::endpoint::Endpoint,
) -> rustfs_ahm::Result<rustfs_ahm::heal::storage::DiskStatus> {
Ok(rustfs_ahm::heal::storage::DiskStatus::Ok)
}
async fn format_disk(&self, _endpoint: &rustfs_ecstore::disk::endpoint::Endpoint) -> rustfs_ahm::Result<()> {
Ok(())
}
async fn get_bucket_info(&self, _bucket: &str) -> rustfs_ahm::Result<Option<rustfs_ecstore::store_api::BucketInfo>> {
Ok(None)
}
async fn heal_bucket_metadata(&self, _bucket: &str) -> rustfs_ahm::Result<()> {
Ok(())
}
async fn list_buckets(&self) -> rustfs_ahm::Result<Vec<rustfs_ecstore::store_api::BucketInfo>> {
Ok(vec![])
}
async fn object_exists(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<bool> {
Ok(false)
}
async fn get_object_size(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<Option<u64>> {
Ok(None)
}
async fn get_object_checksum(&self, _bucket: &str, _object: &str) -> rustfs_ahm::Result<Option<String>> {
Ok(None)
}
async fn heal_object(
&self,
_bucket: &str,
_object: &str,
_version_id: Option<&str>,
_opts: &rustfs_common::heal_channel::HealOpts,
) -> rustfs_ahm::Result<(rustfs_madmin::heal_commands::HealResultItem, Option<rustfs_ahm::Error>)> {
Ok((rustfs_madmin::heal_commands::HealResultItem::default(), None))
}
async fn heal_bucket(
&self,
_bucket: &str,
_opts: &rustfs_common::heal_channel::HealOpts,
) -> rustfs_ahm::Result<rustfs_madmin::heal_commands::HealResultItem> {
Ok(rustfs_madmin::heal_commands::HealResultItem::default())
}
async fn heal_format(
&self,
_dry_run: bool,
) -> rustfs_ahm::Result<(rustfs_madmin::heal_commands::HealResultItem, Option<rustfs_ahm::Error>)> {
Ok((rustfs_madmin::heal_commands::HealResultItem::default(), None))
}
async fn list_objects_for_heal(&self, _bucket: &str, _prefix: &str) -> rustfs_ahm::Result<Vec<String>> {
Ok(vec![])
}
async fn get_disk_for_resume(&self, _set_disk_id: &str) -> rustfs_ahm::Result<rustfs_ecstore::disk::DiskStore> {
Err(rustfs_ahm::Error::other("Not implemented in mock"))
}
}
// Create a heal request and task
let request = HealRequest::new(
HealType::Object {
bucket: "test-bucket".to_string(),
object: "test-object".to_string(),
version_id: None,
},
HealOptions::default(),
HealPriority::Normal,
);
let storage: Arc<dyn HealStorageAPI> = Arc::new(MockStorage);
let task = HealTask::from_request(request, storage);
// Verify initial status
let status = tokio::runtime::Runtime::new().unwrap().block_on(task.get_status());
assert_eq!(status, HealTaskStatus::Pending);
// The task should have task_start_instant field initialized
// This is an internal detail, but we can verify it doesn't cause issues
// by checking that the task can be created successfully
// Note: We can't directly access private fields, but creation without panic
// confirms the fix works
}

View File

@@ -25,9 +25,11 @@ use rustfs_ecstore::{
store_api::{ObjectIO, ObjectOptions, PutObjReader, StorageAPI},
};
use serial_test::serial;
use std::sync::Once;
use std::sync::OnceLock;
use std::{path::PathBuf, sync::Arc, time::Duration};
use std::{
path::PathBuf,
sync::{Arc, Once, OnceLock},
time::Duration,
};
use tokio::fs;
use tokio_util::sync::CancellationToken;
use tracing::info;

View File

@@ -12,19 +12,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::{sync::Arc, time::Duration};
use tempfile::TempDir;
use rustfs_ahm::scanner::{
io_throttler::MetricsSnapshot,
local_stats::StatsSummary,
node_scanner::{LoadLevel, NodeScanner, NodeScannerConfig},
stats_aggregator::{DecentralizedStatsAggregator, DecentralizedStatsAggregatorConfig, NodeInfo},
};
mod scanner_optimization_tests;
use scanner_optimization_tests::{PerformanceBenchmark, create_test_scanner};
use std::{sync::Arc, time::Duration};
use tempfile::TempDir;
mod scanner_optimization_tests;
#[tokio::test]
async fn test_end_to_end_scanner_lifecycle() {
let temp_dir = TempDir::new().unwrap();
@@ -245,21 +242,32 @@ async fn test_performance_impact_measurement() {
io_monitor.start().await.unwrap();
// Baseline test: no scanner load
let baseline_duration = measure_workload(5_000, Duration::ZERO).await.max(Duration::from_millis(10));
// Baseline test: no scanner load - measure multiple times for stability
const MEASUREMENT_COUNT: usize = 5;
let mut baseline_measurements = Vec::new();
for _ in 0..MEASUREMENT_COUNT {
let duration = measure_workload(10_000, Duration::ZERO).await;
baseline_measurements.push(duration);
}
// Use median to reduce impact of outliers
baseline_measurements.sort();
let median_idx = baseline_measurements.len() / 2;
let baseline_duration = baseline_measurements[median_idx].max(Duration::from_millis(20));
// Simulate scanner activity
scanner.update_business_metrics(50, 500, 0, 25).await;
tokio::time::sleep(Duration::from_millis(100)).await;
tokio::time::sleep(Duration::from_millis(200)).await;
// Performance test: with scanner load
let with_scanner_duration_raw = measure_workload(5_000, Duration::from_millis(2)).await;
let with_scanner_duration = if with_scanner_duration_raw <= baseline_duration {
baseline_duration + Duration::from_millis(2)
} else {
with_scanner_duration_raw
};
// Performance test: with scanner load - measure multiple times for stability
let mut scanner_measurements = Vec::new();
for _ in 0..MEASUREMENT_COUNT {
let duration = measure_workload(10_000, Duration::ZERO).await;
scanner_measurements.push(duration);
}
scanner_measurements.sort();
let median_idx = scanner_measurements.len() / 2;
let with_scanner_duration = scanner_measurements[median_idx].max(baseline_duration);
// Calculate performance impact
let baseline_ns = baseline_duration.as_nanos().max(1) as f64;
@@ -281,8 +289,9 @@ async fn test_performance_impact_measurement() {
println!(" Impact percentage: {impact_percentage:.2}%");
println!(" Meets optimization goals: {}", benchmark.meets_optimization_goals());
// Verify optimization target (business impact < 10%)
// Note: In real environment this test may need longer time and real load
// Verify optimization target (business impact < 50%)
// Note: In test environment, allow higher threshold due to system load variability
// In production, the actual impact should be much lower (< 10%)
assert!(impact_percentage < 50.0, "Performance impact too high: {impact_percentage:.2}%");
io_monitor.stop().await;

View File

@@ -23,16 +23,16 @@ use rustfs_ecstore::{
store_api::{MakeBucketOptions, ObjectIO, ObjectInfo, ObjectOptions, PutObjReader, StorageAPI},
};
use serial_test::serial;
use std::borrow::Cow;
use std::sync::Once;
use std::sync::OnceLock;
use std::{path::PathBuf, sync::Arc};
use tokio::fs;
use tokio_util::sync::CancellationToken;
use tracing::warn;
use tracing::{debug, info};
use std::{
borrow::Cow,
path::PathBuf,
sync::{Arc, Once, OnceLock},
};
//use heed_traits::Comparator;
use time::OffsetDateTime;
use tokio::fs;
use tokio_util::sync::CancellationToken;
use tracing::{debug, info, warn};
use uuid::Uuid;
static GLOBAL_ENV: OnceLock<(Vec<PathBuf>, Arc<ECStore>)> = OnceLock::new();
@@ -144,7 +144,7 @@ async fn setup_test_env() -> (Vec<PathBuf>, Arc<ECStore>) {
let mut wtxn = lmdb_env.write_txn().unwrap();
let db = match lmdb_env
.database_options()
.name(&format!("bucket_{}", bucket_name))
.name(&format!("bucket_{bucket_name}"))
.types::<I64<BigEndian>, LifecycleContentCodec>()
.flags(DatabaseFlags::DUP_SORT)
//.dup_sort_comparator::<>()
@@ -152,7 +152,7 @@ async fn setup_test_env() -> (Vec<PathBuf>, Arc<ECStore>) {
{
Ok(db) => db,
Err(err) => {
panic!("lmdb error: {}", err);
panic!("lmdb error: {err}");
}
};
let _ = wtxn.commit();
@@ -199,7 +199,7 @@ async fn upload_test_object(ecstore: &Arc<ECStore>, bucket: &str, object: &str,
.await
.expect("Failed to upload test object");
println!("object_info1: {:?}", object_info);
println!("object_info1: {object_info:?}");
info!("Uploaded test object: {}/{} ({} bytes)", bucket, object, object_info.size);
}
@@ -456,7 +456,7 @@ mod serial_tests {
}
let object_info = convert_record_to_object_info(record);
println!("object_info2: {:?}", object_info);
println!("object_info2: {object_info:?}");
let mod_time = object_info.mod_time.unwrap_or(OffsetDateTime::now_utc());
let expiry_time = rustfs_ecstore::bucket::lifecycle::lifecycle::expected_expiry_time(mod_time, 1);
@@ -494,9 +494,9 @@ mod serial_tests {
type_,
object_name,
} = &elm.1;
println!("cache row:{} {} {} {:?} {}", ver_no, ver_id, mod_time, type_, object_name);
println!("cache row:{ver_no} {ver_id} {mod_time} {type_:?} {object_name}");
}
println!("row:{:?}", row);
println!("row:{row:?}");
}
//drop(iter);
wtxn.commit().unwrap();

View File

@@ -24,9 +24,11 @@ use rustfs_ecstore::{
tier::tier_config::{TierConfig, TierMinIO, TierType},
};
use serial_test::serial;
use std::sync::Once;
use std::sync::OnceLock;
use std::{path::PathBuf, sync::Arc, time::Duration};
use std::{
path::PathBuf,
sync::{Arc, Once, OnceLock},
time::Duration,
};
use tokio::fs;
use tokio_util::sync::CancellationToken;
use tracing::info;
@@ -275,11 +277,11 @@ async fn create_test_tier(server: u32) {
};
let mut tier_config_mgr = GLOBAL_TierConfigMgr.write().await;
if let Err(err) = tier_config_mgr.add(args, false).await {
println!("tier_config_mgr add failed, e: {:?}", err);
println!("tier_config_mgr add failed, e: {err:?}");
panic!("tier add failed. {err}");
}
if let Err(e) = tier_config_mgr.save().await {
println!("tier_config_mgr save failed, e: {:?}", e);
println!("tier_config_mgr save failed, e: {e:?}");
panic!("tier save failed");
}
println!("Created test tier: COLDTIER44");
@@ -297,7 +299,7 @@ async fn object_exists(ecstore: &Arc<ECStore>, bucket: &str, object: &str) -> bo
#[allow(dead_code)]
async fn object_is_delete_marker(ecstore: &Arc<ECStore>, bucket: &str, object: &str) -> bool {
if let Ok(oi) = (**ecstore).get_object_info(bucket, object, &ObjectOptions::default()).await {
println!("oi: {:?}", oi);
println!("oi: {oi:?}");
oi.delete_marker
} else {
println!("object_is_delete_marker is error");
@@ -309,7 +311,7 @@ async fn object_is_delete_marker(ecstore: &Arc<ECStore>, bucket: &str, object: &
#[allow(dead_code)]
async fn object_is_transitioned(ecstore: &Arc<ECStore>, bucket: &str, object: &str) -> bool {
if let Ok(oi) = (**ecstore).get_object_info(bucket, object, &ObjectOptions::default()).await {
println!("oi: {:?}", oi);
println!("oi: {oi:?}");
!oi.transitioned_object.status.is_empty()
} else {
println!("object_is_transitioned is error");

View File

@@ -12,26 +12,23 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::{fs, net::SocketAddr, sync::Arc, sync::OnceLock, time::Duration};
use tempfile::TempDir;
use tokio_util::sync::CancellationToken;
use serial_test::serial;
use rustfs_ahm::heal::manager::HealConfig;
use rustfs_ahm::scanner::{
Scanner,
data_scanner::ScanMode,
node_scanner::{LoadLevel, NodeScanner, NodeScannerConfig},
};
use rustfs_ecstore::disk::endpoint::Endpoint;
use rustfs_ecstore::endpoints::{EndpointServerPools, Endpoints, PoolEndpoints};
use rustfs_ecstore::store::ECStore;
use rustfs_ecstore::{
StorageAPI,
disk::endpoint::Endpoint,
endpoints::{EndpointServerPools, Endpoints, PoolEndpoints},
store::ECStore,
store_api::{MakeBucketOptions, ObjectIO, PutObjReader},
};
use serial_test::serial;
use std::{fs, net::SocketAddr, sync::Arc, sync::OnceLock, time::Duration};
use tempfile::TempDir;
use tokio_util::sync::CancellationToken;
// Global test environment cache to avoid repeated initialization
static GLOBAL_TEST_ENV: OnceLock<(Vec<std::path::PathBuf>, Arc<ECStore>)> = OnceLock::new();

View File

@@ -12,9 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::time::Duration;
use tempfile::TempDir;
use rustfs_ahm::scanner::{
checkpoint::{CheckpointData, CheckpointManager},
io_monitor::{AdvancedIOMonitor, IOMonitorConfig},
@@ -23,6 +20,8 @@ use rustfs_ahm::scanner::{
node_scanner::{LoadLevel, NodeScanner, NodeScannerConfig, ScanProgress},
stats_aggregator::{DecentralizedStatsAggregator, DecentralizedStatsAggregatorConfig},
};
use std::time::Duration;
use tempfile::TempDir;
#[tokio::test]
async fn test_checkpoint_manager_save_and_load() {

View File

@@ -29,6 +29,7 @@ base64-simd = { workspace = true }
rsa = { workspace = true }
serde.workspace = true
serde_json.workspace = true
rand.workspace = true
[lints]
workspace = true

View File

@@ -12,11 +12,9 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use rsa::Pkcs1v15Encrypt;
use rsa::{
RsaPrivateKey, RsaPublicKey,
Pkcs1v15Encrypt, RsaPrivateKey, RsaPublicKey,
pkcs8::{DecodePrivateKey, DecodePublicKey},
rand_core::OsRng,
};
use serde::{Deserialize, Serialize};
use std::io::{Error, Result};
@@ -33,8 +31,9 @@ pub struct Token {
/// Returns the encrypted string processed by base64
pub fn gencode(token: &Token, key: &str) -> Result<String> {
let data = serde_json::to_vec(token)?;
let mut rng = rand::rng();
let public_key = RsaPublicKey::from_public_key_pem(key).map_err(Error::other)?;
let encrypted_data = public_key.encrypt(&mut OsRng, Pkcs1v15Encrypt, &data).map_err(Error::other)?;
let encrypted_data = public_key.encrypt(&mut rng, Pkcs1v15Encrypt, &data).map_err(Error::other)?;
Ok(base64_simd::URL_SAFE_NO_PAD.encode_to_string(&encrypted_data))
}
@@ -76,9 +75,10 @@ mod tests {
pkcs8::{EncodePrivateKey, EncodePublicKey, LineEnding},
};
use std::time::{SystemTime, UNIX_EPOCH};
#[test]
fn test_gencode_and_parse() {
let mut rng = OsRng;
let mut rng = rand::rng();
let bits = 2048;
let private_key = RsaPrivateKey::new(&mut rng, bits).expect("Failed to generate private key");
let public_key = RsaPublicKey::from(&private_key);
@@ -101,7 +101,8 @@ mod tests {
#[test]
fn test_parse_invalid_token() {
let private_key_pem = RsaPrivateKey::new(&mut OsRng, 2048)
let mut rng = rand::rng();
let private_key_pem = RsaPrivateKey::new(&mut rng, 2048)
.expect("Failed to generate private key")
.to_pkcs8_pem(LineEnding::LF)
.unwrap();

View File

@@ -30,7 +30,10 @@ rustfs-targets = { workspace = true }
rustfs-config = { workspace = true, features = ["audit", "constants"] }
rustfs-ecstore = { workspace = true }
chrono = { workspace = true }
const-str = { workspace = true }
futures = { workspace = true }
hashbrown = { workspace = true }
metrics = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
@@ -39,5 +42,6 @@ tracing = { workspace = true, features = ["std", "attributes"] }
url = { workspace = true }
rumqttc = { workspace = true }
[lints]
workspace = true

View File

@@ -13,18 +13,10 @@
// limitations under the License.
use chrono::{DateTime, Utc};
use hashbrown::HashMap;
use rustfs_targets::EventName;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use std::collections::HashMap;
/// Trait for types that can be serialized to JSON and have a timestamp
pub trait LogRecord {
/// Serialize the record to a JSON string
fn to_json(&self) -> String;
/// Get the timestamp of the record
fn get_timestamp(&self) -> chrono::DateTime<chrono::Utc>;
}
/// ObjectVersion represents an object version with key and versionId
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Default)]
@@ -36,19 +28,12 @@ pub struct ObjectVersion {
}
impl ObjectVersion {
/// Set the object name (chainable)
pub fn set_object_name(&mut self, name: String) -> &mut Self {
self.object_name = name;
self
}
/// Set the version ID (chainable)
pub fn set_version_id(&mut self, version_id: Option<String>) -> &mut Self {
self.version_id = version_id;
self
pub fn new(object_name: String, version_id: Option<String>) -> Self {
Self { object_name, version_id }
}
}
/// ApiDetails contains API information for the audit entry
/// `ApiDetails` contains API information for the audit entry.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ApiDetails {
#[serde(skip_serializing_if = "Option::is_none")]
@@ -79,75 +64,86 @@ pub struct ApiDetails {
pub time_to_response_in_ns: Option<String>,
}
impl ApiDetails {
/// Set API name (chainable)
pub fn set_name(&mut self, name: Option<String>) -> &mut Self {
self.name = name;
/// Builder for `ApiDetails`.
#[derive(Default, Clone)]
pub struct ApiDetailsBuilder(pub ApiDetails);
impl ApiDetailsBuilder {
pub fn new() -> Self {
Self::default()
}
pub fn name(mut self, name: impl Into<String>) -> Self {
self.0.name = Some(name.into());
self
}
/// Set bucket name (chainable)
pub fn set_bucket(&mut self, bucket: Option<String>) -> &mut Self {
self.bucket = bucket;
pub fn bucket(mut self, bucket: impl Into<String>) -> Self {
self.0.bucket = Some(bucket.into());
self
}
/// Set object name (chainable)
pub fn set_object(&mut self, object: Option<String>) -> &mut Self {
self.object = object;
pub fn object(mut self, object: impl Into<String>) -> Self {
self.0.object = Some(object.into());
self
}
/// Set objects list (chainable)
pub fn set_objects(&mut self, objects: Option<Vec<ObjectVersion>>) -> &mut Self {
self.objects = objects;
pub fn objects(mut self, objects: Vec<ObjectVersion>) -> Self {
self.0.objects = Some(objects);
self
}
/// Set status (chainable)
pub fn set_status(&mut self, status: Option<String>) -> &mut Self {
self.status = status;
pub fn status(mut self, status: impl Into<String>) -> Self {
self.0.status = Some(status.into());
self
}
/// Set status code (chainable)
pub fn set_status_code(&mut self, code: Option<i32>) -> &mut Self {
self.status_code = code;
pub fn status_code(mut self, code: i32) -> Self {
self.0.status_code = Some(code);
self
}
/// Set input bytes (chainable)
pub fn set_input_bytes(&mut self, bytes: Option<i64>) -> &mut Self {
self.input_bytes = bytes;
pub fn input_bytes(mut self, bytes: i64) -> Self {
self.0.input_bytes = Some(bytes);
self
}
/// Set output bytes (chainable)
pub fn set_output_bytes(&mut self, bytes: Option<i64>) -> &mut Self {
self.output_bytes = bytes;
pub fn output_bytes(mut self, bytes: i64) -> Self {
self.0.output_bytes = Some(bytes);
self
}
/// Set header bytes (chainable)
pub fn set_header_bytes(&mut self, bytes: Option<i64>) -> &mut Self {
self.header_bytes = bytes;
pub fn header_bytes(mut self, bytes: i64) -> Self {
self.0.header_bytes = Some(bytes);
self
}
/// Set time to first byte (chainable)
pub fn set_time_to_first_byte(&mut self, t: Option<String>) -> &mut Self {
self.time_to_first_byte = t;
pub fn time_to_first_byte(mut self, t: impl Into<String>) -> Self {
self.0.time_to_first_byte = Some(t.into());
self
}
/// Set time to first byte in nanoseconds (chainable)
pub fn set_time_to_first_byte_in_ns(&mut self, t: Option<String>) -> &mut Self {
self.time_to_first_byte_in_ns = t;
pub fn time_to_first_byte_in_ns(mut self, t: impl Into<String>) -> Self {
self.0.time_to_first_byte_in_ns = Some(t.into());
self
}
/// Set time to response (chainable)
pub fn set_time_to_response(&mut self, t: Option<String>) -> &mut Self {
self.time_to_response = t;
pub fn time_to_response(mut self, t: impl Into<String>) -> Self {
self.0.time_to_response = Some(t.into());
self
}
/// Set time to response in nanoseconds (chainable)
pub fn set_time_to_response_in_ns(&mut self, t: Option<String>) -> &mut Self {
self.time_to_response_in_ns = t;
pub fn time_to_response_in_ns(mut self, t: impl Into<String>) -> Self {
self.0.time_to_response_in_ns = Some(t.into());
self
}
pub fn build(self) -> ApiDetails {
self.0
}
}
/// AuditEntry represents an audit log entry
/// `AuditEntry` represents an audit log entry.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct AuditEntry {
pub version: String,
@@ -155,6 +151,7 @@ pub struct AuditEntry {
pub deployment_id: Option<String>,
#[serde(rename = "siteName", skip_serializing_if = "Option::is_none")]
pub site_name: Option<String>,
#[serde(with = "chrono::serde::ts_milliseconds")]
pub time: DateTime<Utc>,
pub event: EventName,
#[serde(rename = "type", skip_serializing_if = "Option::is_none")]
@@ -191,200 +188,130 @@ pub struct AuditEntry {
pub error: Option<String>,
}
impl AuditEntry {
/// Create a new AuditEntry with required fields
#[allow(clippy::too_many_arguments)]
pub fn new(
version: String,
deployment_id: Option<String>,
site_name: Option<String>,
time: DateTime<Utc>,
event: EventName,
entry_type: Option<String>,
trigger: String,
api: ApiDetails,
) -> Self {
AuditEntry {
version,
deployment_id,
site_name,
time,
/// Constructor for `AuditEntry`.
pub struct AuditEntryBuilder(AuditEntry);
impl AuditEntryBuilder {
/// Create a new builder with all required fields.
pub fn new(version: impl Into<String>, event: EventName, trigger: impl Into<String>, api: ApiDetails) -> Self {
Self(AuditEntry {
version: version.into(),
time: Utc::now(),
event,
entry_type,
trigger,
trigger: trigger.into(),
api,
remote_host: None,
request_id: None,
user_agent: None,
req_path: None,
req_host: None,
req_node: None,
req_claims: None,
req_query: None,
req_header: None,
resp_header: None,
tags: None,
access_key: None,
parent_user: None,
error: None,
}
..Default::default()
})
}
/// Set version (chainable)
pub fn set_version(&mut self, version: String) -> &mut Self {
self.version = version;
self
}
/// Set deployment ID (chainable)
pub fn set_deployment_id(&mut self, id: Option<String>) -> &mut Self {
self.deployment_id = id;
self
}
/// Set site name (chainable)
pub fn set_site_name(&mut self, name: Option<String>) -> &mut Self {
self.site_name = name;
self
}
/// Set time (chainable)
pub fn set_time(&mut self, time: DateTime<Utc>) -> &mut Self {
self.time = time;
self
}
/// Set event (chainable)
pub fn set_event(&mut self, event: EventName) -> &mut Self {
self.event = event;
self
}
/// Set entry type (chainable)
pub fn set_entry_type(&mut self, entry_type: Option<String>) -> &mut Self {
self.entry_type = entry_type;
self
}
/// Set trigger (chainable)
pub fn set_trigger(&mut self, trigger: String) -> &mut Self {
self.trigger = trigger;
self
}
/// Set API details (chainable)
pub fn set_api(&mut self, api: ApiDetails) -> &mut Self {
self.api = api;
self
}
/// Set remote host (chainable)
pub fn set_remote_host(&mut self, host: Option<String>) -> &mut Self {
self.remote_host = host;
self
}
/// Set request ID (chainable)
pub fn set_request_id(&mut self, id: Option<String>) -> &mut Self {
self.request_id = id;
self
}
/// Set user agent (chainable)
pub fn set_user_agent(&mut self, agent: Option<String>) -> &mut Self {
self.user_agent = agent;
self
}
/// Set request path (chainable)
pub fn set_req_path(&mut self, path: Option<String>) -> &mut Self {
self.req_path = path;
self
}
/// Set request host (chainable)
pub fn set_req_host(&mut self, host: Option<String>) -> &mut Self {
self.req_host = host;
self
}
/// Set request node (chainable)
pub fn set_req_node(&mut self, node: Option<String>) -> &mut Self {
self.req_node = node;
self
}
/// Set request claims (chainable)
pub fn set_req_claims(&mut self, claims: Option<HashMap<String, Value>>) -> &mut Self {
self.req_claims = claims;
self
}
/// Set request query (chainable)
pub fn set_req_query(&mut self, query: Option<HashMap<String, String>>) -> &mut Self {
self.req_query = query;
self
}
/// Set request header (chainable)
pub fn set_req_header(&mut self, header: Option<HashMap<String, String>>) -> &mut Self {
self.req_header = header;
self
}
/// Set response header (chainable)
pub fn set_resp_header(&mut self, header: Option<HashMap<String, String>>) -> &mut Self {
self.resp_header = header;
self
}
/// Set tags (chainable)
pub fn set_tags(&mut self, tags: Option<HashMap<String, Value>>) -> &mut Self {
self.tags = tags;
self
}
/// Set access key (chainable)
pub fn set_access_key(&mut self, key: Option<String>) -> &mut Self {
self.access_key = key;
self
}
/// Set parent user (chainable)
pub fn set_parent_user(&mut self, user: Option<String>) -> &mut Self {
self.parent_user = user;
self
}
/// Set error message (chainable)
pub fn set_error(&mut self, error: Option<String>) -> &mut Self {
self.error = error;
// event
pub fn version(mut self, version: impl Into<String>) -> Self {
self.0.version = version.into();
self
}
/// Build AuditEntry from context or parameters (example, can be extended)
pub fn from_context(
version: String,
deployment_id: Option<String>,
time: DateTime<Utc>,
event: EventName,
trigger: String,
api: ApiDetails,
tags: Option<HashMap<String, Value>>,
) -> Self {
AuditEntry {
version,
deployment_id,
site_name: None,
time,
event,
entry_type: None,
trigger,
api,
remote_host: None,
request_id: None,
user_agent: None,
req_path: None,
req_host: None,
req_node: None,
req_claims: None,
req_query: None,
req_header: None,
resp_header: None,
tags,
access_key: None,
parent_user: None,
error: None,
}
}
}
impl LogRecord for AuditEntry {
/// Serialize AuditEntry to JSON string
fn to_json(&self) -> String {
serde_json::to_string(self).unwrap_or_else(|_| String::from("{}"))
}
/// Get the timestamp of the audit entry
fn get_timestamp(&self) -> DateTime<Utc> {
self.time
pub fn event(mut self, event: EventName) -> Self {
self.0.event = event;
self
}
pub fn api(mut self, api_details: ApiDetails) -> Self {
self.0.api = api_details;
self
}
pub fn deployment_id(mut self, id: impl Into<String>) -> Self {
self.0.deployment_id = Some(id.into());
self
}
pub fn site_name(mut self, name: impl Into<String>) -> Self {
self.0.site_name = Some(name.into());
self
}
pub fn time(mut self, time: DateTime<Utc>) -> Self {
self.0.time = time;
self
}
pub fn entry_type(mut self, entry_type: impl Into<String>) -> Self {
self.0.entry_type = Some(entry_type.into());
self
}
pub fn remote_host(mut self, host: impl Into<String>) -> Self {
self.0.remote_host = Some(host.into());
self
}
pub fn request_id(mut self, id: impl Into<String>) -> Self {
self.0.request_id = Some(id.into());
self
}
pub fn user_agent(mut self, agent: impl Into<String>) -> Self {
self.0.user_agent = Some(agent.into());
self
}
pub fn req_path(mut self, path: impl Into<String>) -> Self {
self.0.req_path = Some(path.into());
self
}
pub fn req_host(mut self, host: impl Into<String>) -> Self {
self.0.req_host = Some(host.into());
self
}
pub fn req_node(mut self, node: impl Into<String>) -> Self {
self.0.req_node = Some(node.into());
self
}
pub fn req_claims(mut self, claims: HashMap<String, Value>) -> Self {
self.0.req_claims = Some(claims);
self
}
pub fn req_query(mut self, query: HashMap<String, String>) -> Self {
self.0.req_query = Some(query);
self
}
pub fn req_header(mut self, header: HashMap<String, String>) -> Self {
self.0.req_header = Some(header);
self
}
pub fn resp_header(mut self, header: HashMap<String, String>) -> Self {
self.0.resp_header = Some(header);
self
}
pub fn tags(mut self, tags: HashMap<String, Value>) -> Self {
self.0.tags = Some(tags);
self
}
pub fn access_key(mut self, key: impl Into<String>) -> Self {
self.0.access_key = Some(key.into());
self
}
pub fn parent_user(mut self, user: impl Into<String>) -> Self {
self.0.parent_user = Some(user.into());
self
}
pub fn error(mut self, error: impl Into<String>) -> Self {
self.0.error = Some(error.into());
self
}
/// Construct the final `AuditEntry`.
pub fn build(self) -> AuditEntry {
self.0
}
}

View File

@@ -21,7 +21,7 @@ pub type AuditResult<T> = Result<T, AuditError>;
#[derive(Error, Debug)]
pub enum AuditError {
#[error("Configuration error: {0}")]
Configuration(String),
Configuration(String, #[source] Option<Box<dyn std::error::Error + Send + Sync>>),
#[error("config not loaded")]
ConfigNotLoaded,
@@ -35,11 +35,14 @@ pub enum AuditError {
#[error("System already initialized")]
AlreadyInitialized,
#[error("Storage not available: {0}")]
StorageNotAvailable(String),
#[error("Failed to save configuration: {0}")]
SaveConfig(String),
SaveConfig(#[source] Box<dyn std::error::Error + Send + Sync>),
#[error("Failed to load configuration: {0}")]
LoadConfig(String),
LoadConfig(#[source] Box<dyn std::error::Error + Send + Sync>),
#[error("Serialization error: {0}")]
Serialization(#[from] serde_json::Error),
@@ -49,7 +52,4 @@ pub enum AuditError {
#[error("Join error: {0}")]
Join(#[from] tokio::task::JoinError),
#[error("Server storage not initialized: {0}")]
ServerNotInitialized(String),
}

View File

@@ -15,7 +15,7 @@
use crate::{AuditEntry, AuditResult, AuditSystem};
use rustfs_ecstore::config::Config;
use std::sync::{Arc, OnceLock};
use tracing::{error, warn};
use tracing::{debug, error, trace, warn};
/// Global audit system instance
static AUDIT_SYSTEM: OnceLock<Arc<AuditSystem>> = OnceLock::new();
@@ -30,6 +30,19 @@ pub fn audit_system() -> Option<Arc<AuditSystem>> {
AUDIT_SYSTEM.get().cloned()
}
/// A helper macro for executing closures if the global audit system is initialized.
/// If not initialized, log a warning and return `Ok(())`.
macro_rules! with_audit_system {
($async_closure:expr) => {
if let Some(system) = audit_system() {
(async move { $async_closure(system).await }).await
} else {
warn!("Audit system not initialized, operation skipped.");
Ok(())
}
};
}
/// Start the global audit system with configuration
pub async fn start_audit_system(config: Config) -> AuditResult<()> {
let system = init_audit_system();
@@ -38,32 +51,17 @@ pub async fn start_audit_system(config: Config) -> AuditResult<()> {
/// Stop the global audit system
pub async fn stop_audit_system() -> AuditResult<()> {
if let Some(system) = audit_system() {
system.close().await
} else {
warn!("Audit system not initialized, cannot stop");
Ok(())
}
with_audit_system!(|system: Arc<AuditSystem>| async move { system.close().await })
}
/// Pause the global audit system
pub async fn pause_audit_system() -> AuditResult<()> {
if let Some(system) = audit_system() {
system.pause().await
} else {
warn!("Audit system not initialized, cannot pause");
Ok(())
}
with_audit_system!(|system: Arc<AuditSystem>| async move { system.pause().await })
}
/// Resume the global audit system
pub async fn resume_audit_system() -> AuditResult<()> {
if let Some(system) = audit_system() {
system.resume().await
} else {
warn!("Audit system not initialized, cannot resume");
Ok(())
}
with_audit_system!(|system: Arc<AuditSystem>| async move { system.resume().await })
}
/// Dispatch an audit log entry to all targets
@@ -72,23 +70,23 @@ pub async fn dispatch_audit_log(entry: Arc<AuditEntry>) -> AuditResult<()> {
if system.is_running().await {
system.dispatch(entry).await
} else {
// System not running, just drop the log entry without error
// The system is initialized but not running (for example, it is suspended). Silently discard log entries based on original logic.
// For debugging purposes, it can be useful to add a trace log here.
trace!("Audit system is not running, dropping audit entry.");
Ok(())
}
} else {
// System not initialized, just drop the log entry without error
// The system is not initialized at all. This is a more important state.
// It might be better to return an error or log a warning.
debug!("Audit system not initialized, dropping audit entry.");
// If this should be a hard failure, you can return Err(AuditError::NotInitialized("..."))
Ok(())
}
}
/// Reload the global audit system configuration
pub async fn reload_audit_config(config: Config) -> AuditResult<()> {
if let Some(system) = audit_system() {
system.reload_config(config).await
} else {
warn!("Audit system not initialized, cannot reload config");
Ok(())
}
with_audit_system!(|system: Arc<AuditSystem>| async move { system.reload_config(config).await })
}
/// Check if the global audit system is running

View File

@@ -25,7 +25,7 @@ pub mod observability;
pub mod registry;
pub mod system;
pub use entity::{ApiDetails, AuditEntry, LogRecord, ObjectVersion};
pub use entity::{ApiDetails, AuditEntry, ObjectVersion};
pub use error::{AuditError, AuditResult};
pub use global::*;
pub use observability::{AuditMetrics, AuditMetricsReport, PerformanceValidation};

View File

@@ -21,12 +21,47 @@
//! - Error rate monitoring
//! - Queue depth monitoring
use metrics::{counter, describe_counter, describe_gauge, describe_histogram, gauge, histogram};
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::{Arc, OnceLock};
use std::time::{Duration, Instant};
use tokio::sync::RwLock;
use tracing::info;
const RUSTFS_AUDIT_METRICS_NAMESPACE: &str = "rustfs.audit.";
const M_AUDIT_EVENTS_TOTAL: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "events.total");
const M_AUDIT_EVENTS_FAILED: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "events.failed");
const M_AUDIT_DISPATCH_NS: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "dispatch.ns");
const M_AUDIT_EPS: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "eps");
const M_AUDIT_TARGET_OPS: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "target.ops");
const M_AUDIT_CONFIG_RELOADS: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "config.reloads");
const M_AUDIT_SYSTEM_STARTS: &str = const_str::concat!(RUSTFS_AUDIT_METRICS_NAMESPACE, "system.starts");
const L_RESULT: &str = "result";
const L_STATUS: &str = "status";
const V_SUCCESS: &str = "success";
const V_FAILURE: &str = "failure";
/// One-time registration of indicator meta information
/// This function ensures that metric descriptors are registered only once.
pub fn init_observability_metrics() {
static METRICS_DESC_INIT: OnceLock<()> = OnceLock::new();
METRICS_DESC_INIT.get_or_init(|| {
// Event/Time-consuming
describe_counter!(M_AUDIT_EVENTS_TOTAL, "Total audit events (labeled by result).");
describe_counter!(M_AUDIT_EVENTS_FAILED, "Total failed audit events.");
describe_histogram!(M_AUDIT_DISPATCH_NS, "Dispatch time per event (ns).");
describe_gauge!(M_AUDIT_EPS, "Events per second since last reset.");
// Target operation/system event
describe_counter!(M_AUDIT_TARGET_OPS, "Total target operations (labeled by status).");
describe_counter!(M_AUDIT_CONFIG_RELOADS, "Total configuration reloads.");
describe_counter!(M_AUDIT_SYSTEM_STARTS, "Total system starts.");
});
}
/// Metrics collector for audit system observability
#[derive(Debug)]
pub struct AuditMetrics {
@@ -56,6 +91,7 @@ impl Default for AuditMetrics {
impl AuditMetrics {
/// Creates a new metrics collector
pub fn new() -> Self {
init_observability_metrics();
Self {
total_events_processed: AtomicU64::new(0),
total_events_failed: AtomicU64::new(0),
@@ -68,11 +104,28 @@ impl AuditMetrics {
}
}
// Suggestion: Call this auxiliary function in the existing "Successful Event Recording" method body to complete the instrumentation
#[inline]
fn emit_event_success_metrics(&self, dispatch_time: Duration) {
// count + histogram
counter!(M_AUDIT_EVENTS_TOTAL, L_RESULT => V_SUCCESS).increment(1);
histogram!(M_AUDIT_DISPATCH_NS).record(dispatch_time.as_nanos() as f64);
}
// Suggestion: Call this auxiliary function in the existing "Failure Event Recording" method body to complete the instrumentation
#[inline]
fn emit_event_failure_metrics(&self, dispatch_time: Duration) {
counter!(M_AUDIT_EVENTS_TOTAL, L_RESULT => V_FAILURE).increment(1);
counter!(M_AUDIT_EVENTS_FAILED).increment(1);
histogram!(M_AUDIT_DISPATCH_NS).record(dispatch_time.as_nanos() as f64);
}
/// Records a successful event dispatch
pub fn record_event_success(&self, dispatch_time: Duration) {
self.total_events_processed.fetch_add(1, Ordering::Relaxed);
self.total_dispatch_time_ns
.fetch_add(dispatch_time.as_nanos() as u64, Ordering::Relaxed);
self.emit_event_success_metrics(dispatch_time);
}
/// Records a failed event dispatch
@@ -80,27 +133,32 @@ impl AuditMetrics {
self.total_events_failed.fetch_add(1, Ordering::Relaxed);
self.total_dispatch_time_ns
.fetch_add(dispatch_time.as_nanos() as u64, Ordering::Relaxed);
self.emit_event_failure_metrics(dispatch_time);
}
/// Records a successful target operation
pub fn record_target_success(&self) {
self.target_success_count.fetch_add(1, Ordering::Relaxed);
counter!(M_AUDIT_TARGET_OPS, L_STATUS => V_SUCCESS).increment(1);
}
/// Records a failed target operation
pub fn record_target_failure(&self) {
self.target_failure_count.fetch_add(1, Ordering::Relaxed);
counter!(M_AUDIT_TARGET_OPS, L_STATUS => V_FAILURE).increment(1);
}
/// Records a configuration reload
pub fn record_config_reload(&self) {
self.config_reload_count.fetch_add(1, Ordering::Relaxed);
counter!(M_AUDIT_CONFIG_RELOADS).increment(1);
info!("Audit configuration reloaded");
}
/// Records a system start
pub fn record_system_start(&self) {
self.system_start_count.fetch_add(1, Ordering::Relaxed);
counter!(M_AUDIT_SYSTEM_STARTS).increment(1);
info!("Audit system started");
}
@@ -110,11 +168,14 @@ impl AuditMetrics {
let elapsed = reset_time.elapsed();
let total_events = self.total_events_processed.load(Ordering::Relaxed) + self.total_events_failed.load(Ordering::Relaxed);
if elapsed.as_secs_f64() > 0.0 {
let eps = if elapsed.as_secs_f64() > 0.0 {
total_events as f64 / elapsed.as_secs_f64()
} else {
0.0
}
};
// EPS is reported in gauge
gauge!(M_AUDIT_EPS).set(eps);
eps
}
/// Gets the average dispatch latency in milliseconds
@@ -166,6 +227,8 @@ impl AuditMetrics {
let mut reset_time = self.last_reset_time.write().await;
*reset_time = Instant::now();
// Reset EPS to zero after reset
gauge!(M_AUDIT_EPS).set(0.0);
info!("Audit metrics reset");
}

View File

@@ -14,6 +14,7 @@
use crate::{AuditEntry, AuditError, AuditResult};
use futures::{StreamExt, stream::FuturesUnordered};
use hashbrown::{HashMap, HashSet};
use rustfs_config::{
DEFAULT_DELIMITER, ENABLE_KEY, ENV_PREFIX, MQTT_BROKER, MQTT_KEEP_ALIVE_INTERVAL, MQTT_PASSWORD, MQTT_QOS, MQTT_QUEUE_DIR,
MQTT_QUEUE_LIMIT, MQTT_RECONNECT_INTERVAL, MQTT_TOPIC, MQTT_USERNAME, WEBHOOK_AUTH_TOKEN, WEBHOOK_BATCH_SIZE,
@@ -25,7 +26,6 @@ use rustfs_targets::{
Target, TargetError,
target::{ChannelTargetType, TargetType, mqtt::MQTTArgs, webhook::WebhookArgs},
};
use std::collections::{HashMap, HashSet};
use std::sync::Arc;
use std::time::Duration;
use tracing::{debug, error, info, warn};
@@ -251,7 +251,7 @@ impl AuditRegistry {
sections.extend(successes_by_section.keys().cloned());
for section_name in sections {
let mut section_map: HashMap<String, KVS> = HashMap::new();
let mut section_map: std::collections::HashMap<String, KVS> = std::collections::HashMap::new();
// The default entry (if present) is written back to `_`
if let Some(default_cfg) = section_defaults.get(&section_name) {
@@ -277,7 +277,7 @@ impl AuditRegistry {
// 7. Save the new configuration to the system
let Some(store) = rustfs_ecstore::new_object_layer_fn() else {
return Err(AuditError::ServerNotInitialized(
return Err(AuditError::StorageNotAvailable(
"Failed to save target configuration: server storage not initialized".to_string(),
));
};
@@ -286,7 +286,7 @@ impl AuditRegistry {
Ok(_) => info!("New audit configuration saved to system successfully"),
Err(e) => {
error!(error = %e, "Failed to save new audit configuration");
return Err(AuditError::SaveConfig(e.to_string()));
return Err(AuditError::SaveConfig(Box::new(e)));
}
}
}

View File

@@ -59,7 +59,7 @@ impl AuditSystem {
/// Starts the audit system with the given configuration
pub async fn start(&self, config: Config) -> AuditResult<()> {
let mut state = self.state.write().await;
let state = self.state.write().await;
match *state {
AuditSystemState::Running => {
@@ -72,7 +72,6 @@ impl AuditSystem {
_ => {}
}
*state = AuditSystemState::Starting;
drop(state);
info!("Starting audit system");
@@ -90,6 +89,17 @@ impl AuditSystem {
let mut registry = self.registry.lock().await;
match registry.create_targets_from_config(&config).await {
Ok(targets) => {
if targets.is_empty() {
info!("No enabled audit targets found, keeping audit system stopped");
drop(registry);
return Ok(());
}
{
let mut state = self.state.write().await;
*state = AuditSystemState::Starting;
}
info!(target_count = targets.len(), "Created audit targets successfully");
// Initialize all targets
@@ -146,7 +156,7 @@ impl AuditSystem {
warn!("Audit system is already paused");
Ok(())
}
_ => Err(AuditError::Configuration("Cannot pause audit system in current state".to_string())),
_ => Err(AuditError::Configuration("Cannot pause audit system in current state".to_string(), None)),
}
}
@@ -164,7 +174,7 @@ impl AuditSystem {
warn!("Audit system is already running");
Ok(())
}
_ => Err(AuditError::Configuration("Cannot resume audit system in current state".to_string())),
_ => Err(AuditError::Configuration("Cannot resume audit system in current state".to_string(), None)),
}
}
@@ -460,7 +470,7 @@ impl AuditSystem {
info!(target_id = %target_id, "Target enabled");
Ok(())
} else {
Err(AuditError::Configuration(format!("Target not found: {target_id}")))
Err(AuditError::Configuration(format!("Target not found: {target_id}"), None))
}
}
@@ -473,7 +483,7 @@ impl AuditSystem {
info!(target_id = %target_id, "Target disabled");
Ok(())
} else {
Err(AuditError::Configuration(format!("Target not found: {target_id}")))
Err(AuditError::Configuration(format!("Target not found: {target_id}"), None))
}
}
@@ -487,7 +497,7 @@ impl AuditSystem {
info!(target_id = %target_id, "Target removed");
Ok(())
} else {
Err(AuditError::Configuration(format!("Target not found: {target_id}")))
Err(AuditError::Configuration(format!("Target not found: {target_id}"), None))
}
}

View File

@@ -52,7 +52,7 @@ async fn test_config_parsing_webhook() {
// We expect this to fail due to server storage not being initialized
// but the parsing should work correctly
match result {
Err(AuditError::ServerNotInitialized(_)) => {
Err(AuditError::StorageNotAvailable(_)) => {
// This is expected in test environment
}
Err(e) => {

View File

@@ -73,7 +73,7 @@ async fn test_concurrent_target_creation() {
// Verify it fails with expected error (server not initialized)
match result {
Err(AuditError::ServerNotInitialized(_)) => {
Err(AuditError::StorageNotAvailable(_)) => {
// Expected in test environment
}
Err(e) => {
@@ -103,17 +103,17 @@ async fn test_audit_log_dispatch_performance() {
use std::collections::HashMap;
let id = 1;
let mut req_header = HashMap::new();
let mut req_header = hashbrown::HashMap::new();
req_header.insert("authorization".to_string(), format!("Bearer test-token-{id}"));
req_header.insert("content-type".to_string(), "application/octet-stream".to_string());
let mut resp_header = HashMap::new();
let mut resp_header = hashbrown::HashMap::new();
resp_header.insert("x-response".to_string(), "ok".to_string());
let mut tags = HashMap::new();
let mut tags = hashbrown::HashMap::new();
tags.insert(format!("tag-{id}"), json!("sample"));
let mut req_query = HashMap::new();
let mut req_query = hashbrown::HashMap::new();
req_query.insert("id".to_string(), id.to_string());
let api_details = ApiDetails {

View File

@@ -35,7 +35,7 @@ async fn test_complete_audit_system_lifecycle() {
// Should fail in test environment but state handling should work
match start_result {
Err(AuditError::ServerNotInitialized(_)) => {
Err(AuditError::StorageNotAvailable(_)) => {
// Expected in test environment
assert_eq!(system.get_state().await, system::AuditSystemState::Stopped);
}
@@ -168,7 +168,7 @@ async fn test_config_parsing_with_multiple_instances() {
// Should fail due to server storage not initialized, but parsing should work
match result {
Err(AuditError::ServerNotInitialized(_)) => {
Err(AuditError::StorageNotAvailable(_)) => {
// Expected - parsing worked but save failed
}
Err(e) => {
@@ -182,48 +182,6 @@ async fn test_config_parsing_with_multiple_instances() {
}
}
// #[tokio::test]
// async fn test_environment_variable_precedence() {
// // Test that environment variables override config file settings
// // This test validates the ENV > file instance > file default precedence
// // Set some test environment variables
// std::env::set_var("RUSTFS_AUDIT_WEBHOOK_ENABLE_TEST", "on");
// std::env::set_var("RUSTFS_AUDIT_WEBHOOK_ENDPOINT_TEST", "http://env.example.com/audit");
// std::env::set_var("RUSTFS_AUDIT_WEBHOOK_AUTH_TOKEN_TEST", "env-token");
// let mut registry = AuditRegistry::new();
//
// // Create config that should be overridden by env vars
// let mut config = Config(HashMap::new());
// let mut webhook_section = HashMap::new();
//
// let mut test_kvs = KVS::new();
// test_kvs.insert("enable".to_string(), "off".to_string()); // Should be overridden
// test_kvs.insert("endpoint".to_string(), "http://file.example.com/audit".to_string()); // Should be overridden
// test_kvs.insert("batch_size".to_string(), "10".to_string()); // Should remain from file
// webhook_section.insert("test".to_string(), test_kvs);
//
// config.0.insert("audit_webhook".to_string(), webhook_section);
//
// // Try to create targets - should use env vars for endpoint/enable, file for batch_size
// let result = registry.create_targets_from_config(&config).await;
// // Clean up env vars
// std::env::remove_var("RUSTFS_AUDIT_WEBHOOK_ENABLE_TEST");
// std::env::remove_var("RUSTFS_AUDIT_WEBHOOK_ENDPOINT_TEST");
// std::env::remove_var("RUSTFS_AUDIT_WEBHOOK_AUTH_TOKEN_TEST");
// // Should fail due to server storage, but precedence logic should work
// match result {
// Err(AuditError::ServerNotInitialized(_)) => {
// // Expected - precedence parsing worked but save failed
// }
// Err(e) => {
// println!("Environment precedence test error: {}", e);
// }
// Ok(_) => {
// println!("Unexpected success in environment precedence test");
// }
// }
// }
#[test]
fn test_target_type_validation() {
use rustfs_targets::target::TargetType;
@@ -315,19 +273,18 @@ fn create_sample_audit_entry_with_id(id: u32) -> AuditEntry {
use chrono::Utc;
use rustfs_targets::EventName;
use serde_json::json;
use std::collections::HashMap;
let mut req_header = HashMap::new();
let mut req_header = hashbrown::HashMap::new();
req_header.insert("authorization".to_string(), format!("Bearer test-token-{id}"));
req_header.insert("content-type".to_string(), "application/octet-stream".to_string());
let mut resp_header = HashMap::new();
let mut resp_header = hashbrown::HashMap::new();
resp_header.insert("x-response".to_string(), "ok".to_string());
let mut tags = HashMap::new();
let mut tags = hashbrown::HashMap::new();
tags.insert(format!("tag-{id}"), json!("sample"));
let mut req_query = HashMap::new();
let mut req_query = hashbrown::HashMap::new();
req_query.insert("id".to_string(), id.to_string());
let api_details = ApiDetails {

View File

@@ -28,7 +28,6 @@ categories = ["web-programming", "development-tools", "data-structures"]
workspace = true
[dependencies]
lazy_static = { workspace = true}
tokio = { workspace = true }
tonic = { workspace = true }
uuid = { workspace = true }

View File

@@ -12,9 +12,9 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::last_minute::{self};
use std::collections::HashMap;
use crate::last_minute::{self};
pub struct ReplicationLatency {
// Delays for single and multipart PUT requests
upload_histogram: last_minute::LastMinuteHistogram,

View File

@@ -14,10 +14,10 @@
use path_clean::PathClean;
use serde::{Deserialize, Serialize};
use std::hash::{DefaultHasher, Hash, Hasher};
use std::path::Path;
use std::{
collections::{HashMap, HashSet},
hash::{DefaultHasher, Hash, Hasher},
path::Path,
time::SystemTime,
};

View File

@@ -16,7 +16,6 @@
use std::collections::HashMap;
use std::sync::LazyLock;
use tokio::sync::RwLock;
use tonic::transport::Channel;

View File

@@ -18,7 +18,7 @@ use std::{
fmt::{self, Display},
sync::OnceLock,
};
use tokio::sync::mpsc;
use tokio::sync::{broadcast, mpsc};
use uuid::Uuid;
pub const HEAL_DELETE_DANGLING: bool = true;
@@ -192,6 +192,11 @@ pub type HealChannelReceiver = mpsc::UnboundedReceiver<HealChannelCommand>;
/// Global heal channel sender
static GLOBAL_HEAL_CHANNEL_SENDER: OnceLock<HealChannelSender> = OnceLock::new();
type HealResponseSender = broadcast::Sender<HealChannelResponse>;
/// Global heal response broadcaster
static GLOBAL_HEAL_RESPONSE_SENDER: OnceLock<HealResponseSender> = OnceLock::new();
/// Initialize global heal channel
pub fn init_heal_channel() -> HealChannelReceiver {
let (tx, rx) = mpsc::unbounded_channel();
@@ -218,6 +223,23 @@ pub async fn send_heal_command(command: HealChannelCommand) -> Result<(), String
}
}
fn heal_response_sender() -> &'static HealResponseSender {
GLOBAL_HEAL_RESPONSE_SENDER.get_or_init(|| {
let (tx, _rx) = broadcast::channel(1024);
tx
})
}
/// Publish a heal response to subscribers.
pub fn publish_heal_response(response: HealChannelResponse) -> Result<(), broadcast::error::SendError<HealChannelResponse>> {
heal_response_sender().send(response).map(|_| ())
}
/// Subscribe to heal responses.
pub fn subscribe_heal_responses() -> broadcast::Receiver<HealChannelResponse> {
heal_response_sender().subscribe()
}
/// Send heal start request
pub async fn send_heal_request(request: HealChannelRequest) -> Result<(), String> {
send_heal_command(HealChannelCommand::Start(request)).await
@@ -415,3 +437,20 @@ pub async fn send_heal_disk(set_disk_id: String, priority: Option<HealChannelPri
};
send_heal_request(req).await
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn heal_response_broadcast_reaches_subscriber() {
let mut receiver = subscribe_heal_responses();
let response = create_heal_response("req-1".to_string(), true, None, None);
publish_heal_response(response.clone()).expect("publish should succeed");
let received = receiver.recv().await.expect("should receive heal response");
assert_eq!(received.request_id, response.request_id);
assert!(received.success);
}
}

View File

@@ -27,11 +27,11 @@ struct TimedAction {
#[allow(dead_code)]
impl TimedAction {
// Avg returns the average time spent on the action.
pub fn avg(&self) -> Option<std::time::Duration> {
pub fn avg(&self) -> Option<Duration> {
if self.count == 0 {
return None;
}
Some(std::time::Duration::from_nanos(self.acc_time / self.count))
Some(Duration::from_nanos(self.acc_time / self.count))
}
// AvgBytes returns the average bytes processed.
@@ -860,7 +860,7 @@ impl LastMinuteHistogram {
}
}
pub fn add(&mut self, size: i64, t: std::time::Duration) {
pub fn add(&mut self, size: i64, t: Duration) {
let index = size_to_tag(size);
self.histogram[index].add(&t);
}

View File

@@ -12,23 +12,21 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::last_minute::{AccElem, LastMinuteLatency};
use chrono::{DateTime, Utc};
use lazy_static::lazy_static;
use rustfs_madmin::metrics::ScannerMetrics as M_ScannerMetrics;
use std::{
collections::HashMap,
fmt::Display,
pin::Pin,
sync::{
Arc,
Arc, OnceLock,
atomic::{AtomicU64, Ordering},
},
time::{Duration, SystemTime},
};
use tokio::sync::{Mutex, RwLock};
use crate::last_minute::{AccElem, LastMinuteLatency};
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum IlmAction {
NoneAction = 0,
@@ -73,8 +71,10 @@ impl Display for IlmAction {
}
}
lazy_static! {
pub static ref globalMetrics: Arc<Metrics> = Arc::new(Metrics::new());
pub static GLOBAL_METRICS: OnceLock<Arc<Metrics>> = OnceLock::new();
pub fn global_metrics() -> &'static Arc<Metrics> {
GLOBAL_METRICS.get_or_init(|| Arc::new(Metrics::new()))
}
#[derive(Clone, Debug, PartialEq, PartialOrd)]
@@ -294,13 +294,13 @@ impl Metrics {
let duration = SystemTime::now().duration_since(start_time).unwrap_or_default();
// Update operation count
globalMetrics.operations[metric].fetch_add(1, Ordering::Relaxed);
global_metrics().operations[metric].fetch_add(1, Ordering::Relaxed);
// Update latency for realtime metrics (spawn async task for this)
if (metric) < Metric::LastRealtime as usize {
let metric_index = metric;
tokio::spawn(async move {
globalMetrics.latency[metric_index].add(duration).await;
global_metrics().latency[metric_index].add(duration).await;
});
}
@@ -319,13 +319,13 @@ impl Metrics {
let duration = SystemTime::now().duration_since(start_time).unwrap_or_default();
// Update operation count
globalMetrics.operations[metric].fetch_add(1, Ordering::Relaxed);
global_metrics().operations[metric].fetch_add(1, Ordering::Relaxed);
// Update latency for realtime metrics with size (spawn async task)
if (metric) < Metric::LastRealtime as usize {
let metric_index = metric;
tokio::spawn(async move {
globalMetrics.latency[metric_index].add_size(duration, size).await;
global_metrics().latency[metric_index].add_size(duration, size).await;
});
}
}
@@ -339,13 +339,13 @@ impl Metrics {
let duration = SystemTime::now().duration_since(start_time).unwrap_or_default();
// Update operation count
globalMetrics.operations[metric].fetch_add(1, Ordering::Relaxed);
global_metrics().operations[metric].fetch_add(1, Ordering::Relaxed);
// Update latency for realtime metrics (spawn async task)
if (metric) < Metric::LastRealtime as usize {
let metric_index = metric;
tokio::spawn(async move {
globalMetrics.latency[metric_index].add(duration).await;
global_metrics().latency[metric_index].add(duration).await;
});
}
}
@@ -360,13 +360,13 @@ impl Metrics {
let duration = SystemTime::now().duration_since(start_time).unwrap_or_default();
// Update operation count
globalMetrics.operations[metric].fetch_add(count as u64, Ordering::Relaxed);
global_metrics().operations[metric].fetch_add(count as u64, Ordering::Relaxed);
// Update latency for realtime metrics (spawn async task)
if (metric) < Metric::LastRealtime as usize {
let metric_index = metric;
tokio::spawn(async move {
globalMetrics.latency[metric_index].add(duration).await;
global_metrics().latency[metric_index].add(duration).await;
});
}
})
@@ -384,8 +384,8 @@ impl Metrics {
Box::new(move || {
let duration = SystemTime::now().duration_since(start).unwrap_or(Duration::from_secs(0));
tokio::spawn(async move {
globalMetrics.actions[a_clone].fetch_add(versions, Ordering::Relaxed);
globalMetrics.actions_latency[a_clone].add(duration).await;
global_metrics().actions[a_clone].fetch_add(versions, Ordering::Relaxed);
global_metrics().actions_latency[a_clone].add(duration).await;
});
})
})
@@ -395,11 +395,11 @@ impl Metrics {
pub async fn inc_time(metric: Metric, duration: Duration) {
let metric = metric as usize;
// Update operation count
globalMetrics.operations[metric].fetch_add(1, Ordering::Relaxed);
global_metrics().operations[metric].fetch_add(1, Ordering::Relaxed);
// Update latency for realtime metrics
if (metric) < Metric::LastRealtime as usize {
globalMetrics.latency[metric].add(duration).await;
global_metrics().latency[metric].add(duration).await;
}
}
@@ -501,7 +501,7 @@ pub fn current_path_updater(disk: &str, initial: &str) -> (UpdateCurrentPathFn,
let tracker_clone = Arc::clone(&tracker);
let disk_clone = disk_name.clone();
tokio::spawn(async move {
globalMetrics.current_paths.write().await.insert(disk_clone, tracker_clone);
global_metrics().current_paths.write().await.insert(disk_clone, tracker_clone);
});
let update_fn = {
@@ -520,7 +520,7 @@ pub fn current_path_updater(disk: &str, initial: &str) -> (UpdateCurrentPathFn,
Arc::new(move || -> Pin<Box<dyn std::future::Future<Output = ()> + Send>> {
let disk_name = disk_name.clone();
Box::pin(async move {
globalMetrics.current_paths.write().await.remove(&disk_name);
global_metrics().current_paths.write().await.remove(&disk_name);
})
})
};

View File

@@ -16,8 +16,8 @@
//! This module defines the configuration for audit systems, including
//! webhook and MQTT audit-related settings.
pub(crate) mod mqtt;
pub(crate) mod webhook;
mod mqtt;
mod webhook;
pub use mqtt::*;
pub use webhook::*;

View File

@@ -145,14 +145,14 @@ pub const DEFAULT_LOG_ROTATION_TIME: &str = "hour";
/// It is used to keep the logs of the application.
/// Default value: 30
/// Environment variable: RUSTFS_OBS_LOG_KEEP_FILES
pub const DEFAULT_LOG_KEEP_FILES: u16 = 30;
pub const DEFAULT_LOG_KEEP_FILES: usize = 30;
/// Default log local logging enabled for rustfs
/// This is the default log local logging enabled for rustfs.
/// It is used to enable or disable local logging of the application.
/// Default value: false
/// Environment variable: RUSTFS_OBS_LOCAL_LOGGING_ENABLED
pub const DEFAULT_LOG_LOCAL_LOGGING_ENABLED: bool = false;
/// Environment variable: RUSTFS_OBS_LOGL_STDOUT_ENABLED
pub const DEFAULT_OBS_LOG_STDOUT_ENABLED: bool = false;
/// Constant representing 1 Kibibyte (1024 bytes)
/// Default value: 1024

View File

@@ -12,30 +12,39 @@
// See the License for the specific language governing permissions and
// limitations under the License.
/// Profiler related environment variable names and default values
pub const ENV_ENABLE_PROFILING: &str = "RUSTFS_ENABLE_PROFILING";
// CPU profiling
pub const ENV_CPU_MODE: &str = "RUSTFS_PROF_CPU_MODE"; // off|continuous|periodic
/// Frequency of CPU profiling samples
pub const ENV_CPU_FREQ: &str = "RUSTFS_PROF_CPU_FREQ";
/// Interval between CPU profiling sessions (for periodic mode)
pub const ENV_CPU_INTERVAL_SECS: &str = "RUSTFS_PROF_CPU_INTERVAL_SECS";
/// Duration of each CPU profiling session (for periodic mode)
pub const ENV_CPU_DURATION_SECS: &str = "RUSTFS_PROF_CPU_DURATION_SECS";
// Memory profiling (jemalloc)
/// Memory profiling (jemalloc)
pub const ENV_MEM_PERIODIC: &str = "RUSTFS_PROF_MEM_PERIODIC";
/// Interval between memory profiling snapshots (for periodic mode)
pub const ENV_MEM_INTERVAL_SECS: &str = "RUSTFS_PROF_MEM_INTERVAL_SECS";
// Output directory
/// Output directory
pub const ENV_OUTPUT_DIR: &str = "RUSTFS_PROF_OUTPUT_DIR";
// Defaults
/// Defaults for profiler settings
pub const DEFAULT_ENABLE_PROFILING: bool = false;
/// CPU profiling
pub const DEFAULT_CPU_MODE: &str = "off";
/// Frequency of CPU profiling samples
pub const DEFAULT_CPU_FREQ: usize = 100;
/// Interval between CPU profiling sessions (for periodic mode)
pub const DEFAULT_CPU_INTERVAL_SECS: u64 = 300;
/// Duration of each CPU profiling session (for periodic mode)
pub const DEFAULT_CPU_DURATION_SECS: u64 = 60;
/// Memory profiling (jemalloc)
pub const DEFAULT_MEM_PERIODIC: bool = false;
/// Interval between memory profiling snapshots (for periodic mode)
pub const DEFAULT_MEM_INTERVAL_SECS: u64 = 300;
/// Output directory
pub const DEFAULT_OUTPUT_DIR: &str = ".";

View File

@@ -0,0 +1,19 @@
// Copyright 2024 RustFS Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/// Metrics collection interval in milliseconds for system metrics (CPU, memory, disk, network).
pub const DEFAULT_METRICS_SYSTEM_INTERVAL_MS: u64 = 30000;
/// Environment variable for setting the metrics collection interval for system metrics.
pub const ENV_OBS_METRICS_SYSTEM_INTERVAL_MS: &str = "RUSTFS_OBS_METRICS_SYSTEM_INTERVAL_MS";

View File

@@ -14,7 +14,13 @@
// Observability Keys
mod metrics;
pub use metrics::*;
pub const ENV_OBS_ENDPOINT: &str = "RUSTFS_OBS_ENDPOINT";
pub const ENV_OBS_TRACE_ENDPOINT: &str = "RUSTFS_OBS_TRACE_ENDPOINT";
pub const ENV_OBS_METRIC_ENDPOINT: &str = "RUSTFS_OBS_METRIC_ENDPOINT";
pub const ENV_OBS_LOG_ENDPOINT: &str = "RUSTFS_OBS_LOG_ENDPOINT";
pub const ENV_OBS_USE_STDOUT: &str = "RUSTFS_OBS_USE_STDOUT";
pub const ENV_OBS_SAMPLE_RATIO: &str = "RUSTFS_OBS_SAMPLE_RATIO";
pub const ENV_OBS_METER_INTERVAL: &str = "RUSTFS_OBS_METER_INTERVAL";
@@ -22,7 +28,7 @@ pub const ENV_OBS_SERVICE_NAME: &str = "RUSTFS_OBS_SERVICE_NAME";
pub const ENV_OBS_SERVICE_VERSION: &str = "RUSTFS_OBS_SERVICE_VERSION";
pub const ENV_OBS_ENVIRONMENT: &str = "RUSTFS_OBS_ENVIRONMENT";
pub const ENV_OBS_LOGGER_LEVEL: &str = "RUSTFS_OBS_LOGGER_LEVEL";
pub const ENV_OBS_LOCAL_LOGGING_ENABLED: &str = "RUSTFS_OBS_LOCAL_LOGGING_ENABLED";
pub const ENV_OBS_LOG_STDOUT_ENABLED: &str = "RUSTFS_OBS_LOG_STDOUT_ENABLED";
pub const ENV_OBS_LOG_DIRECTORY: &str = "RUSTFS_OBS_LOG_DIRECTORY";
pub const ENV_OBS_LOG_FILENAME: &str = "RUSTFS_OBS_LOG_FILENAME";
pub const ENV_OBS_LOG_ROTATION_SIZE_MB: &str = "RUSTFS_OBS_LOG_ROTATION_SIZE_MB";
@@ -47,12 +53,6 @@ pub const DEFAULT_OBS_LOG_MESSAGE_CAPA: usize = 32768;
/// Default values for flush interval in milliseconds
pub const DEFAULT_OBS_LOG_FLUSH_MS: u64 = 200;
/// Audit logger queue capacity environment variable key
pub const ENV_AUDIT_LOGGER_QUEUE_CAPACITY: &str = "RUSTFS_AUDIT_LOGGER_QUEUE_CAPACITY";
/// Default values for observability configuration
pub const DEFAULT_AUDIT_LOGGER_QUEUE_CAPACITY: usize = 10000;
/// Default values for observability configuration
// ### Supported Environment Values
// - `production` - Secure file-only logging
@@ -71,6 +71,9 @@ mod tests {
#[test]
fn test_env_keys() {
assert_eq!(ENV_OBS_ENDPOINT, "RUSTFS_OBS_ENDPOINT");
assert_eq!(ENV_OBS_TRACE_ENDPOINT, "RUSTFS_OBS_TRACE_ENDPOINT");
assert_eq!(ENV_OBS_METRIC_ENDPOINT, "RUSTFS_OBS_METRIC_ENDPOINT");
assert_eq!(ENV_OBS_LOG_ENDPOINT, "RUSTFS_OBS_LOG_ENDPOINT");
assert_eq!(ENV_OBS_USE_STDOUT, "RUSTFS_OBS_USE_STDOUT");
assert_eq!(ENV_OBS_SAMPLE_RATIO, "RUSTFS_OBS_SAMPLE_RATIO");
assert_eq!(ENV_OBS_METER_INTERVAL, "RUSTFS_OBS_METER_INTERVAL");
@@ -78,18 +81,16 @@ mod tests {
assert_eq!(ENV_OBS_SERVICE_VERSION, "RUSTFS_OBS_SERVICE_VERSION");
assert_eq!(ENV_OBS_ENVIRONMENT, "RUSTFS_OBS_ENVIRONMENT");
assert_eq!(ENV_OBS_LOGGER_LEVEL, "RUSTFS_OBS_LOGGER_LEVEL");
assert_eq!(ENV_OBS_LOCAL_LOGGING_ENABLED, "RUSTFS_OBS_LOCAL_LOGGING_ENABLED");
assert_eq!(ENV_OBS_LOG_STDOUT_ENABLED, "RUSTFS_OBS_LOG_STDOUT_ENABLED");
assert_eq!(ENV_OBS_LOG_DIRECTORY, "RUSTFS_OBS_LOG_DIRECTORY");
assert_eq!(ENV_OBS_LOG_FILENAME, "RUSTFS_OBS_LOG_FILENAME");
assert_eq!(ENV_OBS_LOG_ROTATION_SIZE_MB, "RUSTFS_OBS_LOG_ROTATION_SIZE_MB");
assert_eq!(ENV_OBS_LOG_ROTATION_TIME, "RUSTFS_OBS_LOG_ROTATION_TIME");
assert_eq!(ENV_OBS_LOG_KEEP_FILES, "RUSTFS_OBS_LOG_KEEP_FILES");
assert_eq!(ENV_AUDIT_LOGGER_QUEUE_CAPACITY, "RUSTFS_AUDIT_LOGGER_QUEUE_CAPACITY");
}
#[test]
fn test_default_values() {
assert_eq!(DEFAULT_AUDIT_LOGGER_QUEUE_CAPACITY, 10000);
assert_eq!(DEFAULT_OBS_ENVIRONMENT_PRODUCTION, "production");
assert_eq!(DEFAULT_OBS_ENVIRONMENT_DEVELOPMENT, "development");
assert_eq!(DEFAULT_OBS_ENVIRONMENT_TEST, "test");

View File

@@ -29,7 +29,7 @@ documentation = "https://docs.rs/rustfs-crypto/latest/rustfs_crypto/"
workspace = true
[dependencies]
aes-gcm = { workspace = true, features = ["std"], optional = true }
aes-gcm = { workspace = true, optional = true }
argon2 = { workspace = true, features = ["std"], optional = true }
cfg-if = { workspace = true }
chacha20poly1305 = { workspace = true, optional = true }

View File

@@ -19,127 +19,37 @@ pub fn decrypt_data(password: &[u8], data: &[u8]) -> Result<Vec<u8>, crate::Erro
use aes_gcm::{Aes256Gcm, KeyInit as _};
use chacha20poly1305::ChaCha20Poly1305;
// 32: salt
// 1: id
// 12: nonce
const HEADER_LENGTH: usize = 45;
if data.len() < HEADER_LENGTH {
return Err(Error::ErrUnexpectedHeader);
}
let (salt, id, nonce) = (&data[..32], ID::try_from(data[32])?, &data[33..45]);
let data = &data[HEADER_LENGTH..];
let (salt, id, nonce_slice) = (&data[..32], ID::try_from(data[32])?, &data[33..45]);
let body = &data[HEADER_LENGTH..];
match id {
ID::Argon2idChaCHa20Poly1305 => {
let key = id.get_key(password, salt)?;
decrypt(ChaCha20Poly1305::new_from_slice(&key)?, nonce, data)
decrypt(ChaCha20Poly1305::new_from_slice(&key)?, nonce_slice, body)
}
_ => {
let key = id.get_key(password, salt)?;
decrypt(Aes256Gcm::new_from_slice(&key)?, nonce, data)
decrypt(Aes256Gcm::new_from_slice(&key)?, nonce_slice, body)
}
}
}
// use argon2::{Argon2, PasswordHasher};
// use argon2::password_hash::{SaltString};
// use aes_gcm::{Aes256Gcm, Key, Nonce}; // For AES-GCM
// use chacha20poly1305::{ChaCha20Poly1305, Key as ChaChaKey, Nonce as ChaChaNonce}; // For ChaCha20
// use pbkdf2::pbkdf2;
// use sha2::Sha256;
// use std::io::{self, Read};
// use thiserror::Error;
// #[derive(Debug, Error)]
// pub enum DecryptError {
// #[error("unexpected header")]
// UnexpectedHeader,
// #[error("invalid encryption algorithm ID")]
// InvalidAlgorithmId,
// #[error("IO error")]
// Io(#[from] io::Error),
// #[error("decryption error")]
// DecryptionError,
// }
// pub fn decrypt_data2<R: Read>(password: &str, mut data: R) -> Result<Vec<u8>, DecryptError> {
// // Parse the stream header
// let mut hdr = [0u8; 32 + 1 + 8];
// if data.read_exact(&mut hdr).is_err() {
// return Err(DecryptError::UnexpectedHeader);
// }
// let salt = &hdr[0..32];
// let id = hdr[32];
// let nonce = &hdr[33..41];
// let key = match id {
// // Argon2id + AES-GCM
// 0x01 => {
// let salt = SaltString::encode_b64(salt).map_err(|_| DecryptError::DecryptionError)?;
// let argon2 = Argon2::default();
// let hashed_key = argon2.hash_password(password.as_bytes(), &salt)
// .map_err(|_| DecryptError::DecryptionError)?;
// hashed_key.hash.unwrap().as_bytes().to_vec()
// }
// // Argon2id + ChaCha20Poly1305
// 0x02 => {
// let salt = SaltString::encode_b64(salt).map_err(|_| DecryptError::DecryptionError)?;
// let argon2 = Argon2::default();
// let hashed_key = argon2.hash_password(password.as_bytes(), &salt)
// .map_err(|_| DecryptError::DecryptionError)?;
// hashed_key.hash.unwrap().as_bytes().to_vec()
// }
// // PBKDF2 + AES-GCM
// // 0x03 => {
// // let mut key = [0u8; 32];
// // pbkdf2::<Sha256>(password.as_bytes(), salt, 10000, &mut key);
// // key.to_vec()
// // }
// _ => return Err(DecryptError::InvalidAlgorithmId),
// };
// // Decrypt data using the corresponding cipher
// let mut encrypted_data = Vec::new();
// data.read_to_end(&mut encrypted_data)?;
// let plaintext = match id {
// 0x01 => {
// let cipher = Aes256Gcm::new(Key::from_slice(&key));
// let nonce = Nonce::from_slice(nonce);
// cipher
// .decrypt(nonce, encrypted_data.as_ref())
// .map_err(|_| DecryptError::DecryptionError)?
// }
// 0x02 => {
// let cipher = ChaCha20Poly1305::new(ChaChaKey::from_slice(&key));
// let nonce = ChaChaNonce::from_slice(nonce);
// cipher
// .decrypt(nonce, encrypted_data.as_ref())
// .map_err(|_| DecryptError::DecryptionError)?
// }
// 0x03 => {
// let cipher = Aes256Gcm::new(Key::from_slice(&key));
// let nonce = Nonce::from_slice(nonce);
// cipher
// .decrypt(nonce, encrypted_data.as_ref())
// .map_err(|_| DecryptError::DecryptionError)?
// }
// _ => return Err(DecryptError::InvalidAlgorithmId),
// };
// Ok(plaintext)
// }
#[cfg(any(test, feature = "crypto"))]
#[inline]
fn decrypt<T: aes_gcm::aead::Aead>(stream: T, nonce: &[u8], data: &[u8]) -> Result<Vec<u8>, crate::Error> {
use crate::error::Error;
stream
.decrypt(aes_gcm::Nonce::from_slice(nonce), data)
.map_err(Error::ErrDecryptFailed)
use aes_gcm::AeadCore;
use aes_gcm::aead::array::Array;
use core::convert::TryFrom;
let nonce_arr: Array<u8, <T as AeadCore>::NonceSize> =
Array::try_from(nonce).map_err(|_| Error::ErrDecryptFailed(aes_gcm::aead::Error))?;
stream.decrypt(&nonce_arr, data).map_err(Error::ErrDecryptFailed)
}
#[cfg(not(any(test, feature = "crypto")))]

View File

@@ -43,7 +43,7 @@ pub fn encrypt_data(password: &[u8], data: &[u8]) -> Result<Vec<u8>, crate::Erro
if native_aes() {
encrypt(Aes256Gcm::new_from_slice(&key)?, &salt, id, data)
} else {
encrypt(ChaCha20Poly1305::new_from_slice(&key)?, &salt, id, data)
encrypt(chacha20poly1305::ChaCha20Poly1305::new_from_slice(&key)?, &salt, id, data)
}
}
}
@@ -56,16 +56,19 @@ fn encrypt<T: aes_gcm::aead::Aead>(
data: &[u8],
) -> Result<Vec<u8>, crate::Error> {
use crate::error::Error;
use aes_gcm::aead::rand_core::OsRng;
use aes_gcm::AeadCore;
use aes_gcm::aead::array::Array;
use rand::RngCore;
let nonce = T::generate_nonce(&mut OsRng);
let mut nonce: Array<u8, <T as AeadCore>::NonceSize> = Array::default();
rand::rng().fill_bytes(&mut nonce);
let encryptor = stream.encrypt(&nonce, data).map_err(Error::ErrEncryptFailed)?;
let mut ciphertext = Vec::with_capacity(salt.len() + 1 + nonce.len() + encryptor.len());
ciphertext.extend_from_slice(salt);
ciphertext.push(id as u8);
ciphertext.extend_from_slice(nonce.as_slice());
ciphertext.extend_from_slice(&nonce);
ciphertext.extend_from_slice(&encryptor);
Ok(ciphertext)

View File

@@ -106,6 +106,7 @@ serde_urlencoded.workspace = true
google-cloud-storage = { workspace = true }
google-cloud-auth = { workspace = true }
aws-config = { workspace = true }
faster-hex = { workspace = true }
[target.'cfg(not(windows))'.dependencies]
nix = { workspace = true }

View File

@@ -34,9 +34,10 @@ use rustfs_protos::{
};
use std::{
collections::{HashMap, HashSet},
time::SystemTime,
time::{Duration, SystemTime},
};
use time::OffsetDateTime;
use tokio::time::timeout;
use tonic::Request;
use tracing::warn;
@@ -44,6 +45,8 @@ use shadow_rs::shadow;
shadow!(build);
const SERVER_PING_TIMEOUT: Duration = Duration::from_secs(1);
// pub const ITEM_OFFLINE: &str = "offline";
// pub const ITEM_INITIALIZING: &str = "initializing";
// pub const ITEM_ONLINE: &str = "online";
@@ -83,42 +86,45 @@ async fn is_server_resolvable(endpoint: &Endpoint) -> Result<()> {
endpoint.url.host_str().unwrap(),
endpoint.url.port().unwrap()
);
let mut fbb = flatbuffers::FlatBufferBuilder::new();
let payload = fbb.create_vector(b"hello world");
let mut builder = PingBodyBuilder::new(&mut fbb);
builder.add_payload(payload);
let root = builder.finish();
fbb.finish(root, None);
let ping_task = async {
let mut fbb = flatbuffers::FlatBufferBuilder::new();
let payload = fbb.create_vector(b"hello world");
let finished_data = fbb.finished_data();
let mut builder = PingBodyBuilder::new(&mut fbb);
builder.add_payload(payload);
let root = builder.finish();
fbb.finish(root, None);
let decoded_payload = flatbuffers::root::<PingBody>(finished_data);
assert!(decoded_payload.is_ok());
let finished_data = fbb.finished_data();
// Create the client
let mut client = node_service_time_out_client(&addr)
let decoded_payload = flatbuffers::root::<PingBody>(finished_data);
assert!(decoded_payload.is_ok());
let mut client = node_service_time_out_client(&addr)
.await
.map_err(|err| Error::other(err.to_string()))?;
let request = Request::new(PingRequest {
version: 1,
body: bytes::Bytes::copy_from_slice(finished_data),
});
let response: PingResponse = client.ping(request).await?.into_inner();
let ping_response_body = flatbuffers::root::<PingBody>(&response.body);
if let Err(e) = ping_response_body {
eprintln!("{e}");
} else {
println!("ping_resp:body(flatbuffer): {ping_response_body:?}");
}
Ok(())
};
timeout(SERVER_PING_TIMEOUT, ping_task)
.await
.map_err(|err| Error::other(err.to_string()))?;
// Build the PingRequest
let request = Request::new(PingRequest {
version: 1,
body: bytes::Bytes::copy_from_slice(finished_data),
});
// Send the request and obtain the response
let response: PingResponse = client.ping(request).await?.into_inner();
// Print the response
let ping_response_body = flatbuffers::root::<PingBody>(&response.body);
if let Err(e) = ping_response_body {
eprintln!("{e}");
} else {
println!("ping_resp:body(flatbuffer): {ping_response_body:?}");
}
Ok(())
.map_err(|_| Error::other("server ping timeout"))?
}
pub async fn get_local_server_property() -> ServerProperties {

View File

@@ -1105,10 +1105,17 @@ impl TargetClient {
Err(e) => match e {
SdkError::ServiceError(oe) => match oe.into_err() {
HeadBucketError::NotFound(_) => Ok(false),
other => Err(other.into()),
other => Err(S3ClientError::new(format!(
"failed to check bucket exists for bucket:{bucket} please check the bucket name and credentials, error:{other:?}"
))),
},
SdkError::DispatchFailure(e) => Err(S3ClientError::new(format!(
"failed to dispatch bucket exists for bucket:{bucket} error:{e:?}"
))),
_ => Err(e.into()),
_ => Err(S3ClientError::new(format!(
"failed to check bucket exists for bucket:{bucket} error:{e:?}"
))),
},
}
}

View File

@@ -18,7 +18,24 @@
#![allow(unused_must_use)]
#![allow(clippy::all)]
use crate::bucket::lifecycle::bucket_lifecycle_audit::{LcAuditEvent, LcEventSrc};
use crate::bucket::lifecycle::lifecycle::{self, ExpirationOptions, Lifecycle, TransitionOptions};
use crate::bucket::lifecycle::tier_last_day_stats::{DailyAllTierStats, LastDayTierStats};
use crate::bucket::lifecycle::tier_sweeper::{Jentry, delete_object_from_remote_tier};
use crate::bucket::object_lock::objectlock_sys::enforce_retention_for_deletion;
use crate::bucket::{metadata_sys::get_lifecycle_config, versioning_sys::BucketVersioningSys};
use crate::client::object_api_utils::new_getobjectreader;
use crate::error::Error;
use crate::error::StorageError;
use crate::error::{error_resp_to_object_err, is_err_object_not_found, is_err_version_not_found, is_network_or_host_down};
use crate::event::name::EventName;
use crate::event_notification::{EventArgs, send_event};
use crate::global::GLOBAL_LocalNodeName;
use crate::global::{GLOBAL_LifecycleSys, GLOBAL_TierConfigMgr, get_global_deployment_id};
use crate::store::ECStore;
use crate::store_api::StorageAPI;
use crate::store_api::{GetObjectReader, HTTPRangeSpec, ObjectInfo, ObjectOptions, ObjectToDelete};
use crate::tier::warm_backend::WarmBackendGetOpts;
use async_channel::{Receiver as A_Receiver, Sender as A_Sender, bounded};
use bytes::BytesMut;
use futures::Future;
@@ -27,10 +44,15 @@ use lazy_static::lazy_static;
use rustfs_common::data_usage::TierStats;
use rustfs_common::heal_channel::rep_has_active_rules;
use rustfs_common::metrics::{IlmAction, Metrics};
use rustfs_filemeta::fileinfo::{NULL_VERSION_ID, RestoreStatusOps, is_restored_object_on_disk};
use rustfs_filemeta::{NULL_VERSION_ID, RestoreStatusOps, is_restored_object_on_disk};
use rustfs_utils::path::encode_dir_object;
use rustfs_utils::string::strings_has_prefix_fold;
use s3s::Body;
use s3s::dto::{
BucketLifecycleConfiguration, DefaultRetention, ReplicationConfiguration, RestoreRequest, RestoreRequestType, RestoreStatus,
ServerSideEncryption, Timestamp,
};
use s3s::header::{X_AMZ_RESTORE, X_AMZ_SERVER_SIDE_ENCRYPTION, X_AMZ_STORAGE_CLASS};
use sha2::{Digest, Sha256};
use std::any::Any;
use std::collections::HashMap;
@@ -47,31 +69,6 @@ use tracing::{debug, error, info};
use uuid::Uuid;
use xxhash_rust::xxh64;
//use rustfs_notify::{BucketNotificationConfig, Event, EventName, LogLevel, NotificationError, init_logger};
//use rustfs_notify::{initialize, notification_system};
use super::bucket_lifecycle_audit::{LcAuditEvent, LcEventSrc};
use super::lifecycle::{self, ExpirationOptions, Lifecycle, TransitionOptions};
use super::tier_last_day_stats::{DailyAllTierStats, LastDayTierStats};
use super::tier_sweeper::{Jentry, delete_object_from_remote_tier};
use crate::bucket::object_lock::objectlock_sys::enforce_retention_for_deletion;
use crate::bucket::{metadata_sys::get_lifecycle_config, versioning_sys::BucketVersioningSys};
use crate::client::object_api_utils::new_getobjectreader;
use crate::error::Error;
use crate::error::{error_resp_to_object_err, is_err_object_not_found, is_err_version_not_found, is_network_or_host_down};
use crate::event::name::EventName;
use crate::event_notification::{EventArgs, send_event};
use crate::global::GLOBAL_LocalNodeName;
use crate::global::{GLOBAL_LifecycleSys, GLOBAL_TierConfigMgr, get_global_deployment_id};
use crate::store::ECStore;
use crate::store_api::StorageAPI;
use crate::store_api::{GetObjectReader, HTTPRangeSpec, ObjectInfo, ObjectOptions, ObjectToDelete};
use crate::tier::warm_backend::WarmBackendGetOpts;
use s3s::dto::{
BucketLifecycleConfiguration, DefaultRetention, ReplicationConfiguration, RestoreRequest, RestoreRequestType, RestoreStatus,
ServerSideEncryption, Timestamp,
};
use s3s::header::{X_AMZ_RESTORE, X_AMZ_SERVER_SIDE_ENCRYPTION, X_AMZ_STORAGE_CLASS};
pub type TimeFn = Arc<dyn Fn() -> Pin<Box<dyn Future<Output = ()> + Send>> + Send + Sync + 'static>;
pub type TraceFn =
Arc<dyn Fn(String, HashMap<String, String>) -> Pin<Box<dyn Future<Output = ()> + Send>> + Send + Sync + 'static>;
@@ -118,10 +115,9 @@ struct ExpiryTask {
impl ExpiryOp for ExpiryTask {
fn op_hash(&self) -> u64 {
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}", self.obj_info.bucket).as_bytes());
let _ = hasher.write(format!("{}", self.obj_info.name).as_bytes());
hasher.flush();
xxh64::xxh64(hasher.clone().finalize().as_slice(), XXHASH_SEED)
hasher.update(format!("{}", self.obj_info.bucket).as_bytes());
hasher.update(format!("{}", self.obj_info.name).as_bytes());
xxh64::xxh64(hasher.finalize().as_slice(), XXHASH_SEED)
}
fn as_any(&self) -> &dyn Any {
@@ -174,10 +170,9 @@ struct FreeVersionTask(ObjectInfo);
impl ExpiryOp for FreeVersionTask {
fn op_hash(&self) -> u64 {
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}", self.0.transitioned_object.tier).as_bytes());
let _ = hasher.write(format!("{}", self.0.transitioned_object.name).as_bytes());
hasher.flush();
xxh64::xxh64(hasher.clone().finalize().as_slice(), XXHASH_SEED)
hasher.update(format!("{}", self.0.transitioned_object.tier).as_bytes());
hasher.update(format!("{}", self.0.transitioned_object.name).as_bytes());
xxh64::xxh64(hasher.finalize().as_slice(), XXHASH_SEED)
}
fn as_any(&self) -> &dyn Any {
@@ -194,10 +189,9 @@ struct NewerNoncurrentTask {
impl ExpiryOp for NewerNoncurrentTask {
fn op_hash(&self) -> u64 {
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}", self.bucket).as_bytes());
let _ = hasher.write(format!("{}", self.versions[0].object_name).as_bytes());
hasher.flush();
xxh64::xxh64(hasher.clone().finalize().as_slice(), XXHASH_SEED)
hasher.update(format!("{}", self.bucket).as_bytes());
hasher.update(format!("{}", self.versions[0].object_name).as_bytes());
xxh64::xxh64(hasher.finalize().as_slice(), XXHASH_SEED)
}
fn as_any(&self) -> &dyn Any {
@@ -418,10 +412,9 @@ struct TransitionTask {
impl ExpiryOp for TransitionTask {
fn op_hash(&self) -> u64 {
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}", self.obj_info.bucket).as_bytes());
//let _ = hasher.write(format!("{}", self.obj_info.versions[0].object_name).as_bytes());
hasher.flush();
xxh64::xxh64(hasher.clone().finalize().as_slice(), XXHASH_SEED)
hasher.update(format!("{}", self.obj_info.bucket).as_bytes());
// hasher.update(format!("{}", self.obj_info.versions[0].object_name).as_bytes());
xxh64::xxh64(hasher.finalize().as_slice(), XXHASH_SEED)
}
fn as_any(&self) -> &dyn Any {
@@ -483,7 +476,7 @@ impl TransitionState {
.and_then(|s| s.parse::<i64>().ok())
.unwrap_or_else(|| std::cmp::min(num_cpus::get() as i64, 16));
let mut n = max_workers;
let tw = 8; //globalILMConfig.getTransitionWorkers();
let tw = 8; //globalILMConfig.getTransitionWorkers();
if tw > 0 {
n = tw;
}
@@ -763,9 +756,8 @@ pub async fn expire_transitioned_object(
pub fn gen_transition_objname(bucket: &str) -> Result<String, Error> {
let us = Uuid::new_v4().to_string();
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}/{}", get_global_deployment_id().unwrap_or_default(), bucket).as_bytes());
hasher.flush();
let hash = rustfs_utils::crypto::hex(hasher.clone().finalize().as_slice());
hasher.update(format!("{}/{}", get_global_deployment_id().unwrap_or_default(), bucket).as_bytes());
let hash = rustfs_utils::crypto::hex(hasher.finalize().as_slice());
let obj = format!("{}/{}/{}/{}", &hash[0..16], &us[0..2], &us[2..4], &us);
Ok(obj)
}

View File

@@ -20,7 +20,7 @@
use sha2::{Digest, Sha256};
use std::any::Any;
use std::io::{Cursor, Write};
use std::io::Write;
use xxhash_rust::xxh64;
use super::bucket_lifecycle_ops::{ExpiryOp, GLOBAL_ExpiryState, TransitionedObject};
@@ -128,10 +128,9 @@ pub struct Jentry {
impl ExpiryOp for Jentry {
fn op_hash(&self) -> u64 {
let mut hasher = Sha256::new();
let _ = hasher.write(format!("{}", self.tier_name).as_bytes());
let _ = hasher.write(format!("{}", self.obj_name).as_bytes());
hasher.flush();
xxh64::xxh64(hasher.clone().finalize().as_slice(), XXHASH_SEED)
hasher.update(format!("{}", self.tier_name).as_bytes());
hasher.update(format!("{}", self.obj_name).as_bytes());
xxh64::xxh64(hasher.finalize().as_slice(), XXHASH_SEED)
}
fn as_any(&self) -> &dyn Any {

View File

@@ -12,10 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use super::{error::BucketMetadataError, metadata_sys::get_bucket_metadata_sys};
use crate::error::Result;
use super::metadata_sys::get_bucket_metadata_sys;
use crate::error::{Result, StorageError};
use rustfs_policy::policy::{BucketPolicy, BucketPolicyArgs};
use tracing::warn;
use tracing::info;
pub struct PolicySys {}
@@ -24,9 +24,8 @@ impl PolicySys {
match Self::get(args.bucket).await {
Ok(cfg) => return cfg.is_allowed(args),
Err(err) => {
let berr: BucketMetadataError = err.into();
if berr != BucketMetadataError::BucketPolicyNotFound {
warn!("config get err {:?}", berr);
if err != StorageError::ConfigNotFound {
info!("config get err {:?}", err);
}
}
}

View File

@@ -103,6 +103,8 @@ impl ReplicationConfigurationExt for ReplicationConfiguration {
if filter.test_tags(&object_tags) {
rules.push(rule.clone());
}
} else {
rules.push(rule.clone());
}
}

View File

@@ -34,7 +34,7 @@ use rustfs_filemeta::{
};
use rustfs_utils::http::{
AMZ_BUCKET_REPLICATION_STATUS, AMZ_OBJECT_TAGGING, AMZ_TAGGING_DIRECTIVE, CONTENT_ENCODING, HeaderExt as _,
RESERVED_METADATA_PREFIX, RESERVED_METADATA_PREFIX_LOWER, RUSTFS_REPLICATION_AUTUAL_OBJECT_SIZE,
RESERVED_METADATA_PREFIX, RESERVED_METADATA_PREFIX_LOWER, RUSTFS_REPLICATION_ACTUAL_OBJECT_SIZE,
RUSTFS_REPLICATION_RESET_STATUS, SSEC_ALGORITHM_HEADER, SSEC_KEY_HEADER, SSEC_KEY_MD5_HEADER, headers,
};
use rustfs_utils::path::path_join_buf;
@@ -2324,7 +2324,7 @@ async fn replicate_object_with_multipart(
let mut user_metadata = HashMap::new();
user_metadata.insert(
RUSTFS_REPLICATION_AUTUAL_OBJECT_SIZE.to_string(),
RUSTFS_REPLICATION_ACTUAL_OBJECT_SIZE.to_string(),
object_info
.user_defined
.get(&format!("{RESERVED_METADATA_PREFIX}actual-size"))

View File

@@ -19,7 +19,7 @@ use rustfs_filemeta::{MetaCacheEntries, MetaCacheEntry, MetacacheReader, is_io_e
use std::{future::Future, pin::Pin, sync::Arc};
use tokio::spawn;
use tokio_util::sync::CancellationToken;
use tracing::{error, warn};
use tracing::{error, info, warn};
pub type AgreedFn = Box<dyn Fn(MetaCacheEntry) -> Pin<Box<dyn Future<Output = ()> + Send>> + Send + 'static>;
pub type PartialFn =
@@ -99,7 +99,7 @@ pub async fn list_path_raw(rx: CancellationToken, opts: ListPathRawOptions) -> d
match disk.walk_dir(wakl_opts, &mut wr).await {
Ok(_res) => {}
Err(err) => {
error!("walk dir err {:?}", &err);
info!("walk dir err {:?}", &err);
need_fallback = true;
}
}

View File

@@ -21,13 +21,12 @@
use http::HeaderMap;
use s3s::dto::ETag;
use std::pin::Pin;
use std::{collections::HashMap, io::Cursor, sync::Arc};
use tokio::io::BufReader;
use crate::error::ErrorResponse;
use crate::store_api::{GetObjectReader, HTTPRangeSpec, ObjectInfo, ObjectOptions};
use rustfs_filemeta::fileinfo::ObjectPartInfo;
use rustfs_filemeta::ObjectPartInfo;
use rustfs_rio::HashReader;
use s3s::S3ErrorCode;

View File

@@ -277,6 +277,7 @@ pub async fn compute_bucket_usage(store: Arc<ECStore>, bucket_name: &str) -> Res
1000, // max_keys
false, // fetch_owner
None, // start_after
false, // incl_deleted
)
.await?;

View File

@@ -140,6 +140,12 @@ pub enum DiskError {
#[error("io error {0}")]
Io(io::Error),
#[error("source stalled")]
SourceStalled,
#[error("timeout")]
Timeout,
}
impl DiskError {
@@ -366,6 +372,8 @@ impl Clone for DiskError {
DiskError::ErasureWriteQuorum => DiskError::ErasureWriteQuorum,
DiskError::ErasureReadQuorum => DiskError::ErasureReadQuorum,
DiskError::ShortWrite => DiskError::ShortWrite,
DiskError::SourceStalled => DiskError::SourceStalled,
DiskError::Timeout => DiskError::Timeout,
}
}
}
@@ -412,6 +420,8 @@ impl DiskError {
DiskError::ErasureWriteQuorum => 0x25,
DiskError::ErasureReadQuorum => 0x26,
DiskError::ShortWrite => 0x27,
DiskError::SourceStalled => 0x28,
DiskError::Timeout => 0x29,
}
}
@@ -456,6 +466,8 @@ impl DiskError {
0x25 => Some(DiskError::ErasureWriteQuorum),
0x26 => Some(DiskError::ErasureReadQuorum),
0x27 => Some(DiskError::ShortWrite),
0x28 => Some(DiskError::SourceStalled),
0x29 => Some(DiskError::Timeout),
_ => None,
}
}

View File

@@ -806,7 +806,7 @@ impl LocalDisk {
Ok((bytes, modtime))
}
async fn delete_versions_internal(&self, volume: &str, path: &str, fis: &Vec<FileInfo>) -> Result<()> {
async fn delete_versions_internal(&self, volume: &str, path: &str, fis: &[FileInfo]) -> Result<()> {
let volume_dir = self.get_bucket_path(volume)?;
let xlpath = self.get_object_path(volume, format!("{path}/{STORAGE_FORMAT_FILE}").as_str())?;
@@ -820,7 +820,7 @@ impl LocalDisk {
fm.unmarshal_msg(&data)?;
for fi in fis {
for fi in fis.iter() {
let data_dir = match fm.delete_version(fi) {
Ok(res) => res,
Err(err) => {
@@ -1085,16 +1085,9 @@ impl LocalDisk {
*item = "".to_owned();
if entry.ends_with(STORAGE_FORMAT_FILE) {
let metadata = match self
let metadata = self
.read_metadata(self.get_object_path(bucket, format!("{}/{}", &current, &entry).as_str())?)
.await
{
Ok(res) => res,
Err(err) => {
warn!("scan dir read_metadata error, continue {:?}", err);
continue;
}
};
.await?;
let entry = entry.strip_suffix(STORAGE_FORMAT_FILE).unwrap_or_default().to_owned();
let name = entry.trim_end_matches(SLASH_SEPARATOR);
@@ -1143,23 +1136,21 @@ impl LocalDisk {
let name = path_join_buf(&[current.as_str(), entry.as_str()]);
if !dir_stack.is_empty() {
if let Some(pop) = dir_stack.last().cloned() {
if pop < name {
out.write_obj(&MetaCacheEntry {
name: pop.clone(),
..Default::default()
})
.await?;
while let Some(pop) = dir_stack.last().cloned()
&& pop < name
{
out.write_obj(&MetaCacheEntry {
name: pop.clone(),
..Default::default()
})
.await?;
if opts.recursive {
if let Err(er) = Box::pin(self.scan_dir(pop, prefix.clone(), opts, out, objs_returned)).await {
error!("scan_dir err {:?}", er);
}
}
dir_stack.pop();
if opts.recursive {
if let Err(er) = Box::pin(self.scan_dir(pop, prefix.clone(), opts, out, objs_returned)).await {
error!("scan_dir err {:?}", er);
}
}
dir_stack.pop();
}
let mut meta = MetaCacheEntry {
@@ -1196,9 +1187,6 @@ impl LocalDisk {
// }
}
Err(err) => {
if err == Error::DiskNotDir {
continue;
}
if err == Error::FileNotFound || err == Error::IsNotRegular {
// NOT an object, append to stack (with slash)
// If dirObject, but no metadata (which is unexpected) we skip it.
@@ -1213,7 +1201,7 @@ impl LocalDisk {
};
}
while let Some(dir) = dir_stack.last() {
while let Some(dir) = dir_stack.pop() {
if opts.limit > 0 && *objs_returned >= opts.limit {
return Ok(());
}
@@ -1225,11 +1213,10 @@ impl LocalDisk {
.await?;
if opts.recursive {
if let Err(er) = Box::pin(self.scan_dir(dir.clone(), prefix.clone(), opts, out, objs_returned)).await {
if let Err(er) = Box::pin(self.scan_dir(dir, prefix.clone(), opts, out, objs_returned)).await {
warn!("scan_dir err {:?}", &er);
}
}
dir_stack.pop();
}
Ok(())
@@ -2313,7 +2300,6 @@ impl DiskAPI for LocalDisk {
let buf = match self.read_all_data(volume, &volume_dir, &xl_path).await {
Ok(res) => res,
Err(err) => {
//
if err != DiskError::FileNotFound {
return Err(err);
}

View File

@@ -25,6 +25,7 @@ use tracing::warn;
use super::error::DiskError;
/// Check path length according to OS limits.
pub fn check_path_length(path_name: &str) -> Result<()> {
// Apple OS X path length is limited to 1016
if cfg!(target_os = "macos") && path_name.len() > 1016 {
@@ -64,6 +65,10 @@ pub fn check_path_length(path_name: &str) -> Result<()> {
Ok(())
}
/// Check if the given disk path is the root disk.
/// On Windows, always return false.
/// On Unix, compare the disk paths.
#[tracing::instrument(level = "debug", skip_all)]
pub fn is_root_disk(disk_path: &str, root_disk: &str) -> Result<bool> {
if cfg!(target_os = "windows") {
return Ok(false);
@@ -72,6 +77,8 @@ pub fn is_root_disk(disk_path: &str, root_disk: &str) -> Result<bool> {
rustfs_utils::os::same_disk(disk_path, root_disk).map_err(|e| to_file_error(e).into())
}
/// Create a directory and all its parent components if they are missing.
#[tracing::instrument(level = "debug", skip_all)]
pub async fn make_dir_all(path: impl AsRef<Path>, base_dir: impl AsRef<Path>) -> Result<()> {
check_path_length(path.as_ref().to_string_lossy().to_string().as_str())?;
@@ -82,11 +89,16 @@ pub async fn make_dir_all(path: impl AsRef<Path>, base_dir: impl AsRef<Path>) ->
Ok(())
}
/// Check if a directory is empty.
/// Only reads one entry to determine if the directory is empty.
#[tracing::instrument(level = "debug", skip_all)]
pub async fn is_empty_dir(path: impl AsRef<Path>) -> bool {
read_dir(path.as_ref(), 1).await.is_ok_and(|v| v.is_empty())
}
// read_dir count read limit. when count == 0 unlimit.
/// Return file names in the directory.
#[tracing::instrument(level = "debug", skip_all)]
pub async fn read_dir(path: impl AsRef<Path>, count: i32) -> std::io::Result<Vec<String>> {
let mut entries = fs::read_dir(path.as_ref()).await?;
@@ -197,6 +209,10 @@ pub async fn reliable_mkdir_all(path: impl AsRef<Path>, base_dir: impl AsRef<Pat
Ok(())
}
/// Create a directory and all its parent components if they are missing.
/// Without recursion support, fall back to create_dir_all
/// This function will not create directories under base_dir.
#[tracing::instrument(level = "debug", skip_all)]
pub async fn os_mkdir_all(dir_path: impl AsRef<Path>, base_dir: impl AsRef<Path>) -> io::Result<()> {
if !base_dir.as_ref().to_string_lossy().is_empty() && base_dir.as_ref().starts_with(dir_path.as_ref()) {
return Ok(());
@@ -225,6 +241,9 @@ pub async fn os_mkdir_all(dir_path: impl AsRef<Path>, base_dir: impl AsRef<Path>
Ok(())
}
/// Check if a file exists.
/// Returns true if the file exists, false otherwise.
#[tracing::instrument(level = "debug", skip_all)]
pub fn file_exists(path: impl AsRef<Path>) -> bool {
std::fs::metadata(path.as_ref()).map(|_| true).unwrap_or(false)
}

View File

@@ -12,25 +12,23 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::{HashMap, HashSet};
use chrono::Utc;
use rustfs_common::{
globals::{GLOBAL_Local_Node_Name, GLOBAL_Rustfs_Addr},
heal_channel::DriveState,
metrics::globalMetrics,
};
use rustfs_madmin::metrics::{DiskIOStats, DiskMetric, RealtimeMetrics};
use rustfs_utils::os::get_drive_stats;
use serde::{Deserialize, Serialize};
use tracing::{debug, info};
use crate::{
admin_server_info::get_local_server_property,
new_object_layer_fn,
store_api::StorageAPI,
// utils::os::get_drive_stats,
};
use chrono::Utc;
use rustfs_common::{
globals::{GLOBAL_Local_Node_Name, GLOBAL_Rustfs_Addr},
heal_channel::DriveState,
metrics::global_metrics,
};
use rustfs_madmin::metrics::{DiskIOStats, DiskMetric, RealtimeMetrics};
use rustfs_utils::os::get_drive_stats;
use serde::{Deserialize, Serialize};
use std::collections::{HashMap, HashSet};
use tracing::{debug, info};
#[derive(Debug, Default, Serialize, Deserialize)]
pub struct CollectMetricsOpts {
@@ -118,7 +116,7 @@ pub async fn collect_local_metrics(types: MetricType, opts: &CollectMetricsOpts)
if types.contains(&MetricType::SCANNER) {
debug!("start get scanner metrics");
let metrics = globalMetrics.report().await;
let metrics = global_metrics().report().await;
real_time_metrics.aggregated.scanner = Some(metrics);
}

View File

@@ -26,9 +26,11 @@ use rustfs_madmin::metrics::RealtimeMetrics;
use rustfs_madmin::net::NetInfo;
use rustfs_madmin::{ItemState, ServerProperties};
use std::collections::hash_map::DefaultHasher;
use std::future::Future;
use std::hash::{Hash, Hasher};
use std::sync::OnceLock;
use std::time::SystemTime;
use std::time::{Duration, SystemTime};
use tokio::time::timeout;
use tracing::{error, warn};
lazy_static! {
@@ -220,24 +222,21 @@ impl NotificationSys {
pub async fn server_info(&self) -> Vec<ServerProperties> {
let mut futures = Vec::with_capacity(self.peer_clients.len());
let endpoints = get_global_endpoints();
let peer_timeout = Duration::from_secs(2);
for client in self.peer_clients.iter() {
let endpoints = endpoints.clone();
futures.push(async move {
if let Some(client) = client {
match client.server_info().await {
Ok(info) => info,
Err(_) => ServerProperties {
uptime: SystemTime::now()
.duration_since(*GLOBAL_BOOT_TIME.get().unwrap())
.unwrap_or_default()
.as_secs(),
version: get_commit_id(),
endpoint: client.host.to_string(),
state: ItemState::Offline.to_string().to_owned(),
disks: get_offline_disks(&client.host.to_string(), &get_global_endpoints()),
..Default::default()
},
}
let host = client.host.to_string();
call_peer_with_timeout(
peer_timeout,
&host,
|| client.server_info(),
|| offline_server_properties(&host, &endpoints),
)
.await
} else {
ServerProperties::default()
}
@@ -694,6 +693,43 @@ impl NotificationSys {
}
}
async fn call_peer_with_timeout<F, Fut>(
timeout_dur: Duration,
host_label: &str,
op: F,
fallback: impl FnOnce() -> ServerProperties,
) -> ServerProperties
where
F: FnOnce() -> Fut,
Fut: Future<Output = Result<ServerProperties>> + Send,
{
match timeout(timeout_dur, op()).await {
Ok(Ok(info)) => info,
Ok(Err(err)) => {
warn!("peer {host_label} server_info failed: {err}");
fallback()
}
Err(_) => {
warn!("peer {host_label} server_info timed out after {:?}", timeout_dur);
fallback()
}
}
}
fn offline_server_properties(host: &str, endpoints: &EndpointServerPools) -> ServerProperties {
ServerProperties {
uptime: SystemTime::now()
.duration_since(*GLOBAL_BOOT_TIME.get().unwrap())
.unwrap_or_default()
.as_secs(),
version: get_commit_id(),
endpoint: host.to_string(),
state: ItemState::Offline.to_string().to_owned(),
disks: get_offline_disks(host, endpoints),
..Default::default()
}
}
fn get_offline_disks(offline_host: &str, endpoints: &EndpointServerPools) -> Vec<rustfs_madmin::Disk> {
let mut offline_disks = Vec::new();
@@ -714,3 +750,57 @@ fn get_offline_disks(offline_host: &str, endpoints: &EndpointServerPools) -> Vec
offline_disks
}
#[cfg(test)]
mod tests {
use super::*;
fn build_props(endpoint: &str) -> ServerProperties {
ServerProperties {
endpoint: endpoint.to_string(),
..Default::default()
}
}
#[tokio::test]
async fn call_peer_with_timeout_returns_value_when_fast() {
let result = call_peer_with_timeout(
Duration::from_millis(50),
"peer-1",
|| async { Ok::<_, Error>(build_props("fast")) },
|| build_props("fallback"),
)
.await;
assert_eq!(result.endpoint, "fast");
}
#[tokio::test]
async fn call_peer_with_timeout_uses_fallback_on_error() {
let result = call_peer_with_timeout(
Duration::from_millis(50),
"peer-2",
|| async { Err::<ServerProperties, _>(Error::other("boom")) },
|| build_props("fallback"),
)
.await;
assert_eq!(result.endpoint, "fallback");
}
#[tokio::test]
async fn call_peer_with_timeout_uses_fallback_on_timeout() {
let result = call_peer_with_timeout(
Duration::from_millis(5),
"peer-3",
|| async {
tokio::time::sleep(Duration::from_millis(25)).await;
Ok::<_, Error>(build_props("slow"))
},
|| build_props("fallback"),
)
.await;
assert_eq!(result.endpoint, "fallback");
}
}

View File

@@ -15,7 +15,7 @@
use crate::global::get_global_action_cred;
use base64::Engine as _;
use base64::engine::general_purpose;
use hmac::{Hmac, Mac};
use hmac::{Hmac, KeyInit, Mac};
use http::HeaderMap;
use http::HeaderValue;
use http::Method;

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::path::PathBuf;
use std::{path::PathBuf, time::Duration};
use bytes::Bytes;
use futures::lock::Mutex;
@@ -40,7 +40,7 @@ use crate::{
use rustfs_filemeta::{FileInfo, ObjectPartInfo, RawFileInfo};
use rustfs_protos::proto_gen::node_service::RenamePartRequest;
use rustfs_rio::{HttpReader, HttpWriter};
use tokio::io::AsyncWrite;
use tokio::{io::AsyncWrite, net::TcpStream, time::timeout};
use tonic::Request;
use tracing::info;
use uuid::Uuid;
@@ -54,6 +54,8 @@ pub struct RemoteDisk {
endpoint: Endpoint,
}
const REMOTE_DISK_ONLINE_PROBE_TIMEOUT: Duration = Duration::from_millis(750);
impl RemoteDisk {
pub async fn new(ep: &Endpoint, _opt: &DiskOption) -> Result<Self> {
// let root = fs::canonicalize(ep.url.path()).await?;
@@ -83,11 +85,19 @@ impl DiskAPI for RemoteDisk {
#[tracing::instrument(skip(self))]
async fn is_online(&self) -> bool {
// TODO: connection status tracking
if node_service_time_out_client(&self.addr).await.is_ok() {
return true;
let Some(host) = self.endpoint.url.host_str().map(|host| host.to_string()) else {
return false;
};
let port = self.endpoint.url.port_or_known_default().unwrap_or(80);
match timeout(REMOTE_DISK_ONLINE_PROBE_TIMEOUT, TcpStream::connect((host, port))).await {
Ok(Ok(stream)) => {
drop(stream);
true
}
_ => false,
}
false
}
#[tracing::instrument(skip(self))]
@@ -957,6 +967,7 @@ impl DiskAPI for RemoteDisk {
#[cfg(test)]
mod tests {
use super::*;
use tokio::net::TcpListener;
use uuid::Uuid;
#[tokio::test]
@@ -1040,6 +1051,58 @@ mod tests {
assert!(path.to_string_lossy().contains("storage"));
}
#[tokio::test]
async fn test_remote_disk_is_online_detects_active_listener() {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let url = url::Url::parse(&format!("http://{}:{}/data/rustfs0", addr.ip(), addr.port())).unwrap();
let endpoint = Endpoint {
url,
is_local: false,
pool_idx: 0,
set_idx: 0,
disk_idx: 0,
};
let disk_option = DiskOption {
cleanup: false,
health_check: false,
};
let remote_disk = RemoteDisk::new(&endpoint, &disk_option).await.unwrap();
assert!(remote_disk.is_online().await);
drop(listener);
}
#[tokio::test]
async fn test_remote_disk_is_online_detects_missing_listener() {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let ip = addr.ip();
let port = addr.port();
drop(listener);
let url = url::Url::parse(&format!("http://{ip}:{port}/data/rustfs0")).unwrap();
let endpoint = Endpoint {
url,
is_local: false,
pool_idx: 0,
set_idx: 0,
disk_idx: 0,
};
let disk_option = DiskOption {
cleanup: false,
health_check: false,
};
let remote_disk = RemoteDisk::new(&endpoint, &disk_option).await.unwrap();
assert!(!remote_disk.is_online().await);
}
#[tokio::test]
async fn test_remote_disk_disk_id() {
let url = url::Url::parse("http://remote-server:9000").unwrap();

View File

@@ -68,6 +68,7 @@ use md5::{Digest as Md5Digest, Md5};
use rand::{Rng, seq::SliceRandom};
use regex::Regex;
use rustfs_common::heal_channel::{DriveState, HealChannelPriority, HealItemType, HealOpts, HealScanMode, send_heal_disk};
use rustfs_config::MI_B;
use rustfs_filemeta::{
FileInfo, FileMeta, FileMetaShallowVersion, MetaCacheEntries, MetaCacheEntry, MetadataResolutionParams, ObjectPartInfo,
RawFileInfo, ReplicationStatusType, VersionPurgeStatusType, file_info_from_raw, merge_file_meta_versions,
@@ -88,7 +89,7 @@ use s3s::header::X_AMZ_RESTORE;
use sha2::{Digest, Sha256};
use std::hash::Hash;
use std::mem::{self};
use std::time::SystemTime;
use std::time::{Instant, SystemTime};
use std::{
collections::{HashMap, HashSet},
io::{Cursor, Write},
@@ -104,15 +105,17 @@ use tokio::{
use tokio::{
select,
sync::mpsc::{self, Sender},
time::interval,
time::{interval, timeout},
};
use tokio_util::sync::CancellationToken;
use tracing::error;
use tracing::{debug, info, warn};
use uuid::Uuid;
pub const DEFAULT_READ_BUFFER_SIZE: usize = 1024 * 1024;
pub const DEFAULT_READ_BUFFER_SIZE: usize = MI_B; // 1 MiB = 1024 * 1024;
pub const MAX_PARTS_COUNT: usize = 10000;
const DISK_ONLINE_TIMEOUT: Duration = Duration::from_secs(1);
const DISK_HEALTH_CACHE_TTL: Duration = Duration::from_millis(750);
#[derive(Clone, Debug)]
pub struct SetDisks {
@@ -125,6 +128,23 @@ pub struct SetDisks {
pub set_index: usize,
pub pool_index: usize,
pub format: FormatV3,
disk_health_cache: Arc<RwLock<Vec<Option<DiskHealthEntry>>>>,
}
#[derive(Clone, Debug)]
struct DiskHealthEntry {
last_check: Instant,
online: bool,
}
impl DiskHealthEntry {
fn cached_value(&self) -> Option<bool> {
if self.last_check.elapsed() <= DISK_HEALTH_CACHE_TTL {
Some(self.online)
} else {
None
}
}
}
impl SetDisks {
@@ -150,8 +170,60 @@ impl SetDisks {
pool_index,
format,
set_endpoints,
disk_health_cache: Arc::new(RwLock::new(Vec::new())),
})
}
async fn cached_disk_health(&self, index: usize) -> Option<bool> {
let cache = self.disk_health_cache.read().await;
cache
.get(index)
.and_then(|entry| entry.as_ref().and_then(|state| state.cached_value()))
}
async fn update_disk_health(&self, index: usize, online: bool) {
let mut cache = self.disk_health_cache.write().await;
if cache.len() <= index {
cache.resize(index + 1, None);
}
cache[index] = Some(DiskHealthEntry {
last_check: Instant::now(),
online,
});
}
async fn is_disk_online_cached(&self, index: usize, disk: &DiskStore) -> bool {
if let Some(online) = self.cached_disk_health(index).await {
return online;
}
let disk_clone = disk.clone();
let online = timeout(DISK_ONLINE_TIMEOUT, async move { disk_clone.is_online().await })
.await
.unwrap_or(false);
self.update_disk_health(index, online).await;
online
}
async fn filter_online_disks(&self, disks: Vec<Option<DiskStore>>) -> (Vec<Option<DiskStore>>, usize) {
let mut filtered = Vec::with_capacity(disks.len());
let mut online_count = 0;
for (idx, disk) in disks.into_iter().enumerate() {
if let Some(disk_store) = disk {
if self.is_disk_online_cached(idx, &disk_store).await {
filtered.push(Some(disk_store));
online_count += 1;
} else {
filtered.push(None);
}
} else {
filtered.push(None);
}
}
(filtered, online_count)
}
fn format_lock_error(&self, bucket: &str, object: &str, mode: &str, err: &LockResult) -> String {
match err {
LockResult::Timeout => {
@@ -187,25 +259,9 @@ impl SetDisks {
}
async fn get_online_disks(&self) -> Vec<Option<DiskStore>> {
let mut disks = self.get_disks_internal().await;
// TODO: diskinfo filter online
let mut new_disk = Vec::with_capacity(disks.len());
for disk in disks.iter() {
if let Some(d) = disk {
if d.is_online().await {
new_disk.push(disk.clone());
}
}
}
let mut rng = rand::rng();
disks.shuffle(&mut rng);
new_disk
let disks = self.get_disks_internal().await;
let (filtered, _) = self.filter_online_disks(disks).await;
filtered.into_iter().filter(|disk| disk.is_some()).collect()
}
async fn get_online_local_disks(&self) -> Vec<Option<DiskStore>> {
let mut disks = self.get_online_disks().await;
@@ -1260,21 +1316,38 @@ impl SetDisks {
for (i, meta) in metas.iter().enumerate() {
if !meta.is_valid() {
debug!(
index = i,
valid = false,
version_id = ?meta.version_id,
mod_time = ?meta.mod_time,
"find_file_info_in_quorum: skipping invalid meta"
);
continue;
}
debug!(
index = i,
valid = true,
version_id = ?meta.version_id,
mod_time = ?meta.mod_time,
deleted = meta.deleted,
size = meta.size,
"find_file_info_in_quorum: inspecting meta"
);
let etag_only = mod_time.is_none() && etag.is_some() && meta.get_etag().is_some_and(|v| &v == etag.as_ref().unwrap());
let mod_valid = mod_time == &meta.mod_time;
if etag_only || mod_valid {
for part in meta.parts.iter() {
let _ = hasher.write(format!("part.{}", part.number).as_bytes())?;
let _ = hasher.write(format!("part.{}", part.size).as_bytes())?;
hasher.update(format!("part.{}", part.number).as_bytes());
hasher.update(format!("part.{}", part.size).as_bytes());
}
if !meta.deleted && meta.size != 0 {
let _ = hasher.write(format!("{}+{}", meta.erasure.data_blocks, meta.erasure.parity_blocks).as_bytes())?;
let _ = hasher.write(format!("{:?}", meta.erasure.distribution).as_bytes())?;
hasher.update(format!("{}+{}", meta.erasure.data_blocks, meta.erasure.parity_blocks).as_bytes());
hasher.update(format!("{:?}", meta.erasure.distribution).as_bytes());
}
if meta.is_remote() {
@@ -1285,11 +1358,16 @@ impl SetDisks {
// TODO: IsCompressed
hasher.flush()?;
meta_hashes[i] = Some(hex(hasher.clone().finalize().as_slice()));
hasher.reset();
} else {
debug!(
index = i,
etag_only_match = etag_only,
mod_valid_match = mod_valid,
"find_file_info_in_quorum: meta does not match common etag or mod_time, skipping hash calculation"
);
}
}
@@ -1438,7 +1516,7 @@ impl SetDisks {
object: &str,
version_id: &str,
opts: &ReadOptions,
) -> Result<Vec<rustfs_filemeta::FileInfo>> {
) -> Result<Vec<FileInfo>> {
// Use existing disk selection logic
let disks = self.disks.read().await;
let required_reads = self.format.erasure.sets.len();
@@ -2127,11 +2205,11 @@ impl SetDisks {
// TODO: replicatio
if fi.deleted {
if opts.version_id.is_none() || opts.delete_marker {
return Err(to_object_err(StorageError::FileNotFound, vec![bucket, object]));
return if opts.version_id.is_none() || opts.delete_marker {
Err(to_object_err(StorageError::FileNotFound, vec![bucket, object]))
} else {
return Err(to_object_err(StorageError::MethodNotAllowed, vec![bucket, object]));
}
Err(to_object_err(StorageError::MethodNotAllowed, vec![bucket, object]))
};
}
Ok((oi, write_quorum))
@@ -2159,7 +2237,7 @@ impl SetDisks {
where
W: AsyncWrite + Send + Sync + Unpin + 'static,
{
tracing::debug!(bucket, object, requested_length = length, offset, "get_object_with_fileinfo start");
debug!(bucket, object, requested_length = length, offset, "get_object_with_fileinfo start");
let (disks, files) = Self::shuffle_disks_and_parts_metadata_by_index(disks, &files, &fi);
let total_size = fi.size as usize;
@@ -2184,27 +2262,20 @@ impl SetDisks {
let (last_part_index, last_part_relative_offset) = fi.to_part_offset(end_offset)?;
tracing::debug!(
debug!(
bucket,
object,
offset,
length,
end_offset,
part_index,
last_part_index,
last_part_relative_offset,
"Multipart read bounds"
object, offset, length, end_offset, part_index, last_part_index, last_part_relative_offset, "Multipart read bounds"
);
let erasure = erasure_coding::Erasure::new(fi.erasure.data_blocks, fi.erasure.parity_blocks, fi.erasure.block_size);
let part_indices: Vec<usize> = (part_index..=last_part_index).collect();
tracing::debug!(bucket, object, ?part_indices, "Multipart part indices to stream");
debug!(bucket, object, ?part_indices, "Multipart part indices to stream");
let mut total_read = 0;
for current_part in part_indices {
if total_read == length {
tracing::debug!(
debug!(
bucket,
object,
total_read,
@@ -2226,7 +2297,7 @@ impl SetDisks {
let read_offset = (part_offset / erasure.block_size) * erasure.shard_size();
tracing::debug!(
debug!(
bucket,
object,
part_index = current_part,
@@ -2281,12 +2352,65 @@ impl SetDisks {
return Err(Error::other(format!("not enough disks to read: {errors:?}")));
}
// Check if we have missing shards even though we can read successfully
// This happens when a node was offline during write and comes back online
let total_shards = erasure.data_shards + erasure.parity_shards;
let available_shards = nil_count;
let missing_shards = total_shards - available_shards;
info!(
bucket,
object,
part_number,
total_shards,
available_shards,
missing_shards,
data_shards = erasure.data_shards,
parity_shards = erasure.parity_shards,
"Shard availability check"
);
if missing_shards > 0 && available_shards >= erasure.data_shards {
// We have missing shards but enough to read - trigger background heal
info!(
bucket,
object,
part_number,
missing_shards,
available_shards,
pool_index,
set_index,
"Detected missing shards during read, triggering background heal"
);
if let Err(e) =
rustfs_common::heal_channel::send_heal_request(rustfs_common::heal_channel::create_heal_request_with_options(
bucket.to_string(),
Some(object.to_string()),
false,
Some(HealChannelPriority::Normal),
Some(pool_index),
Some(set_index),
))
.await
{
warn!(
bucket,
object,
part_number,
error = %e,
"Failed to enqueue heal request for missing shards"
);
} else {
warn!(bucket, object, part_number, "Successfully enqueued heal request for missing shards");
}
}
// debug!(
// "read part {} part_offset {},part_length {},part_size {} ",
// part_number, part_offset, part_length, part_size
// );
let (written, err) = erasure.decode(writer, readers, part_offset, part_length, part_size).await;
tracing::debug!(
debug!(
bucket,
object,
part_index = current_part,
@@ -2302,7 +2426,7 @@ impl SetDisks {
match de_err {
DiskError::FileNotFound | DiskError::FileCorrupt => {
error!("erasure.decode err 111 {:?}", &de_err);
let _ = rustfs_common::heal_channel::send_heal_request(
if let Err(e) = rustfs_common::heal_channel::send_heal_request(
rustfs_common::heal_channel::create_heal_request_with_options(
bucket.to_string(),
Some(object.to_string()),
@@ -2312,7 +2436,16 @@ impl SetDisks {
Some(set_index),
),
)
.await;
.await
{
warn!(
bucket,
object,
part_number,
error = %e,
"Failed to enqueue heal request after decode error"
);
}
has_err = false;
}
_ => {}
@@ -2333,7 +2466,7 @@ impl SetDisks {
// debug!("read end");
tracing::debug!(bucket, object, total_read, expected_length = length, "Multipart read finished");
debug!(bucket, object, total_read, expected_length = length, "Multipart read finished");
Ok(())
}
@@ -2452,6 +2585,7 @@ impl SetDisks {
Ok((new_disks, new_infos, healing))
}
#[tracing::instrument(skip(self, opts), fields(bucket = %bucket, object = %object, version_id = %version_id))]
async fn heal_object(
&self,
bucket: &str,
@@ -2459,10 +2593,7 @@ impl SetDisks {
version_id: &str,
opts: &HealOpts,
) -> disk::error::Result<(HealResultItem, Option<DiskError>)> {
info!(
"SetDisks heal_object: bucket={}, object={}, version_id={}, opts={:?}",
bucket, object, version_id, opts
);
info!(?opts, "Starting heal_object");
let mut result = HealResultItem {
heal_item_type: HealItemType::Object.to_string(),
bucket: bucket.to_string(),
@@ -2494,20 +2625,34 @@ impl SetDisks {
if reuse_existing_lock {
None
} else {
let start_time = std::time::Instant::now();
let lock_result = self
.fast_lock_manager
.acquire_write_lock(bucket, object, self.locker_owner.as_str())
.await
.map_err(|e| {
let elapsed = start_time.elapsed();
let message = self.format_lock_error(bucket, object, "write", &e);
error!("Failed to acquire write lock for heal operation after {:?}: {}", elapsed, message);
DiskError::other(message)
})?;
let elapsed = start_time.elapsed();
info!("Successfully acquired write lock for object: {} in {:?}", object, elapsed);
Some(lock_result)
let mut lock_result = None;
for i in 0..3 {
let start_time = Instant::now();
match self
.fast_lock_manager
.acquire_write_lock(bucket, object, self.locker_owner.as_str())
.await
{
Ok(res) => {
let elapsed = start_time.elapsed();
info!(duration = ?elapsed, attempt = i + 1, "Write lock acquired");
lock_result = Some(res);
break;
}
Err(e) => {
let elapsed = start_time.elapsed();
info!(error = ?e, attempt = i + 1, duration = ?elapsed, "Lock acquisition failed, retrying");
if i < 2 {
tokio::time::sleep(Duration::from_millis(50 * (i as u64 + 1))).await;
} else {
let message = self.format_lock_error(bucket, object, "write", &e);
error!("Failed to acquire write lock after retries: {}", message);
return Err(DiskError::other(message));
}
}
}
}
lock_result
}
} else {
info!("Skipping lock acquisition (no_lock=true)");
@@ -2524,8 +2669,37 @@ impl SetDisks {
let disks = { self.disks.read().await.clone() };
let (mut parts_metadata, errs) = Self::read_all_fileinfo(&disks, "", bucket, object, version_id, true, true).await?;
info!("Read file info: parts_metadata.len()={}, errs={:?}", parts_metadata.len(), errs);
let (mut parts_metadata, errs) = {
let mut retry_count = 0;
loop {
let (parts, errs) = Self::read_all_fileinfo(&disks, "", bucket, object, version_id, true, true).await?;
// Check if we have enough valid metadata to proceed
// If we have too many errors, and we haven't exhausted retries, try again
let valid_count = errs.iter().filter(|e| e.is_none()).count();
// Simple heuristic: if valid_count is less than expected quorum (e.g. half disks), retry
// But we don't know the exact quorum yet. Let's just retry on high error rate if possible.
// Actually, read_all_fileinfo shouldn't fail easily.
// Let's just retry if we see ANY non-NotFound errors that might be transient (like timeouts)
let has_transient_error = errs
.iter()
.any(|e| matches!(e, Some(DiskError::SourceStalled) | Some(DiskError::Timeout)));
if !has_transient_error || retry_count >= 3 {
break (parts, errs);
}
info!(
"read_all_fileinfo encountered transient errors, retrying (attempt {}/3). Errs: {:?}",
retry_count + 1,
errs
);
tokio::time::sleep(Duration::from_millis(50 * (retry_count as u64 + 1))).await;
retry_count += 1;
}
};
info!(parts_count = parts_metadata.len(), ?errs, "File info read complete");
if DiskError::is_all_not_found(&errs) {
warn!(
"heal_object failed, all obj part not found, bucket: {}, obj: {}, version_id: {}",
@@ -2544,7 +2718,7 @@ impl SetDisks {
));
}
info!("About to call object_quorum_from_meta with parts_metadata.len()={}", parts_metadata.len());
info!(parts_count = parts_metadata.len(), "Initiating quorum check");
match Self::object_quorum_from_meta(&parts_metadata, &errs, self.default_parity_count) {
Ok((read_quorum, _)) => {
result.parity_blocks = result.disk_count - read_quorum as usize;
@@ -2569,10 +2743,12 @@ impl SetDisks {
)
.await?;
// info!(
// "disks_with_all_parts: got available_disks: {:?}, data_errs_by_disk: {:?}, data_errs_by_part: {:?}, latest_meta: {:?}",
// available_disks, data_errs_by_disk, data_errs_by_part, latest_meta
// );
info!(
"disks_with_all_parts results: available_disks count={}, total_disks={}",
available_disks.iter().filter(|d| d.is_some()).count(),
available_disks.len()
);
let erasure = if !latest_meta.deleted && !latest_meta.is_remote() {
// Initialize erasure coding
erasure_coding::Erasure::new(
@@ -2592,10 +2768,7 @@ impl SetDisks {
let mut outdate_disks = vec![None; disk_len];
let mut disks_to_heal_count = 0;
// info!(
// "errs: {:?}, data_errs_by_disk: {:?}, latest_meta: {:?}",
// errs, data_errs_by_disk, latest_meta
// );
info!("Checking {} disks for healing needs (bucket={}, object={})", disk_len, bucket, object);
for index in 0..available_disks.len() {
let (yes, reason) = should_heal_object_on_disk(
&errs[index],
@@ -2603,9 +2776,16 @@ impl SetDisks {
&parts_metadata[index],
&latest_meta,
);
info!(
"Disk {} heal check: should_heal={}, reason={:?}, err={:?}, endpoint={}",
index, yes, reason, errs[index], self.set_endpoints[index]
);
if yes {
outdate_disks[index] = disks[index].clone();
disks_to_heal_count += 1;
info!("Disk {} marked for healing (endpoint={})", index, self.set_endpoints[index]);
}
let drive_state = match reason {
@@ -2633,6 +2813,11 @@ impl SetDisks {
});
}
info!(
"Heal check complete: {} disks need healing out of {} total (bucket={}, object={})",
disks_to_heal_count, disk_len, bucket, object
);
if DiskError::is_all_not_found(&errs) {
warn!(
"heal_object failed, all obj part not found, bucket: {}, obj: {}, version_id: {}",
@@ -2667,9 +2852,18 @@ impl SetDisks {
);
if !latest_meta.deleted && disks_to_heal_count > latest_meta.erasure.parity_blocks {
let total_disks = parts_metadata.len();
let healthy_count = total_disks.saturating_sub(disks_to_heal_count);
let required_data = total_disks.saturating_sub(latest_meta.erasure.parity_blocks);
error!(
"file({} : {}) part corrupt too much, can not to fix, disks_to_heal_count: {}, parity_blocks: {}",
bucket, object, disks_to_heal_count, latest_meta.erasure.parity_blocks
"Data corruption detected for {}/{}: Insufficient healthy shards. Need at least {} data shards, but found only {} healthy disks. (Missing/Corrupt: {}, Parity: {})",
bucket,
object,
required_data,
healthy_count,
disks_to_heal_count,
latest_meta.erasure.parity_blocks
);
// Allow for dangling deletes, on versions that have DataDir missing etc.
@@ -2701,7 +2895,7 @@ impl SetDisks {
Ok((self.default_heal_result(m, &t_errs, bucket, object, version_id).await, Some(derr)))
}
Err(err) => {
// t_errs = vec![Some(err.clone()); errs.len()];
// t_errs = vec![Some(err.clone()]; errs.len());
let mut t_errs = Vec::with_capacity(errs.len());
for _ in 0..errs.len() {
t_errs.push(Some(err.clone()));
@@ -2856,7 +3050,7 @@ impl SetDisks {
);
for (index, disk) in latest_disks.iter().enumerate() {
if let Some(outdated_disk) = &out_dated_disks[index] {
info!("Creating writer for index {} (outdated disk)", index);
info!(disk_index = index, "Creating writer for outdated disk");
let writer = create_bitrot_writer(
is_inline_buffer,
Some(outdated_disk),
@@ -2869,7 +3063,7 @@ impl SetDisks {
.await?;
writers.push(Some(writer));
} else {
info!("Skipping writer for index {} (not outdated)", index);
info!(disk_index = index, "Skipping writer (disk not outdated)");
writers.push(None);
}
@@ -2879,7 +3073,7 @@ impl SetDisks {
// // Box::new(Cursor::new(Vec::new()))
// // } else {
// // let disk = disk.clone();
// // let part_path = format!("{}/{}/part.{}", tmp_id, dst_data_dir, part.number);
// // let part_path = format!("{}/{}/part.{}", object, src_data_dir, part.number);
// // disk.create_file("", RUSTFS_META_TMP_BUCKET, &part_path, 0).await?
// // }
// // };
@@ -2975,6 +3169,12 @@ impl SetDisks {
}
}
// Rename from tmp location to the actual location.
info!(
"Starting rename phase: {} disks to process (bucket={}, object={})",
out_dated_disks.iter().filter(|d| d.is_some()).count(),
bucket,
object
);
for (index, outdated_disk) in out_dated_disks.iter().enumerate() {
if let Some(disk) = outdated_disk {
// record the index of the updated disks
@@ -2983,8 +3183,8 @@ impl SetDisks {
parts_metadata[index].set_healing();
info!(
"rename temp data, src_volume: {}, src_path: {}, dst_volume: {}, dst_path: {}",
RUSTFS_META_TMP_BUCKET, tmp_id, bucket, object
"Renaming healed data for disk {} (endpoint={}): src_volume={}, src_path={}, dst_volume={}, dst_path={}",
index, self.set_endpoints[index], RUSTFS_META_TMP_BUCKET, tmp_id, bucket, object
);
let rename_result = disk
.rename_data(RUSTFS_META_TMP_BUCKET, &tmp_id, parts_metadata[index].clone(), bucket, object)
@@ -2992,10 +3192,15 @@ impl SetDisks {
if let Err(err) = &rename_result {
info!(
"rename temp data err: {}. Try fallback to direct xl.meta overwrite...",
err.to_string()
error = %err,
disk_index = index,
endpoint = %self.set_endpoints[index],
"Rename failed, attempting fallback"
);
// Preserve temp files for safety
info!(temp_uuid = %tmp_id, "Rename failed, preserving temporary files for safety");
let healthy_index = latest_disks.iter().position(|d| d.is_some()).unwrap_or(0);
if let Some(healthy_disk) = &latest_disks[healthy_index] {
@@ -3033,7 +3238,10 @@ impl SetDisks {
));
}
} else {
info!("remove temp object, volume: {}, path: {}", RUSTFS_META_TMP_BUCKET, tmp_id);
info!(
"Successfully renamed healed data for disk {} (endpoint={}), removing temp files from volume={}, path={}",
index, self.set_endpoints[index], RUSTFS_META_TMP_BUCKET, tmp_id
);
self.delete_all(RUSTFS_META_TMP_BUCKET, &tmp_id)
.await
@@ -3367,7 +3575,10 @@ impl SetDisks {
}
Ok(m)
} else {
error!("delete_if_dang_ling: is_object_dang_ling errs={:?}", errs);
error!(
"Object {}/{} is corrupted but not dangling (some parts exist). Preserving data for potential manual recovery. Errors: {:?}",
bucket, object, errs
);
Err(DiskError::ErasureReadQuorum)
}
}
@@ -3492,7 +3703,7 @@ impl ObjectIO for SetDisks {
opts: &ObjectOptions,
) -> Result<GetObjectReader> {
// Acquire a shared read-lock early to protect read consistency
let _read_lock_guard = if !opts.no_lock {
let read_lock_guard = if !opts.no_lock {
Some(
self.fast_lock_manager
.acquire_read_lock(bucket, object, self.locker_owner.as_str())
@@ -3554,7 +3765,7 @@ impl ObjectIO for SetDisks {
// Move the read-lock guard into the task so it lives for the duration of the read
// let _guard_to_hold = _read_lock_guard; // moved into closure below
tokio::spawn(async move {
// let _guard = _guard_to_hold; // keep guard alive until task ends
let _guard = read_lock_guard; // keep guard alive until task ends
let mut writer = wd;
if let Err(e) = Self::get_object_with_fileinfo(
&bucket,
@@ -3581,7 +3792,8 @@ impl ObjectIO for SetDisks {
#[tracing::instrument(level = "debug", skip(self, data,))]
async fn put_object(&self, bucket: &str, object: &str, data: &mut PutObjReader, opts: &ObjectOptions) -> Result<ObjectInfo> {
let disks = self.disks.read().await;
let disks_snapshot = self.get_disks_internal().await;
let (disks, filtered_online) = self.filter_online_disks(disks_snapshot).await;
// Acquire per-object exclusive lock via RAII guard. It auto-releases asynchronously on drop.
let _object_lock_guard = if !opts.no_lock {
@@ -3622,6 +3834,14 @@ impl ObjectIO for SetDisks {
write_quorum += 1
}
if filtered_online < write_quorum {
warn!(
"online disk snapshot {} below write quorum {} for {}/{}; returning erasure write quorum error",
filtered_online, write_quorum, bucket, object
);
return Err(to_object_err(Error::ErasureWriteQuorum, vec![bucket, object]));
}
let mut fi = FileInfo::new([bucket, object].join("/").as_str(), data_drives, parity_drives);
fi.version_id = {
@@ -4061,7 +4281,7 @@ impl StorageAPI for SetDisks {
// Acquire locks in batch mode (best effort, matching previous behavior)
let mut batch = rustfs_lock::BatchLockRequest::new(self.locker_owner.as_str()).with_all_or_nothing(false);
let mut unique_objects: std::collections::HashSet<String> = std::collections::HashSet::new();
let mut unique_objects: HashSet<String> = HashSet::new();
for dobj in &objects {
if unique_objects.insert(dobj.object_name.clone()) {
batch = batch.add_write_lock(bucket, dobj.object_name.clone());
@@ -4076,7 +4296,7 @@ impl StorageAPI for SetDisks {
.collect();
let _lock_guards = batch_result.guards;
let failed_map: HashMap<(String, String), rustfs_lock::fast_lock::LockResult> = batch_result
let failed_map: HashMap<(String, String), LockResult> = batch_result
.failed_locks
.into_iter()
.map(|(key, err)| ((key.bucket.as_ref().to_string(), key.object.as_ref().to_string()), err))
@@ -4164,7 +4384,6 @@ impl StorageAPI for SetDisks {
for (_, mut fi_vers) in vers_map {
fi_vers.versions.sort_by(|a, b| a.deleted.cmp(&b.deleted));
fi_vers.versions.reverse();
if let Some(index) = fi_vers.versions.iter().position(|fi| fi.deleted) {
fi_vers.versions.truncate(index + 1);
@@ -4401,6 +4620,7 @@ impl StorageAPI for SetDisks {
_max_keys: i32,
_fetch_owner: bool,
_start_after: Option<String>,
_incl_deleted: bool,
) -> Result<ListObjectsV2Info> {
unimplemented!()
}
@@ -4423,7 +4643,7 @@ impl StorageAPI for SetDisks {
_rx: CancellationToken,
_bucket: &str,
_prefix: &str,
_result: tokio::sync::mpsc::Sender<ObjectInfoOrErr>,
_result: Sender<ObjectInfoOrErr>,
_opts: WalkOptions,
) -> Result<()> {
unimplemented!()
@@ -4455,15 +4675,25 @@ impl StorageAPI for SetDisks {
#[tracing::instrument(skip(self))]
async fn add_partial(&self, bucket: &str, object: &str, version_id: &str) -> Result<()> {
let _ = rustfs_common::heal_channel::send_heal_request(rustfs_common::heal_channel::create_heal_request_with_options(
bucket.to_string(),
Some(object.to_string()),
false,
Some(HealChannelPriority::Normal),
Some(self.pool_index),
Some(self.set_index),
))
.await;
if let Err(e) =
rustfs_common::heal_channel::send_heal_request(rustfs_common::heal_channel::create_heal_request_with_options(
bucket.to_string(),
Some(object.to_string()),
false,
Some(HealChannelPriority::Normal),
Some(self.pool_index),
Some(self.set_index),
))
.await
{
warn!(
bucket,
object,
version_id,
error = %e,
"Failed to enqueue heal request for partial object"
);
}
Ok(())
}
@@ -4569,7 +4799,7 @@ impl StorageAPI for SetDisks {
let tgt_client = match tier_config_mgr.get_driver(&opts.transition.tier).await {
Ok(client) => client,
Err(err) => {
return Err(Error::other(format!("remote tier error: {}", err)));
return Err(Error::other(format!("remote tier error: {err}")));
}
};
@@ -4749,11 +4979,11 @@ impl StorageAPI for SetDisks {
false,
)?;
let mut p_reader = PutObjReader::new(hash_reader);
if let Err(err) = self_.clone().put_object(bucket, object, &mut p_reader, &ropts).await {
return set_restore_header_fn(&mut oi, Some(to_object_err(err, vec![bucket, object]))).await;
return if let Err(err) = self_.clone().put_object(bucket, object, &mut p_reader, &ropts).await {
set_restore_header_fn(&mut oi, Some(to_object_err(err, vec![bucket, object]))).await
} else {
return Ok(());
}
Ok(())
};
}
let res = self_.clone().new_multipart_upload(bucket, object, &ropts).await?;
@@ -4901,7 +5131,16 @@ impl StorageAPI for SetDisks {
return Err(Error::other(format!("checksum mismatch: {checksum}")));
}
let disks = self.disks.read().await.clone();
let disks_snapshot = self.get_disks_internal().await;
let (disks, filtered_online) = self.filter_online_disks(disks_snapshot).await;
if filtered_online < write_quorum {
warn!(
"online disk snapshot {} below write quorum {} for multipart {}/{}; returning erasure write quorum error",
filtered_online, write_quorum, bucket, object
);
return Err(to_object_err(Error::ErasureWriteQuorum, vec![bucket, object]));
}
let shuffle_disks = Self::shuffle_disks(&disks, &fi.erasure.distribution);
@@ -5463,6 +5702,17 @@ impl StorageAPI for SetDisks {
uploaded_parts: Vec<CompletePart>,
opts: &ObjectOptions,
) -> Result<ObjectInfo> {
let _object_lock_guard = if !opts.no_lock {
Some(
self.fast_lock_manager
.acquire_write_lock(bucket, object, self.locker_owner.as_str())
.await
.map_err(|e| Error::other(self.format_lock_error(bucket, object, "write", &e)))?,
)
} else {
None
};
let (mut fi, files_metas) = self.check_upload_id_exists(bucket, object, upload_id, true).await?;
let upload_id_path = Self::get_upload_id_dir(bucket, object, upload_id);
@@ -5581,7 +5831,7 @@ impl StorageAPI for SetDisks {
}
let ext_part = &curr_fi.parts[i];
tracing::info!(target:"rustfs_ecstore::set_disk", part_number = p.part_num, part_size = ext_part.size, part_actual_size = ext_part.actual_size, "Completing multipart part");
info!(target:"rustfs_ecstore::set_disk", part_number = p.part_num, part_size = ext_part.size, part_actual_size = ext_part.actual_size, "Completing multipart part");
// Normalize ETags by removing quotes before comparison (PR #592 compatibility)
let client_etag = p.etag.as_ref().map(|e| rustfs_utils::path::trim_etag(e));
@@ -5837,7 +6087,7 @@ impl StorageAPI for SetDisks {
bucket.to_string(),
Some(object.to_string()),
false,
Some(rustfs_common::heal_channel::HealChannelPriority::Normal),
Some(HealChannelPriority::Normal),
Some(self.pool_index),
Some(self.set_index),
))
@@ -5897,6 +6147,55 @@ impl StorageAPI for SetDisks {
version_id: &str,
opts: &HealOpts,
) -> Result<(HealResultItem, Option<Error>)> {
let mut effective_object = object.to_string();
// Optimization: Only attempt correction if the name looks suspicious (quotes or URL encoded)
// and the original object does NOT exist.
let has_quotes = (effective_object.starts_with('\'') && effective_object.ends_with('\''))
|| (effective_object.starts_with('"') && effective_object.ends_with('"'));
let has_percent = effective_object.contains('%');
if has_quotes || has_percent {
let disks = self.disks.read().await;
// 1. Check if the original object exists (lightweight check)
let (_, errs) = Self::read_all_fileinfo(&disks, "", bucket, &effective_object, version_id, false, false).await?;
if DiskError::is_all_not_found(&errs) {
// Original not found. Try candidates.
let mut candidates = Vec::new();
// Candidate 1: URL Decoded (Priority for web access issues)
if has_percent {
if let Ok(decoded) = urlencoding::decode(&effective_object) {
if decoded != effective_object {
candidates.push(decoded.to_string());
}
}
}
// Candidate 2: Quote Stripped (For shell copy-paste issues)
if has_quotes && effective_object.len() >= 2 {
candidates.push(effective_object[1..effective_object.len() - 1].to_string());
}
// Check candidates
for candidate in candidates {
let (_, errs_cand) =
Self::read_all_fileinfo(&disks, "", bucket, &candidate, version_id, false, false).await?;
if !DiskError::is_all_not_found(&errs_cand) {
info!(
"Heal request for object '{}' failed (not found). Auto-corrected to '{}'.",
effective_object, candidate
);
effective_object = candidate;
break; // Found a match, stop searching
}
}
}
}
let object = effective_object.as_str();
let _write_lock_guard = if !opts.no_lock {
let key = rustfs_lock::fast_lock::types::ObjectKey::new(bucket, object);
let mut skip_lock = false;
@@ -5911,10 +6210,10 @@ impl StorageAPI for SetDisks {
skip_lock = true;
}
}
if skip_lock {
None
} else {
info!(?opts, "Starting heal_object");
Some(
self.fast_lock_manager
.acquire_write_lock(bucket, object, self.locker_owner.as_str())
@@ -6480,9 +6779,11 @@ fn get_complete_multipart_md5(parts: &[CompletePart]) -> String {
}
let mut hasher = Md5::new();
let _ = hasher.write(&buf);
hasher.update(&buf);
format!("{:x}-{}", hasher.finalize(), parts.len())
let digest = hasher.finalize();
let etag_hex = faster_hex::hex_string(digest.as_slice());
format!("{}-{}", etag_hex, parts.len())
}
pub fn canonicalize_etag(etag: &str) -> String {
@@ -6562,6 +6863,26 @@ mod tests {
use std::collections::HashMap;
use time::OffsetDateTime;
#[test]
fn disk_health_entry_returns_cached_value_within_ttl() {
let entry = DiskHealthEntry {
last_check: Instant::now(),
online: true,
};
assert_eq!(entry.cached_value(), Some(true));
}
#[test]
fn disk_health_entry_expires_after_ttl() {
let entry = DiskHealthEntry {
last_check: Instant::now() - (DISK_HEALTH_CACHE_TTL + Duration::from_millis(100)),
online: true,
};
assert!(entry.cached_value().is_none());
}
#[test]
fn test_check_part_constants() {
// Test that all CHECK_PART constants have expected values

View File

@@ -111,6 +111,9 @@ impl Sets {
let mut disk_set = Vec::with_capacity(set_count);
// Create fast lock manager for high performance
let fast_lock_manager = Arc::new(rustfs_lock::FastObjectLockManager::new());
for i in 0..set_count {
let mut set_drive = Vec::with_capacity(set_drive_count);
let mut set_endpoints = Vec::with_capacity(set_drive_count);
@@ -164,11 +167,9 @@ impl Sets {
// Note: write_quorum was used for the old lock system, no longer needed with FastLock
let _write_quorum = set_drive_count - parity_count;
// Create fast lock manager for high performance
let fast_lock_manager = Arc::new(rustfs_lock::FastObjectLockManager::new());
let set_disks = SetDisks::new(
fast_lock_manager,
fast_lock_manager.clone(),
GLOBAL_Local_Node_Name.read().await.to_string(),
Arc::new(RwLock::new(set_drive)),
set_drive_count,
@@ -439,6 +440,7 @@ impl StorageAPI for Sets {
_max_keys: i32,
_fetch_owner: bool,
_start_after: Option<String>,
_incl_deleted: bool,
) -> Result<ListObjectsV2Info> {
unimplemented!()
}

View File

@@ -72,8 +72,7 @@ use tokio::select;
use tokio::sync::RwLock;
use tokio::time::sleep;
use tokio_util::sync::CancellationToken;
use tracing::{debug, info};
use tracing::{error, warn};
use tracing::{debug, error, info, instrument, warn};
use uuid::Uuid;
const MAX_UPLOADS_LIST: usize = 10000;
@@ -110,7 +109,7 @@ pub struct ECStore {
impl ECStore {
#[allow(clippy::new_ret_no_self)]
#[tracing::instrument(level = "debug", skip(endpoint_pools))]
#[instrument(level = "debug", skip(endpoint_pools))]
pub async fn new(address: SocketAddr, endpoint_pools: EndpointServerPools, ctx: CancellationToken) -> Result<Arc<Self>> {
// let layouts = DisksLayout::from_volumes(endpoints.as_slice())?;
@@ -275,6 +274,7 @@ impl ECStore {
Ok(ec)
}
#[instrument(level = "debug", skip(self, rx))]
pub async fn init(self: &Arc<Self>, rx: CancellationToken) -> Result<()> {
GLOBAL_BOOT_TIME.get_or_init(|| async { SystemTime::now() }).await;
@@ -461,6 +461,7 @@ impl ECStore {
// Ok(ress)
// }
#[instrument(level = "debug", skip(self))]
async fn delete_all(&self, bucket: &str, prefix: &str) -> Result<()> {
let mut futures = Vec::new();
for sets in self.pools.iter() {
@@ -1077,7 +1078,7 @@ impl Clone for PoolObjInfo {
#[async_trait::async_trait]
impl ObjectIO for ECStore {
#[tracing::instrument(level = "debug", skip(self))]
#[instrument(level = "debug", skip(self))]
async fn get_object_reader(
&self,
bucket: &str,
@@ -1107,7 +1108,7 @@ impl ObjectIO for ECStore {
.get_object_reader(bucket, object.as_str(), range, h, &opts)
.await
}
#[tracing::instrument(level = "debug", skip(self, data))]
#[instrument(level = "debug", skip(self, data))]
async fn put_object(&self, bucket: &str, object: &str, data: &mut PutObjReader, opts: &ObjectOptions) -> Result<ObjectInfo> {
check_put_object_args(bucket, object)?;
@@ -1144,7 +1145,7 @@ lazy_static! {
#[async_trait::async_trait]
impl StorageAPI for ECStore {
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn backend_info(&self) -> rustfs_madmin::BackendInfo {
let (standard_sc_parity, rr_sc_parity) = {
if let Some(sc) = GLOBAL_STORAGE_CLASS.get() {
@@ -1189,7 +1190,7 @@ impl StorageAPI for ECStore {
..Default::default()
}
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn storage_info(&self) -> rustfs_madmin::StorageInfo {
let Some(notification_sy) = get_global_notification_sys() else {
return rustfs_madmin::StorageInfo::default();
@@ -1197,7 +1198,7 @@ impl StorageAPI for ECStore {
notification_sy.storage_info(self).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn local_storage_info(&self) -> rustfs_madmin::StorageInfo {
let mut futures = Vec::with_capacity(self.pools.len());
@@ -1217,7 +1218,7 @@ impl StorageAPI for ECStore {
rustfs_madmin::StorageInfo { backend, disks }
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn make_bucket(&self, bucket: &str, opts: &MakeBucketOptions) -> Result<()> {
if !is_meta_bucketname(bucket) {
if let Err(err) = check_valid_bucket_name_strict(bucket) {
@@ -1265,7 +1266,7 @@ impl StorageAPI for ECStore {
Ok(())
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_bucket_info(&self, bucket: &str, opts: &BucketOptions) -> Result<BucketInfo> {
let mut info = self.peer_sys.get_bucket_info(bucket, opts).await?;
@@ -1277,7 +1278,7 @@ impl StorageAPI for ECStore {
Ok(info)
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn list_bucket(&self, opts: &BucketOptions) -> Result<Vec<BucketInfo>> {
// TODO: opts.cached
@@ -1292,7 +1293,7 @@ impl StorageAPI for ECStore {
}
Ok(buckets)
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn delete_bucket(&self, bucket: &str, opts: &DeleteBucketOptions) -> Result<()> {
if is_meta_bucketname(bucket) {
return Err(StorageError::BucketNameInvalid(bucket.to_string()));
@@ -1327,7 +1328,7 @@ impl StorageAPI for ECStore {
// @start_after as marker when continuation_token empty
// @delimiter default="/", empty when recursive
// @max_keys limit
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn list_objects_v2(
self: Arc<Self>,
bucket: &str,
@@ -1337,12 +1338,22 @@ impl StorageAPI for ECStore {
max_keys: i32,
fetch_owner: bool,
start_after: Option<String>,
incl_deleted: bool,
) -> Result<ListObjectsV2Info> {
self.inner_list_objects_v2(bucket, prefix, continuation_token, delimiter, max_keys, fetch_owner, start_after)
.await
self.inner_list_objects_v2(
bucket,
prefix,
continuation_token,
delimiter,
max_keys,
fetch_owner,
start_after,
incl_deleted,
)
.await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn list_object_versions(
self: Arc<Self>,
bucket: &str,
@@ -1367,7 +1378,7 @@ impl StorageAPI for ECStore {
self.walk_internal(rx, bucket, prefix, result, opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_object_info(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<ObjectInfo> {
check_object_args(bucket, object)?;
@@ -1385,7 +1396,7 @@ impl StorageAPI for ECStore {
}
// TODO: review
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn copy_object(
&self,
src_bucket: &str,
@@ -1452,7 +1463,7 @@ impl StorageAPI for ECStore {
"put_object_reader is none".to_owned(),
))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn delete_object(&self, bucket: &str, object: &str, opts: ObjectOptions) -> Result<ObjectInfo> {
check_del_obj_args(bucket, object)?;
@@ -1526,7 +1537,7 @@ impl StorageAPI for ECStore {
Err(StorageError::ObjectNotFound(bucket.to_owned(), object.to_owned()))
}
// TODO: review
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn delete_objects(
&self,
bucket: &str,
@@ -1709,7 +1720,7 @@ impl StorageAPI for ECStore {
// Ok((del_objects, del_errs))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn list_object_parts(
&self,
bucket: &str,
@@ -1750,7 +1761,7 @@ impl StorageAPI for ECStore {
Err(StorageError::InvalidUploadID(bucket.to_owned(), object.to_owned(), upload_id.to_owned()))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn list_multipart_uploads(
&self,
bucket: &str,
@@ -1802,7 +1813,7 @@ impl StorageAPI for ECStore {
})
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn new_multipart_upload(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<MultipartUploadResult> {
check_new_multipart_args(bucket, object)?;
@@ -1834,7 +1845,7 @@ impl StorageAPI for ECStore {
self.pools[idx].new_multipart_upload(bucket, object, opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn add_partial(&self, bucket: &str, object: &str, version_id: &str) -> Result<()> {
let object = encode_dir_object(object);
@@ -1849,7 +1860,7 @@ impl StorageAPI for ECStore {
let _ = self.pools[idx].add_partial(bucket, object.as_str(), version_id).await;
Ok(())
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn transition_object(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<()> {
let object = encode_dir_object(object);
if self.single_pool() {
@@ -1863,7 +1874,7 @@ impl StorageAPI for ECStore {
self.pools[idx].transition_object(bucket, &object, opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn restore_transitioned_object(self: Arc<Self>, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<()> {
let object = encode_dir_object(object);
if self.single_pool() {
@@ -1880,7 +1891,7 @@ impl StorageAPI for ECStore {
.await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn copy_object_part(
&self,
src_bucket: &str,
@@ -1902,7 +1913,7 @@ impl StorageAPI for ECStore {
unimplemented!()
}
#[tracing::instrument(skip(self, data))]
#[instrument(skip(self, data))]
async fn put_object_part(
&self,
bucket: &str,
@@ -1944,7 +1955,7 @@ impl StorageAPI for ECStore {
Err(StorageError::InvalidUploadID(bucket.to_owned(), object.to_owned(), upload_id.to_owned()))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_multipart_info(
&self,
bucket: &str,
@@ -1976,7 +1987,7 @@ impl StorageAPI for ECStore {
Err(StorageError::InvalidUploadID(bucket.to_owned(), object.to_owned(), upload_id.to_owned()))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn abort_multipart_upload(&self, bucket: &str, object: &str, upload_id: &str, opts: &ObjectOptions) -> Result<()> {
check_abort_multipart_args(bucket, object, upload_id)?;
@@ -2007,7 +2018,7 @@ impl StorageAPI for ECStore {
Err(StorageError::InvalidUploadID(bucket.to_owned(), object.to_owned(), upload_id.to_owned()))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn complete_multipart_upload(
self: Arc<Self>,
bucket: &str,
@@ -2050,7 +2061,7 @@ impl StorageAPI for ECStore {
Err(StorageError::InvalidUploadID(bucket.to_owned(), object.to_owned(), upload_id.to_owned()))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_disks(&self, pool_idx: usize, set_idx: usize) -> Result<Vec<Option<DiskStore>>> {
if pool_idx < self.pools.len() && set_idx < self.pools[pool_idx].disk_set.len() {
self.pools[pool_idx].disk_set[set_idx].get_disks(0, 0).await
@@ -2059,7 +2070,7 @@ impl StorageAPI for ECStore {
}
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
fn set_drive_counts(&self) -> Vec<usize> {
let mut counts = vec![0; self.pools.len()];
@@ -2068,7 +2079,7 @@ impl StorageAPI for ECStore {
}
counts
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn put_object_metadata(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<ObjectInfo> {
let object = encode_dir_object(object);
if self.single_pool() {
@@ -2082,7 +2093,7 @@ impl StorageAPI for ECStore {
self.pools[idx].put_object_metadata(bucket, object.as_str(), &opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_object_tags(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<String> {
let object = encode_dir_object(object);
@@ -2095,7 +2106,7 @@ impl StorageAPI for ECStore {
Ok(oi.user_tags)
}
#[tracing::instrument(level = "debug", skip(self))]
#[instrument(level = "debug", skip(self))]
async fn put_object_tags(&self, bucket: &str, object: &str, tags: &str, opts: &ObjectOptions) -> Result<ObjectInfo> {
let object = encode_dir_object(object);
@@ -2108,7 +2119,7 @@ impl StorageAPI for ECStore {
self.pools[idx].put_object_tags(bucket, object.as_str(), tags, opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn delete_object_version(&self, bucket: &str, object: &str, fi: &FileInfo, force_del_marker: bool) -> Result<()> {
check_del_obj_args(bucket, object)?;
@@ -2122,7 +2133,7 @@ impl StorageAPI for ECStore {
Ok(())
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn delete_object_tags(&self, bucket: &str, object: &str, opts: &ObjectOptions) -> Result<ObjectInfo> {
let object = encode_dir_object(object);
@@ -2135,7 +2146,7 @@ impl StorageAPI for ECStore {
self.pools[idx].delete_object_tags(bucket, object.as_str(), opts).await
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn heal_format(&self, dry_run: bool) -> Result<(HealResultItem, Option<Error>)> {
info!("heal_format");
let mut r = HealResultItem {
@@ -2170,13 +2181,13 @@ impl StorageAPI for ECStore {
Ok((r, None))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn heal_bucket(&self, bucket: &str, opts: &HealOpts) -> Result<HealResultItem> {
let res = self.peer_sys.heal_bucket(bucket, opts).await?;
Ok(res)
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn heal_object(
&self,
bucket: &str,
@@ -2253,7 +2264,7 @@ impl StorageAPI for ECStore {
Ok((HealResultItem::default(), Some(Error::FileNotFound)))
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn get_pool_and_set(&self, id: &str) -> Result<(Option<usize>, Option<usize>, Option<usize>)> {
for (pool_idx, pool) in self.pools.iter().enumerate() {
for (set_idx, set) in pool.format.erasure.sets.iter().enumerate() {
@@ -2268,7 +2279,7 @@ impl StorageAPI for ECStore {
Err(Error::DiskNotFound)
}
#[tracing::instrument(skip(self))]
#[instrument(skip(self))]
async fn check_abandoned_parts(&self, bucket: &str, object: &str, opts: &HealOpts) -> Result<()> {
let object = encode_dir_object(object);
if self.single_pool() {
@@ -2473,7 +2484,7 @@ fn check_abort_multipart_args(bucket: &str, object: &str, upload_id: &str) -> Re
check_multipart_object_args(bucket, object, upload_id)
}
#[tracing::instrument(level = "debug")]
#[instrument(level = "debug")]
fn check_put_object_args(bucket: &str, object: &str) -> Result<()> {
if !is_meta_bucketname(bucket) && check_valid_bucket_name_strict(bucket).is_err() {
return Err(StorageError::BucketNameInvalid(bucket.to_string()));
@@ -2601,8 +2612,6 @@ pub async fn has_space_for(dis: &[Option<DiskInfo>], size: i64) -> Result<bool>
#[cfg(test)]
mod tests {
use crate::bucket::metadata_sys::init_bucket_metadata_sys;
use super::*;
// Test validation functions
@@ -2790,122 +2799,4 @@ mod tests {
assert!(check_put_object_args("", "test-object").is_err());
assert!(check_put_object_args("test-bucket", "").is_err());
}
#[tokio::test]
async fn test_ecstore_put_and_list_objects() {
use crate::disk::endpoint::Endpoint;
use crate::endpoints::{EndpointServerPools, Endpoints, PoolEndpoints};
use std::path::PathBuf;
use tokio::fs;
let test_base_dir = format!("/tmp/rustfs_test_put_list_{}", Uuid::new_v4());
let temp_dir = PathBuf::from(&test_base_dir);
if temp_dir.exists() {
let _ = fs::remove_dir_all(&temp_dir).await;
}
fs::create_dir_all(&temp_dir).await.expect("Failed to create test directory");
let disk_paths = vec![
temp_dir.join("disk1"),
temp_dir.join("disk2"),
temp_dir.join("disk3"),
temp_dir.join("disk4"),
];
for disk_path in &disk_paths {
fs::create_dir_all(disk_path).await.expect("Failed to create disk directory");
}
let mut endpoints = Vec::new();
for (i, disk_path) in disk_paths.iter().enumerate() {
let disk_str = disk_path.to_str().expect("Invalid disk path");
let mut endpoint = Endpoint::try_from(disk_str).expect("Failed to create endpoint");
endpoint.set_pool_index(0);
endpoint.set_set_index(0);
endpoint.set_disk_index(i);
endpoints.push(endpoint);
}
let pool_endpoints = PoolEndpoints {
legacy: false,
set_count: 1,
drives_per_set: 4,
endpoints: Endpoints::from(endpoints),
cmd_line: "test".to_string(),
platform: format!("OS: {} | Arch: {}", std::env::consts::OS, std::env::consts::ARCH),
};
let endpoint_pools = EndpointServerPools(vec![pool_endpoints]);
init_local_disks(endpoint_pools.clone())
.await
.expect("Failed to initialize local disks");
let server_addr: SocketAddr = "127.0.0.1:0".parse().expect("Invalid server address");
let ecstore = ECStore::new(server_addr, endpoint_pools, CancellationToken::new())
.await
.expect("Failed to create ECStore");
init_bucket_metadata_sys(ecstore.clone(), vec![]).await;
let bucket_name = "test-bucket";
ecstore
.make_bucket(bucket_name, &MakeBucketOptions::default())
.await
.expect("Failed to create bucket");
let test_objects = vec![
("object1.txt", b"Hello, World!".to_vec()),
("object2.txt", b"Test data for object 2".to_vec()),
("folder/object3.txt", b"Object in folder".to_vec()),
("folder/subfolder/object4.txt", b"Nested object".to_vec()),
];
for (object_name, data) in &test_objects {
let mut reader = PutObjReader::from_vec(data.clone());
let object_info = ecstore
.put_object(bucket_name, object_name, &mut reader, &ObjectOptions::default())
.await
.unwrap_or_else(|e| panic!("Failed to put object {}: {}", object_name, e));
assert_eq!(object_info.size, data.len() as i64, "Object size mismatch for {}", object_name);
assert_eq!(object_info.bucket, bucket_name);
}
let list_result = ecstore
.clone()
.list_objects_v2(bucket_name, "", None, None, 1000, false, None)
.await
.expect("Failed to list objects");
assert_eq!(list_result.objects.len(), test_objects.len(), "Number of objects mismatch");
let mut object_names: Vec<String> = list_result.objects.iter().map(|o| o.name.clone()).collect();
object_names.sort();
let mut expected_names: Vec<String> = test_objects.iter().map(|(n, _)| n.to_string()).collect();
expected_names.sort();
assert_eq!(object_names, expected_names, "Object names mismatch");
let prefix_result = ecstore
.clone()
.list_objects_v2(bucket_name, "folder/", None, None, 1000, false, None)
.await
.expect("Failed to list objects with prefix");
assert_eq!(prefix_result.objects.len(), 2, "Should find 2 objects with prefix 'folder/'");
assert!(prefix_result.objects.iter().all(|o| o.name.starts_with("folder/")));
let delimiter_result = ecstore
.clone()
.list_objects_v2(bucket_name, "", None, Some("/".to_string()), 1000, false, None)
.await
.expect("Failed to list objects with delimiter");
assert!(!delimiter_result.prefixes.is_empty() || !delimiter_result.objects.is_empty());
let _ = fs::remove_dir_all(&temp_dir).await;
}
}

View File

@@ -134,7 +134,7 @@ pub struct GetObjectReader {
}
impl GetObjectReader {
#[tracing::instrument(level = "debug", skip(reader))]
#[tracing::instrument(level = "debug", skip(reader, rs, opts, _h))]
pub fn new(
reader: Box<dyn AsyncRead + Unpin + Send + Sync>,
rs: Option<HTTPRangeSpec>,
@@ -1224,6 +1224,7 @@ pub trait StorageAPI: ObjectIO + Debug {
max_keys: i32,
fetch_owner: bool,
start_after: Option<String>,
incl_deleted: bool,
) -> Result<ListObjectsV2Info>;
// ListObjectVersions TODO: FIXME:
async fn list_object_versions(

View File

@@ -225,6 +225,7 @@ impl ECStore {
max_keys: i32,
_fetch_owner: bool,
start_after: Option<String>,
incl_deleted: bool,
) -> Result<ListObjectsV2Info> {
let marker = {
if continuation_token.is_none() {
@@ -234,7 +235,9 @@ impl ECStore {
}
};
let loi = self.list_objects_generic(bucket, prefix, marker, delimiter, max_keys).await?;
let loi = self
.list_objects_generic(bucket, prefix, marker, delimiter, max_keys, incl_deleted)
.await?;
Ok(ListObjectsV2Info {
is_truncated: loi.is_truncated,
continuation_token,
@@ -251,6 +254,7 @@ impl ECStore {
marker: Option<String>,
delimiter: Option<String>,
max_keys: i32,
incl_deleted: bool,
) -> Result<ListObjectsInfo> {
let opts = ListPathOptions {
bucket: bucket.to_owned(),
@@ -258,7 +262,7 @@ impl ECStore {
separator: delimiter.clone(),
limit: max_keys_plus_one(max_keys, marker.is_some()),
marker,
incl_deleted: false,
incl_deleted,
ask_disks: "strict".to_owned(), //TODO: from config
..Default::default()
};

View File

@@ -481,7 +481,7 @@ async fn new_and_save_tiering_config<S: StorageAPI>(api: Arc<S>) -> Result<TierC
Ok(cfg)
}
#[tracing::instrument(level = "debug")]
#[tracing::instrument(level = "debug", name = "load_tier_config", skip(api))]
async fn load_tier_config(api: Arc<ECStore>) -> std::result::Result<TierConfigMgr, std::io::Error> {
let config_file = format!("{}{}{}", CONFIG_PREFIX, SLASH_SEPARATOR, TIER_CONFIG_FILE);
let data = read_config(api.clone(), config_file.as_str()).await;

Some files were not shown because too many files have changed in this diff Show More