mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-11 03:24:21 +00:00
Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.
205 KiB
205 KiB