llama.cpp/tools/server/server.cpp at 618575c5825d7d4f170e686e772178d2aae148ae

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 03:24:21 +00:00

Files

Oleksandr Kuvshynov e5155e6986 server : export max observed n_past value (#15361 )

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.

2025-08-18 00:28:58 +02:00

205 KiB

Raw Blame History

View Raw

205 KiB Raw Blame History

205 KiB

Raw Blame History