mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-14 13:04:08 +00:00
* docs: fix metrics endpoint description in server README Required model query parameter for router mode described. Removed metrics: - llamacpp:kv_cache_usage_ratio - llamacpp:kv_cache_tokens Added metrics: - llamacpp:prompt_seconds_total - llamacpp:tokens_predicted_seconds_total - llamacpp:n_decode_total - llamacpp:n_busy_slots_per_decode * server: fix metrics type for n_busy_slots_per_decode metric
180 KiB
180 KiB