llama.cpp/tools/server/server-task.cpp at e06c3ab2bc7e45df7584468014681349fceccfbc

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-06 17:14:07 +00:00

Files

Ryan Goulden 26c9ce1288 server: Add cached_tokens info to oaicompat responses (#19361 )

* tests : fix fetch_server_test_models.py

* server: to_json_oaicompat cached_tokens

Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.

2026-03-19 19:09:33 +01:00

80 KiB

Raw Blame History

View Raw

80 KiB Raw Blame History

80 KiB

Raw Blame History