llama.cpp/scripts/fetch_server_test_models.py at cd03ec7642f192c13689c7f20096dcca38dc5e33

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 22:54:05 +00:00

Files

Ryan Goulden 26c9ce1288 server: Add cached_tokens info to oaicompat responses (#19361 )

* tests : fix fetch_server_test_models.py

* server: to_json_oaicompat cached_tokens

Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.

2026-03-19 19:09:33 +01:00

3.9 KiB

Executable File

Raw Blame History

View Raw

3.9 KiB Executable File Raw Blame History

3.9 KiB

Executable File

Raw Blame History