mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-06 17:14:07 +00:00
* tests : fix fetch_server_test_models.py * server: to_json_oaicompat cached_tokens Adds OpenAI and Anthropic compatible information about the number of cached prompt tokens used in a response.
80 KiB
80 KiB