llama.cpp/tools/server/server-context.cpp at 68522c678daa7b65718f8a3de89bb2fbb139e26f

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-10 19:14:07 +00:00

Files

Xuan-Son Nguyen f896d2c34f server: improve speed of speculative decoding (#17808 )

* server: improve speed of speculative decoding

* fix small draft case

* add link to the PR

* server : fix generation time measurement

* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)

* server : add comment

* add PR to docs

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-12-08 14:35:28 +01:00

151 KiB

Raw Blame History

View Raw

151 KiB Raw Blame History

151 KiB

Raw Blame History