llama.cpp/tools/server/server-task.cpp at da426cb25031928bcbc0d822bbd5ac3491ed4c13

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-06 09:04:07 +00:00

Files

Radoslav Gerganov c830f99cfa server : support max_completion_tokens request property (#19831 )

"max_tokens" is deprectated in favor of "max_completion_tokens" which
sets the upper bound for reasoning+output token.

Closes: #13700

2026-02-24 10:30:00 +02:00

View Raw