llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-04 08:04:07 +00:00

Files

Radoslav Gerganov c830f99cfa server : support max_completion_tokens request property (#19831 )

"max_tokens" is deprectated in favor of "max_completion_tokens" which
sets the upper bound for reasoning+output token.

Closes: #13700

2026-02-24 10:30:00 +02:00

batched-bench

tool/ex/tests: consistently free ctx, then model (#18168 )

2025-12-22 11:00:37 +01:00

cli

cli : provide model with text filename (#19783 )

2026-02-22 22:33:49 +01:00

completion

llama : remove write/read of output ids/logits/embeddings (#18862 )

2026-02-23 07:04:30 +01:00

cvector-generator

docs : Minor cleanups (#19252 )

2026-02-02 08:38:55 +02:00

export-lora

docs : Minor cleanups (#19252 )

2026-02-02 08:38:55 +02:00

fit-params

llama-fit-params: keep explicit --ctx-size 0 (#19070 )

2026-01-24 22:13:08 +01:00

gguf-split

cli: new CLI experience (#17824 )

2025-12-10 15:28:59 +01:00

imatrix

common : refactor common_sampler + grammar logic changes (#17937 )

2025-12-14 10:11:13 +02:00

llama-bench

Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )

2026-01-16 09:46:51 +01:00

mtmd

model: Add PaddleOCR-VL model support (#18825 )

2026-02-19 17:05:25 +01:00

perplexity

perplexity: add proper batching (#19661 )

2026-02-16 18:44:44 +02:00

quantize

quantize : add --dry-run option (#19526 )

2026-02-20 09:20:16 +01:00

rpc

NetBSD build support (#19589 )

2026-02-14 09:47:01 +01:00

server

server : support max_completion_tokens request property (#19831 )

2026-02-24 10:30:00 +02:00

tokenize

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

tts

model : fix wavtokenizer embedding notions (#19479 )

2026-02-11 07:52:20 +02:00

CMakeLists.txt

cmake: only build cli when server is enabled (#18670 )

2026-01-09 16:43:26 +01:00