llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-08 18:14:07 +00:00

Files

Georgi Gerganov d7d826b3c1 server : support multi-modal context checkpoints (#19849 )

* Modify llama-memory-hybrid-iswa.cpp

* Modify llama-memory-recurrent.cpp

* Modify server-common.cpp

* Modify server-common.h

* Modify server-context.cpp

* Modify server-task.h

* Added comment to llama-memory-hybrid-iswa.cpp

* Remove comment from server-context.cpp

* Stylistic fix server-context.cpp

* Fix an issue when seqrm isn't called in server-context.cpp

* cont : alternative impl

* cont : cleanup

* cont : n_tokens -> int64_t

---------

Co-authored-by: timkhronos <timkhronos@gmail.com>

2026-02-25 15:14:27 +02:00

batched-bench

tool/ex/tests: consistently free ctx, then model (#18168 )

2025-12-22 11:00:37 +01:00

cli

cli : provide model with text filename (#19783 )

2026-02-22 22:33:49 +01:00

completion

llama : remove write/read of output ids/logits/embeddings (#18862 )

2026-02-23 07:04:30 +01:00

cvector-generator

docs : Minor cleanups (#19252 )