llama.cpp/tools/server/server-task.h at gg/server-refactor

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-14 04:54:06 +00:00

Files

Georgi Gerganov a4854f0349 cont : improve n_cmpl logic

- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache

2026-01-09 15:36:58 +02:00

16 KiB

Raw Permalink Blame History

View Raw

16 KiB Raw Permalink Blame History

16 KiB

Raw Permalink Blame History