llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-07 17:44:09 +00:00

Files

Jeff Bolz 61bde8e21f vulkan: Reduce temporary memory usage for TOP_K (#17623 )

- Compute row size for the temp buffer based on the output of the first pass.
- Update shader addressing math to use the output row size
- Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k"

For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer
from about 3.2MB to 500KB.

2025-12-02 19:22:04 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

model: LFM2-VL fixes (#17577 )

2025-11-30 21:57:31 +01:00

src

vulkan: Reduce temporary memory usage for TOP_K (#17623 )

2025-12-02 19:22:04 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

cmake : add utf8 compilation options for msvc (#17682 )

2025-12-02 19:50:57 +02:00