llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-07 17:44:09 +00:00

Files

Jeff Bolz 1946e46f4c vulkan: For coopmat2 FA, use fp16 accumulators for the final result (#19376 )

The cpu and cuda backends use fp16 for the VKQ accumulator type, this change
does the same for vulkan. This helps particularly with large head sizes which
are very register-limited.

I tried this for the coopmat1 path and it slowed down a bit. I didn't try for
scalar.

I applied the softmax bias that the cuda backend uses to avoid overflow,
although I was not able to reproduce the original bug without it.

2026-02-06 09:15:13 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

ggml-virtgpu: make the code thread safe (#19204 )

2026-02-04 10:46:18 +08:00

src

vulkan: For coopmat2 FA, use fp16 accumulators for the final result (#19376 )

2026-02-06 09:15:13 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

Bump cmake max version (needed for Windows on Snapdragon builds) (#19188 )

2026-02-01 14:13:38 -08:00