llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-07 01:24:24 +00:00

Files

Jeff Bolz 2960eb2975 vulkan: Use one row per workgroup for f32 mmv (#17711 )

The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.

2025-12-06 11:12:26 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

rpc : fix alloc size logic (#17116 )

2025-12-05 19:39:04 +02:00

src

vulkan: Use one row per workgroup for f32 mmv (#17711 )

2025-12-06 11:12:26 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

build : move _WIN32_WINNT definition to headers (#17736 )

2025-12-04 07:04:02 +01:00