llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-10 11:04:06 +00:00

Files

Jeff Bolz b365c3ff01 vulkan/cuda: fix topk_moe with exp_probs_b (#18071 )

I updated test_topk_moe to more closely match llm_graph_context::build_moe_ffn
and added coverage for exp_probs_b and some other missing combinations. This
exposed a bug in both CUDA and Vulkan backends where they were assuming the
input to argsort and the input to get_rows are the same. I'd like to optimize
this graph in another change, but for now just get it functional.

CUDA also had a bug where it got n_experts from the wrong place, leading to
GGML_ASSERT failures in some of the new tests.

2025-12-21 10:27:34 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )

2025-12-15 09:24:59 +01:00

src

vulkan/cuda: fix topk_moe with exp_probs_b (#18071 )

2025-12-21 10:27:34 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

2025-12-19 09:42:28 -08:00