llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 19:44:06 +00:00

Files

Ruben Ortlam 47a268ea50 Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900 )

* vulkan: split mul_mmq_funcs for mul_mat_vecq use

* add mxfp4 mmvq

* add q2_k mmvq

* add q3_k mmvq

* add q4_k and q5_k mmvq

* add q6_k mmvq

* handle 4x4 quants per mmvq thread

* enable MUL_MAT_ID mmvq support

* enable subgroup optimizations for mul_mat_vec_id shaders

* device tuning

* request prealloc_y sync after quantization

* fix indentation

* fix llvmpipe test failures

* fix mul_mat_id mmvq condition

* fix unused variable warning

2025-11-29 09:37:22 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

rpc : cache and reuse compute graphs (#15405 )

2025-11-28 08:33:51 +00:00

src

Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900 )

2025-11-29 09:37:22 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 )

2025-11-28 17:33:23 +02:00