llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-03-17 16:44:07 +00:00

Files

Justin Bradford 627670601a kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620 )

* kleidiai : fix MUL_MAT support for batched (3D) inputs

The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.

This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.

Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.

Fixes #20608

* Kleidiai support_ops should only return true for 3D inputs, not also 4D

2026-03-17 14:03:54 +02:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00

src

kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620 )

2026-03-17 14:03:54 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00