llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 22:54:05 +00:00

Files

Max Krasnyansky 609ea50026 hexagon: Q4_0 and MXFP4 repack fixes (#20527 )

* hexagon: fix tail corruption with rows sizes not multiple of 256

* hexagon: use different stride for repacking partial blocks

* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks

Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.

* hex-mm: update rmpy x8 for better optimizations

* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures

* hex-mm: use vzero to init accumulators

* hex-mm: properly call partial rmpy_x8

2026-03-14 11:09:08 -07:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00

src

hexagon: Q4_0 and MXFP4 repack fixes (#20527 )

2026-03-14 11:09:08 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00