llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 14:44:05 +00:00

Files

Yiwei Shao ee051c1e4e hexagon: support for IQ4_NL and MXFP4 (#21018 )

* ggml-hexagon: add IQ4_NL and MXFP4 HMX matmul support

- Add IQ4_NL quantization type support to Hexagon backend (buffer
  set/get tensor repack, mul_mat, mul_mat_id dispatch)
- Implement HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with
  LUT-based 4-bit index to int8 kvalue dequantization
- Add MXFP4 HMX dequantization path with E8M0 scale conversion,
  including batch-4 fast path and single-tile fallback
- Unify quantized row size / scale offset logic to handle Q4_0,
  Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path

* ggml-hexagon: fix SKIP_QUANTIZE src1 address mismatch in mixed-quant models

* Fix the pragma indent

2026-03-27 09:22:41 -07:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

src

hexagon: support for IQ4_NL and MXFP4 (#21018 )

2026-03-27 09:22:41 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.9.8 (ggml/1442)

2026-03-18 15:17:28 +02:00