llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-04 08:04:07 +00:00

Files

Alfred ce734a8a2f ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

* feat: implement real Q8_0

* feat: adding cmake option for configuring FP32 quantize group size

* typo: set() shall be used

---------

Co-authored-by: ngdxzy <zhenyu_xu@uri.edu>

2025-12-19 09:42:28 -08:00

hexagon

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

2025-12-19 09:42:28 -08:00

BLIS.md

make : deprecate (#10514 )

2024-12-02 21:22:53 +02:00

CANN.md

CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled (#16861 )

2025-11-12 14:37:52 +08:00

CUDA-FEDORA.md

docs: update: improve the Fedoa CUDA guide (#12536 )

2025-03-24 11:02:26 +00:00

OPENCL.md

opencl: update doc (#17011 )

2025-11-04 16:02:36 -08:00

SYCL.md

added note for old Intel hardware pre sycl (#18017 )

2025-12-16 17:45:09 +08:00

zDNN.md

ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )

2025-12-07 00:13:33 +08:00

ZenDNN.md

ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )

2025-12-07 00:13:33 +08:00