llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 19:44:06 +00:00

Files

Yanzhao Wang 66001722aa hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837 )

Implement the Gated Delta Net recurrence on HVX with:
- 4-row fused kernels for PP (prompt processing) path
- 8-row fused kernels for TG (token generation) path, reducing
  K/Q/gate vector reload overhead by 2x
- Separate PP/TG thread functions for I-cache isolation
- VTCM state scratchpad with DMA in/out for TG single-cycle access
- Vectorized gate exp via hvx_exp_f32

2026-05-08 17:12:04 -07:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: lower-case PCI bus id, standardize for ggml (#22820 )

2026-05-08 10:09:38 +02:00

src

hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837 )

2026-05-08 17:12:04 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.11.0 (ggml/1478)

2026-05-05 13:15:59 +03:00