llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-10 11:04:06 +00:00

Files

Shawn Gu c5a3bc39b1 opencl: Adreno optimization for MoE - MxFP4 (#22301 )

* MoE Mxfp4 CLC kernel added, router reorder on GPU

* Pass test-backend-ops for MoE mxfp4 Adreno CLC

* remove putenv in llama-model.cpp

* fix indent style and whitespace

* opencl: remove unnecessary headers

* opencl: do not save cl_program objects

* opencl: remove unnecessary assert

* fix precision issue

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>

2026-05-01 23:02:24 -07:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: manage NCCL communicators in context (#21891 )

2026-04-15 15:58:40 +02:00

src

opencl: Adreno optimization for MoE - MxFP4 (#22301 )

2026-05-01 23:02:24 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.10.2 (ggml/1474)

2026-05-02 08:55:29 +03:00