llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 19:44:06 +00:00

Files

neha-ha a6cc43c286 ggml-webgpu: updated matrix-vector multiplication (#21738 )

* merged properly, but slow q3_k and q5_k with u32 indexing

* Start on new mat-vec

* New format float paths working

* Working q4_0

* Work on remaining legacy q-types

* port k-quants to new matvec

* remove old shader

* Remove old constants, format

* remove accidental file

---------

Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

2026-04-20 07:37:17 -07:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: manage NCCL communicators in context (#21891 )

2026-04-15 15:58:40 +02:00

src

ggml-webgpu: updated matrix-vector multiplication (#21738 )

2026-04-20 07:37:17 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

cmake: remove CMP0194 policy to restore MSVC builds (#21934 )

2026-04-19 10:25:05 +03:00