llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 11:34:10 +00:00

Files

Alexey Kopytko e20b83930c SYCL: reduce allocation overhead during flash attention (#22732 )

* SYCL: reduce allocation overhead during flash attention

* tidy up whitespace

* add a note about the flag

* move ggml_sycl_fattn_* into fattn-buffers.hpp

* refactor implementation into fattn-buffers.cpp

* move new_fattn_kv_buffers back into ggml-sycl.cpp

2026-05-09 09:30:39 +03:00

snapdragon

hexagon: add support for basic and extended Op profiling (#22269 )

2026-04-23 14:17:21 -07:00

VirtGPU

ggml-virtgpu: Fix some build commands (#20341 )

2026-03-12 15:47:45 +08:00

BLIS.md

make : deprecate (#10514 )

2024-12-02 21:22:53 +02:00

CANN.md

CANN: update docker images to 8.5.0 and improve CANN.md (#20801 )