Files
llama.cpp/docs/backend
Alexey Kopytko e20b83930c SYCL: reduce allocation overhead during flash attention (#22732)
* SYCL: reduce allocation overhead during flash attention

* tidy up whitespace

* add a note about the flag

* move ggml_sycl_fattn_* into fattn-buffers.hpp

* refactor implementation into fattn-buffers.cpp

* move new_fattn_kv_buffers back into ggml-sycl.cpp
2026-05-09 09:30:39 +03:00
..
2024-12-02 21:22:53 +02:00
2026-01-18 18:03:35 +08:00