llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 22:54:05 +00:00

Files

Aman Gupta b94050e896 CUDA: use LRU based eviction for cuda graphs (#21611 )

* CUDA: use a ring-buffer for cuda graphs

* bump limit to 128

* use LRU eviction

* better naming

* do periodic clean-up

2026-04-17 23:24:21 +08:00

2026-04-09 16:42:19 +02:00

2026-04-15 15:58:40 +02:00

2026-04-17 23:24:21 +08:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2026-04-16 08:34:05 +03:00