mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-15 21:44:05 +00:00
Gives best perf for backend-sampling on CUDA. Flag can be removed once CCCL 3.2 is bundled within CTK and that CTK version is used in llama.cpp