llama.cpp/tools/server/server-task.cpp at 30742a6ff50365073f8cef58e73fc5055c2f8a11

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-13 20:44:09 +00:00

Files

Oliver Simons 7668999518 Merge branch 'master' into gpu-sampling

Let's keep `master's` cumsum implementation for it's likely better AMD
perf and add back pure-CUB-implementation in follow-up commit

2025-12-05 14:41:08 +01:00

View Raw