mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-12 20:14:09 +00:00
Old implementation parallelizes rows across SMs, which does not fit the needs of backend-sampling (where we have ncols >> nrows and thus want to parallelize ncols across SMs)
327 KiB
327 KiB