llama.cpp/tests/test-backend-ops.cpp at a84dfd3e1072bcf422b90dc4b03d334395c90fe4

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-12 20:14:09 +00:00

Files

Oliver Simons a84dfd3e10 CUDA: Add Cooperative-Groups-based parallelization of ncols in softmax

Old implementation parallelizes rows across SMs, which does not fit the
needs of backend-sampling (where we have ncols >> nrows and thus want to
parallelize ncols across SMs)

2025-12-09 12:58:56 +01:00

327 KiB

Raw Blame History

View Raw

327 KiB Raw Blame History

327 KiB

Raw Blame History