llama.cpp/tests/test-backend-ops.cpp at graph-profiler

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-09 02:24:17 +00:00

Files

Jeff Bolz 303f8615e9 vulkan: Multi-pass softmax for large number of cols (#17892 )

When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result

2025-12-13 10:04:29 +01:00

330 KiB

Raw Permalink Blame History

View Raw

330 KiB Raw Permalink Blame History

330 KiB

Raw Permalink Blame History