llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 19:44:06 +00:00

Files

Pascal 58e68df0f9 cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667 )

* cuda: fuse snake activation (mul, sin, sqr, mul, add)

Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The
matcher recognizes the naive 5 op decomposition emitted by audio
decoders (BigVGAN, Vocos) for snake activation
y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise
kernel.

Add test_snake_fuse comparing CPU naive vs CUDA fused across
F32 / F16 / BF16.

* cuda: address review feedback from @am17an

Use ggml_cuda_cast for F32/F16/BF16 conversions and rename
kernel_snake to snake_kernel to match upstream conventions.

* cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an

* Update tests/test-backend-ops.cpp

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* cuda: snake fusion check add->type matches x->type

Address review feedback from @am17an

* cuda: snake fusion check add->type matches x->type

Moved for readability (equivalent)
Address review feedback from @am17an

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

2026-05-08 17:44:09 +08:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: lower-case PCI bus id, standardize for ggml (#22820 )

2026-05-08 10:09:38 +02:00

src

cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667 )

2026-05-08 17:44:09 +08:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.11.0 (ggml/1478)

2026-05-05 13:15:59 +03:00