llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 03:24:21 +00:00

Files

Reese Levine 82677a6ede ggml-webgpu: compute pass batching and removing profiling overhead (#21873 )

* Update register tiling matmul to use f32 accumulation

* fix profiling code

* Fix register tiling matmul for chrome, i'm blaming dawn

* Update batch tuning value for iOS

* compile fix

* Fix use of new load function

* Move to a single query set for GPU profiling

* Move to batching compute passes when not profiling

* Refactor build_multi

* remove iOS throttling now that we're batching compute passes

2026-04-16 11:12:19 +03:00

wgsl-shaders

ggml-webgpu: Fix dequantization helpers to not pass in pointers (#21872 )

2026-04-15 09:14:40 -07:00

CMakeLists.txt

ggml webgpu: add support for emscripten builds (#17184 )

2025-12-03 10:25:34 +01:00

ggml-webgpu-shader-lib.hpp

ggml-webgpu: address quantization precision and backend lifecycle managment (#21521 )

2026-04-10 10:52:01 -07:00

ggml-webgpu.cpp

ggml-webgpu: compute pass batching and removing profiling overhead (#21873 )

2026-04-16 11:12:19 +03:00

pre_wgsl.hpp

ggml webgpu: initial flashattention implementation (#18610 )

2026-01-08 08:23:39 -08:00