llama.cpp/ggml/src at b8893 - llama.cpp - Gitea: Git with a cup of tea

sdgoij/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-10 19:14:07 +00:00

Files

History

Nikhil Jain 0d0764dfd2 [WebGPU] Implement async tensor api and event api (#22099 )

* Only run webgpu CI on my fork

* Implement set_tensor_async

* Implement synchronize api

* Implement event creation and deletion API

* Cleanup

* Cleanup

* Comment out jobs for local CI run

* Add webgpu only workflow

* Delete .github/workflows/build-webgpu.yml

* Cleanup

* Cleanup

* Update API with function handlers

* Run clang-format

* Replace one-shot buffer with a direct queue.WriteBuffer using the buffer context

2026-04-22 10:52:01 -07:00

..

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636 )

2026-04-20 19:02:54 +03:00

ggml-cuda: flush legacy pool on OOM and retry (#22155 )

2026-04-20 23:30:38 +02:00

hexagon: add support for FILL op (#22198 )

2026-04-21 16:24:20 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

metal : workaround macOS GPU interactivity watchdog (#22216 )

2026-04-21 17:24:55 +03:00

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

2026-03-22 11:05:51 +01:00

opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for Adreno (#21938 )

2026-04-16 22:28:33 -07:00

openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944 )

2026-04-21 18:58:34 +03:00

rpc : refactor the RPC transport (#21998 )

2026-04-19 10:21:53 +03:00

sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (#22119 )

2026-04-22 20:32:56 +08:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

vulkan: Support F16 OP_FILL (#22177 )

2026-04-21 11:01:56 +02:00

[WebGPU] Implement async tensor api and event api (#22099 )

2026-04-22 10:52:01 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

CMakeLists.txt

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-alloc.c

ggml : remove ggml-ext.h (#21869 )

2026-04-14 17:32:58 +03:00

ggml-backend-dl.cpp

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-dl.h

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-impl.h

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-backend-meta.cpp

Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129 )

2026-04-20 18:25:39 +02:00

ggml-backend-reg.cpp

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00

ggml-backend.cpp

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-common.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-impl.h

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-opt.cpp

fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592 )

2026-04-08 17:40:15 +02:00

ggml-quants.c

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-quants.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00