Commit Graph

  • f76565db92 common: map developer role to system (#20215) b8256 Piotr Wilkin (ilintar) 2026-03-09 14:25:11 +01:00
  • 43e1cbd6c1 models : fix assert in mamba2 graph (#20270) b8255 Georgi Gerganov 2026-03-09 13:15:15 +02:00
  • 107d599952 server : add kill switch when server is stuck (#20277) b8254 Georgi Gerganov 2026-03-09 10:33:12 +02:00
  • e8bbc736cb ggml-cuda: disable gdn for musa (#20278) b8253 Aman Gupta 2026-03-09 16:15:36 +08:00
  • b518195101 llama-quant : left-align tensor names in output (#20117) b8252 ddh0 2026-03-09 02:28:41 -05:00
  • e2763a6723 contributing: limit open PRs for new contributors to 1 (#20036) Aman Gupta 2026-03-09 15:05:34 +08:00
  • 0beb8db3a0 ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (#20219) b8250 Bertay Eren 2026-03-09 09:24:16 +03:00
  • b2f460bd3c vulkan: skip zero size tensors in backend copies (#20233) b8249 Ruben Ortlam 2026-03-09 07:23:45 +01:00
  • 5f4cdac385 cuda : display total and free VRAM capacity during device initialization (#20185) b8248 Michael Huang 2026-03-08 21:45:43 -07:00
  • ae87863dc1 llama-bench: introduce -hf and -hff flags & use --mmap 1 by default (#20211) b8247 Aaron Teo 2026-03-09 09:05:44 +08:00
  • 97c64fbdbd PEG parser for LFM2 (#20251) b8246 Piotr Wilkin (ilintar) 2026-03-09 01:11:22 +01:00
  • d417bc43dd server : do not create checkpoints right after mtmd chunks (#20232) b8245 Georgi Gerganov 2026-03-08 22:16:46 +02:00
  • 35bee031e1 graph : remove redundant scale_w parameter (#20235) b8244 Sigbjørn Skjæret 2026-03-08 18:58:28 +01:00
  • 451ef08432 common : gracefully handle incomplete output (#20191) b8243 Aldehir Rojas 2026-03-08 11:17:02 -05:00
  • 9b24886f78 Fix compile bug (#20203) b8242 Piotr Wilkin (ilintar) 2026-03-08 17:15:49 +01:00
  • 62b8143ad2 Fix structured outputs (#20223) b8241 Piotr Wilkin (ilintar) 2026-03-08 17:14:43 +01:00
  • d088d5b74f ggml-vulkan: Add ELU op support (#20183) b8240 GiantPrince 2026-03-08 07:38:17 -04:00
  • cd18a50ea5 vulkan: Fix data races in coopmat1 mul_mat(_id) (#20084) b8239 Jeff Bolz 2026-03-08 06:33:48 -05:00
  • a976ff081b llama: end-to-end tests (#19802) b8238 Johannes Gäßler 2026-03-08 12:30:21 +01:00
  • a95047979a readme : update infra list (#20212) Christopher Maher 2026-03-08 03:42:28 -07:00
  • b283f6d5b3 Revert to OAI-compatible args (#20213) b8236 Piotr Wilkin (ilintar) 2026-03-08 11:33:03 +01:00
  • ff52ee964d server : correct index on finish in OAI completion streams (#20226) b8235 decahedron1 2026-03-08 04:08:57 -05:00
  • 213c4a0b81 [SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190) b8234 Neo Zhang 2026-03-08 12:00:07 +08:00
  • 715ed28683 use scalar sums 0cc4m/vulkan-coopmat-int8 Ruben Ortlam 2026-03-07 22:11:40 +01:00
  • a9435151db apply scales inline Ruben Ortlam 2026-03-07 14:56:25 +01:00
  • d1f8bbd085 vulkan: add int8 coopmat quantized matmul shader Ruben Ortlam 2026-03-07 14:42:41 +01:00
  • c5a778891b ggml: add GATED_DELTA_NET op (#19504) b8233 Aman Gupta 2026-03-07 15:41:10 +08:00
  • 6fce5c6a7d opencl: add l2_norm (#20160) b8232 lhez 2026-03-06 18:03:05 -08:00
  • c024d85908 Autoparser: True streaming (#20177) b8231 Piotr Wilkin (ilintar) 2026-03-07 01:55:33 +01:00
  • 2f2923f895 Autoparser: add optional argument reshuffle capability (#20171) b8230 Piotr Wilkin (ilintar) 2026-03-06 22:34:15 +01:00
  • 649f06481e quants : Add memsets and other fixes for IQ quants (#19861) b8229 Bartowski 2026-03-06 16:06:56 -05:00
  • 7463687161 Add @pwilkin to CODEOWNERS for autoparser code (#20174) Piotr Wilkin (ilintar) 2026-03-06 21:25:41 +01:00
  • 566059a26b Autoparser - complete refactoring of parser architecture (#18675) b8227 Piotr Wilkin (ilintar) 2026-03-06 21:01:00 +01:00
  • 34df42f7be hexagon: add f32 ssm_conv op (#20122) b8226 Todor Boinovski 2026-03-06 09:59:26 -08:00
  • e68f2fb894 server : preserve anthropic thinking blocks in conversion (#20120) b8225 Tom Vaucourt 2026-03-06 17:41:12 +01:00
  • ba2fd11cdf cpu: skip redudant ROPE cache updates (#20149) b8224 Max Krasnyansky 2026-03-06 08:32:40 -08:00
  • d48e876467 ggml-cuda: add mem check for fusion (#19916) b8223 Aman Gupta 2026-03-07 00:05:43 +08:00
  • ba2ff79e43 ggml: update comments for backends which have no memory to report (#20157) b8222 Aaron Teo 2026-03-06 23:24:38 +08:00
  • c6980ff29d ggml-cpu: Fix gcc 15 ICE on ppc64le (#20083) (#20130) b8221 shalinib-ibm 2026-03-06 20:52:39 +05:30
  • 1e38a7a6fa CUDA: use shared mem for ssm_conv (#20128) b8220 Aman Gupta 2026-03-06 23:09:59 +08:00
  • 121fe62182 test pr/19802-test Georgi Gerganov 2026-03-06 16:30:32 +02:00
  • 388baabc06 context: ignore zero scale LoRAs when checking sameness (#20166) b8219 Tim Neumann 2026-03-06 14:05:52 +01:00
  • 573f2cf58e feat: add video support for Qwen3.5 andrewmd5 2026-03-06 21:37:07 +09:00
  • f5ddcd1696 Checkpoint every n tokens: squash (#20087) b8218 Piotr Wilkin (ilintar) 2026-03-06 11:39:26 +01:00
  • 803d3a1964 fix CI Johannes Gäßler 2026-03-06 10:16:05 +01:00
  • f6235a41ef webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655) Aleksander Grygier 2026-03-06 10:00:39 +01:00
  • b90486e51d fix WebGPU Johannes Gäßler 2026-03-05 22:50:22 +01:00
  • 7e466072f3 fix CI Johannes Gäßler 2026-03-05 15:26:38 +01:00
  • 4b7f407ae8 fix use-after-free in llama-model-loader.cpp Johannes Gäßler 2026-03-04 12:36:09 +01:00
  • e6a6af1cef fixup for rebase Johannes Gäßler 2026-03-03 21:27:02 +01:00
  • fc6960347b tests: add end-to-end tests per model architecture Johannes Gäßler 2026-02-21 11:15:32 +01:00
  • 2850bc6a13 ggml-cpu: fix data race for debug asserts (#20148) b8216 Johannes Gäßler 2026-03-06 09:12:49 +01:00
  • 17a4258946 kv-cache : fix M-RoPE checkpoints (#20132) b8215 Georgi Gerganov 2026-03-06 08:46:51 +02:00
  • f7db3f3789 cli : Don't clear system prompt when using '/clear' (#20067) b8214 Roj234 2026-03-06 13:41:11 +08:00
  • 6c97bffd65 opencl: add neg, exp and diag (#20127) b8213 lhez 2026-03-05 21:16:39 -08:00
  • 2b10b62677 hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) b8212 YardenTal44 2026-03-06 04:29:13 +02:00
  • a0ed91a442 models : kda chunk size = 16 (#19827) ymcki 2026-03-05 23:01:23 +08:00
  • 2cd20b72ed CUDA: Improve performance via less synchronizations between token (#17795) b8210 Andreas Kieslinger 2026-03-05 12:53:21 +01:00
  • 872646b30c model : update Qwen3.5 model type detection (#20126) b8209 Eric Zhang 2026-03-05 19:47:14 +08:00
  • b5ed0e058c cli : add command and file auto-completion (#19985) b8208 Sigbjørn Skjæret 2026-03-05 10:47:28 +01:00
  • cf232515c9 convert : register Qwen 3.5 ForCausalLM for text only (#20119) Sigbjørn Skjæret 2026-03-05 10:30:02 +01:00
  • 5e335ba113 webui: Improvements for Models Selector UI (#20066) Aleksander Grygier 2026-03-05 08:52:22 +01:00
  • 92f7da00b4 chore : correct typos [no ci] (#20041) Marcel Petrick 2026-03-05 08:50:21 +01:00
  • 7a99dc85e2 hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (#20118) b8204 Max Krasnyansky 2026-03-04 21:55:29 -08:00
  • 69fd345335 opencl: add SET, support i32 for CPY, minor refactor for cpy (#20101) b8203 lhez 2026-03-04 21:32:26 -08:00
  • 1a29907d2e hexagon: add llama-completion runner script (#20095) b8202 Todor Boinovski 2026-03-04 15:04:59 -08:00
  • 24d2ee0527 [WebGPU] Fix wait logic for inflight jobs (#20096) b8201 Nikhil Jain 2026-03-04 11:54:55 -08:00
  • 541bf37622 Add concat op to webgpu. (#20068) b8200 Masashi Yoshimura 2026-03-05 04:19:00 +09:00
  • d969e933e1 tools : add missing clocale include in mtmd-cli [no ci] (#20107) Sigbjørn Skjæret 2026-03-04 14:18:04 +01:00
  • 7f5ee54968 ggml: fix ggml_is_contiguous_n for ne == 1 (#20092) b8198 Johannes Gäßler 2026-03-04 12:04:31 +01:00
  • 66199c9f03 ggml : use a simple std::thread in AMX without OpenMP (#20074) b8197 Adrien Gallouët 2026-03-04 11:57:09 +01:00
  • c99909dd0b impl : use 6 digits for tensor dims (#20094) b8196 ddh0 2026-03-04 02:53:38 -06:00
  • cb8f4fa3f8 Fix locale-dependent float printing in GGUF metadata (#17331) b8195 SamareshSingh 2026-03-04 02:30:40 -06:00
  • 54910bd4f3 completion : Fix a typo in warning message (#20082) b8194 standby24x7 2026-03-04 14:44:49 +09:00
  • ecd99d6a9a docs: Fix intel documentation link (#20040) b8193 Mickael Desgranges 2026-03-03 14:50:00 +01:00
  • 137435ff15 kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (#20043) b8192 Charles Xu 2026-03-03 10:40:26 +01:00
  • 24350fdf9b opencl: add optimized q4_1 mm kernel for adreno (#19840) b8191 shaofeiqi 2026-03-02 19:49:41 -08:00
  • 49a7564ac1 ggml webgpu: fix workgroup dispatch limit for large batch sizes (#19965) b8190 Abhijit Ramesh 2026-03-02 19:35:11 -08:00
  • 4d828bd1ab ggml webgpu: Clean up per-thread parameter buffer pool and job submission logic (#19772) b8189 Nikhil Jain 2026-03-02 10:23:34 -08:00
  • 36a7a6589c ggml-webgpu: Support non-contiguous src0 and overlapping src0/src1 in binary ops (#19850) b8188 Masashi Yoshimura 2026-03-03 00:59:53 +09:00
  • feefb92836 vulkan: tune MMVQ for Intel Windows (#19988) b8187 Ruben Ortlam 2026-03-02 15:58:25 +01:00
  • ec88c3ceea scripts : improve get-wikitext-2.sh (#19952) Adrien Gallouët 2026-03-02 15:40:49 +01:00
  • 2afcdb9777 ggml-cpu: optimise s390x multiply extend instructions (#20032) b8185 Aaron Teo 2026-03-02 16:23:56 +08:00
  • 319146247e vulkan: improve partial offloading performance on AMD (#19976) b8184 Ruben Ortlam 2026-03-01 17:32:14 +01:00
  • 66d65ec29b cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernels (#19999) b8183 oobabooga 2026-03-01 02:40:22 -03:00
  • 05728db18e vendors : update miniaudio library to 0.11.24 (#19914) b8182 Dmitry Atamanov 2026-02-28 20:10:01 +05:00
  • 4720819d45 vendor : update cpp-httplib to 0.35.0 (#19969) b8181 Adrien Gallouët 2026-02-28 13:53:56 +01:00
  • d979f2b176 tests : model metadata loading from huggingface (#19796) b8180 Bartowski 2026-02-28 04:44:38 -05:00
  • 07e2c9707c eagle3: support --eagle3 in llama-cli ruixiangw 2026-02-28 00:33:54 +00:00
  • ecbcb7ea9d CUDA: add CDNA3 MFMA support for flash attention MMA kernel (#19806) b8179 Jayant Lohia 2026-02-28 00:07:26 +05:30
  • 3e6ab244ad server: Add pragma once to server-context.h (#19944) b8178 Roj234 2026-02-28 01:28:36 +08:00
  • 5596a35791 server: Mirroring /v1/responses to /responses to match /v1/chat/completions pattern (#19873) b8177 Sami Kama 2026-02-27 08:44:42 -08:00
  • 8d3b962f47 ci : use ubuntu-latest for gguf-publish workflow (#19951) Daniel Bevenius 2026-02-27 14:42:24 +01:00
  • d903f30e25 ggml-cpu: add repack for mxfp4 (#19738) b8175 Aman Gupta 2026-02-27 18:15:09 +08:00
  • 8387ffb28d gguf-py : dump version to 0.18.0 (#19950) gguf-v0.18.0 Daniel Bevenius 2026-02-27 11:02:53 +01:00
  • 2e7e638523 server : support multiple model aliases via comma-separated --alias (#19926) b8173 Pascal 2026-02-27 07:05:23 +01:00
  • a8b192b6ec tests : enable test-chat out of tree build (#19558) b8172 Jan Patrick Lehr 2026-02-27 05:37:54 +01:00
  • c17dce4f5c replace the magic nunber 768 by max work group size to support iGPU (#19920) b8171 Neo Zhang 2026-02-27 09:26:07 +08:00
  • 88cf781f51 ggml-zendnn: update code for latest ZenDNN API (#19923) b8170 Vishal Singh 2026-02-27 06:13:41 +05:30
  • 4e76d24f28 ggml : fix AMX and add batched support (#19925) b8169 Adrien Gallouët 2026-02-26 21:39:11 +01:00