Commit Graph

  • 5a32a9b8a5 Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507) b8336 Rail Chabdarov 2026-03-14 06:19:44 +01:00
  • 3b439504ba opencl: fix l2_norm (#20480) lhez 2026-03-13 22:18:52 -07:00
  • 463b6a963c tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954) b8334 Adrien Gallouët 2026-03-13 21:25:57 +01:00
  • e30f1fdf74 graph : remove redundant GDN state transposes (#20443) b8333 Georgi Gerganov 2026-03-13 22:12:54 +02:00
  • 1430c35948 common/parser: gracefully handle undetected tool parser, print error message. (#20286) b8332 Piotr Wilkin (ilintar) 2026-03-13 20:56:10 +01:00
  • f17b3be63f llama : fix pooling assertion crash in chunked GDN detection path (#20468) b8331 ZeroV0LT 2026-03-13 19:53:42 +01:00
  • d7ba99c485 server: reset counter related to kill-switch on client error (#20513) b8330 SoftwareRenderer 2026-03-13 13:58:09 -04:00
  • eebf21c3e9 don't use initializer list for semaphore wait info Ruben Ortlam 2026-03-13 17:14:47 +01:00
  • fbaa95bc29 ggml-cpu: add RVV vec dot kernels for quantization types (#18859) b8329 rehan-10xengineer 2026-03-13 20:36:04 +05:00
  • 08a4ba6f03 use timeline semaphores instead of fences for event_synchronize Ruben Ortlam 2026-03-13 16:02:51 +01:00
  • b5e1212063 ggml : fix typo gmml (#20512) b8328 Adrien Gallouët 2026-03-13 14:36:13 +01:00
  • 2204bcedc8 also reset command buffers before reuse Ruben Ortlam 2026-03-13 13:53:23 +01:00
  • c0d100e0fc fix event command buffer reset validation error Ruben Ortlam 2026-03-13 13:49:31 +01:00
  • 58deae173e vulkan: fix event wait submission, event command buffer reset Ruben Ortlam 2026-03-13 13:40:53 +01:00
  • 8f974d2392 mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105) b8327 Daniel Bevenius 2026-03-13 12:30:02 +01:00
  • 2948e6049a general: CONTRIBUTING.md - guidelines for quantization schemes (#19762) Piotr Wilkin (ilintar) 2026-03-13 12:21:33 +01:00
  • 73c9eb8ced metal : fix l2 norm scale (#20493) b8325 Georgi Gerganov 2026-03-13 11:43:20 +02:00
  • 5ec6569eb5 unify scalar+vector and fix reduce function 0cc4m/vulkan-slang-flash-attention Ruben Ortlam 2026-03-13 09:23:03 +01:00
  • e880cb2e0d Revert "move kv shmem staging to function" Ruben Ortlam 2026-03-13 08:18:33 +01:00
  • 0349025db8 move kv shmem staging to function Ruben Ortlam 2026-03-09 15:02:25 +01:00
  • 2c623bfaea generic reductions Ruben Ortlam 2026-03-06 08:09:56 +01:00
  • e1b40fa53a fix slang issues Ruben Ortlam 2026-03-05 11:35:45 +01:00
  • a4ac1d903a vulkan: port Flash Attention shader to Slang Ruben Ortlam 2026-03-04 12:29:09 +01:00
  • 983df142a9 convert : fix/suppress pyright errors (#20442) Daniel Bevenius 2026-03-13 06:00:52 +01:00
  • 57819b8d4b llama : disable graph reuse with pipeline parallelism (#20463) b8323 Georgi Gerganov 2026-03-12 21:04:13 +02:00
  • 95ae9982d3 Merge branch 'master' into compilade/imatrix-neutral-prior compilade/imatrix-neutral-prior Francis Couture-Harpin 2026-03-12 13:20:00 -04:00
  • f45b59a5c3 Revert "quantize : assume the neutral prior is equal imatrix weights" Francis Couture-Harpin 2026-03-12 13:11:53 -04:00
  • 7ded1269ab unify matmul_id shader selection 0cc4m/vulkan-mm-pipeline-selection Ruben Ortlam 2026-03-12 14:55:12 +01:00
  • 664dfc7730 vulkan: unify matmul shader selection Ruben Ortlam 2026-03-12 14:47:18 +01:00
  • 557fe2d913 vendor : update cpp-httplib to 0.37.1 (#20390) b8322 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-12 09:57:06 -03:00
  • 0e810413bb tests : use reasoning instead of reasoning_budget in server tests (#20432) Piotr Wilkin (ilintar) 2026-03-12 13:41:01 +01:00
  • 128142fe7d test-backend-ops: allow loading tests from file and parsing model operators into file (#19896) b8320 Ruben Ortlam 2026-03-12 13:26:00 +01:00
  • 6de1bc631d common : update completion executables list [no ci] (#19934) Daniel Bevenius 2026-03-12 12:12:01 +01:00
  • 0a10c34dc1 grammar: Fix grammar root symbol check (#19761) b8318 Asbjørn Olling 2026-03-12 12:04:56 +01:00
  • deee23863b vulkan: add GATED_DELTA_NET op support (#20334) b8317 ProgenyAlpha 2026-03-12 06:32:04 -04:00
  • c3e3f9e533 convert : better mtp check and fix return [no ci] (#20419) Sigbjørn Skjæret 2026-03-12 10:04:20 +01:00
  • 40c550d4f6 vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379) b8315 ProgenyAlpha 2026-03-12 05:03:18 -04:00
  • de190154c8 New conversations now auto-select the first loaded model (#20403) Pascal 2026-03-12 09:07:05 +01:00
  • 05039967da ggml-virtgpu: Fix some build commands (#20341) Masashi Yoshimura 2026-03-12 16:47:45 +09:00
  • e4cff0956b metal : avoid divisions in bin kernel (#20426) b8312 Georgi Gerganov 2026-03-12 09:42:40 +02:00
  • 4cc6eb158c ci: Setup self-hosted CI for Intel Linux Vulkan backend (#20154) Masato Nakasaka 2026-03-11 22:43:22 -07:00
  • 246ffc4b05 vulkan: fix l2_norm epsilon handling (#20350) b8310 Jeff Bolz 2026-03-12 00:39:41 -05:00
  • aa429cf507 vulkan: fix OOB check in flash_attn_mask_opt (#20296) b8309 Jeff Bolz 2026-03-12 00:35:49 -05:00
  • 5866e3bbc8 vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (#20059) b8308 Masato Nakasaka 2026-03-11 22:30:16 -07:00
  • 0516e04bf9 opencl: use larger workgroup size for get_rows (#20316) lhez 2026-03-11 22:03:27 -07:00
  • 3d9ab225e7 opencl: add cumsum op (#18981) shaofeiqi 2026-03-11 22:03:07 -07:00
  • d63aa398de hip: compile debug builds with -O2 on hip to avoid a compiler bug (#20392) b8305 uvos 2026-03-12 03:37:10 +01:00
  • a8304b4d27 common/parser: add GigaChatV3/3.1 models support (#19931) b8304 Mishusha 2026-03-12 03:22:25 +03:00
  • fdb17643d3 model : add support for Phi4ForCausalLMV (#20168) b8303 DAN™ 2026-03-11 19:25:54 -04:00
  • 1eea6a2968 graph : add optional scale parameter to build_lora_mm [no ci] (#20427) Richard Davison 2026-03-12 00:22:49 +01:00
  • 4a748b8f15 common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (#20416) b8301 ddh0 2026-03-11 18:13:28 -05:00
  • f2ab047f27 ggml-webgpu: Add supports for GGML_OP_REPEAT (#20230) b8300 Masashi Yoshimura 2026-03-12 06:40:36 +09:00
  • 20fbf04cd6 metal : fix capture_started flag gg/metal-bin-mod Georgi Gerganov 2026-03-11 23:15:16 +02:00
  • a71b566137 metal : avoid modulus in bin kernel when not broadcasting Georgi Gerganov 2026-03-11 19:33:03 +02:00
  • d28961d81e llama : enable chunked fused GDN path (#20340) b8299 Georgi Gerganov 2026-03-11 22:46:40 +02:00
  • f90bd1dd84 llama : whitespace cleanup (#20422) b8298 Sigbjørn Skjæret 2026-03-11 21:18:29 +01:00
  • 5eae9cb1d9 ggml : add NVFP4 quantization type support (#19769) b8297 Richard Davison 2026-03-11 21:02:54 +01:00
  • 3ca19b0e9f benches : add nemotron super (#20420) Georgi Gerganov 2026-03-11 21:39:40 +02:00
  • eaf1d7930c llama : add support for Nemotron 3 Super (#20411) b8295 Daniel Bevenius 2026-03-11 19:27:53 +01:00
  • 76ea1c1c46 metal : fix capture_compute counter logic (#20410) Georgi Gerganov 2026-03-11 18:38:22 +02:00
  • 8c8544f9fb metal : fix capture_compute counter logic gg/metal-capture-env-cont Georgi Gerganov 2026-03-11 18:35:45 +02:00
  • bd1ec818e9 compare-llama-bench: check remotes as well (#20406) Aman Gupta 2026-03-12 00:14:42 +08:00
  • b1f856af72 kleidiai: revert unrelated requirements change Martin Klacer 2026-03-11 15:24:43 +00:00
  • b541241104 metal : fix q5_k mul_mv register spill (#20399) b8292 Georgi Gerganov 2026-03-11 16:25:27 +02:00
  • c363256839 metal : add env var to trigger graph capture (#20398) b8291 Georgi Gerganov 2026-03-11 16:25:10 +02:00
  • ecac98ee53 [SYCL] Update SYCL.md for binary package for Windows (#20401) Neo Zhang 2026-03-11 22:21:22 +08:00
  • 182acfe5c5 ci: disable coopmat on ubuntu-24-cmake-vulkan job (#20294) Ruben Ortlam 2026-03-11 14:12:29 +01:00
  • db8ea663c7 kleidiai: add cpu feature detection to CI run script Martin Klacer 2026-03-06 16:45:01 +00:00
  • b5fe4559ae common/parser: use nlohmann::ordered_json to preserve parameter order (#20385) Aldehir Rojas 2026-03-11 04:26:51 -05:00
  • acb7c79069 common/parser: handle reasoning budget (#20297) b8287 Piotr Wilkin (ilintar) 2026-03-11 10:26:12 +01:00
  • 5f91b1d5d5 ggml-cuda: gdn use shared mem for HIP (#20366) b8286 uvos 2026-03-11 06:06:19 +01:00
  • 9ef7523ee9 cuda/hip: fix loop unrolling in ssm-conv (#20369) b8285 uvos 2026-03-11 06:04:32 +01:00
  • 00de615345 Fix agentic mcp image single model (#20339) b8284 Pascal 2026-03-11 05:31:33 +01:00
  • e1a399992b vendor : update cpp-httplib to 0.37.0 (#20207) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-11 00:03:53 -03:00
  • 4f2f0a163d vendor : update miniaudio to 0.11.25 (#20209) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-11 00:01:56 -03:00
  • 0cec84f999 fix op rope, add rope_back (#20293) b8281 Neo Zhang 2026-03-11 09:53:34 +08:00
  • b2e1427c9b fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (#20283) b8280 Neo Zhang 2026-03-11 09:53:05 +08:00
  • 4d99d45084 model : qwen3vl reranker text support (#20332) b8279 Vinicios Lugli 2026-03-10 19:40:14 -03:00
  • 10e5b148b0 llama-quant : correct n_attention_wv usage (#20357) b8278 ddh0 2026-03-10 14:43:29 -05:00
  • 90b2731894 ggml : bump RPC version (#20330) b8277 Georgi Gerganov 2026-03-10 21:36:57 +02:00
  • aa2d278a11 ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173) b8276 Reese Levine 2026-03-10 09:14:27 -07:00
  • 6c770d16ca Reduce level of content parser warning message to avoid log spam on non-debug verbosity (#20347) Piotr Wilkin (ilintar) 2026-03-10 15:21:51 +01:00
  • 8d880ac012 examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968) Ray Xu 2026-03-10 21:38:18 +08:00
  • 0f1e9d14cc docs: update CPU backend ops to mark POOL_1D as supported (#20304) a3894281 2026-03-10 15:31:24 +02:00
  • 1274fbee9e models : fix assert in mamba2 (cont) (#20335) b8272 Georgi Gerganov 2026-03-10 15:00:08 +02:00
  • a7b3dee7a5 server : make 2 checkpoints near the end of the prompt (#20288) b8271 Georgi Gerganov 2026-03-10 14:28:23 +02:00
  • ec947d2b16 common : fix incorrect uses of stoul (#20313) b8270 Sigbjørn Skjæret 2026-03-10 11:40:26 +01:00
  • 0cd4f4720b kleidiai : support for concurrent sme and neon kernel execution (#20070) b8269 Charles Xu 2026-03-10 08:25:25 +01:00
  • af237f3026 ggml-cpu: add RVV repack GEMM and GEMV for quantization types (#19121) b8268 Taimur Ahmad 2026-03-10 11:49:52 +05:00
  • 1a5631beaa metal: handle command buffer failures gracefully in synchronize (#20306) b8267 Julian Pscheid 2026-03-09 23:32:24 -07:00
  • 1dab5f5a44 llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770) b8266 ddh0 2026-03-10 01:16:05 -05:00
  • c96f608d98 common: consolidate PEG string parsers (#20263) b8265 Aldehir Rojas 2026-03-09 18:29:21 -05:00
  • 0842b9b465 model: fix step3.5 n_rot (#20318) b8264 Xuan-Son Nguyen 2026-03-09 23:42:24 +01:00
  • 59db9a357d llama: dynamic head_dim and n_rot for SWA (#20301) b8263 Xuan-Son Nguyen 2026-03-09 22:22:39 +01:00
  • 23fbfcb1ad server: Parse port numbers from MCP server URLs in CORS proxy (#20208) b8262 Evan Huus 2026-03-09 12:47:54 -04:00
  • e22cd0aa15 metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250) b8261 Paul Flynn 2026-03-09 10:48:12 -04:00
  • 96cfc4992c server : fix checkpoints n_tokens calculation (#20287) b8260 Georgi Gerganov 2026-03-09 16:47:06 +02:00
  • ed0007aa32 metal : add upscale (#20284) b8259 Georgi Gerganov 2026-03-09 16:45:11 +02:00
  • 344ee2a38a server : warn swa-full is not supported for non-SWA models (#20291) b8258 Georgi Gerganov 2026-03-09 16:44:25 +02:00
  • d6e1556499 server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279) Georgi Gerganov 2026-03-09 16:43:38 +02:00