Commit Graph

  • 5764d7c6a6 gemma : perform per-layer projections in the first layer (#21612) b8711 Georgi Gerganov 2026-04-08 16:06:30 +03:00
  • 87f4744a80 examples : disable cb_eval callback for --save-logits (#21553) b8710 Daniel Bevenius 2026-04-08 14:10:33 +02:00
  • 85d482e6b6 parser: fix MiniMax handling (#21573) b8709 Piotr Wilkin (ilintar) 2026-04-08 12:47:25 +02:00
  • ae65fbdf33 tests : remove obsolete .mjs script (#21615) b8708 Georgi Gerganov 2026-04-08 13:20:46 +03:00
  • 3bd9aa1f92 chore: Update labeler to have separate labels for server/webui and server changes (#21567) Aleksander Grygier 2026-04-08 10:35:31 +02:00
  • ece522f98c chore: Remove legacy files (#21606) Aleksander Grygier 2026-04-08 09:55:08 +02:00
  • 09343c0198 model : support step3-vl-10b (#21287) b8705 forforever73 2026-04-08 15:51:31 +08:00
  • 97508acb17 webui: fix syntax highlighting lost after streaming for non-common languages (#21206) Hamish M. Blair 2026-04-07 23:58:08 -07:00
  • 5c4aae66e1 devops: kleidiai: provide KleidiAI-Enabled ARM Release Artifact (#21259) b8703 Martin Klacer 2026-04-08 06:06:12 +01:00
  • c5ce4bc227 CUDA: make cuda graphs props check faster (#21472) b8702 Aman Gupta 2026-04-08 09:05:51 +08:00
  • 66c4f9ded0 ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168) b8701 iacopPBK 2026-04-07 21:47:42 +02:00
  • 93bdc61563 gguf-py : fix missing comma after bad merge in tensor-mapping (#21558) Daniel Bevenius 2026-04-07 21:24:25 +02:00
  • 4eb19514dd kv-cache : support attention rotation for heterogeneous iSWA (#21513) b8699 Georgi Gerganov 2026-04-07 20:31:28 +03:00
  • 957d717ce5 ggml-webgpu: parameterize submission size and add iOS specific limits (#21533) b8698 Reese Levine 2026-04-07 10:30:01 -07:00
  • de1aa6fa73 CUDA: check for buffer overlap before fusing (#21566) b8697 Aman Gupta 2026-04-08 00:57:04 +08:00
  • 69c28f1547 llama-server: fix model params not propagated (#21509) b8696 Aaron Teo 2026-04-07 21:39:41 +08:00
  • 0d049d6a92 unicode : add custom Qwen2 regex handler to fix segfault on long input (#21257) Son H. Nguyen 2026-04-07 22:13:38 +09:00
  • a8ec0df461 llama: remove per-arch tensor name lists (#21531) b8694 Johannes Gäßler 2026-04-07 15:02:03 +02:00
  • e8f5082697 server : fix restore for checkpoints with pos_min == 0 (#21510) b8693 Georgi Gerganov 2026-04-07 15:29:17 +03:00
  • 22fc79134e ggml : deprecate GGML_OP_ADD1 (#21363) b8692 Georgi Gerganov 2026-04-07 15:28:27 +03:00
  • 2a619f6fbc ggml: Vulkan build, Linux -- output error string for errno on fork failure (#20868) (#20904) b8691 Tom Overlund 2026-04-07 07:54:55 -04:00
  • edd4d9bca5 vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029) b8690 mkoker 2026-04-07 07:41:29 -04:00
  • 482192f12d webui : store reasoning_content so it is sent back in subsequent requests (#21249) Aldehir Rojas 2026-04-07 06:32:44 -05:00
  • 71a81f6fcc ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) (#21519) b8688 Antoine Viallon 2026-04-07 12:18:55 +02:00
  • ecce0087da fix: Detect streaming state in reasoning content blocks (#21549) Aleksander Grygier 2026-04-07 12:04:41 +02:00
  • d1f82e382d Fix rtl text rendering (#21382) Kabir08 2026-04-07 15:07:20 +05:30
  • 0988accf82 [SYCL] Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (#21527) b8685 PMZFX 2026-04-07 04:12:49 -04:00
  • 0033f53a07 docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu) (#21518) b8684 Dmytro Romanov 2026-04-07 06:37:26 +02:00
  • d0a6dfeb28 ggml-webgpu: Add the support of MUL_MAT_ID (#21147) b8683 Masashi Yoshimura 2026-04-07 05:08:46 +09:00
  • 2e1f0a889e ggml: add Q1_0 1-bit quantization support (CPU) (#21273) b8682 Pasha Khosravi 2026-04-06 11:55:21 -07:00
  • 506200cf8b cli: fix stripping of \n in multiline input (#21485) b8681 Bipin Yadav 2026-04-07 00:24:06 +05:30
  • 15f786e658 [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel (#21159) b8680 Gaurav Garg 2026-04-07 00:04:29 +05:30
  • 94ca829b60 llama-bench: add -fitc and -fitt to arguments (#21304) b8679 Aman Gupta 2026-04-06 22:26:02 +08:00
  • 4aa962e2b0 vocab : add byte token handling to BPE detokenizer for Gemma4 (#21488) b8678 Aldehir Rojas 2026-04-06 09:08:37 -05:00
  • 941146b3f1 convert : fix block_ff_dim retrieval for lfm2 (#21508) Sigbjørn Skjæret 2026-04-06 14:05:18 +02:00
  • 482d862bcb server : handle unsuccessful sink.write in chunked stream provider (#21478) b8676 lainon1 2026-04-06 13:03:02 +01:00
  • 3979f2bb08 docs: add hunyuan-ocr gguf, also add test [no ci] (#21490) Xuan-Son Nguyen 2026-04-06 14:02:37 +02:00
  • 400ac8e194 convert : set "add bos" == True for Gemma 4 (#21500) Georgi Gerganov 2026-04-06 13:52:07 +03:00
  • f51fd36d79 sycl : handle other FA case (#21377) Neo Zhang 2026-04-06 18:28:00 +08:00
  • a30369d515 cpu: fix ARM NEON nvfp4 vec dot 0cc4m/cpu-arm-nvfp4-fix Ruben Ortlam 2026-04-06 10:26:50 +02:00
  • 25eec6f327 hexagon: slight optimization for argosrt output init (#21463) b8672 Yarden Tal 2026-04-06 04:30:25 +03:00
  • 58190cc84d llama : correct platform-independent loading of BOOL metadata (#21428) b8671 anchortense 2026-04-06 09:40:38 +10:00
  • af76639f72 model : add HunyuanOCR support (#21395) b8670 Richard Davison 2026-04-05 23:32:14 +02:00
  • 761797ffdf ci : use default RISE RISC-V Runners (#21263) Ludovic Henry 2026-04-05 20:29:48 +02:00
  • 5d3a4a7da5 server : fix logging of build + system info (#21460) b8668 ddh0 2026-04-05 09:14:02 -05:00
  • c08d28d088 ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438) b8667 M1DNYT3 2026-04-05 04:04:00 +03:00
  • 661e9acb36 ci: fix vulkan workflow referencing non-existent action (#21442) Nicholas Sparks 2026-04-04 20:59:51 -04:00
  • b8635075ff common : add gemma 4 specialized parser (#21418) b8665 Aldehir Rojas 2026-04-04 13:39:00 -05:00
  • 9c699074c9 server: Fix undefined timing measurement errors in server context (#21201) b8664 Dan Hoffman 2026-04-04 07:11:19 -07:00
  • d01f6274c0 common : respect specified tag, only fallback when tag is empty (#21413) b8663 Adrien Gallouët 2026-04-04 15:08:03 +02:00
  • 650bf14eb9 llama-model: read final_logit_softcapping for Gemma 4 (#21390) b8662 SamareshSingh 2026-04-04 06:05:10 -05:00
  • b7ad48ebda llama: add custom newline split for Gemma 4 (#21406) b8661 Aman Gupta 2026-04-04 15:06:34 +08:00
  • d006858316 ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278) b8660 Reese Levine 2026-04-03 11:40:14 -07:00
  • e439700992 ci: Add Windows Vulkan backend testing on Intel (#21292) Masato Nakasaka 2026-04-04 02:16:44 +09:00
  • 50e0ad08fb server: save and clear idle slots on new task (--clear-idle) (#20993) b8658 Yes You Can Have Your Own 2026-04-03 20:02:27 +03:00
  • f1f793ad06 common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) b8657 Piotr Wilkin (ilintar) 2026-04-03 17:51:52 +02:00
  • af5c13841f common : fix tool call type detection for nullable and enum schemas (#21327) b8656 Samanvya Tripathi 2026-04-03 11:51:23 -04:00
  • 277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920) M1DNYT3 2026-04-03 16:06:45 +03:00
  • 384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331) jeromew 2026-04-03 15:05:14 +02:00
  • 1f34806c44 jinja: coerce input for string-specific filters (#21370) b8653 Sigbjørn Skjæret 2026-04-03 15:03:33 +02:00
  • 887535c33f ci: add more binary checks (#21349) Aaron Teo 2026-04-03 20:50:00 +08:00
  • d3416a4aa9 fix: remove stale assert (#21369) b8651 Piotr Wilkin (ilintar) 2026-04-03 13:40:41 +02:00
  • 43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337) uvos 2026-04-03 11:38:22 +02:00
  • f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355) Tillerino 2026-04-03 11:21:07 +02:00
  • f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315) b8648 Vishal Singh 2026-04-03 14:49:08 +05:30
  • b069b10ab4 vocab: fix Gemma4 tokenizer (#21343) Piotr Wilkin (ilintar) 2026-04-03 10:33:03 +02:00
  • 0c58ba3365 rpc : reuse compute graph buffers (#21299) b8646 Radoslav Gerganov 2026-04-03 10:28:09 +03:00
  • 57ace0d612 chat : avoid including json in chat.h (#21306) b8645 Georgi Gerganov 2026-04-03 09:07:59 +03:00
  • 39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332) b8644 Georgi Gerganov 2026-04-03 09:07:01 +03:00
  • f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345) b8643 Vishal Singh 2026-04-03 08:05:15 +05:30
  • 7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066) b8642 Slobodan Josic 2026-04-03 00:59:20 +02:00
  • 5208e2d5ba fix: gemma 4 template (#21326) b8641 Piotr Wilkin (ilintar) 2026-04-02 23:31:02 +02:00
  • 7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112) b8640 Bartowski 2026-04-02 16:53:58 -04:00
  • a1cfb64530 ggml-webgpu: add vectorized flash attention (#20709) b8639 Zheyuan Chen 2026-04-02 10:40:42 -07:00
  • 5803c8d115 tests: allow exporting graph ops from HF file without downloading weights (#21182) b8638 Ruben Ortlam 2026-04-02 18:19:20 +02:00
  • 63f8fe0ef4 model, mtmd: fix gguf conversion for audio/vision mmproj (#21309) b8637 Xuan-Son Nguyen 2026-04-02 17:10:32 +02:00
  • 223373742b common : add commentary rules for gpt-oss-20b (#21286) Aldehir Rojas 2026-04-02 08:59:59 -05:00
  • e15efe007d Relax prefill parser to allow space. (#21240) b8635 Piotr Wilkin (ilintar) 2026-04-02 11:29:11 +02:00
  • 6137c325a1 chat : add Granite 4.0 chat template with correct tool_call role mapping (#20804) b8634 Jesus Talavera 2026-04-02 11:28:56 +02:00
  • 17193cce34 kv-cache : do not quantize SWA KV cache (#21277) Georgi Gerganov 2026-04-02 11:54:05 +03:00
  • d6dac92bfd Ignore Transfer-Encoding header. (#20269) Roger Chen 2026-04-02 16:41:19 +08:00
  • dae2bf41c9 sync : ggml b8631 Georgi Gerganov 2026-04-02 10:38:24 +03:00
  • bc07d55922 ggml : bump version to 0.9.11 (ggml/1456) Georgi Gerganov 2026-04-02 10:37:26 +03:00
  • 4888137b17 sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (#21283) b8629 Neo Zhang 2026-04-02 15:08:32 +08:00
  • fbd441c379 hexagon : add cumsum op support (#21246) b8628 Todor Boinovski 2026-04-01 17:44:02 -07:00
  • c30e012253 contrib : rewrite AGENTS.md, make it more clear about project values (#21270) copilot/compare-kv-shifting-implementation Xuan-Son Nguyen 2026-04-01 23:31:51 +02:00
  • 95a6ebabb2 opencl: fix leak in Adreno q8_0 path (#21212) b8626 lhez 2026-04-01 12:54:58 -07:00
  • 12dbf1da95 server: Bypass API Key validation for WebUI static bundle assets (#21269) b8625 Aleksander Grygier 2026-04-01 21:32:15 +02:00
  • 86221cf6da CUDA: fix FA kernel selection logic (#21271) b8624 Johannes Gäßler 2026-04-01 21:28:19 +02:00
  • 6de97b9d3e kleidiai: add CPU feature detection to CI run script (#20394) Martin Klacer 2026-04-01 18:02:41 +01:00
  • 5a0ed5150a Update Dawn version in WebGPU CI (#20784) Nikhil Jain 2026-04-01 09:53:05 -07:00
  • 8710e5f9b9 hexagon: improve RMS_NORM and DIV accuracy (#21251) Aparna M P 2026-04-01 21:13:08 +05:30
  • 1d6d4cf7a5 fix: tool call parsing for LFM2 and LFM2.5 models (#21242) Jonathan 2026-04-01 07:22:44 -07:00
  • 744c0c7310 llama : rotate activations for better quantization (#21038) Georgi Gerganov 2026-04-01 16:58:01 +03:00
  • 0356e33aaf scripts: add function call test script (#21234) Xuan-Son Nguyen 2026-04-01 15:31:58 +02:00
  • 6422036fcb sync : ggml Georgi Gerganov 2026-04-01 16:02:34 +03:00
  • 296bc0538b ggml : bump version to 0.9.10 (ggml/1454) Georgi Gerganov 2026-04-01 16:01:45 +03:00
  • 6b949d1078 sycl : support nvfp4 type in mul_mat (#21227) Neo Zhang 2026-04-01 18:54:15 +08:00
  • 84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074) Michael Wand 2026-04-01 03:04:58 -07:00
  • e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224) Ettore Di Giacinto 2026-04-01 11:50:17 +02:00