Commit Graph

  • f4b5bf2f32 ci : re-enable mac workflows (#21894) b8792 Georgi Gerganov 2026-04-14 15:58:09 +03:00
  • aa0f1897b7 metal : add XIELU unary op (#20802) b8791 Seyoung Jeong 2026-04-14 21:43:59 +09:00
  • be76dd0bb2 vendor : update BoringSSL to 0.20260413.0 (#21881) b8790 Adrien Gallouët 2026-04-14 13:25:09 +02:00
  • 2e05f06ffb ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (#21559) b8789 Richard Davison 2026-04-14 13:23:45 +02:00
  • acc37a42ea cmake: fix CMP0194 warning on Windows with MSVC (#21630) b8788 texasich 2026-04-14 05:47:56 -05:00
  • 5a23695d5a ggml-webgpu: Update register tiling matmul to use f32 accumulation (#21644) b8787 Reese Levine 2026-04-14 03:46:41 -07:00
  • 56666fa607 common: skip reasoning budget sampler when no budget is requested (#21870) b8786 Berk Idem 2026-04-14 06:43:06 -04:00
  • 6a6780a232 vulkan: Support GGML_TYPE_NVFP4 (#21455) b8785 Jeff Bolz 2026-04-14 11:34:23 +02:00
  • e489a5ca0e server: support OAI /v1/audio/transcriptions API (#21863) b8784 Xuan-Son Nguyen 2026-04-14 11:09:52 +02:00
  • 53fb592060 opt arc770 for Q4_0 arthw 2026-04-14 12:22:19 +08:00
  • e21cdc11a0 common/gemma4 : handle parsing edge cases (#21760) b8783 Aldehir Rojas 2026-04-13 18:18:18 -05:00
  • e974923698 docs: listing qwen3-asr and qwen3-omni as supported (#21857) Xuan-Son Nguyen 2026-04-13 22:28:17 +02:00
  • 1c0d9081fd chat: dedicated DeepSeek v3.2 parser + "official" template (#21785) b8781 Piotr Wilkin (ilintar) 2026-04-13 22:23:53 +02:00
  • a8bad3842e ci: Also exempt 'security' tag from auto-close (#21844) Christian Kastner 2026-04-13 19:18:44 +02:00
  • c5b682b25c various clean up mtmd-video-api Xuan Son Nguyen 2026-04-13 17:39:14 +02:00
  • f558360b32 Merge branch 'master' into video-support Xuan Son Nguyen 2026-04-13 15:40:05 +02:00
  • 75f3bc94e6 vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) b8779 Ruben Ortlam 2026-04-13 14:21:31 +02:00
  • aa00911d12 common : add download cancellation and temp file cleanup (#21813) b8778 Adrien Gallouët 2026-04-13 11:18:23 +02:00
  • ce8fd4b1a6 server: Expose build_info in router mode (#21835) b8777 Gaspard Petit 2026-04-13 05:14:42 -04:00
  • 9f5e1edb10 CUDA: Limit DeviceSegmentedSort to immediate mode (#21718) b8776 Oliver Simons 2026-04-13 11:14:06 +02:00
  • 920b3e78cb mtmd: use causal attn for gemma 4 audio (#21824) b8775 Xuan-Son Nguyen 2026-04-13 09:47:55 +02:00
  • 974c8c94cc webui: add setting for first-line chat titles (#21797) Rohan Jain 2026-04-13 13:00:46 +05:30
  • 227ed28e12 webui: MCP Diagnostics improvements (#21803) Aleksander Grygier 2026-04-13 07:58:38 +02:00
  • bafae27654 Remove extra conditional check on debug mode. (#21798) b8772 Masashi Yoshimura 2026-04-13 12:13:04 +09:00
  • 873c825611 sycl: disable Q1_0 in backend and cleanup unused variables (#21807) b8771 Akarshan Biswas 2026-04-13 07:14:58 +05:30
  • 82764d8f40 mtmd: fix crash when sending image under 2x2 pixels (#21711) b8770 Sergiu 2026-04-13 00:59:21 +03:00
  • 21a4933042 mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441) b8769 Xuan-Son Nguyen 2026-04-12 23:57:25 +02:00
  • 1e9d771e2c convert : force f16 or f32 on step3-vl conv weights (#21646) Sigbjørn Skjæret 2026-04-12 19:22:29 +02:00
  • aa4695c5e5 mtmd: add gemma 4 test (vision + audio) [no ci] (#21806) Xuan-Son Nguyen 2026-04-12 16:29:03 +02:00
  • 547765a93e mtmd: add Gemma 4 audio conformer encoder support (#21421) b8766 Stephen Cox 2026-04-13 00:15:26 +12:00
  • 9e209c5aee fix: Proper messages rendering for "Show raw output" (#21672) Aleksander Grygier 2026-04-12 13:08:11 +02:00
  • 6313acbef0 docs: add guide on how to add multimodal support (#21778) Xuan-Son Nguyen 2026-04-12 13:02:38 +02:00
  • ff5ef82786 CUDA: skip compilation of superfluous FA kernels (#21768) b8763 Johannes Gäßler 2026-04-11 18:52:11 +02:00
  • 073bb2c20b mtmd : add MERaLiON-2 multimodal audio support (#21756) b8762 Sirui He 2026-04-11 20:15:48 +08:00
  • af1127d3c4 opencl: add basic support for q5_k (#21593) b8761 shaofeiqi 2026-04-11 01:46:19 -07:00
  • 865ff06b2f TP: fix Qwen 3 Next data split (#21732) b8760 Johannes Gäßler 2026-04-11 09:23:42 +02:00
  • 2b2cd57de6 ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (#21716) b8759 Sigbjørn Skjæret 2026-04-11 08:45:00 +02:00
  • 660386f6f8 py : Bump typer to latest to fix huggingface_hub issue (#21701) Bartowski 2026-04-11 02:44:15 -04:00
  • a29e4c0b7b CUDA: also store node->src ne/nb for graph equality (#21736) b8757 Aman Gupta 2026-04-11 10:30:30 +08:00
  • b136b62cf9 fix: Fix broken structured output when using $refs in json_schema (#21699) b8756 Galunid 2026-04-11 01:26:36 +02:00
  • 81069a808a hexagon: add support for linux on snapdragon (#21707) b8755 Todor Boinovski 2026-04-10 15:57:23 -07:00
  • 9aa2807769 hexagon: improved Op queuing, buffer and cache management (#21705) b8754 Max Krasnyansky 2026-04-10 15:47:43 -07:00
  • 3fc65063d9 common : better align to the updated official gemma4 template (#21704) b8753 Aldehir Rojas 2026-04-10 16:12:53 -05:00
  • 05b3caaa48 common : add callback interface for download progress (#21735) b8752 Adrien Gallouët 2026-04-10 22:17:00 +02:00
  • e62fa13c24 model : make Gemma 4 shared-KV tail attn_k tensors optional on load (#21739) b8751 MoonRide303 2026-04-10 21:45:50 +02:00
  • bfd1f453cb ggml-webgpu: support non-square subgroup matrix configs for Intel GPUs (#21669) b8750 Rithik Sharma 2026-04-10 10:52:38 -07:00
  • e4fed9d08d ggml-webgpu: address quantization precision and backend lifecycle managment (#21521) b8749 Chen Yuan 2026-04-10 13:52:01 -04:00
  • 5dd102539b server : ignore --alias when using --models-preset (#21380) b8748 Adrien Gallouët 2026-04-10 17:42:56 +02:00
  • fb38d6f278 common : fix when loading a cached HF models with unavailable API (#21670) b8747 Adrien Gallouët 2026-04-10 16:37:46 +02:00
  • 0893f50f2d common: mark --split-mode tensor as experimental (#21684) b8746 Johannes Gäßler 2026-04-10 12:27:27 +02:00
  • f989a6e39e webui: Static build output improvements (#21667) Aleksander Grygier 2026-04-10 11:49:47 +02:00
  • d7ff074c87 common : enable reasoning budget sampler for gemma4 (#21697) b8744 Berk Idem 2026-04-10 05:49:14 -04:00
  • 3f8752b559 docs : fix broken link to ggml-openvino in OPENVINO.md (#21709) Belem Zhang 2026-04-10 15:50:08 +08:00
  • 7b69125331 vulkan: Support Q1_0 (#21539) b8742 Jeff Bolz 2026-04-10 01:35:27 -05:00
  • e095a482a0 common : add fluidity to the progress bar (#21671) b8741 Adrien Gallouët 2026-04-10 08:24:53 +02:00
  • e34f042154 CUDA: fuse muls (#21665) b8740 Aman Gupta 2026-04-10 10:24:09 +08:00
  • d132f22fc9 HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570) b8739 andyluo7 2026-04-09 22:13:32 +03:00
  • d6f3030047 ggml: backend-agnostic tensor parallelism (experimental) (#19378) b8738 Johannes Gäßler 2026-04-09 16:42:19 +02:00
  • 009a113326 ggml : check return value of CUB calls used in argsort and top-k (they all return cudaError_t) (#21676) b8737 fairydreaming 2026-04-09 15:17:11 +02:00
  • 4cabbe36e0 state 0cc4m/vulkan-async-p2p Ruben Ortlam 2026-04-09 13:00:31 +02:00
  • 9f001cae27 state Ruben Ortlam 2026-04-09 12:51:43 +02:00
  • 88335c0490 state Ruben Ortlam 2026-04-09 12:39:51 +02:00
  • c8ac02fa1b requirements : update transformers to 5.5.1 (#21617) Daniel Bevenius 2026-04-09 12:36:29 +02:00
  • 204023c897 state Ruben Ortlam 2026-04-09 12:36:15 +02:00
  • d88d722fc1 state Ruben Ortlam 2026-04-09 12:32:08 +02:00
  • 4ef9301e4d webui: add "Send message on Enter" setting (#21577) JvM 2026-04-09 12:26:27 +02:00
  • 96d9516329 state Ruben Ortlam 2026-04-09 12:25:27 +02:00
  • ddf03c6d9a common : fix ambiguous grammar rule in gemma4 (#21661) b8734 Aldehir Rojas 2026-04-09 05:25:07 -05:00
  • 26229755c5 common : simplify autoparser tagged parser rules (#21216) b8733 Aldehir Rojas 2026-04-09 05:24:20 -05:00
  • 057dba336e model: fix multimodal padding token for gemma3n/gemma4 (#21625) b8732 Xuan-Son Nguyen 2026-04-09 12:18:23 +02:00
  • 501aeed18f mtmd: support dots.ocr (#17575) b8731 Xuan-Son Nguyen 2026-04-09 12:16:38 +02:00
  • 8a108eddb4 state Ruben Ortlam 2026-04-09 12:05:15 +02:00
  • 47dde34e00 state Ruben Ortlam 2026-04-09 11:58:46 +02:00
  • 8d0e158076 state Ruben Ortlam 2026-04-09 11:51:39 +02:00
  • aade0f81dd state Ruben Ortlam 2026-04-09 11:42:50 +02:00
  • 0ec191e1d7 vocab: add gemma4 tokenizer tests, fix edge case (#21534) b8730 Piotr Wilkin (ilintar) 2026-04-09 11:41:14 +02:00
  • 243532e556 jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623) b8729 Kwa Jie Hao 2026-04-09 17:28:33 +08:00
  • 700270239d state Ruben Ortlam 2026-04-09 11:24:21 +02:00
  • ddaafa3dc1 state Ruben Ortlam 2026-04-09 11:11:17 +02:00
  • e5e0be0add state Ruben Ortlam 2026-04-09 11:00:36 +02:00
  • 5e9c635463 metal : add missing mm-id specializations for q1_0 (#21662) b8728 Georgi Gerganov 2026-04-09 10:54:00 +03:00
  • 9949ad08f6 fix: Model Selector choice sync (#21628) Aleksander Grygier 2026-04-09 09:46:27 +02:00
  • 3ee9da0e4f server : fix grammar commandline args (#21543) b8726 AUTOMATIC1111 2026-04-09 10:16:54 +03:00
  • 75511a8d7e webui: Add option to pre-encode conversation for faster next turns (#21034) Aleksander Grygier 2026-04-09 09:10:18 +02:00
  • b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654) b8724 Akarshan Biswas 2026-04-09 12:06:48 +05:30
  • 8a65a7a8ee ci: drop v5 all: composition from labeler.yml (#21627) Marxist-Leninist 2026-04-09 07:20:19 +01:00
  • 3c4eae7dc9 state Ruben Ortlam 2026-04-09 07:50:05 +02:00
  • 7e2799c8c9 state Ruben Ortlam 2026-04-09 07:40:02 +02:00
  • 8a132faaa0 vulkan: unify type macros to use Vx instead of _VECx (#21605) b8722 Ruben Ortlam 2026-04-09 07:31:51 +02:00
  • 4293919068 common : skip non-primary GGUF split files when selecting model (#21633) b8721 Adrien Gallouët 2026-04-09 07:28:06 +02:00
  • cd0722594a state Ruben Ortlam 2026-04-09 07:25:33 +02:00
  • d12cc3d1ca CUDA: also store node->src->data ptrs for equality check (#21635) b8720 Aman Gupta 2026-04-09 01:01:56 +08:00
  • 2dcb7f74ed fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592) b8719 RealOrko 2026-04-08 16:40:15 +01:00
  • 660600081f server: respect the ignore eos flag (#21203) b8718 Yuri Khrustalev 2026-04-08 11:12:15 -04:00
  • d9a12c82f0 vocab : remove </s> eog token if gemma4 (#21492) b8717 Aldehir Rojas 2026-04-08 09:53:06 -05:00
  • 4a05e0c566 webui : send both backend_sampling == false/true (#18781) Georgi Gerganov 2026-04-08 17:35:52 +03:00
  • e9fd96283d Propose fix a couple of typos (#21581) b8715 John Eismeier 2026-04-08 10:29:03 -04:00
  • 3ba12fed0a kv-cache : extend cache quantization checks (#21586) b8714 Erik Scholz 2026-04-08 15:08:57 +02:00
  • 5473949070 webgpu : Query for adapter support when registering WebGPU backend (#21579) b8713 Reese Levine 2026-04-08 06:08:29 -07:00
  • dcdcbad42a metal: Q1_0 backend (#21528) b8712 Pasha Khosravi 2026-04-08 06:07:47 -07:00