Commit Graph

  • 723c71064d vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (#19921) b8168 Ruben Ortlam 2026-02-26 19:11:04 +01:00
  • 37964f44f9 mtmd : fix padding of n_tokens (#19930) b8167 Georgi Gerganov 2026-02-26 18:39:49 +02:00
  • 01cd448b8c server : fix ctx checkpoint restore logic (#19924) b8166 Georgi Gerganov 2026-02-26 18:20:16 +02:00
  • 99bd67c9b2 kv-cache : fix can_shift() check to take into account M-RoPE (#19928) b8165 Georgi Gerganov 2026-02-26 18:08:54 +02:00
  • b68d75165a llama: Add option to merge gate and exp weights (#19139) b8164 Aman Gupta 2026-02-26 21:01:08 +08:00
  • ffaafde16f ggml-virtgpu: improve the reliability of the code (#19846) b8163 Kevin Pouget 2026-02-26 13:00:57 +01:00
  • efba35a860 server: fix load-on-startup not respected in ini file (#19897) b8162 drrros 2026-02-26 14:32:31 +03:00
  • 9b62913b40 jinja : correct default size for string slices (#19913) b8161 Eric Zhang 2026-02-26 19:28:09 +08:00
  • 66287bdaac model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826) Maximilian Werk 2026-02-26 12:14:09 +01:00
  • 1ca3d1de15 gguf : avoid too many file size calls (#19919) b8159 Georgi Gerganov 2026-02-26 12:46:32 +02:00
  • bd72300591 server : fix typo in server README.md (#19900) yggdrasil75 2026-02-26 05:26:16 -05:00
  • 2943210c1e support permuted, remove check s0/s10 (#19889) b8157 Neo Zhang 2026-02-26 10:27:20 +08:00
  • 3769fe6eb7 vulkan: check for memory overlap before doing fusion (#19768) b8156 Jeff Bolz 2026-02-25 11:25:38 -06:00
  • 832aa94762 common : add more aliases for sampler CLI params (#19797) b8155 ddh0 2026-02-25 09:34:25 -06:00
  • 3af34b9ff5 ci : update the ROCm/HIP toolchain versions [no ci] (#19891) Slobodan Josic 2026-02-25 15:54:49 +01:00
  • f20469d919 server : enable multi-modal prompt caching (#19877) b8153 Georgi Gerganov 2026-02-25 15:15:42 +02:00
  • d7d826b3c1 server : support multi-modal context checkpoints (#19849) b8152 Georgi Gerganov 2026-02-25 15:14:27 +02:00
  • c747294b2d scripts: update corpus of compare-logprobs (#19326) Xuan-Son Nguyen 2026-02-25 12:57:34 +01:00
  • 8fdf269dad ci : update Windows ROCm build to 26.Q1 [no ci] (#19810) Mario Limonciello 2026-02-25 05:30:19 -06:00
  • a96a1120b4 gguf : fix ftell/fseek for Windows (#19870) b8149 Aldehir Rojas 2026-02-24 22:58:11 -06:00
  • 244641955f models : fix graph splits (#19866) b8148 Georgi Gerganov 2026-02-25 00:01:13 +02:00
  • 47eb12b953 server: fix query params lost when proxying requests in multi-model router mode (#19854) b8147 Pascal 2026-02-24 21:46:06 +01:00
  • 418dea39ce ggml/gguf : prevent integer overflows (#19856) b8146 Georgi Gerganov 2026-02-24 20:17:11 +02:00
  • da426cb250 model : update label for LFM2-24B-A2B (#19848) b8145 Tarek Dakhran 2026-02-24 14:27:42 +01:00
  • c830f99cfa server : support max_completion_tokens request property (#19831) b8144 Radoslav Gerganov 2026-02-24 10:30:00 +02:00
  • aa6f918c1c Vulkan Scalar Flash Attention Refactor (#19625) b8143 Ruben Ortlam 2026-02-24 08:35:48 +01:00
  • 8c2c0108dd vulkan: fix coopmat1 without bf16 support (#19793) b8142 Jeff Bolz 2026-02-24 00:48:32 -06:00
  • 3ea5360c00 vulkan: fix data race in mul_mat_id shader (#19790) b8141 Jeff Bolz 2026-02-24 00:43:12 -06:00
  • 39fb81f875 hexagon refactor all Ops to use local context struct (#19819) b8140 Max Krasnyansky 2026-02-23 16:32:14 -08:00
  • 5eb0ea32f0 feat: Add code blocks full height setting to parameter sync service (#19835) Aleksander Grygier 2026-02-23 22:30:13 +01:00
  • b68a83e641 vendor : update cpp-httplib to 0.34.0 (#19830) b8138 Adrien Gallouët 2026-02-23 21:05:48 +01:00
  • d8aeb65cee tests : fix typos in comments in test-backend-sampler [no ci] (#19824) Daniel Bevenius 2026-02-23 17:12:02 +01:00
  • 9051663d5d webui: Add setting to have full height Code Blocks in Chat Messages (#19829) Aleksander Grygier 2026-02-23 14:16:50 +01:00
  • 72b44c0d21 model-conversion : merge inspect-org-model.py with tensor-info.py (#19823) Daniel Bevenius 2026-02-23 14:15:16 +01:00
  • b8ab2cc559 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-02-23 14:47:03 +02:00
  • bc160d3582 ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod) (#19356) Alberto Cabrera Pérez 2026-02-23 12:42:52 +00:00
  • 4b436e4e5e flake8 fix ci-tmp Sigbjørn Skjæret 2026-02-23 11:48:01 +01:00
  • 2b6dfe824d llama : remove write/read of output ids/logits/embeddings (#18862) b8133 Daniel Bevenius 2026-02-23 07:04:30 +01:00
  • e8e261699a cli : provide model with text filename (#19783) b8132 Sigbjørn Skjæret 2026-02-22 22:33:49 +01:00
  • 5452d736f8 jinja: correct stats for tojson and string filters (#19785) b8131 Xuan-Son Nguyen 2026-02-22 21:08:23 +01:00
  • a6d3e9a239 ggml : relax asseerts for ggml_get_type_traits() Georgi Gerganov 2026-02-22 21:37:58 +02:00
  • 9c5d8dec37 gguf : add file size check for arrays Georgi Gerganov 2026-02-22 21:36:56 +02:00
  • c76408dbb9 gguf : add mem_size overflow test Georgi Gerganov 2026-02-22 18:40:06 +02:00
  • ed4837891d common : fix improper trimming in XML parser on complete message (#19805) b8130 Aldehir Rojas 2026-02-22 10:34:54 -06:00
  • cacc371f99 Fix wrong cli-argument in documentation (#19804) Kilian Krampf 2026-02-22 16:26:33 +01:00
  • ae2368e74e model : add Kanana-2 model support (#19803) b8128 HelloKS 2026-02-23 00:15:02 +09:00
  • 9f0684f003 ci : fix rocm archive name [no ci] (#19808) Sigbjørn Skjæret 2026-02-22 16:14:37 +01:00
  • c79698f28a ggml : relax ggml_type asserts to debug-only Georgi Gerganov 2026-02-22 16:32:39 +02:00
  • 45250db0f8 ggml : remove deprecated ggml_type_sizef() Georgi Gerganov 2026-02-22 16:23:57 +02:00
  • dfac6caa40 ggml : print values when overflow Georgi Gerganov 2026-02-22 16:09:53 +02:00
  • 327e2ca6f2 gguf : minor print fix Georgi Gerganov 2026-02-22 16:01:29 +02:00
  • 09788740f3 gguf : fix ctx size for no_alloc == true Georgi Gerganov 2026-02-22 15:54:44 +02:00
  • 4e89ec67fa gguf : better name Georgi Gerganov 2026-02-22 15:47:01 +02:00
  • 34ec1c3f18 server : merge contiguous Responses input items into a single assistant message (#19773) b8126 Aldehir Rojas 2026-02-22 07:11:31 -06:00
  • 46a9a0656a enforce proper alignment in add_custom_alignment Sigbjørn Skjæret 2026-02-22 10:38:58 +01:00
  • f2ac3ef57e py : restore tensor_fields Georgi Gerganov 2026-02-22 10:54:14 +02:00
  • 12c719b3f1 gguf-py : error on duplicate keys when reading Georgi Gerganov 2026-02-22 09:47:43 +02:00
  • 5d67acd422 ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object Georgi Gerganov 2026-02-22 09:46:54 +02:00
  • e877ad8bd9 ci : fix rocm release path [no ci] (#19784) Sigbjørn Skjæret 2026-02-22 08:07:46 +01:00
  • 35715657cb Update ROCm docker container to 7.2 release (#19418) b8124 Mario Limonciello 2026-02-21 14:53:39 -06:00
  • f75c4e8bf5 Add a build target to generate ROCm artifacts using ROCm 7.2 (#19433) b8123 Mario Limonciello 2026-02-21 12:56:26 -06:00
  • 99156f3a5f vendor : update cpp-httplib to 0.33.1 (#19778) b8122 Adrien Gallouët 2026-02-21 19:12:31 +01:00
  • a0c91e8f9f Improve CUDA graph capture (#19754) b8121 Gaurav Garg 2026-02-21 15:09:36 +05:30
  • 07968d53e4 fix: UI single model selection in router mode (#19767) crsawyer 2026-02-21 02:28:39 -06:00
  • ba3b9c8844 hexagon : fix build release (#19444) (#19587) b8119 Mengsheng Wu 2026-02-20 16:40:00 -08:00
  • 94b0200a01 common : merge qwen3-coder and nemotron nano 3 parsers (#19765) b8118 Aldehir Rojas 2026-02-20 16:22:22 -06:00
  • 9fea2434af eagle3: fix model convert code format ruixiangw 2026-02-20 18:05:49 +00:00
  • b3537924ef eagle3: fix model convert issue ruixiangw 2026-02-20 17:54:08 +00:00
  • b908baf182 ggml-cpu: add RVV vec dot kernels for quantization types (#18784) b8117 Taimur Ahmad 2026-02-20 16:30:07 +05:00
  • 492bc31978 quantize : add --dry-run option (#19526) b8116 ddh0 2026-02-20 02:20:16 -06:00
  • 77d6ae4ac8 test: mul_mat tests with huge batch size (#19519) b8115 Jeff Bolz 2026-02-19 18:08:25 -08:00
  • 10b26ee23a WebUI hide models in router mode (#19374) crsawyer 2026-02-19 15:53:42 -06:00
  • 3dadc88b58 common : fix Step-3.5-Flash format detection and thinking support (#19635) b8113 Jesse Posner 2026-02-19 13:40:52 -08:00
  • 39e4b1dc9b common : fix gpt-oss Jinja error when assistant message has both content and thinking with tool calls (#19704) b8112 abhijitb11 2026-02-19 12:59:20 -08:00
  • 11c325c6e0 ggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support. (#19700) b8111 Masashi Yoshimura 2026-02-20 01:18:30 +09:00
  • 237958db33 model: Add PaddleOCR-VL model support (#18825) b8110 megemini 2026-02-20 00:05:25 +08:00
  • d97dd299a0 py : assert that alignment is non-zero power of 2 Georgi Gerganov 2026-02-19 16:44:43 +02:00
  • 2e23292cfe ggml : fix negative tensor type oob Georgi Gerganov 2026-02-19 16:42:46 +02:00
  • 7babe5fb13 gguf : prevent array elements exhaustion Georgi Gerganov 2026-02-19 16:26:54 +02:00
  • 357b8e50f1 gguf : prevent string exhaustion Georgi Gerganov 2026-02-19 16:08:04 +02:00
  • abb9f3c42b vulkan: fix MMQ shader push constants and multi-dispatch (#19732) b8109 Ruben Ortlam 2026-02-19 14:59:16 +01:00
  • 69788e0d23 ggml : fix int overflows in ggml_new_object() Georgi Gerganov 2026-02-19 15:59:09 +02:00
  • 198f79d6c3 gguf : prevent integer overflow for ggml_context mem size Georgi Gerganov 2026-02-19 15:51:00 +02:00
  • da348c9dfb models : fix qwen3.5 beta/gate shapes (#19730) b8108 Georgi Gerganov 2026-02-19 15:19:53 +02:00
  • e6267a9359 mtmd: build_attn modified, flash_attn on/off via ctx_params (#19729) b8107 Saba Fallah 2026-02-19 13:50:29 +01:00
  • 2bf318fd2f model : add JAIS-2 architecture support (#19488) b8106 3 a l i 2026-02-19 16:30:17 +04:00
  • c78e682245 CUDA: fix kernel selection logic for tile FA (#19686) b8105 Johannes Gäßler 2026-02-19 12:42:58 +01:00
  • c5897995a7 mtmd : chat : Fix extra \n between text and media marker (#19595) b8104 Tarek Dakhran 2026-02-19 12:18:57 +01:00
  • 03fd9d3bb4 webui: Fix Attachments not being included in completion request (#19731) Aleksander Grygier 2026-02-19 10:27:38 +01:00
  • 8004f3a8d1 model : add tokenizer from LFM2.5-Audio-1.5B (#19687) b8102 Tarek Dakhran 2026-02-19 09:54:48 +01:00
  • eacb4b67a2 llama : use output_resolve_row() in get_logits_ith/get_embeddings_ith (#19663) b8101 Daniel Bevenius 2026-02-19 09:48:08 +01:00
  • c0d0430340 model : full modern bert support (#18330) b8100 Ryan Mangeno 2026-02-19 02:52:21 -05:00
  • 3bb2fcc856 llamafile: powerpc: add FP16 MMA path for Q4/Q8 matmul (#19709) b8099 shalinib-ibm 2026-02-19 11:58:53 +05:30
  • 27326bfce1 models : dedup qwen35 graphs (#19660) b8098 Georgi Gerganov 2026-02-19 08:17:49 +02:00
  • ad9f692f8f models : dedup Kimi Linear delta net implementation (#19668) ymcki 2026-02-19 14:15:17 +08:00
  • 8a70973557 Add Jinja support for "indent" string filter (#19529) b8096 Piotr Wilkin (ilintar) 2026-02-19 00:25:52 +01:00
  • e7f2f95c9a ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (#19535) b8095 Reese Levine 2026-02-18 16:06:29 -07:00
  • b55dcdef5d server: save generated text for the /slots endpoint (for LLAMA_SERVER_SLOTS_DEBUG=1) (#19622) b8094 matteo 2026-02-18 18:53:37 +01:00
  • eeef3cfced model: support GLM-OCR (#19677) b8093 Xuan-Son Nguyen 2026-02-18 17:51:40 +01:00
  • e99f1083a0 docs: Fix broken links for preparing models in Backends (#19684) Maciej Lisowski 2026-02-18 16:50:23 +01:00