Commit Graph

  • 3fab96cd04 ci : disable self-hosted mac jobs (#20985) Georgi Gerganov 2026-03-25 14:46:40 +02:00
  • 914eb5ff0c jinja: fix macro with kwargs (#20960) b8519 Xuan-Son Nguyen 2026-03-25 12:22:48 +01:00
  • 8fc17493c3 gguf-split : clarify operation of gguf-split (#19749) Francisco Herrera 2026-03-25 06:12:50 -05:00
  • 36dafba5c4 llama: fix llama-model-saver (#20503) b8517 Johannes Gäßler 2026-03-25 11:53:16 +01:00
  • 69e0ecef06 webui: Fix editing assistant message without branching (#20944) Aleksander Grygier 2026-03-25 11:47:33 +01:00
  • 062cca58fc Add SLEEPING status to the WebUI model selector (#20949) Pascal 2026-03-25 11:02:32 +01:00
  • 406f4e3f61 android : fix-pointer-dangling (#20974) b8514 yikechayedan 2026-03-25 17:51:26 +08:00
  • 53dc8b59bf sycl : fix wrong variable check by assert (#20903) b8513 Neo Zhang 2026-03-25 17:48:37 +08:00
  • 403c9c9cef ci : bump gguf publish python version (#20982) Sigbjørn Skjæret 2026-03-25 10:04:59 +01:00
  • 8fc85db9d2 ci : limit requirements versions (#20980) Sigbjørn Skjæret 2026-03-25 09:55:37 +01:00
  • 3a60d06ad9 convert : register Qwen3Model architecture (#20967) Dowon 2026-03-25 17:37:59 +09:00
  • abd86ef175 docs : Update OpenVINO backend docs (#20968) Ravi Panchumarthy 2026-03-25 01:33:51 -07:00
  • 07a6fd8775 kleidiai: removed cpu feature detection from CI run script pr/20394 Martin Klacer 2026-03-24 17:24:41 +00:00
  • 9f102a1407 models : move the token embedding norms to the first layer (#20943) b8508 Georgi Gerganov 2026-03-24 17:00:30 +02:00
  • df488da9ac fix double semicolon 0cc4m/vulkan-repack Ruben Ortlam 2026-03-24 13:57:56 +01:00
  • 63f85ed27e add coopmat2 support Ruben Ortlam 2026-03-24 13:57:40 +01:00
  • 3fc6f1aed1 ggml-backend: re-enable graph reuse with pipeline parallelism (#20927) b8507 Aman Gupta 2026-03-24 20:47:00 +08:00
  • 29771a0a4c vendor : update cpp-httplib to 0.39.0 (#20933) b8506 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-24 09:33:33 -03:00
  • 42ebce3beb common : fix get_gguf_split_info (#20946) b8505 Adrien Gallouët 2026-03-24 13:33:14 +01:00
  • a94fdb090a WebUI: fix edit msg form textarea height (#20830) BlueMöhre 2026-03-24 13:17:45 +01:00
  • c9dc43333f readme : clarify MODEL_ENDPOINT usage (#20941) Adrien Gallouët 2026-03-24 10:35:07 +01:00
  • 2d2d9c2062 common : add a WARNING for HF cache migration (#20935) b8502 Adrien Gallouët 2026-03-24 09:24:39 +01:00
  • 92080b4396 metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (#20930) b8501 nuri 2026-03-24 17:13:07 +09:00
  • 342d6125bc metal : add FA instantiations for HSK=512, HSV=512 (#20902) b8500 Georgi Gerganov 2026-03-24 10:03:09 +02:00
  • c2e224d829 issues: add openvino backends (#20932) Aaron Teo 2026-03-24 14:41:10 +08:00
  • 8c7957ca33 common : add standard Hugging Face cache support (#20775) b8498 Adrien Gallouët 2026-03-24 07:30:33 +01:00
  • e852eb4901 llama-fit: fix regex pattern for gate_up tensors (#20910) b8497 Aman Gupta 2026-03-24 12:57:57 +08:00
  • 312d870a89 common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) b8496 Aldehir Rojas 2026-03-23 22:21:47 -05:00
  • 7cadbfce10 hexagon: general DMA and Binary Op fixes for large strides (#20918) b8495 Max Krasnyansky 2026-03-23 15:33:49 -07:00
  • 1fb2290a51 Add codeowners for scripts/snapdragon and docs/snapdragon (#20915) Max Krasnyansky 2026-03-23 14:57:18 -07:00
  • 1772701f99 opencl: add q6_K gemm and gemv kernels for Adreno (#20089) b8493 lhez 2026-03-23 12:44:18 -07:00
  • 39bf0d3c6a rpc : RCE patch (#20908) b8492 las7 2026-03-23 10:54:57 -07:00
  • bd6992180b contrib: add "Requirements" section to PR template (#20841) Xuan-Son Nguyen 2026-03-23 16:59:02 +01:00
  • a4901b0477 vulkan: repack q4_0 into aligned arrays Ruben Ortlam 2026-03-23 14:57:06 +01:00
  • fd18364755 devops: upgraded default oneAPI version (#20731) Davi Henrique Linhares 2026-03-23 10:47:34 -03:00
  • 11fb11b901 webui: Improve chat form positioning (#20901) Aleksander Grygier 2026-03-23 14:30:55 +01:00
  • 35b662bb5d docs: Fix typo in reasoning flag documentation (#20780) Geo Maciolek 2026-03-23 09:24:55 -04:00
  • f93c09e267 memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta() (#20887) b8487 Georgi Gerganov 2026-03-23 14:08:46 +02:00
  • 841bc203e2 docs : rerun llama-gen-docs to include new CLI args (#20892) Eric Zhang 2026-03-23 19:33:38 +08:00
  • 31a5cf4c3f server: use httplib dynamic threads (#20817) b8485 Xuan-Son Nguyen 2026-03-23 12:22:46 +01:00
  • e32d243849 ai : update gh permissions (#20895) Georgi Gerganov 2026-03-23 13:21:41 +02:00
  • c44a932cf4 webui: fix --webui-config-file settings not applied on load (#20823) Pascal 2026-03-23 11:25:35 +01:00
  • 177c75852a metal: add CONV_3D (#19927) Rashid Ul Islam 2026-03-23 13:15:34 +05:30
  • 7a0b6a635e common/autoparser : detect reasoning markers when enable_thinking changes system prompt (#20859) Jhen-Jie Hong 2026-03-23 15:35:27 +08:00
  • 07ff000551 CANN: add RoPE cache preload before ACL graph capture (#20747) b8480 Chenguang Li 2026-03-23 15:24:06 +08:00
  • cc18f965b6 fix(openvino): explicit memset in buffer_context allocation (#20857) b8479 Dan Hoffman 2026-03-22 23:05:37 -07:00
  • 84ffd0c192 opencl: add flattened Q4_K mv and general Q4_K mm (#20773) b8478 shaofeiqi 2026-03-22 22:45:11 -07:00
  • ec2b787ebe mtmd: Add dynamic high-resolution image preprocessing for InternVL model (#20847) b8477 bssrdf 2026-03-22 20:06:30 -04:00
  • d3ac030a5d mtmd : fix LightOnOCR image preprocessing (#20877) b8476 DorianRudolph 2026-03-23 01:04:14 +01:00
  • 49bfddeca1 server: allow router to report child instances sleep status (#20849) b8475 Xuan-Son Nguyen 2026-03-22 18:33:52 +01:00
  • bd3f1d9d65 CUDA: fix BF16 FA compilation (#20865) b8474 Johannes Gäßler 2026-03-22 17:53:33 +01:00
  • 23c9182ce8 jinja : refactor token advancement (#20864) b8473 Sigbjørn Skjæret 2026-03-22 17:45:10 +01:00
  • 81bc4d3ddc server: fix Host header (#20843) b8472 Evgeny Kurnevsky 2026-03-22 15:29:22 +01:00
  • f40a80b4f3 support bf16 and quantized type (#20803) b8471 Neo Zhang 2026-03-22 22:06:27 +08:00
  • db9d8aa428 ggml-cuda: native bf16 flash attention for vec kernel (#20525) b8470 Patrick Buckley 2026-03-22 03:05:51 -07:00
  • ccb87fa3ee [CUDA] Increase number of output elements per-thread block if the K-dimension is small (#20635) b8469 Gaurav Garg 2026-03-22 14:19:35 +05:30
  • 3306dbaef7 misc : prefer ggml-org models in docs and examples (#20827) b8468 ddh0 2026-03-21 16:00:26 -05:00
  • 990e4d9698 common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) b8467 Andrea Arcangeli 2026-03-21 13:43:35 -04:00
  • 212f4521b0 context : use n_embd_out for pooled embedding extraction (#20840) b8466 Tom Hillbrunner 2026-03-21 18:35:00 +01:00
  • 568aec82d2 docs : explicit about banning accounts that violates policy (#19593) Xuan-Son Nguyen 2026-03-21 15:50:16 +01:00
  • 2bcdddd5e3 fix(rpc): prevent division by zero in deserialize_tensor (#20712) b8464 y198 2026-03-21 20:59:43 +07:00
  • eac9c6ea83 Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead of BF16 (#20730) Michael Wand 2026-03-21 04:35:21 -07:00
  • 29b28a9824 ci : switch from pyright to ty (#20826) Sigbjørn Skjæret 2026-03-21 08:54:34 +01:00
  • cea560f483 Add shader count for Intel Arc Pro B60 (#20818) b8461 Matt Corallo 2026-03-21 04:22:51 +00:00
  • b1c70e2e54 common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825) b8460 Piotr Wilkin (ilintar) 2026-03-21 00:19:04 +01:00
  • e6ec21e62f ggml-cpu: add always_inline to tinyBLAS_PPC accumulator saves (#20791) b8459 shalinib-ibm 2026-03-21 04:41:45 +05:30
  • 4cb7e0bd61 ai : limit runtime of the agent (#20816) Georgi Gerganov 2026-03-20 20:31:25 +02:00
  • 203eec25c0 releases : disable s390x builds gg/release-disable-s390x Georgi Gerganov 2026-03-20 19:31:25 +02:00
  • 149b2493c0 common : fix typo in debug log ('extracft' -> 'extract') (#20807) b8457 James O'Leary 2026-03-20 10:23:18 -07:00
  • b31b30f31d ai : do not run bash commands in the prompt (#20810) Georgi Gerganov 2026-03-20 19:06:33 +02:00
  • 58c81f7e81 model : fix Granite Hybrid type check for 7B.A1B (#20795) Victor Villar 2026-03-20 15:16:09 +01:00
  • fb78ad29bb server: (doc) clarify in-scope and out-scope features (#20794) Xuan-Son Nguyen 2026-03-20 14:03:50 +01:00
  • e06c3ab2bc vulkan: change gated_delta_net to shard a column across a subgroup (#20662) Jeff Bolz 2026-03-20 06:17:15 -05:00
  • dc6592431b context: zero output buffer on allocation (#20781) Ruikai Peng 2026-03-20 17:31:34 +08:00
  • 3adbef7776 model: assert nextn_predict_layers to prevent underflow (#20783) Ruikai Peng 2026-03-20 17:17:58 +08:00
  • ab9d4c3678 server : improve mtmd ctx checkpoints (#20726) Georgi Gerganov 2026-03-20 11:13:12 +02:00
  • 1af9dab32b CANN: add BF16 support for core operators (#20152) hipudding 2026-03-20 17:08:39 +08:00
  • 6d99b44c7e docs : fix Metal backend op support status in ops.md (#20779) Seyoung Jeong 2026-03-20 03:06:38 -06:00
  • 464fd0e71f ai : update find-related action (#20790) Georgi Gerganov 2026-03-20 10:28:14 +02:00
  • 21c8045214 jinja : fix heap OOB read in value equality comparison (#20782) Ruikai Peng 2026-03-20 14:15:17 +08:00
  • c46583b86b common/parser : fix out_of_range crash in throw path (#20424 regression) (#20777) b8445 James O'Leary 2026-03-19 18:37:22 -07:00
  • c1b911654a server: fix router mode deadlock on child crash and TOCTOU race in models_max (#20763) Ben Racicot 2026-03-19 17:16:05 -04:00
  • b739738dad docs: Update server README to reflect PR #20297 (#20560) Tomeamis 2026-03-19 21:28:44 +01:00
  • a0bbcdd9b6 ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (#20767) Sundaram krishnan 2026-03-20 01:06:23 +05:30
  • 6c72646a61 ci : improve action for duplicate issue (#20772) Georgi Gerganov 2026-03-19 21:11:53 +02:00
  • 340807273b hip: Avoid compiler bug in RDNA code generation during debug builds on Windows (#20655) Rail Chabdarov 2026-03-19 19:14:08 +01:00
  • 26c9ce1288 server: Add cached_tokens info to oaicompat responses (#19361) Ryan Goulden 2026-03-19 11:09:33 -07:00
  • 76f2dc70c3 chat : handle tool calls with no required args in TAG_WITH_TAGGED format (#20764) James O'Leary 2026-03-19 09:53:11 -07:00
  • 900efd531d ci : clarify gh command for viewing issues (#20766) Georgi Gerganov 2026-03-19 18:43:54 +02:00
  • 74c42ee1f4 hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (#20693) Yiwei Shao 2026-03-19 09:11:06 -07:00
  • b49d8b8757 ci : add hip quality check (#20430) uvos 2026-03-19 17:05:44 +01:00
  • 5e54d51b19 common/parser: add proper reasoning tag prefill reading (#20424) Piotr Wilkin (ilintar) 2026-03-19 16:58:21 +01:00
  • c1258830b2 ggml webgpu: ops support for qwen3.5 (SET, TRI_SOLVE, SSM_CONV, GATED_DELTA_NET) + GET_ROWS optimization (#20687) Reese Levine 2026-03-19 08:45:28 -07:00
  • 922b90e567 common : add LLAMA_ARG_SPEC_TYPE (#20744) ddh0 2026-03-19 10:16:55 -05:00
  • f071ce67c9 ci : add action for finding duplicate issues (#20756) Georgi Gerganov 2026-03-19 16:17:37 +02:00
  • 4065c1a3a6 Server becomes the source of truth for sampling parameter defaults (#20558) Pascal 2026-03-19 13:20:39 +01:00
  • 1e64534570 mtmd: add clip_graph::build_mm() (#20751) b8429 Xuan-Son Nguyen 2026-03-19 13:11:39 +01:00
  • cd708db0cc WebUI: Persist the on/off state of the MCP servers for new conversations (#20750) Pascal 2026-03-19 12:54:06 +01:00
  • 512bba6ee0 webui: Improve model parsing logic + add unit tests (#20749) Aleksander Grygier 2026-03-19 12:25:50 +01:00
  • b486c17b3e convert : support is_causal hyperparameter (#20746) Dowon 2026-03-19 19:41:11 +09:00