Commit Graph

  • 5d56effdee convert : add support for Nemotron Nano 3 Omni (#22481) Daniel Bevenius 2026-04-28 19:17:57 +02:00
  • 52e5f0a5c1 common : re-arm reasoning budget after DONE on new <think> (#22323) b8964 Jillis ter Hove 2026-04-28 19:15:36 +02:00
  • f9f33654a6 vulkan: Coalesce Q4_K/Q5_K scale loads (#21751) b8963 Matt Corallo 2026-04-28 15:31:04 +00:00
  • 98bb57916a ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing logic (#22456) b8962 Reese Levine 2026-04-28 07:27:17 -07:00
  • f42e29fdf1 webui: Server tools (#21237) Aleksander Grygier 2026-04-28 14:35:49 +03:00
  • 19821178be vulkan: add barrier after writetimestamp (#21865) b8960 Jeff Bolz 2026-04-28 12:28:12 +02:00
  • 698d19b93c ggml: improve SPIR-V headers detection with __has_include (#21918) Emil Askerov 2026-04-28 13:19:06 +03:00
  • 50494a2800 ggml : skip already registered backends and devices (#22296) b8958 Adrien Gallouët 2026-04-28 09:02:32 +02:00
  • d530d6e7a2 ggml : revert to -lm linking instead of find_library (#22355) b8957 Adrien Gallouët 2026-04-28 08:56:02 +02:00
  • c3e08f4700 CANN: add new ops, optimize existing ops (#21204) b8956 hipudding 2026-04-28 14:27:22 +08:00
  • 14e733e36f spec : refactor params (#22397) b8955 Georgi Gerganov 2026-04-28 09:07:33 +03:00
  • 516e8d7a8a server: use pos_next instead of n_tokens for m-rope (#22439) b8954 Aman Gupta 2026-04-28 13:41:00 +08:00
  • 434b2a1ff6 ggml-webgpu: add Q1_0 support (#22374) b8953 Rithik Sharma 2026-04-27 15:50:59 -07:00
  • 983ca8992e server: (router) Forward form-data to model server (Fixes #22044) (#22118) b8952 tha80 2026-04-27 23:55:00 +02:00
  • 665abc6097 add fast mat-vec kernels for i-quants (#22344) b8951 Rithik Sharma 2026-04-27 08:25:45 -07:00
  • 4414c04b9a Additional test for common/gemma4 : handle parsing edge cases (#22420) b8950 Igor Rudenko 2026-04-27 17:36:59 +03:00
  • ceaf47c4b1 fix: rpc-server cache may not work in Windows environments (#22394) b8949 unraido 2026-04-27 23:25:09 +09:00
  • 42401c72b8 Fix type casting for unaccounted memory calculation (#22424) b8948 rankaiyx 2026-04-27 20:31:13 +08:00
  • e940b3d468 download : prefer q8_0 when q4_k not available (#22428) b8947 Georgi Gerganov 2026-04-27 15:30:29 +03:00
  • fd6f79c7a4 download : prefer q8_0 when q4_k not available gg/download-prefer-q8 Georgi Gerganov 2026-04-27 12:08:25 +03:00
  • 0f1bb602dd model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (#22421) b8946 ynankani 2026-04-27 07:58:48 +00:00
  • d13540becd convert : remove input_scale for dequantized fp8 modelopt (#22356) Sigbjørn Skjæret 2026-04-27 08:45:01 +02:00
  • f84270ea10 ggml : use 64 bytes aligned tile buffers (#21058) b8944 Adrien Gallouët 2026-04-27 08:30:55 +02:00
  • 5594d13224 common: fix missing exports in llama-common (#22340) b8943 Max Krasnyansky 2026-04-26 22:06:39 -07:00
  • f535774325 pr2wt : symlink .pi (#22386) Georgi Gerganov 2026-04-26 19:49:26 +03:00
  • 06a811d085 add performance-portable tuning for register-tile and subgroup matmul (#22241) b8941 Rithik Sharma 2026-04-26 09:26:28 -07:00
  • 78433f606f Fix recurrent state serialization for partial reads and writes (#22362) b8940 Gaurav Garg 2026-04-26 17:04:40 +05:30
  • 7ec36aa861 Github: set meta backend code owner (#22388) Johannes Gäßler 2026-04-26 13:34:13 +02:00
  • b1a5bd4e0c CUDA: better coalesce data-access for contiguous concat (#22330) Oliver Simons 2026-04-26 09:21:45 +02:00
  • cb9fc575e4 common : use pimpl in debug.h to reduce header dependencies pr/22340-gg Georgi Gerganov 2026-04-26 09:45:41 +03:00
  • 68adf99ff7 cont : cleanup Georgi Gerganov 2026-04-26 09:39:29 +03:00
  • 0c6ee1cade ggml-cpu : re-enable fast gelu_quick_f16 (#22339) b8937 Sigbjørn Skjæret 2026-04-26 08:28:14 +02:00
  • 2dd84169d1 ggml-cpu: optimize avx2 q6_k (#22345) b8936 Eve 2026-04-26 06:27:50 +00:00
  • f454bd7eb8 opencl: add iq4_nl support (#22272) b8935 lhez 2026-04-25 21:21:58 -07:00
  • b760272f1a hexagon: guard HMX clock request for v75+ platforms (#22377) b8934 Trivikram Reddy 2026-04-25 19:58:26 -05:00
  • 38d762d8fc common: refactor common/debug to move abort_on_nan into base_callback_data Max Krasnyansky 2026-04-25 16:48:16 -07:00
  • dcad77cc3b chat: fix handling of space in reasoning markers (#22353) b8933 Piotr Wilkin (ilintar) 2026-04-25 21:24:13 +02:00
  • 98dc1418ea spec : fix vocab compat checks (#22358) Georgi Gerganov 2026-04-25 20:11:35 +03:00
  • 9725a313be CUDA: reduce MMQ stream-k overhead (#22298) b8931 Johannes Gäßler 2026-04-25 14:15:03 +02:00
  • d1649047a3 metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962) Developer-Ecosystem-Engineering 2026-04-25 05:14:28 -07:00
  • 9d34231bb8 llama-quant : default ftype param Q5_1 --> Q8_0 (#20828) b8929 ddh0 2026-04-25 01:25:35 -05:00
  • 8ea8fee966 gitignore : add .pi + personal SYSTEM.md (#22316) Georgi Gerganov 2026-04-25 09:20:45 +03:00
  • eddd7a13a5 [SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291) b8927 Neo Zhang 2026-04-25 14:20:14 +08:00
  • dd2914dc81 ggml-webgpu: support for SSM_SCAN and disable set_rows error checking (#22327) b8926 Reese Levine 2026-04-24 23:18:15 -07:00
  • 0adede866d parser: fix structured output bug (#22302) b8925 Piotr Wilkin (ilintar) 2026-04-24 23:19:55 +02:00
  • 361fe72acb Hexagon: Bump HMX Frequency to Max Corner (#22334) b8924 Trivikram Reddy 2026-04-24 15:55:17 -05:00
  • a702f39597 CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303) Shreya Jain 2026-04-24 12:21:36 -07:00
  • 13d36cf891 ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix (#22199) b8922 Zheyuan Chen 2026-04-24 10:39:09 -07:00
  • f65bc34c68 hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306) Mengsheng Wu 2026-04-25 00:21:33 +08:00
  • 91b03e4c93 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-04-24 14:20:12 +03:00
  • 15fa3c493b metal : print GPU description (#22318) b8920 Georgi Gerganov 2026-04-24 13:56:03 +03:00
  • dc80c5252a common : fix jinja warnings with clang 21 (#22313) b8919 Adrien Gallouët 2026-04-24 12:36:02 +02:00
  • e583f3b4f5 ggml : minor coding style (#22308) b8918 Georgi Gerganov 2026-04-24 11:02:00 +03:00
  • 017f090442 jinja : remove unused header (#22310) b8917 Georgi Gerganov 2026-04-24 11:01:46 +03:00
  • ffdd983fb8 server : fix swa-full logic (#22288) b8916 Georgi Gerganov 2026-04-24 10:17:37 +03:00
  • 793d0a7931 server: rename debug tags to match --cache-idle-slots naming (#22292) Yes You Can Have Your Own 2026-04-24 09:28:44 +03:00
  • 8bc492ebb4 hexagon: add SOLVE_TRI op (#21974) b8914 Mengsheng Wu 2026-04-24 09:39:13 +08:00
  • e5f070a1dc fix(shader): handle the buffer aliasing for rms fuse (#22266) b8913 Chen Yuan 2026-04-23 19:32:59 -04:00
  • fa0b8a70a8 cli: Remove redundant local sampling variables (#20429) (#22264) b8912 Ethan Turner 2026-04-23 15:53:23 -07:00
  • 5d2b52d80d hexagon: add support for basic and extended Op profiling (#22269) b8911 Max Krasnyansky 2026-04-23 14:17:21 -07:00
  • 187a456370 Enable testing on Snapdragon devices (#21051) Shreya Jain 2026-04-23 13:08:10 -07:00
  • 185cbff6f1 server : convert_anthropic_to_oai: also copy chat_template_kwargs (#22154) b8909 srkizer 2026-04-24 03:32:46 +09:00
  • c78fb909b2 server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) (#22267) b8908 Song Li 2026-04-23 12:39:07 -04:00
  • 12568ca8c8 vendor : update LibreSSL to 4.3.1 (#22285) b8907 Adrien Gallouët 2026-04-23 17:45:56 +02:00
  • c807c6e3b0 server: (anthropic API) fix prefix caching (#21793) b8906 kvc0 2026-04-23 08:45:02 -07:00
  • 0949beb5a3 fix build number for sycl release (#22283) b8905 Sigbjørn Skjæret 2026-04-23 15:38:58 +02:00
  • 9012c50fc8 model-conversion : fix mmproj output file name [no ci] (#22274) Daniel Bevenius 2026-04-23 15:07:38 +02:00
  • 0dd7f915fd cli : cleanup auto-completion code (#21745) Matthias Straka 2026-04-23 15:03:28 +02:00
  • 550d684bd1 server: Enable transcriptions API for LFM2-Audio (#22000) b8902 Tarek Dakhran 2026-04-23 10:47:26 +02:00
  • b9421898b6 add for Q4_0 opt_arc770_q4_0 arthw 2026-04-23 15:33:19 +08:00
  • 8635e221c8 metal : fix event synchronization (#22260) b8901 Georgi Gerganov 2026-04-23 08:22:49 +03:00
  • 930e0210d1 gitignore: add AGENTS.local.md (#22246) Georgi Gerganov 2026-04-23 08:22:24 +03:00
  • 96c1db26c4 ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (#22239) Georgi Gerganov 2026-04-23 08:22:08 +03:00
  • 4ead6fd957 [SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24 package. (#22078) Neo Zhang Jianyu 2026-04-23 13:21:36 +08:00
  • 5eaee65384 convert : Handle ModelOpt produced mixed precision model during convert to GGUF (#22247) ynankani 2026-04-23 05:19:51 +00:00
  • 60b68a6279 sycl : fused MoE mul_mat_vec_q for TG (#21920) abotsis 2026-04-22 23:18:56 -06:00
  • b76429a69c ggml-webgpu: add support for im2col (#22259) b8895 Chen Yuan 2026-04-22 23:17:41 -04:00
  • 86db42e97f CUDA: fuse relu + sqr (#22249) Anav Prasad 2026-04-23 02:28:56 +00:00
  • 6217b49583 HIP: flip GGML_HIP_GRAPHS to default on (#22254) b8893 uvos 2026-04-23 02:34:31 +02:00
  • 0d0764dfd2 [WebGPU] Implement async tensor api and event api (#22099) b8892 Nikhil Jain 2026-04-22 10:52:01 -07:00
  • 6da7168312 ggml-webgpu: Add fused RMS_NORM + MUL (#21983) b8891 Masashi Yoshimura 2026-04-23 02:51:40 +09:00
  • 8bccdbbff9 chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs (#22217) b8890 Piotr Wilkin (ilintar) 2026-04-22 18:10:56 +02:00
  • bcb5eeb645 speculative-simple : add checkpoint support (#22227) b8889 Georgi Gerganov 2026-04-22 15:44:45 +03:00
  • 225088ea76 sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (#22119) b8888 Akarshan Biswas 2026-04-22 18:02:56 +05:30
  • 82d3f4d3b2 mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242) b8887 Xuan-Son Nguyen 2026-04-22 12:16:29 +02:00
  • 17f6245168 server: ignore reasoning content from transcription api (#21905) b8886 Xuan-Son Nguyen 2026-04-22 12:10:50 +02:00
  • 7bfe60fdf9 mtmd, llama : Update HunyuanVL vision-language model support (#22037) b8885 manayang 2026-04-22 17:58:43 +08:00
  • 750579ff14 common: Refactoring sampler parameters (#20429) (#22233) b8884 Ethan Turner 2026-04-22 01:40:19 -07:00
  • 134d6e54d4 common/chat, server: refactor, move all conversion functions to common, add tests (#20690) b8883 Piotr Wilkin (ilintar) 2026-04-22 10:28:45 +02:00
  • a5355a0226 server: keep router model refcount to avoid unloading models that have running requests 0cc4m/server-router-fix-reload-deadlock Ruben Ortlam 2026-04-16 13:40:13 +02:00
  • ca7f7b7b94 ggml-webgpu(shader): support conv2d kernels. (#21964) b8882 Chen Yuan 2026-04-21 23:18:57 -04:00
  • 0dedb9ef7a hexagon: add support for FILL op (#22198) b8881 Aparna M P 2026-04-22 04:54:20 +05:30
  • 2799d933b5 ggml-webgpu: reset CPU/GPU profiling time when freeing context (#22050) b8880 Masashi Yoshimura 2026-04-22 08:05:21 +09:00
  • 04fe84b69d server: allow cancel loading model (#21814) Xuan-Son Nguyen 2026-04-22 00:26:09 +02:00
  • 5a4cd6741f Hexagon: DAIG op (#22195) b8878 Shreya Jain 2026-04-21 14:16:04 -07:00
  • 2248799a58 hexagon: fix missing v79 entry in libggml-htp.inf (#22194) Mengsheng Wu 2026-04-22 04:53:44 +08:00
  • 72d693e4fb spec : reset i_last when low acceptance streak occurs (#22168) b8876 Paul Dubs 2026-04-21 20:29:07 +02:00
  • 98d2d2884e mtmd: Add support for Reka Edge 2603 (#21616) b8875 Kwa Jie Hao 2026-04-22 02:02:49 +08:00
  • 84652b80cf arg : add --spec-default (#22223) b8874 Georgi Gerganov 2026-04-21 19:52:02 +03:00
  • 52f1096f21 openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944) b8873 Zijun Yu 2026-04-21 23:58:34 +08:00