Commit Graph

  • 1b9bbaa357 common : fix gpt-oss content removal (#20745) b8425 Aldehir Rojas 2026-03-19 05:40:39 -05:00
  • 07feeaa92e vulkan: dequantize iq4_xs 4 at a time (#20657) b8424 Eve 2026-03-19 10:32:04 +00:00
  • 3fee84e156 cmake : fix build warning when kleidiai is enabled (#20457) b8423 Charles Xu 2026-03-19 09:14:48 +01:00
  • 811397745e vocab : assert array size of scores and toktypes (#20737) b8422 Sigbjørn Skjæret 2026-03-19 08:34:04 +01:00
  • c014c3f83a docs: add information about openvino in the docker page (#20743) Kevin Hannon 2026-03-19 03:08:47 -04:00
  • 7f2cbd9a4d CANN: handle in-place ROPE on non-contiguous f32 tensors (#20274) b8420 Chenguang Li 2026-03-19 14:05:01 +08:00
  • 509a31d00f ggml-webgpu: Update the RMS_NORM preprocessor and add L2_NORM (#20665) b8419 Masashi Yoshimura 2026-03-19 13:08:59 +09:00
  • ea01d196d7 ggml-webgpu: Add supports for DIAG and TRI (#20664) b8418 Masashi Yoshimura 2026-03-19 13:08:35 +09:00
  • 07ba6d275b CANN: support flash attention for head dim not multiple of 16, fix ALiBi slope offset (#20031) b8417 Chenguang Li 2026-03-19 11:02:42 +08:00
  • 6729d4920c model : add control vector support where missing (#20653) b8416 Michael Grau 2026-03-18 23:25:12 +01:00
  • d13d60af1d gguf-py : cleaner way to get the first key (#20727) Sigbjørn Skjæret 2026-03-18 23:21:42 +01:00
  • 5744d7ec43 Rebuild index.html.gz (#20724) crsawyer 2026-03-18 12:49:57 -05:00
  • 8ced5f41f9 Move to no timeout for WaitAny in graph submission to avoid deadlocks in some cases on llvm-pipe backends (#20618) b8413 Reese Levine 2026-03-18 10:23:47 -07:00
  • 78d550b541 ggml-cpu/x86: fix unused changemask warning in repack (#20692) b8412 Shaw Nguyen 2026-03-18 23:45:06 +07:00
  • 4efd326e71 sync : ggml b8411 Georgi Gerganov 2026-03-18 14:45:54 +02:00
  • b08f7322ee ggml : bump version to 0.9.8 (ggml/1442) Georgi Gerganov 2026-03-16 20:15:14 +02:00
  • 79187f2fb8 ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441) Georgi Gerganov 2026-03-16 20:09:25 +02:00
  • 48e61238e1 webui: improve tooltip wording for attachment requirements (#20688) Julien Chaumond 2026-03-18 14:01:02 +01:00
  • 4af94d9afe gguf : fix division by zero gg/gguf-fix-division-by-zero Georgi Gerganov 2026-03-18 12:39:19 +02:00
  • 312cf03328 llama : re-enable manual LoRA adapter free (#19983) b8407 Pop Flamingo 2026-03-18 11:03:26 +01:00
  • f4049ad735 tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] (#20483) Masato Nakasaka 2026-03-18 02:43:31 -07:00
  • 5e8910a0db common : rework gpt-oss parser (#20393) b8405 Aldehir Rojas 2026-03-18 04:41:25 -05:00
  • fe00a84b4b tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645) Aaron Teo 2026-03-18 17:40:22 +08:00
  • 7ab321d40d webui: Fix duplicated messages on q param (#20715) Aleksander Grygier 2026-03-18 10:32:43 +01:00
  • 7533a7d509 HIP : ignore return of hipMemAdvise [no ci] (#20696) uvos 2026-03-18 09:53:13 +01:00
  • a69d54f990 context : fix graph not resetting when control vector changes (#20381) b8401 Andreas Obersteiner 2026-03-18 07:10:13 +01:00
  • cf23ee2447 hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701) b8400 Krishna Sridhar 2026-03-17 15:34:36 -07:00
  • 892e3c333a vulkan: disable mmvq on Intel Windows driver (#20672) b8399 Ruben Ortlam 2026-03-17 21:51:43 +01:00
  • ee4801e5a6 ggml-blas: set mkl threads from thread context (#20602) b8398 Kevin Hannon 2026-03-17 13:16:49 -04:00
  • d2ecd2d1cf common/parser: add --skip-chat-parsing to force a pure content parser. (#20289) Piotr Wilkin (ilintar) 2026-03-17 16:16:43 +01:00
  • 054d8b0f24 ggml-cpu: fix RVV checks in quants and repacking (#20682) Taimur Ahmad 2026-03-17 19:03:40 +05:00
  • ab0bb93748 ci : bump ccache [no ci] (#20679) Sigbjørn Skjæret 2026-03-17 14:54:31 +01:00
  • 3a5cb629b1 vulkan: async and event fixes (#20518) b8394 Ruben Ortlam 2026-03-17 14:27:23 +01:00
  • 8cc2d81264 server : fix ctx checkpoint invalidation (#20671) b8393 Georgi Gerganov 2026-03-17 15:21:14 +02:00
  • 627670601a kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620) b8392 Justin Bradford 2026-03-17 05:03:54 -07:00
  • 740a447fc3 vulkan: allow graphics queue only through env var (#20599) b8391 Ruben Ortlam 2026-03-17 10:09:59 +01:00
  • 8dc96153c3 enhance FA stable in UT arthw 2026-03-17 15:57:02 +08:00
  • b6c83aad55 [SYCL] ehance UPSCALE to support all UT cases (#20637) b8390 Neo Zhang 2026-03-17 10:01:52 +08:00
  • 2e4a6edd4a tools/server: support refusal content for Responses API (#20285) b8389 Piotr Wilkin (ilintar) 2026-03-17 01:42:04 +01:00
  • d34ff7eb5b model: mistral small 4 support (#20649) b8388 Xuan-Son Nguyen 2026-03-17 00:31:14 +01:00
  • 45172df4d6 ci : disable AMX jobs (#20654) Georgi Gerganov 2026-03-16 22:38:59 +02:00
  • 9b342d0a9f benches : add Nemotron 3 Nano on DGX Spark (#20652) Georgi Gerganov 2026-03-16 21:50:43 +02:00
  • 55e87026f7 tests : write to binary buffer to avoid newline translation in jinja -py [no ci] (#20365) Sigbjørn Skjæret 2026-03-16 20:40:22 +01:00
  • cf21cdf36c kleidiai: add data type check to get_tensor_traits (#20639) Martin Klacer 2026-03-16 19:25:54 +00:00
  • 0ed992973b ci : update labeler (#20629) Sigbjørn Skjæret 2026-03-16 20:24:20 +01:00
  • 1bbec6a75d jinja : add capability check for object args (#20612) Aldehir Rojas 2026-03-16 11:43:14 -05:00
  • f47a246a08 sync : ggml Georgi Gerganov 2026-03-16 14:56:06 +02:00
  • c0ccbd1f86 ggml : try fix arm build (whisper/0) Georgi Gerganov 2026-03-16 09:11:13 +02:00
  • f6da02c3f2 ggml : extend im2col f16 (ggml/1434) David366AI 2026-03-15 15:50:56 -04:00
  • dddca026bf webui: add model information dialog to router mode (#20600) Pascal 2026-03-16 15:38:11 +01:00
  • 3c8521c4f5 llama-graph: replace cont with reshape for alpha in qwen35 (#20640) b8377 Aman Gupta 2026-03-16 22:07:13 +08:00
  • 5bb2d50c0f Merge branch 'master' into pr/18039 Georgi Gerganov 2026-03-16 15:41:24 +02:00
  • 67a2209fab webui: Add MCP CORS Proxy detection logic & UI (#20167) Aleksander Grygier 2026-03-16 13:05:36 +01:00
  • d65c4f2dc9 Fix model selector locked to first loaded model with multiple models (#20580) Pascal 2026-03-16 12:04:06 +01:00
  • d8c331c0af webui: use date in more human readable exported filename (#19939) Woof Dog 2026-03-16 10:18:13 +00:00
  • 46dba9fce8 vulkan: fix flash attention dot product precision (#20589) b8373 Ruben Ortlam 2026-03-16 10:45:49 +01:00
  • de8f01c2d7 model : wire up Nemotron-H tensors for NVFP4 support (#20561) b8372 Sigbjørn Skjæret 2026-03-16 09:19:16 +01:00
  • 079e5a45f0 convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization (#20539) Richard Davison 2026-03-16 09:18:47 +01:00
  • d3936498a3 common : fix iterator::end() dereference (#20445) b8370 Masato Nakasaka 2026-03-15 23:50:38 -07:00
  • 34818ea6c0 CUDA: GDN hide memory latency (#20537) b8369 Aman Gupta 2026-03-16 11:41:45 +08:00
  • 9e2e2198b0 tools/cli: fix disable reasoning (#20606) b8368 Piotr Wilkin (ilintar) 2026-03-15 22:40:53 +01:00
  • 88915cb55c server : fix wait in test_cancel_requests() test (#20601) Georgi Gerganov 2026-03-15 20:54:37 +02:00
  • ebbf544ed1 sycl : fix for untransposed GDA recurrent state (#20583) b8366 Sigbjørn Skjæret 2026-03-15 19:10:15 +01:00
  • b91d7dfe5b ci : only save openvino caches on github-hosted master (#20593) Sigbjørn Skjæret 2026-03-15 18:58:13 +01:00
  • ae40cd27c8 CUDA: limit number of FA stream-k CUDA blocks (#20586) b8364 Johannes Gäßler 2026-03-15 18:30:47 +01:00
  • ceef6b5233 ggml: avoid creating CUDA context during device init (#20595) b8363 Pascal 2026-03-15 17:42:56 +01:00
  • 07c6a59b4f vendor : update cpp-httplib to 0.38.0 (#20578) b8362 Adrien Gallouët 2026-03-15 17:30:06 +01:00
  • 8b7d340b6f ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain (#20536) b8361 MoonShadow 2026-03-16 00:23:58 +08:00
  • 559646472d fix: prevent nullptr dereference (#20552) b8360 Eric Hsieh 2026-03-15 23:51:49 +08:00
  • cf45437d35 codeowners : use teams (#20526) Sigbjørn Skjæret 2026-03-15 14:26:10 +01:00
  • 9cd4ebcfb1 ci : split build.yml + server.yml (#20546) b8358 Georgi Gerganov 2026-03-15 15:11:17 +02:00
  • 89d0aec042 convert : support contiguous method on lora tensors (#20489) Sigbjørn Skjæret 2026-03-15 12:15:12 +01:00
  • 15324f905b cont : reduce paths gg/ci-build-cro Sigbjørn Skjæret 2026-03-15 13:03:20 +02:00
  • 3fec0e1b86 cont : split server.yml Georgi Gerganov 2026-03-15 12:02:24 +02:00
  • 45a8ab2e0e ci : split build.yml Georgi Gerganov 2026-03-14 16:30:07 +02:00
  • b9da4444df ggml : guard against sumq2 being 0 in IQ4_NL (#20460) b8356 Bartowski 2026-03-15 04:47:28 -04:00
  • 0776a6a039 remove event pending stage 0cc4m/vulkan-async-fixes2 Ruben Ortlam 2026-03-15 08:59:39 +01:00
  • 617db241aa cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478) b8355 PikaPikachu 2026-03-15 15:33:39 +08:00
  • 1a3d8edbba vulkan: use graphics queue on AMD (#20551) b8354 Ruben Ortlam 2026-03-15 08:18:54 +01:00
  • 6b10a82c00 kv-cache : fix reading llama_kv_cell_ext during state read (#20273) b8353 sprayandwipe 2026-03-15 07:11:19 +00:00
  • d23355afc3 model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506) b8352 Michael Wand 2026-03-14 14:44:42 -07:00
  • b30a5fdf37 metal : add FA specialization for HSK = 320, HSV = 256 (#20549) b8351 Georgi Gerganov 2026-03-14 23:15:47 +02:00
  • b4768955c4 ci : move self-hosted workflows to separate files (#20540) b8350 Georgi Gerganov 2026-03-14 23:15:35 +02:00
  • fc350fdf96 docker : force Python 3.13 in Vulkan container (#20530) Gerard Guillemas Martos 2026-03-14 21:37:09 +01:00
  • 3a6f059909 ci : try to optimize some jobs (#20521) b8348 Eve 2026-03-14 19:27:52 +00:00
  • 609ea50026 hexagon: Q4_0 and MXFP4 repack fixes (#20527) b8347 Max Krasnyansky 2026-03-14 11:09:08 -07:00
  • 9f774e45ee ci : reduce webgpu tests timeout to 900s (#20538) Georgi Gerganov 2026-03-14 17:08:26 +02:00
  • 94d0262277 mtmd: add llama-mtmd-debug binary (#20508) Xuan-Son Nguyen 2026-03-14 15:52:29 +01:00
  • a93c0ef0fa add op gated_delta_net (#20455) Neo Zhang 2026-03-14 22:01:57 +08:00
  • 710878a7dd webui: restore code preview iframe origin isolation (#20477) Chedrian07 2026-03-14 19:28:28 +09:00
  • 0685848bc6 scripts : remove get-wikitext-103.sh (#20543) Adrien Gallouët 2026-03-14 11:22:04 +01:00
  • 0024a69b70 scripts : update get-hellaswag.sh and get-winogrande.sh (#20542) Adrien Gallouët 2026-03-14 11:21:50 +01:00
  • d0b79aaa2f ggml : add native AVX512-FP16 support for F16 operations (#20529) b8340 Adrien Gallouët 2026-03-14 10:06:14 +01:00
  • 937a425600 fix log Ruben Ortlam 2026-03-14 08:45:28 +00:00
  • 5c177a1036 fix event reuse issue with multiple vectors Ruben Ortlam 2026-03-14 09:03:29 +01:00
  • f2c0dfb739 Use fp32 in cuBLAS V100 to avoid overflows, env variables to override cuBLAS compute type (#19959) b8339 Wallentri 2026-03-14 10:43:13 +03:00
  • ccd8d4a6ce NO MERGE: sync logging Ruben Ortlam 2026-03-13 14:20:47 +01:00
  • 9789c4ecdc ggml : add OpenVINO backend (#15307) b8338 Zijun Yu 2026-03-14 13:56:55 +08:00
  • 77e20cc107 vendor : update cpp-httplib to 0.37.2 (#20484) b8337 Adrien Gallouët 2026-03-14 06:51:02 +01:00
  • 4374b5ab9a use multiple events to avoid reset issues Ruben Ortlam 2026-03-14 06:39:20 +01:00