Commit Graph

  • 4a4f819cb6 sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations (#22147) Intel AI Get-to Market Customer Success and Solutions 2026-05-08 22:42:40 -07:00
  • 046e284437 Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812) b9085 AesSedai 2026-05-08 20:28:29 -07:00
  • 66001722aa hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837) b9084 Yanzhao Wang 2026-05-08 17:12:04 -07:00
  • c5703e03a5 sycl: support non-contiguous input in PAD op (#22148) Intel AI Get-to Market Customer Success and Solutions 2026-05-08 17:05:22 -07:00
  • b46812de78 Feature hexagon l2 norm (#22816) b9082 Pranav Dhinakar 2026-05-08 13:41:40 -07:00
  • 49956041ee common : do not wrap raw strings in schema parser for tagged parsers (#22827) b9081 Aldehir Rojas 2026-05-08 15:33:17 -05:00
  • 9f5f0e689c model : support Gemma4_26B_A4B_NVFP4 (#22804) b9080 ynankani 2026-05-08 18:42:09 +00:00
  • 55b62bce15 llama : reuse device buffers when possible Georgi Gerganov 2026-05-08 20:42:56 +03:00
  • f1652197dd server : support parallel drafting Georgi Gerganov 2026-05-08 19:30:31 +03:00
  • f88c942861 spec : support parallel drafts Georgi Gerganov 2026-05-08 18:53:33 +03:00
  • f9cd456ea5 common : revert reasoning budget +inf logit bias (#22740) b9079 Aldehir Rojas 2026-05-08 10:46:43 -05:00
  • 927d6635d3 cont : prepare params Georgi Gerganov 2026-05-08 17:50:20 +03:00
  • 5d6f18a638 webui: fix LLM title generation for agentic conversations (#22840) smugman-dot 2026-05-08 15:36:04 +01:00
  • 8822c122be cont : prepare params Georgi Gerganov 2026-05-08 16:59:48 +03:00
  • 29debb3a6a server: support Vertex AI compatible API (#22545) b9077 Xuan-Son Nguyen 2026-05-08 15:23:04 +02:00
  • 6582523eaa spec : refactor for multi-sequence speculative context Georgi Gerganov 2026-05-08 15:43:36 +03:00
  • 9dcf835528 server: (router) expose child model info from router's /v1/models (#22683) b9076 Xuan-Son Nguyen 2026-05-08 14:42:15 +02:00
  • 58e68df0f9 cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667) b9075 Pascal 2026-05-08 11:44:09 +02:00
  • 9b2925e1e0 webui: Add Import/Export of Settings configuration + improve architecture (#22803) Aleksander Grygier 2026-05-08 11:26:04 +02:00
  • efa2f8e5a7 naming : improve consistency gg/spec-refactor-ctx Georgi Gerganov 2026-05-08 12:24:57 +03:00
  • 778f9e247e tools : update readme Georgi Gerganov 2026-05-08 11:55:16 +03:00
  • 1dbc054da5 server : fix slot ctx_drft ptr Georgi Gerganov 2026-05-08 11:55:05 +03:00
  • 161eae0adf spec : fix n_past type Georgi Gerganov 2026-05-08 11:54:32 +03:00
  • a8fd165fec CUDA: lower-case PCI bus id, standardize for ggml (#22820) b9073 Johannes Gäßler 2026-05-08 10:09:38 +02:00
  • e5b1401318 speculative-simple : update Georgi Gerganov 2026-05-08 11:09:34 +03:00
  • 6d57a49a70 vulkan: fix spv shadowing (#22760) b9072 miyan 2026-05-08 15:35:22 +08:00
  • 3b1a8df8fd server : clean-up + dry Georgi Gerganov 2026-05-08 10:20:01 +03:00
  • 233d1aee69 server : add comment Georgi Gerganov 2026-05-08 08:50:23 +03:00
  • 3e941b813b ggml: update SCHED_DEBUG output to use ggml_op_desc() (#22825) b9071 Max Krasnyansky 2026-05-07 22:43:04 -07:00
  • 12c7cfbe83 server : fix URL for draft model Georgi Gerganov 2026-05-08 08:03:49 +03:00
  • 6a4b05a030 server : fix mtmd draft processing Georgi Gerganov 2026-05-08 08:02:11 +03:00
  • f3e8d149ce opencl: add q4_0 MoE GEMM for Adreno (#22731) b9070 Shawn Gu 2026-05-07 21:17:07 -07:00
  • 8be14e40de spec : handle draft running out of context Georgi Gerganov 2026-05-08 07:11:51 +03:00
  • 1d72d87349 convert : fix RuntimeError when stripping FP8 KV-cache scales (#22818) Michał Piszczek 2026-05-08 05:55:48 +02:00
  • 6a2a2513dc fix script error (#22795sycl : ) Neo Zhang 2026-05-08 11:54:57 +08:00
  • ba72d4d287 ggml: update SCHED_DEBUG output to use ggml_op_desc() maxk/sched-debug-use-op-desc Max Krasnyansky 2026-05-06 11:27:12 -07:00
  • 44dbe8c521 model: Support sarashina2.2-vision-3b model (#22103) samuraieng 2026-05-08 06:10:29 +09:00
  • 05ff59cb57 CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (#22651) b9066 leonardHONG 2026-05-08 03:59:29 +08:00
  • aaf4a4d5e0 webui: add option for LLM title generation (#22265) smugman-dot 2026-05-07 20:14:03 +01:00
  • 7e118cdce0 cont : process images throught the draft context Georgi Gerganov 2026-05-07 21:34:29 +03:00
  • ae6703fa89 cont : pass correct n_past for drafting Georgi Gerganov 2026-05-07 21:19:11 +03:00
  • 0239f4c611 cont : handle non-ckpt models Georgi Gerganov 2026-05-07 21:10:03 +03:00
  • c7facb0fe1 cont : async drft eval when possible Georgi Gerganov 2026-05-07 20:04:18 +03:00
  • 08c8012bde cont : sync main and drft contexts Georgi Gerganov 2026-05-07 18:47:34 +03:00
  • de35b1255c server, spec : transition to unified spec context Georgi Gerganov 2026-05-07 17:57:59 +03:00
  • 1afee5b262 server : improve ctx names Georgi Gerganov 2026-05-07 13:07:44 +03:00
  • 11fd5e7272 server : draft prompt cache and checkpoints Georgi Gerganov 2026-05-07 12:47:56 +03:00
  • c97dc3605e server : sketch the ctx_dft decode loop Georgi Gerganov 2026-05-07 10:50:42 +03:00
  • 8a50f6f0b9 cont : dedup ctx_seq_rm_type Georgi Gerganov 2026-05-07 10:22:20 +03:00
  • 77269ad8a7 cont : pass seq_id Georgi Gerganov 2026-05-07 10:14:18 +03:00
  • 4550f0f08b spec : update common_speculative_init() Georgi Gerganov 2026-05-07 10:06:32 +03:00
  • befc7ef635 spec : drop support for incompatible vocabs Georgi Gerganov 2026-05-07 09:54:09 +03:00
  • 2c9a40849f spec : refactor Georgi Gerganov 2026-05-07 08:39:46 +03:00
  • e43431b381 llama : fix device state save/load (#22805) b9064 Georgi Gerganov 2026-05-07 21:43:40 +03:00
  • ceb7e14b96 opencl: add opfilter regex for debugging (#22782) b9063 shaofeiqi 2026-05-07 11:00:20 -07:00
  • 093be624cc common/chat : preserve media markers for typed-content templates (#22634) b9062 Aldehir Rojas 2026-05-07 12:50:56 -05:00
  • deab41ec68 tests: add long-sequence cases and fix inputs for gated_delta_net (#22794) b9061 HaoJun ZHANG 2026-05-08 00:23:36 +08:00
  • ad09224658 sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET (#22149) b9060 Intel AI Get-to Market Customer Success and Solutions 2026-05-07 08:51:33 -07:00
  • b9afc19cb4 Write a readme on Multi-GPU usage in llama.cpp (#22729) Gaurav Garg 2026-05-07 21:18:40 +05:30
  • 803627f121 llama : remove unnecessary seq_id check during state restore (#22797) b9058 Georgi Gerganov 2026-05-07 16:37:26 +03:00
  • 68380ae11b ggml-cpu: Optimized risc-v cpu q1_0 dot b9057 pl752 2026-05-07 18:09:25 +05:00
  • cc97e45a14 mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT (#22770) b9056 Pascal 2026-05-07 14:01:01 +02:00
  • 8e52631d55 model: Add Mimo v2.5 model support (#22493) b9055 AesSedai 2026-05-07 04:21:58 -07:00
  • f4b5a2ee91 webui: fix ?model= URL param race in router mode (#22771) Pascal 2026-05-07 13:09:32 +02:00
  • 97f06e9eed codeowners : add ZenDNN backend codeowner (#22772) Vishal Singh 2026-05-07 12:16:51 +05:30
  • e358d75adb webui: fix flicker issue on dismiss animation on overlay primitives (#22773) viggy 2026-05-06 23:11:31 -07:00
  • cfff1fc300 sycl : fix test script (#22737) Shane Tran Whitmire 2026-05-07 00:25:57 -05:00
  • 3980e04d5a llama : add missing call to ggml_backend_load_all() (#22752) b9050 Adrien Gallouët 2026-05-07 07:24:47 +02:00
  • 2496f9c149 mtmd : support MiniCPM-V 4.6 (#22529) b9049 tc-mb 2026-05-07 03:54:09 +08:00
  • 5207d120ea model : don't crash on unsupported architecture (#22742) b9048 Gilad S. 2026-05-06 11:51:21 -05:00
  • a0101225bc common: do not fit to unknown device memory (#22614) b9047 fl0rianr 2026-05-06 17:03:45 +02:00
  • a290ce6266 gguf-py : bump version to 0.19.0 (#22664) gguf-v0.19.0 Georgi Gerganov 2026-05-06 15:46:14 +03:00
  • a00e47e422 mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101) b9045 Yakine Tahtah 2026-05-06 14:40:59 +02:00
  • 750141969c feat: migrate to PEP 621 and add uv support (#21907) David Huggins-Daines 2026-05-06 08:04:10 -04:00
  • a736e6c0ac convert : ignore non-language tensors for Gemma4Model (#22753) Daniel Bevenius 2026-05-06 13:50:44 +02:00
  • e3e3f8e46a webui: Remove Google Favicons & Improve MCP Information logic & UI (#22719) Aleksander Grygier 2026-05-06 11:12:27 +02:00
  • f08f20a0e3 ggml-cpu: fuse RMS_NORM + MUL on CPU backend (#22423) b9041 zzzzwc 2026-05-06 15:41:14 +08:00
  • 07eaf919ed add tabindex and aria-hidden (#22699) viggy 2026-05-06 00:21:58 -07:00
  • 74d6248f71 convert : add filter_tensors method to pre-filter tensors (#22597) Sigbjørn Skjæret 2026-05-06 08:06:05 +02:00
  • 2ca1161bd7 ggml : use CL_DEVICE_GLOBAL_MEM_SIZE as memory estimate for OpenCL --fit (#22688) b9038 fl0rianr 2026-05-06 07:12:48 +02:00
  • 0445829c1d llama : enable layer input extraction gg/llama-extract-embeddings Georgi Gerganov 2026-05-05 20:50:20 +03:00
  • bbeb89d76c Hexagon: Process M-tail rows on HMX instead of HVX (#22724) b9037 Trivikram Reddy 2026-05-05 11:43:03 -05:00
  • ff806a110d opencl: refactor Adreno q4_0 (#22335) lhez 2026-05-05 09:38:57 -07:00
  • d5003b6e4d rpc : use graph uid instead of graph cache (#22701) Radoslav Gerganov 2026-05-05 13:47:13 +03:00
  • 2635ac76e8 common : fix missing-noreturn warnings when compiling with clang 21 (#22702) Adrien Gallouët 2026-05-05 12:16:25 +02:00
  • 70a8309114 sync : ggml b9033 Georgi Gerganov 2026-05-05 13:15:19 +03:00
  • c91faf997f ggml : bump version to 0.11.0 (ggml/1478) Georgi Gerganov 2026-05-05 13:14:32 +03:00
  • bf76ac77be common : only load backends when required (#22290) b9031 Adrien Gallouët 2026-05-05 09:23:50 +02:00
  • a09a00e502 vendor : update cpp-httplib to 0.43.3 (#22686) b9030 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-05 04:04:57 -03:00
  • f84632951a wip pr/18039-gg Georgi Gerganov 2026-04-25 18:27:15 +03:00
  • 4567954ab0 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-05-05 09:28:39 +03:00
  • 2bacb1eb77 server : validate --tools CLI argument against known tool names (#22538) b9029 Georgi Gerganov 2026-05-05 06:35:27 +03:00
  • d6e7b033a4 llama : add option to save memory in device buffers (#22679) b9028 Georgi Gerganov 2026-05-05 06:35:07 +03:00
  • fa595462ca graph : handle non-contiguous Q/K/V in mul_mat_aux (#22630) Sigbjørn Skjæret 2026-05-05 05:34:44 +02:00
  • a817a22bc6 ggml : implement fast walsh-hadamard transform for kv rotation (#21352) (#22631) b9026 Ismail 2026-05-05 04:05:05 +02:00
  • eff06702b2 kleidiai : update to v1.24.0 and use release archive (#22549) b9025 Charles Xu 2026-05-04 21:13:31 +02:00
  • 069be0ae22 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-05-04 21:42:27 +03:00
  • e77056f9b2 CUDA: use fastdiv for batch index split in get_rows (#22650) leonardHONG 2026-05-04 22:24:05 +08:00
  • 935a340292 server: implement /models?reload=1 (#21848) b9023 Xuan-Son Nguyen 2026-05-04 16:23:26 +02:00
  • d8794eecd5 examples: refactor diffusion generation (#22590) b9022 Shakhnazar Sailaukan 2026-05-04 16:19:30 +04:00