Commit Graph

  • 380b4c984e common: support negated args (#17919) b7376 Xuan-Son Nguyen 2025-12-12 23:58:53 +01:00
  • e39a2ce66d clip: move model cgraphs into their own files (#17965) b7375 Xuan-Son Nguyen 2025-12-12 21:14:48 +01:00
  • a8c7f33d79 ci : change the cann version and the container pull method (#17953) b7374 jiahao su 2025-12-13 03:43:00 +08:00
  • b7f5f46e03 docker : include legacy llama-completion binary (#17964) Sigbjørn Skjæret 2025-12-12 19:39:23 +01:00
  • 482211438d CUDA: fix overflow in MMA kernel without stream-k (#17939) b7372 Johannes Gäßler 2025-12-12 17:43:58 +01:00
  • 7bed317f53 models : fix the attn_factor for mistral3 graphs + improve consistency (#17945) b7371 Georgi Gerganov 2025-12-12 17:12:40 +02:00
  • dcb7d17758 cann : fix ops broken by circular padding guard (#17825) b7370 Sigbjørn Skjæret 2025-12-12 15:49:27 +01:00
  • 51604435e8 ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951) b7369 ixgbe 2025-12-12 22:26:03 +08:00
  • 17158965ac mtmd: explicitly forbidden inclusion of private header and libcommon (#17946) b7368 Xuan-Son Nguyen 2025-12-12 15:16:06 +01:00
  • 12280ae905 webui: Fix parsing non-LaTeX occurrencies of \( or \) (#17810) Aleksander Grygier 2025-12-12 15:13:36 +01:00
  • 07b809bbc0 Apply suggestions from code review Oliver Simons 2025-12-12 15:07:28 +01:00
  • 54a0fee4b7 arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958) b7366 Xuan-Son Nguyen 2025-12-12 14:06:06 +01:00
  • dada4c846d model-conversion : remove max diff check in compare-logits [no ci] (#17954) Daniel Bevenius 2025-12-12 13:25:16 +01:00
  • b8ee22cfde common : add minimalist multi-thread progress bar (#17602) b7364 Adrien Gallouët 2025-12-12 12:44:35 +01:00
  • 2eaa2c65cb cmake: link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17949) b7363 Gustavo Rocha Dias 2025-12-12 08:02:28 -03:00
  • c33a58bced HIP: enable mmf for RDNA3 (#17879) b7362 yulo 2025-12-12 18:34:33 +08:00
  • a81a569577 Add a search field on model selector / improve mobile display (#17765) b7361 Pascal 2025-12-11 18:21:21 +01:00
  • 53ecd4fdb9 SOLVE_TRI extension to more dimensions (#17793) b7360 Piotr Wilkin (ilintar) 2025-12-11 17:20:43 +01:00
  • 4d10b78e23 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-11 14:42:56 +02:00
  • c6f6e4f96a ggml-alloc : fix reuse-parent logic for misaligned sizes (#17884) Georgi Gerganov 2025-12-11 14:30:10 +02:00
  • d9f8f60618 batch : fix sequence id ownership (#17915) b7358 Georgi Gerganov 2025-12-11 14:29:47 +02:00
  • ab65b47a52 tests : run backend sampler tests always on the CPU Georgi Gerganov 2025-12-11 14:12:35 +02:00
  • 74b112e3e7 sampling : fix greedy Georgi Gerganov 2025-12-11 13:37:02 +02:00
  • 8544aba37f sampling : generic ggml op support detection Georgi Gerganov 2025-12-11 13:19:43 +02:00
  • d5d16651a8 cont : fix build Georgi Gerganov 2025-12-11 11:27:47 +02:00
  • 54e9054017 sampling : optimize logit_bias sampler Georgi Gerganov 2025-12-11 11:14:39 +02:00
  • e4ae383317 docs: use port 8080 in Docker examples (#17903) Yuichiro Utsumi 2025-12-11 18:12:07 +09:00
  • 56720f8f01 Merge pull request #1 from JohannesGaessler/gpu-sampling-hip Daniel Bevenius 2025-12-11 09:20:55 +01:00
  • 34ce48d97a ggml-hexagon: fix rope failure at test-backend-ops (#17565) b7356 nullname 2025-12-11 06:45:43 +08:00
  • 45e350e3d3 ci: fix riscv64-native build (#17916) Sigbjørn Skjæret 2025-12-10 23:24:31 +01:00
  • c6b2c9310c mtmd: some small clean up (#17909) b7354 Xuan-Son Nguyen 2025-12-10 22:20:06 +01:00
  • 34a6d86982 cli: enable jinja by default (#17911) b7353 Xuan-Son Nguyen 2025-12-10 22:19:42 +01:00
  • 42cf5c01e5 HIP/MUSA: fix build for backend sampling Johannes Gäßler 2025-12-10 22:00:46 +01:00
  • f32ca51bfe server: add presets (config) when using multiple models (#17859) b7352 Pascal 2025-12-10 22:18:21 +01:00
  • e1f4921980 Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748) b7351 Max Krasnyansky 2025-12-10 12:32:23 -08:00
  • 4dff236a52 ggml : remove GGML_KQ_MASK_PAD constant (#17910) b7350 Georgi Gerganov 2025-12-10 20:53:16 +02:00
  • 804e7e3795 graph : respect sampler order for graph reuse Georgi Gerganov 2025-12-10 20:40:15 +02:00
  • 44d5c4b592 batch : fix sequence id ownage Georgi Gerganov 2025-12-10 20:35:58 +02:00
  • 4df6e859e9 cuda : add missing support check for xielu (#17895) b7349 Sigbjørn Skjæret 2025-12-10 16:16:20 +01:00
  • 38882247d3 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-10 17:07:21 +02:00
  • 6c2131773c cli: new CLI experience (#17824) b7348 Xuan-Son Nguyen 2025-12-10 15:28:59 +01:00
  • b677721819 model : Qwen3-Next-80B-A3B has 48 layers (#17898) b7347 Eric Zhang 2025-12-10 22:22:40 +08:00
  • 2d2e1030e3 docs : update opencl ops (#17904) lhez 2025-12-10 06:20:00 -08:00
  • c02654eb7d graph : make the compute graph constant with respect to active samplers Georgi Gerganov 2025-12-10 15:54:33 +02:00
  • 0ecee8be37 server : reconnect the backend_sampling setting in the WebUI Georgi Gerganov 2025-12-10 15:42:02 +02:00
  • 81cb5783c8 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-10 13:41:32 +02:00
  • 17f7f4baad CUDA: fix unpadded strides in MMA FA kernel (#17891) b7345 Johannes Gäßler 2025-12-10 12:39:56 +01:00
  • 9e79b0116e convert: allow using quantized Mistral weight (#17889) Xuan-Son Nguyen 2025-12-10 10:26:22 +01:00
  • 2e9eab80c2 fix softmax for iGPU (#17838) b7343 Neo Zhang Jianyu 2025-12-10 16:59:57 +08:00
  • 2fbe3b7bb7 common : add parser for ministral/mistral large 3/devstral 2 (#17713) b7342 Aldehir Rojas 2025-12-09 17:31:04 -06:00
  • 63391852b0 docs : update cpu and cuda ops (#17890) Sigbjørn Skjæret 2025-12-09 23:31:29 +01:00
  • 086a63e3a5 metal: SSM kernel improvements (#17876) b7340 Gabe Goodhart 2025-12-09 12:30:02 -07:00
  • b63509262a Add DIAG for CUDA (#17873) b7339 Piotr Wilkin (ilintar) 2025-12-09 20:28:57 +01:00
  • 48f47565a7 docs: clarify that CPU support should be first (#17886) Johannes Gäßler 2025-12-09 20:10:36 +01:00
  • 6dc6614bf0 Disable cooperative groups for musa Oliver Simons 2025-12-09 19:09:52 +01:00
  • a25fda5290 Fix launch logic when supports_cooperative_launch=false Oliver Simons 2025-12-09 19:03:47 +01:00
  • 3f0594ad0b Try fixing HIP build errors by adding corresponding #defines Oliver Simons 2025-12-09 18:51:28 +01:00
  • 02e409a5be ggml : Provide macos-specific backtrace printing to avoid terminal death (#17869) b7337 Gabe Goodhart 2025-12-09 09:29:07 -07:00
  • 34b407b41c sampling : use host buffer type for inputs Georgi Gerganov 2025-12-09 17:53:17 +02:00
  • 92ff767918 llama : require backend samplers to be of type llama_sampler_chain Georgi Gerganov 2025-12-09 15:38:37 +02:00
  • 6b82eb7883 metal : print node names for debugging (#17882) b7336 Georgi Gerganov 2025-12-09 15:25:49 +02:00
  • 07003f1ffb Fix compiler warnings by casting const away Oliver Simons 2025-12-09 13:05:43 +01:00
  • 886c3668b5 Add TODOs to and adjust heuristics of row-wise soft_max in CUDA Oliver Simons 2025-12-09 12:55:30 +01:00
  • a84dfd3e10 CUDA: Add Cooperative-Groups-based parallelization of ncols in softmax Oliver Simons 2025-12-08 16:48:52 +01:00
  • 86a3f0fad8 ggml : allow fill node alloc inplace (#17870) b7335 Sigbjørn Skjæret 2025-12-09 12:23:47 +01:00
  • 63908b631a cmake: fix Mach-O current version number (#17877) b7334 Rhys-T 2025-12-09 06:17:41 -05:00
  • 42b12b5608 model : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B (#12652) b7333 Sigbjørn Skjæret 2025-12-09 12:15:06 +01:00
  • 4e842d5120 console: allow using arrow left/right, home/end keys and history mode (#17836) b7332 Xuan-Son Nguyen 2025-12-09 11:53:59 +01:00
  • 7ab6f51b97 Revert "ggml : remove redundant src in ggml_cast" Georgi Gerganov 2025-12-09 12:52:59 +02:00
  • ca709e427b CANN: add support for partial RoPE and Vision mode (#17543) b7331 Chenguang Li 2025-12-09 17:53:23 +08:00
  • 2a615b27e4 ggml : remove redundant src in ggml_cast gg/cast-remove-src Georgi Gerganov 2025-12-09 10:58:06 +02:00
  • 9f6681c3a4 ggml-alloc : fix reuse-parent logic for misaligned sizes Georgi Gerganov 2025-12-09 11:13:44 +02:00
  • 62d1b0082d ggml : remove redundant src in ggml_cast Georgi Gerganov 2025-12-09 10:58:06 +02:00
  • d62b5804e1 metal : print node names for debugging Georgi Gerganov 2025-12-09 10:55:54 +02:00
  • 560ac16f7d server : handle unsupported cases Georgi Gerganov 2025-12-09 10:55:11 +02:00
  • 0cdce38a97 CUDA: fix FP16 overflow in tile FA kernel (#17875) b7330 Johannes Gäßler 2025-12-09 09:34:02 +01:00
  • e39502e74b llama : add token matching support to llama-grammar (#17816) b7329 Aldehir Rojas 2025-12-09 00:32:57 -06:00
  • 1d2a1ab73d model : support Rnj-1 (#17811) b7328 philip-essential 2025-12-08 19:49:03 -08:00
  • c8554b66e0 graph : use fill instead of scale_bias in grouped expert selection (#17867) b7327 Sigbjørn Skjæret 2025-12-08 21:29:59 +01:00
  • f3beb22b17 sampling : handle n_probs case Georgi Gerganov 2025-12-08 21:30:10 +02:00
  • 2fa51c19b0 model-conversion : add token ids to prompt token output [no ci] (#17863) Daniel Bevenius 2025-12-08 17:13:08 +01:00
  • 951520ddb0 server: delegate result_state creation to server_task (#17835) b7325 Xuan-Son Nguyen 2025-12-08 17:04:38 +01:00
  • 6d38db5dfe Merge branch 'master' into HEAD Georgi Gerganov 2025-12-08 17:55:24 +02:00
  • 68522c678d ci : support bfloat16 SYCL release package (#17855) b7324 Neo Zhang 2025-12-08 22:09:39 +08:00
  • f896d2c34f server: improve speed of speculative decoding (#17808) Xuan-Son Nguyen 2025-12-08 14:35:28 +01:00
  • e4e9c4329c Make graph_max_nodes vary by ubatch size (#17794) Piotr Wilkin (ilintar) 2025-12-08 14:32:41 +01:00
  • 636fc17a37 Fix Kimi-K2 tool-call parsing issues (#17376) hksdpc255 2025-12-09 00:32:04 +11:00
  • 51e0c2d917 cuda : add FILL op support (#17851) Jay Zenith 2025-12-08 05:10:12 -08:00
  • 37a4f63244 server : add development documentation (#17760) Xuan-Son Nguyen 2025-12-08 13:54:58 +01:00
  • 2bc96931d2 server : make cache_reuse configurable per request (#17858) b7318 Georgi Gerganov 2025-12-08 12:43:12 +02:00
  • 5814b4dce1 cuda: optimize SOLVE_TRI using registers and FMAF (#17703) b7317 wsbagnsv1 2025-12-08 10:41:08 +01:00
  • 79d61896d3 ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784) b7316 ixgbe 2025-12-08 16:41:34 +08:00
  • 4d3726278b model: add llama 4 scaling for mistral-large (deepseek arch) (#17744) b7315 Xuan-Son Nguyen 2025-12-07 22:29:54 +01:00
  • 08f9d3cc1d Vulkan: improve mul_mat_vec_iq1_m (#16907) b7314 lovedheart 2025-12-07 18:40:42 +01:00
  • 72e3681073 sampling : fix top-p Georgi Gerganov 2025-12-07 17:11:50 +02:00
  • 42125f0e10 tests : check temp back to 0.0 Georgi Gerganov 2025-12-07 15:54:49 +02:00
  • 8ef5f900db cont : fixes Georgi Gerganov 2025-12-07 12:52:25 +02:00
  • 0a540f9abd ci : add windows-cuda 13.1 release (#17839) b7313 Sigbjørn Skjæret 2025-12-07 14:02:04 +01:00
  • 22577583a3 common : change --color to accept on/off/auto, default to auto (#17827) b7312 Sigbjørn Skjæret 2025-12-07 03:43:50 +01:00
  • d9e03db1e7 sycl: add missing BF16 conversion support for Intel oneAPI (#17780) b7311 Law Po Ying 2025-12-07 09:18:18 +08:00