Commit Graph

  • c1625620f6 sampling : return early if backend sampling is disabled Daniel Bevenius 2025-11-21 08:44:25 +01:00
  • 21d31e0810 ggml-hexagon: fix swiglu failure at test-backend-ops (#17344) b7122 nullname 2025-11-21 07:45:05 +08:00
  • dd0f321941 readme : add Unsloth exporting to GGUF in tools (#17411) Daniel Han 2025-11-20 11:07:36 -08:00
  • 054a45c3d3 grammar: fix regression caused by #17381 (#17412) b7120 Xuan-Son Nguyen 2025-11-20 18:35:10 +01:00
  • c0b9903a1a more readable xsn/fix_grammar_1 Xuan Son Nguyen 2025-11-20 17:45:37 +01:00
  • 3b195d301a grammar: fix regression caused by #17381 Xuan Son Nguyen 2025-11-20 17:31:55 +01:00
  • 6cdda87baf ci : disable op offload in some tests sl/realloc-error-cont Georgi Gerganov 2025-11-20 17:16:50 +02:00
  • 0d28b16bdc sampling : introduce sampling_info struct Daniel Bevenius 2025-11-20 14:31:37 +01:00
  • 4c91f2633f Improved file naming & structure for UI components (#17405) Aleksander Grygier 2025-11-20 14:07:31 +01:00
  • 92c0b387a9 grammar : fix integer overflow (#17381) b7118 Piotr Wilkin (ilintar) 2025-11-20 13:47:04 +01:00
  • 2286a360ff sync : ggml b7117 Georgi Gerganov 2025-11-20 14:09:48 +02:00
  • 1d321e592b metal : fix compile on macos 11 (whisper/3533) YangLe 2025-11-20 19:54:54 +08:00
  • 196f5083ef common : more accurate sampling timing (#17382) Georgi Gerganov 2025-11-20 13:40:10 +02:00
  • 5088b435d4 convert : fix TypeError when loading base model remotely in convert_lora_to_gguf (#17385) o7si 2025-11-20 19:30:12 +08:00
  • 845f200b28 ggml : Fix transposed SOLVE_TRI result (#17323) b7113 Piotr Wilkin (ilintar) 2025-11-20 11:58:21 +01:00
  • a7784a8b1d DGX Spark: UMA support (#17368) b7112 Scott Fudally 2025-11-20 02:32:02 -08:00
  • 79bb743512 ggml : remove useless and error-prone variadic macros (#17399) b7111 Adrien Gallouët 2025-11-20 11:18:27 +01:00
  • 3ae282a06f kleidiai: fix zero-size array declaration (#17240) b7110 sudhiarm 2025-11-20 09:45:49 +00:00
  • ed4345bdd9 squash! common : fix regression caused by extra memory allocations during sampling Daniel Bevenius 2025-11-20 07:56:33 +01:00
  • 5be353ec4a ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (#17314) b7109 ixgbe 2025-11-20 14:09:18 +08:00
  • 0c660e7390 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-20 06:57:24 +01:00
  • 7d77f07325 vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319) b7108 Giuseppe Scrivano 2025-11-19 17:29:45 +01:00
  • 1fa4551af0 vulkan: support larger argsort (#17313) b7107 Jeff Bolz 2025-11-19 10:25:50 -06:00
  • 2eba631b81 vulkan: Add copy_transpose shader (#17371) b7106 Jeff Bolz 2025-11-19 09:50:43 -06:00
  • 18ed4d8f96 squash! sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 15:10:15 +01:00
  • 99c53d6558 webui: Add a "Continue" Action for Assistant Message (#16971) Aleksander Grygier 2025-11-19 14:39:50 +01:00
  • 38f408c253 common : fix regression caused by extra memory allocations during sampling Georgi Gerganov 2025-11-19 13:43:29 +02:00
  • 07b0e7a5ac convert : use self.block_count everywhere instead of reading hparams (#17359) Sigbjørn Skjæret 2025-11-19 11:52:38 +01:00
  • d74eb61aa7 squash! sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 11:29:26 +01:00
  • fd7353d5eb cuda: fix rope fusion for gemma3 (#17378) b7103 Aman Gupta 2025-11-19 18:25:05 +08:00
  • 6fd4f95367 Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (#17332) b7102 Piotr Wilkin (ilintar) 2025-11-19 10:36:33 +01:00
  • 7e98ebcc6b sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 09:31:33 +01:00
  • e4838046f3 llama : update worst-case graph for unified cache Georgi Gerganov 2025-11-19 09:44:04 +02:00
  • 980b7cd17e vulkan: force full subgroups for flash attention to fix intel subgroup crash (#17356) b7101 Ruben Ortlam 2025-11-19 08:46:26 +01:00
  • c49daff5ba ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (#17308) b7100 Jeremy Rand 2025-11-19 06:19:00 +00:00
  • 51fee29822 sampling : always populate logits for sampled probs Daniel Bevenius 2025-11-19 07:14:11 +01:00
  • 0da7e7dccc sampling : remove version from sampler chain Daniel Bevenius 2025-11-19 06:59:03 +01:00
  • 10e9780154 chat: fix int overflow, prevent size calculation in float/double (#17357) b7099 Xuan-Son Nguyen 2025-11-18 19:11:53 +01:00
  • a045492088 vocab : call reserve() for building plamo-2-translate suffix (#17343) Haiyue Wang 2025-11-19 01:58:22 +08:00
  • 1920345c3b common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) b7097 hksdpc255 2025-11-19 04:54:15 +11:00
  • 26be108be8 CUDA: Optimize argsort for gpu-based token sampling Oliver Simons 2025-11-18 18:17:44 +01:00
  • 561a3e2788 ci : change the openEuler-310p image to fix release (#17361) b7096 jiahao su 2025-11-19 01:10:23 +08:00
  • 625010d42d move enum stop_type to server-task Xuan Son Nguyen 2025-11-18 17:28:21 +01:00
  • 311c1a347f sampling : ensure at most one output token per seq Daniel Bevenius 2025-11-18 16:01:54 +01:00
  • f40a2e5f11 gitignore : be more specific about ignored stuff (#17354) Georgi Gerganov 2025-11-18 16:44:53 +02:00
  • 82957a90f2 sampling : always expose sampled_ids Daniel Bevenius 2025-11-18 14:54:49 +01:00
  • ca993bad51 rm redundant includes Xuan Son Nguyen 2025-11-18 15:01:17 +01:00
  • 3b7946034c add server-queue Xuan Son Nguyen 2025-11-18 14:24:46 +01:00
  • e1a756e934 add server-task, server-common Xuan Son Nguyen 2025-11-18 14:15:14 +01:00
  • 4b52e59903 graph : do not include llama-model.h Georgi Gerganov 2025-11-18 13:53:25 +02:00
  • bc4064cfea CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (#17347) Chenguang Li 2025-11-18 16:41:52 +08:00
  • 97cb3fd5ae fix: resolve undefined variable 'svr' compilation error (#17348) o7si 2025-11-18 16:10:47 +08:00
  • ffa277a54c CANN: Add openEuler-cann in build and release (#17192) jiahao su 2025-11-18 16:08:55 +08:00
  • da95bf2a85 vulkan: support noncontig i32 copy (#17328) b7091 Jeff Bolz 2025-11-18 00:41:24 -06:00
  • 71574f9273 sampling : enable all backend sampler tests Daniel Bevenius 2025-11-18 07:31:54 +01:00
  • 0de8878c96 server: split HTTP into its own interface (#17216) b7090 Xuan-Son Nguyen 2025-11-17 22:05:44 +01:00
  • 38e2c1b412 vulkan: add log RTE support to fix Nvidia CI (#17320) b7089 Ruben Ortlam 2025-11-17 21:37:49 +01:00
  • cb44fc84e8 cmake : fix ARM feature verification (#17170) b7088 Adrien Gallouët 2025-11-17 21:37:29 +01:00
  • 0710d5f0f8 ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. slaren 2025-11-14 22:04:41 +01:00
  • 67d3b8e84d ggml : add initial cumsum implementation for CUDA Daniel Bevenius 2025-11-17 16:03:04 +01:00
  • a3eb847d24 webui : add backend sampling options Daniel Bevenius 2025-11-17 15:32:33 +01:00
  • f1f3e68511 server : add backend sampling options/configuration Daniel Bevenius 2025-11-17 15:31:30 +01:00
  • 9fe9a00a8a llama-cli : add backend sampler configuration Daniel Bevenius 2025-11-17 15:30:16 +01:00
  • 7884b0e0ac sampling : add support for backend sampling Daniel Bevenius 2025-11-17 15:19:34 +01:00
  • cb623de3fc ggml : add missing AVX512 feature checks (#17270) b7087 Adrien Gallouët 2025-11-17 12:12:00 +01:00
  • 7aaeedc098 metal : support I32 -> I32 copy (#17317) b7086 Georgi Gerganov 2025-11-17 11:52:00 +02:00
  • 3347e6d904 metal : faster argsort (#17315) b7085 Georgi Gerganov 2025-11-17 11:51:48 +02:00
  • 1a139644a8 metal : add cumsum (#17305) b7084 Georgi Gerganov 2025-11-17 11:51:13 +02:00
  • 2376b7758c CANN: Use smart pointers to manage ACL objects (#17238) b7083 hipudding 2025-11-17 08:43:59 +08:00
  • dbed61294a vulkan: add LOG operation support for F32 and F16 (#17183) b7082 Pavels Zaicenkovs 2025-11-16 22:50:09 +01:00
  • dba1cbceb3 tune for RDNA3 0cc4m/vulkan-mmq-bk-step-tuning 0cc4m 2025-11-16 20:21:22 +01:00
  • 94e2c4d2b3 fix warptile 0cc4m 2025-11-16 20:06:04 +01:00
  • 80deff3648 vulkan: fix MMQ quantize_y condition (#17301) b7081 Ruben Ortlam 2025-11-16 19:38:17 +01:00
  • c19b3c378c device tuning 0cc4m 2025-11-16 18:37:04 +00:00
  • 8b1c339bd2 ci : revert #16249 (#17303) Eve 2025-11-16 18:09:17 +00:00
  • 7e8eb9ba0a vulkan: allow MMQ bk_step tuning 0cc4m 2025-11-16 14:25:24 +01:00
  • 6c262ac39c release: fix duplicate libs, store symbolic links Aaron Teo 2025-11-16 19:51:53 +08:00
  • 416e7c7f47 metal : remove obosolete asserts (#17295) b7079 Georgi Gerganov 2025-11-16 09:50:26 +02:00
  • 5b2093becc server : handle context overflow during decode (#17267) b7078 Georgi Gerganov 2025-11-16 09:23:37 +02:00
  • 52e5d421f1 opencl: fix rms_norm_mul (#17250) b7077 lhez 2025-11-15 17:40:14 -08:00
  • 4db5641210 opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181) b7076 shaofeiqi 2025-11-15 17:33:10 -08:00
  • 72bd7321a7 sycl : unify unary kernels with a generic implementation and enable wide operator support (#17213) b7075 shani-f 2025-11-16 01:52:42 +02:00
  • 22e1ce2f81 webui: Fix clickability around chat processing statistics UI (#17278) Aleksander Grygier 2025-11-15 22:41:41 +01:00
  • 1411d9275a webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618) Pascal 2025-11-15 21:09:32 +01:00
  • 662192e1dc convert : remove unnecessary chat template patching (#17289) Sigbjørn Skjæret 2025-11-15 20:58:59 +01:00
  • 24dc769f1b vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287) b7071 Jeff Bolz 2025-11-15 12:54:23 -06:00
  • 4dca015b7e vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (#17285) b7070 Ruben Ortlam 2025-11-15 15:18:58 +01:00
  • 9a8860cf5d convert : use all parts in safetensors index (#17286) Sigbjørn Skjæret 2025-11-15 14:12:39 +01:00
  • 9d3ef4809f convert : set expert gating func in base class (#17279) Sigbjørn Skjæret 2025-11-15 14:06:24 +01:00
  • c7b7db0445 mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) b7067 Ankur Verma 2025-11-15 03:41:16 -08:00
  • 1568d13c2c vulkan: implement ABS and NEG (#17245) b7066 Giuseppe Scrivano 2025-11-15 12:00:29 +01:00
  • 439342ea0b vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244) b7065 Jeff Bolz 2025-11-15 04:56:15 -06:00
  • 234ae7d7bd vulkan: skip all-negative-inf blocks in FA (#17186) b7064 Jeff Bolz 2025-11-15 03:37:25 -06:00
  • 38eaf32af1 vulkan: change graph_compute to be async and enable get_tensor_async (#17158) b7063 Jeff Bolz 2025-11-15 02:06:41 -06:00
  • 9b17d74ab7 mtmd: add mtmd_log_set (#17268) b7062 Xuan-Son Nguyen 2025-11-14 15:56:19 +01:00
  • e1fcf8b09b model : add AfmoeForCausalLM support (#16477) b7061 Bartowski 2025-11-14 07:54:10 -05:00
  • 6cd0cf72ce fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048) b7060 Marek Hradil jr. 2025-11-14 13:35:26 +01:00
  • d396b43748 server : fix "can batch with" bug (#17263) b7059 Georgi Gerganov 2025-11-14 14:03:45 +02:00
  • 45c6ef7307 metal : support argsort for ne00 > 1024 (#17247) b7058 Georgi Gerganov 2025-11-14 09:36:06 +02:00
  • 2606b0adab metal : make the FA extra sizes consistent (#17143) b7057 Georgi Gerganov 2025-11-14 09:13:34 +02:00