Commit Graph

  • 4dca015b7e vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (#17285) b7070 Ruben Ortlam 2025-11-15 15:18:58 +01:00
  • 9a8860cf5d convert : use all parts in safetensors index (#17286) Sigbjørn Skjæret 2025-11-15 14:12:39 +01:00
  • 9d3ef4809f convert : set expert gating func in base class (#17279) Sigbjørn Skjæret 2025-11-15 14:06:24 +01:00
  • c7b7db0445 mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) b7067 Ankur Verma 2025-11-15 03:41:16 -08:00
  • 1568d13c2c vulkan: implement ABS and NEG (#17245) b7066 Giuseppe Scrivano 2025-11-15 12:00:29 +01:00
  • 439342ea0b vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244) b7065 Jeff Bolz 2025-11-15 04:56:15 -06:00
  • 234ae7d7bd vulkan: skip all-negative-inf blocks in FA (#17186) b7064 Jeff Bolz 2025-11-15 03:37:25 -06:00
  • 38eaf32af1 vulkan: change graph_compute to be async and enable get_tensor_async (#17158) b7063 Jeff Bolz 2025-11-15 02:06:41 -06:00
  • 9b17d74ab7 mtmd: add mtmd_log_set (#17268) b7062 Xuan-Son Nguyen 2025-11-14 15:56:19 +01:00
  • e1fcf8b09b model : add AfmoeForCausalLM support (#16477) b7061 Bartowski 2025-11-14 07:54:10 -05:00
  • 6cd0cf72ce fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048) b7060 Marek Hradil jr. 2025-11-14 13:35:26 +01:00
  • d396b43748 server : fix "can batch with" bug (#17263) b7059 Georgi Gerganov 2025-11-14 14:03:45 +02:00
  • 45c6ef7307 metal : support argsort for ne00 > 1024 (#17247) b7058 Georgi Gerganov 2025-11-14 09:36:06 +02:00
  • 2606b0adab metal : make the FA extra sizes consistent (#17143) b7057 Georgi Gerganov 2025-11-14 09:13:34 +02:00
  • 307772fcda readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V (#17259) ixgbe 2025-11-14 15:12:56 +08:00
  • f1bad23f88 Better UX for handling multiple attachments in WebUI (#17246) b7055 Aleksander Grygier 2025-11-14 01:19:08 +01:00
  • becc4816dd ggml-cpu: handle 3d tensors in repack mat_mul (#17241) b7054 Alberto Cabrera Pérez 2025-11-13 20:53:00 +00:00
  • c4abcb2457 server: fixing naming conflict res_error (#17243) b7053 Xuan-Son Nguyen 2025-11-13 20:53:47 +01:00
  • 389ac78b26 ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063) b7052 Piotr Wilkin (ilintar) 2025-11-13 19:54:47 +01:00
  • a19bd6f7ce vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (#17219) b7051 Ruben Ortlam 2025-11-13 14:51:21 +01:00
  • dd091e52f8 sched : fix reserve ignoring user tensor assignments (#17232) b7050 Diego Devesa 2025-11-13 04:14:02 -08:00
  • 1215dde7b0 ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (#17227) b7049 ixgbe 2025-11-13 20:13:32 +08:00
  • 0cfb19166b metal: accelerated conv2d (#17175) b7048 bagheera 2025-11-13 05:32:44 -06:00
  • 2776db6c81 Revert "ggml-cpu: handle 3d tensors in repack mat_mul (#17030)" (#17233) b7047 Georgi Gerganov 2025-11-13 12:59:37 +02:00
  • 879dec341a ggml-cpu : use template for argsort (#17222) b7046 Diego Devesa 2025-11-13 00:59:05 -08:00
  • 97d5117217 CANN: Add cross_entropy_loss op support (#16886) b7045 TecJesh 2025-11-13 09:39:51 +08:00
  • a90eb94ca9 CUDA: fuse rope + set_rows (#16884) b7044 Aman Gupta 2025-11-13 08:50:01 +08:00
  • 07751f8d44 update SYCL support OPs (#17208) Neo Zhang Jianyu 2025-11-13 08:42:23 +08:00
  • ffb6f3d921 vocab : correct bounds check for UGM XCDA array access (#17215) b7042 o7si 2025-11-13 06:41:02 +08:00
  • 5d6838b74f CUDA: static assert to prevent misuse of memcpy_1 (#17198) b7041 Johannes Gäßler 2025-11-12 23:13:55 +01:00
  • 92bb442ad9 docker : preserve .so symlinks for docker container builds (#17214) Mike Abbott 2025-11-12 12:33:55 -07:00
  • 374fe09cdd ggml : use std::sort in ggml_argsort CPU implementation (#17211) b7039 Georgi Gerganov 2025-11-12 20:43:38 +02:00
  • 8e878f0cb4 Update packages + upgrade Storybook to v10 (#17201) Aleksander Grygier 2025-11-12 19:01:48 +01:00
  • 00c94083b3 server: (refactor) implement generator-based API for task results (#17174) b7037 Xuan-Son Nguyen 2025-11-12 18:50:52 +01:00
  • 017eceed61 ci: add check vendor job (#17179) Xuan-Son Nguyen 2025-11-12 14:56:02 +01:00
  • ee8dd5c658 server: move res_error/res_ok to static function (#17167) b7035 Xuan-Son Nguyen 2025-11-12 14:17:24 +01:00
  • 1c398dc9ec ggml-cpu: handle 3d tensors in repack mat_mul (#17030) b7034 Alberto Cabrera Pérez 2025-11-12 12:52:19 +00:00
  • 52cf111b31 cmake : cleanup (#17199) b7033 Adrien Gallouët 2025-11-12 13:48:30 +01:00
  • 78010a0d52 cmake : move OpenSSL linking to vendor/cpp-httplib (#17177) b7032 Adrien Gallouët 2025-11-12 12:32:50 +01:00
  • 655cddd174 CANN: Add L2_NORM op support (#16856) b7031 TecJesh 2025-11-12 15:11:42 +08:00
  • 5da7664960 [SYCL]fix ci crash about SSM_CONV (#17169) b7030 Neo Zhang Jianyu 2025-11-12 14:44:29 +08:00
  • 23a46ce972 CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled (#16861) Raul Torres 2025-11-12 06:37:52 +00:00
  • c273d75375 hexagon: various Op fixes (#17135) b7028 Max Krasnyansky 2025-11-11 15:25:04 -08:00
  • 7d019cff74 disable rms norm mul rope for chips with no fp16 rte (#17134) b7027 Eve 2025-11-11 18:53:30 +00:00
  • 3fe36c3238 ci: add Arm-hosted Graviton4 runner (#17021) sudhiarm 2025-11-11 15:58:05 +00:00
  • 1d45b4228f vendor: split httplib to cpp/h files (#17150) b7025 Xuan-Son Nguyen 2025-11-11 13:32:58 +01:00
  • ca4844062b ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (#17161) b7024 ixgbe 2025-11-11 19:41:51 +08:00
  • 73460f6278 ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805) b7023 duduta 2025-11-11 13:33:24 +02:00
  • 8c583242ad kleidiai: add optimized per-channel kernels for Q8_0 (#16993) b7022 Charles Xu 2025-11-11 12:20:31 +01:00
  • 4a5b8aff40 cmake : add version to all shared object files (#17091) b7021 Mike Abbott 2025-11-11 04:19:50 -07:00
  • d2d626938a Install rpc-server when GGML_RPC is ON. (#17149) b7020 Nicolas B. Pierron 2025-11-11 11:53:59 +01:00
  • 2fc392ce35 convert : register UMT5Model architecture for T5 conversion (#17160) levkropp 2025-11-11 03:38:30 -05:00
  • ece0f5c177 opencl: add fastdiv and use it in set_rows, ported from cuda (#17090) b7018 lhez 2025-11-10 15:00:13 -08:00
  • 7bef684118 models : move build_inp_out_ids outside loop (#17151) b7017 Sigbjørn Skjæret 2025-11-10 22:55:30 +01:00
  • 395e286bc9 cpu: skip NOPs to avoid barriers (#17133) b7016 Max Krasnyansky 2025-11-10 12:44:49 -08:00
  • 13730c183b metal : cap threadgroups size of set_rows (#17146) b7015 Georgi Gerganov 2025-11-10 21:33:35 +02:00
  • 967eb4b2bf ggml-cpu : inspect -march and -mcpu to found the CPU (#16333) b7014 Adrien Gallouët 2025-11-10 20:03:36 +01:00
  • f117be185e vulkan: check glslc executable string (#17144) b7013 Ruben Ortlam 2025-11-10 16:59:26 +01:00
  • 85234a4b3a vulkan: fix validation issue introduced by #16868 (#17145) b7012 Ruben Ortlam 2025-11-10 16:59:10 +01:00
  • 0c74f32632 memory: Hybrid context shift (#17009) b7011 Gabe Goodhart 2025-11-10 08:14:23 -07:00
  • e6dbc81569 metal : cap threadgroups size of set_rows gg/metal-set-rows-threads Georgi Gerganov 2025-11-10 16:17:09 +02:00
  • c27efd2bd1 metal : enable tensor API for A19 (#17087) b7010 Georgi Gerganov 2025-11-10 15:38:42 +02:00
  • df70bedda7 arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277) b7009 fj-y-saito 2025-11-10 22:12:59 +09:00
  • 3ad533689c ggml : remove KQ mask padding gg/fa-no-kq-pad Georgi Gerganov 2025-09-28 18:10:17 +03:00
  • f914544b16 batched-bench : add "separate text gen" mode (#17103) b7008 Georgi Gerganov 2025-11-10 12:59:29 +02:00
  • 4b13a684c5 mtmd: fix patch_size initialized to random value in audio models (#17128) b7007 Xuan-Son Nguyen 2025-11-10 11:41:05 +01:00
  • 9898b57cbe editorconfig : ignore benches/ (#17140) Georgi Gerganov 2025-11-10 12:17:19 +02:00
  • 1032256ec9 cuda/vulkan : bicubic interpolation (#17022) b7005 Acly 2025-11-10 10:19:39 +01:00
  • 15274c0c50 benches : add eval results (#17139) Georgi Gerganov 2025-11-10 10:44:10 +02:00
  • b8595b16e6 mtmd : fix embedding size for image input (#17123) b7003 Georgi Gerganov 2025-11-09 18:31:02 +02:00
  • 392e09a608 vulkan: fix memory allocations (#17122) b7002 Ruben Ortlam 2025-11-09 16:14:41 +01:00
  • 802cef44bf convert : parse safetensors directly (#15667) compilade 2025-11-09 09:49:40 -05:00
  • 1c07c0c68c convert : handle compressed-tensors quant method (#17069) compilade 2025-11-09 09:45:50 -05:00
  • cb1adf8851 server : handle failures to restore host cache (#17078) b6999 Georgi Gerganov 2025-11-09 14:27:05 +02:00
  • ef1d826997 benches : add folder with benchmarks (#16931) Georgi Gerganov 2025-11-09 12:53:29 +02:00
  • 86fde91e62 Switch to using Ubuntu 25.10 vulkan/mesa (#16497) Eric Curtin 2025-11-09 09:25:38 +00:00
  • 7f3e9d339c vulkan: iGPU memory reporting fix (#17110) b6996 Ruben Ortlam 2025-11-09 09:54:47 +01:00
  • 8a3519b708 vulkan: fix mmq out of bounds reads (#17108) b6995 Ruben Ortlam 2025-11-09 09:52:57 +01:00
  • 80a6cf6347 vulkan: fuse mul_mat_id + mul (#17095) b6994 Jeff Bolz 2025-11-09 02:48:42 -06:00
  • 0750a59903 metal : retain src and dst buffers during async ops (#17101) b6993 Georgi Gerganov 2025-11-09 08:28:51 +02:00
  • aa3b7a90b4 arg: add --cache-list argument to list cached models (#17073) b6992 Xuan-Son Nguyen 2025-11-08 21:54:14 +01:00
  • 333f2595a3 webui: fix keyboard shortcuts for new chat & edit chat title (#17007) chansikpark 2025-11-08 14:52:35 -05:00
  • 53d7d21e61 vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978) b6990 Jeff Bolz 2025-11-08 13:24:29 -06:00
  • eeee367de5 server: fix correct time_ms calculation in prompt_progress (#17093) b6989 Aidan 2025-11-08 13:12:11 +00:00
  • 64fe17fbb8 Revert "CUDA: add expert reduce kernel (#16857)" (#17100) b6988 Aman Gupta 2025-11-08 21:05:19 +08:00
  • c1b187688d CUDA: skip fusion for repeating adds in bias (#17080) b6987 Aman Gupta 2025-11-08 16:58:05 +08:00
  • b8a5cfd11a vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636) b6986 SavicStefan 2025-11-08 09:28:22 +01:00
  • 08416ebe7f ggml: disable vxe for cross-compilation by default (#16966) b6985 Aleksei Nikiforov 2025-11-08 09:00:20 +01:00
  • b4e335d8dc vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977) b6984 Jeff Bolz 2025-11-08 01:52:15 -06:00
  • d6fe40fa00 vulkan: Fix test-thread-safety crashes (#17024) b6983 Jeff Bolz 2025-11-08 01:39:45 -06:00
  • e14e842e87 CUDA: fix MMQ stream-k fixup ne1 indices (#17089) b6982 Johannes Gäßler 2025-11-08 08:26:18 +01:00
  • 647b960bd8 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) b6981 Reese Levine 2025-11-07 19:27:20 -08:00
  • 299f5d782c CUDA: properly handle nb00=nb02 case for cpy (#17081) b6980 bssrdf 2025-11-07 17:41:58 -05:00
  • ac76d36201 vulkan : refactor buffer handling in vk_op_f32 (#16840) b6979 Acly 2025-11-07 21:08:50 +01:00
  • 6515610506 CUDA: fix should_use_mmvf for ne11 == 1 (#17085) b6978 Johannes Gäßler 2025-11-07 20:53:14 +01:00
  • 7956bb4d7f bench : cache the llama_context state at computed depth (#16944) b6977 Georgi Gerganov 2025-11-07 21:23:11 +02:00
  • 9008027aa3 hparams : add n_embd_inp() to support extended embed (#16928) b6976 Sigbjørn Skjæret 2025-11-07 19:27:58 +01:00
  • 16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046) b6975 Georgi Gerganov 2025-11-07 20:03:25 +02:00
  • 9eb9a1331d Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084) b6974 Adrien Gallouët 2025-11-07 17:34:05 +01:00
  • 7c23f3f0d4 ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239) b6973 iron 2025-11-08 00:18:14 +08:00