Commit Graph

  • 238856ec8f ggml webgpu: shader library organization (#19530) b8091 Reese Levine 2026-02-18 07:51:02 -07:00
  • ea003229d3 Pre-MCP UI and architecture cleanup (#19689) Aleksander Grygier 2026-02-18 12:02:02 +01:00
  • d0061be838 vulkan: split mul_mat into multiple dispatches to avoid overflow (#19509) b8089 Jeff Bolz 2026-02-18 01:47:10 -08:00
  • 5d45884106 metal : fix build JohannesGaessler/ggml-meta-backend-8-tmp Georgi Gerganov 2026-02-18 09:14:31 +02:00
  • a569bda445 common : make small string helpers as inline functions (#19693) b8088 Adrien Gallouët 2026-02-18 08:03:01 +01:00
  • e2f19b320f opencl: refactor expm1 and softplus (#19404) b8087 shaofeiqi 2026-02-17 14:47:18 -08:00
  • 983559d24b opencl: optimize mean and sum_row kernels (#19614) b8086 shaofeiqi 2026-02-17 13:56:09 -08:00
  • 2b089c7758 model-conversion : add option to print tensor values (#19692) Daniel Bevenius 2026-02-17 20:43:22 +01:00
  • afa6bfe4f7 Pre-MCP UI and architecture cleanup (#19685) Aleksander Grygier 2026-02-17 13:47:45 +01:00
  • ae2d3f28a8 ggml: ggml-cpu: force-no-lto-for-cpu-feats (#19609) b8083 Talha Can Havadar 2026-02-17 12:22:46 +01:00
  • ad8207af77 cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645) b8082 Georgi Gerganov 2026-02-17 12:31:49 +02:00
  • 667b694278 model-conversion : make printing of config values optional (#19681) Daniel Bevenius 2026-02-17 10:46:53 +01:00
  • e48349a49d ci : bump komac version (#19682) Sigbjørn Skjæret 2026-02-17 09:30:31 +01:00
  • ae46a61e41 build : link ws2_32 as PUBLIC on Windows (#19666) b8079 Adrien Gallouët 2026-02-17 08:37:07 +01:00
  • 65cede7c70 build : cleanup library linking logic (#19665) b8078 Adrien Gallouët 2026-02-17 08:36:45 +01:00
  • 05fa625eac convert : add JoyAI-LLM-Flash (#19651) b8077 DAN™ 2026-02-16 16:49:57 -05:00
  • d612901116 perplexity: add proper batching (#19661) b8076 AesSedai 2026-02-16 08:44:44 -08:00
  • cceb1b4e33 common : inline functions (#18639) b8075 Ivan Chikish 2026-02-16 18:52:24 +03:00
  • d23a55997d ggml : make ggml_is_view as API (#19539) b8074 Judd 2026-02-16 23:43:34 +08:00
  • 5f28c53d11 model: Add support for Tiny Aya Models (#19611) b8073 Saurabh Dash 2026-02-16 10:28:46 -05:00
  • 4408494144 build : rework llama_option_depr to handle LLAMA_CURL (#19658) b8072 Adrien Gallouët 2026-02-16 16:06:48 +01:00
  • f0198ef6fc Merge pull request #6 from gaugarg-nv/get_host_buffer_type Johannes Gäßler 2026-02-16 15:11:08 +01:00
  • 2ba9adc093 Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (#19591) b8071 Mario Limonciello 2026-02-16 07:46:08 -06:00
  • cc45f2ada6 models : deduplicate delta-net graphs for Qwen family (#19597) b8070 Georgi Gerganov 2026-02-16 14:35:04 +02:00
  • aa8b62105c Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Gaurav Garg 2026-02-16 15:39:26 +05:30
  • d5dfc33027 graph : fix KQ mask, lora, cvec reuse checks (#19644) b8069 Georgi Gerganov 2026-02-16 09:21:11 +02:00
  • 267ba5a1d9 ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132) b8068 abhijain1204fujitsu 2026-02-16 12:08:43 +05:30
  • ff4affb4c1 sync : ggml b8067 Georgi Gerganov 2026-02-15 22:23:13 +02:00
  • 55d58599c8 ggml : bump version to 0.9.7 (ggml/1425) Georgi Gerganov 2026-02-15 22:21:04 +02:00
  • 1a8c700bfd ggml : bump version to 0.9.6 (ggml/1423) Georgi Gerganov 2026-02-07 09:58:02 +02:00
  • 27b93cbd15 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624) b8064 David Friehs 2026-02-15 18:08:42 +01:00
  • 6e67fd2144 docs: update s390x build docs (#19643) Aaron Teo 2026-02-16 00:33:34 +08:00
  • 9e118b97c4 build : remove LLAMA_HTTPLIB option (#19623) b8062 Adrien Gallouët 2026-02-15 15:38:50 +01:00
  • 57088276d4 cmake : check if KleidiAI API has been fetched (#19640) b8061 Daniel Bevenius 2026-02-15 13:59:38 +01:00
  • 341bc7d23c context : fix output reorder with backend sampling (#19638) b8060 Georgi Gerganov 2026-02-15 14:57:40 +02:00
  • 08e6d914b8 ggml : avoid UB in gemm ukernel (#19642) b8059 Georgi Gerganov 2026-02-15 14:56:35 +02:00
  • 184c694f45 ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (#19399) b8058 Aaron Teo 2026-02-15 18:20:35 +08:00
  • 684b36101c ggml-cpu: FA add GEMM microkernel (#19422) b8057 Aman Gupta 2026-02-15 11:09:24 +05:30
  • 3a00c98584 cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (#19581) b8056 SamareshSingh 2026-02-14 23:22:53 -06:00
  • 079feab9e3 convert : ensure all models handle new experts count (#19621) b8055 Sigbjørn Skjæret 2026-02-14 22:22:32 +01:00
  • 01d8eaa28d mtmd : Add Nemotron Nano 12B v2 VL support (#19547) b8054 Anav Prasad 2026-02-14 05:07:00 -08:00
  • 1725e316c1 models : optimize qwen3next graph (#19375) b8053 Georgi Gerganov 2026-02-14 12:57:36 +02:00
  • b7742cf321 ggml : fix GGML_DEBUG with OpenMP (#19599) b8052 Adrien Gallouët 2026-02-14 11:22:57 +01:00
  • badba89320 NetBSD build support (#19589) b8051 iMil 2026-02-14 09:47:01 +01:00
  • baa12f3831 webui: Architecture and UI improvements (#19596) Aleksander Grygier 2026-02-14 09:06:41 +01:00
  • 2d8015e8a4 llama : update LoRA API. + fix excessive graph reserves (#19280) b8049 agent-enemy-2 2026-02-14 03:06:27 -05:00
  • eb145c0753 mmap: Fix Windows handle lifetime (#19598) b8048 George 2026-02-14 10:05:12 +02:00
  • 6e473fb384 metal : fix ACC op (#19427) b8047 Georgi Gerganov 2026-02-14 09:54:03 +02:00
  • c7db95f106 scripts : use official split.py for cpp-httplib (#19588) b8046 Adrien Gallouët 2026-02-14 08:41:16 +01:00
  • 0d00ef65ed convert : store ffn_gate_inp_shexp as F32 (#19606) Sigbjørn Skjæret 2026-02-14 08:17:43 +01:00
  • 91ea5d67f2 build : fix libtool call in build-xcframework.sh (#19605) Adrien Gallouët 2026-02-14 06:48:37 +01:00
  • dbb023336b vulkan: support L2_NORM with contiguous rows (#19604) b8043 Jeff Bolz 2026-02-13 21:42:04 -08:00
  • 53aef25a88 vulkan: support GGML_OP_SET (#19584) b8042 Jeff Bolz 2026-02-13 21:36:38 -08:00
  • 2dec548094 vulkan: Add vendor id for Qualcomm drivers (#19569) b8041 Sophon 2026-02-14 13:29:17 +08:00
  • 0ccbfdef3e hexagon: further optimizations and refactoring for flash attention (#19583) b8040 Max Krasnyansky 2026-02-13 16:27:30 -08:00
  • 94a602db66 github : add missing backends to issue templates (#19603) Mengsheng Wu 2026-02-13 15:56:53 -08:00
  • 05a6f0e894 vulkan: restore -inf check in FA shaders (#19582) b8038 Jeff Bolz 2026-02-13 11:35:29 -08:00
  • fd24533e89 better granularity estimate Johannes Gäßler 2026-02-13 18:20:44 +01:00
  • d8f97b99ed fix compilation Johannes Gäßler 2026-02-13 15:13:40 +01:00
  • b48e80f677 common : update download code (#19573) b8037 Adrien Gallouët 2026-02-13 15:10:46 +01:00
  • 752584d5f5 model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) (#19460) b8036 Xuan-Son Nguyen 2026-02-13 14:56:53 +01:00
  • 98ab6727e4 arbitrary num. of GPUs/tensor split Johannes Gäßler 2026-02-13 11:45:05 +01:00
  • cc2aa81513 Fix wrong memcpy length for block_interleave == 4 (#19575) b8035 Alberto Cabrera Pérez 2026-02-13 12:32:14 +00:00
  • 0e21991472 fix vulkan ggml_acc only works in 3d but not 4d (#19426) b8034 ymcki 2026-02-13 20:31:37 +08:00
  • b2ecc0cdb4 support --verbose-prompt (#19576) b8033 Sigbjørn Skjæret 2026-02-13 12:49:10 +01:00
  • 5065da554e CUDA: loop over ne2*ne3 in case it overflows (#19538) b8032 Aman Gupta 2026-02-13 17:01:40 +05:30
  • 5174d7206f webui: UI and routing fixes (#19586) Aleksander Grygier 2026-02-13 12:31:00 +01:00
  • 9c7d45c0fc fix view_offs scaling Johannes Gäßler 2026-02-13 11:05:57 +01:00
  • 43919b7f4f CUDA: Do not mutate cgraph for fused ADDs (#19566) b8030 Oliver Simons 2026-02-13 10:37:55 +01:00
  • 423cf0b26f docs : fix broken link and typo (#19560) Pavan Shinde 2026-02-13 14:08:09 +05:30
  • 33a56f90a6 model : Kimi Linear fix conv state update (#19531) b8028 ymcki 2026-02-13 16:10:18 +08:00
  • 25224c8021 llama : remove deprecated codecvt (#19565) b8027 Adrien Gallouët 2026-02-13 06:43:53 +01:00
  • 2f5d8f8edc vendor : update BoringSSL to 0.20260211.0 (#19562) b8026 Adrien Gallouët 2026-02-13 06:43:26 +01:00
  • bb96bfd361 memory : fix kv cache size for hybrid models (#19559) b8025 Georgi Gerganov 2026-02-13 07:36:24 +02:00
  • 0644baefde metal : improve concurrency (#19555) b8024 Georgi Gerganov 2026-02-13 07:35:57 +02:00
  • 490eb96b88 metal : support GGML_OP_SET (#19548) b8023 Georgi Gerganov 2026-02-13 07:34:52 +02:00
  • 31e4f189bb support for tensor dims % n_devs != 0 Johannes Gäßler 2026-02-11 23:34:43 +01:00
  • 3bb78133ab hexagon: fix typo in vtcm_needs_release (#19545) b8022 Shupei Fan 2026-02-13 07:07:49 +08:00
  • 79cc0f2daf opencl: add basic support for q4_1 (#19534) b8021 lhez 2026-02-12 14:52:37 -08:00
  • 338085c69e args : add -kvu to llama-parallel (#19577) b8020 Georgi Gerganov 2026-02-12 21:52:41 +02:00
  • 5da56dc1d8 args : add -kvu to llama-parallel pr/19378 Georgi Gerganov 2026-02-12 21:50:01 +02:00
  • f8feadb20f metal : fix build Georgi Gerganov 2026-02-12 21:49:52 +02:00
  • 4c61875bf8 webui: Add switcher to Chat Message UI to show raw LLM output (#19571) Aleksander Grygier 2026-02-12 19:55:51 +01:00
  • 4b385bfcf8 vendor : update cpp-httplib (#19537) b8018 Adrien Gallouët 2026-02-12 16:11:22 +01:00
  • f488429380 llama : update outdated comment in llama.h (#19428) b8017 Christian Schmitz 2026-02-12 15:52:57 +01:00
  • b12a56351d Merge pull request #4 from gaugarg-nv/minor_fixes Johannes Gäßler 2026-02-12 14:19:13 +01:00
  • 9bb9d78368 Apply suggestion from @JohannesGaessler Johannes Gäßler 2026-02-12 14:18:49 +01:00
  • 10385e8fb8 Fix the seg fault without NCCL Gaurav Garg 2026-02-12 18:29:01 +05:30
  • 4d688f9ebb (webui) FEATURE: Enable adding or injecting System Message into chat (#19556) Aleksander Grygier 2026-02-12 13:56:08 +01:00
  • ff599039a9 scripts : add support for forks in pr2wt.sh (#19540) Daniel Bevenius 2026-02-12 13:14:28 +01:00
  • f486ce9f30 (webui) REFACTOR: UI primitives and polish (#19551) Aleksander Grygier 2026-02-12 12:21:00 +01:00
  • 38adc7d469 WebUI Architecture Cleanup (#19541) Aleksander Grygier 2026-02-12 11:22:27 +01:00
  • 3b3a948134 metal : update sum_rows kernel to support float4 (#19524) b8012 Georgi Gerganov 2026-02-12 11:35:28 +02:00
  • 6845f7f87f Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461) b8011 Mario Limonciello 2026-02-12 02:38:35 -06:00
  • fa16e517a3 server : fix typo in README.md for features list (#19510) RichardScottOZ 2026-02-12 18:26:25 +10:30
  • 313493de53 docs : update path in snapdragon README.md (#19533) TriDefender 2026-02-12 15:13:51 +08:00
  • b1ff83bbb0 hexagon: further optimization and tuning of matmul and dot kernels (#19407) b8008 Max Krasnyansky 2026-02-11 23:04:27 -08:00
  • 4ae1b7517a common : replace deprecated codecvt using parse_utf8_codepoint (#19517) b8007 Adrien Gallouët 2026-02-12 07:27:52 +01:00
  • 3fdd0b7a6e 2d tensor set/get support Johannes Gäßler 2026-02-11 17:42:51 +01:00
  • 4d3daf80f8 opencl: add general Q6_K mm and Q4_K mv (#19347) b8006 lhez 2026-02-11 10:33:13 -08:00