Commit Graph

  • 4abef75f2c vulkan: Implement SOLVE_TRI (#17486) b7179 Jeff Bolz 2025-11-27 08:48:00 -06:00
  • c386114922 arch : add description about LLM_TENSOR_INFOS (#17550) b7178 Georgi Gerganov 2025-11-27 16:34:13 +02:00
  • e9d070980b sampling : remove backend sampling chain from common_sampler Daniel Bevenius 2025-11-27 15:28:37 +01:00
  • 6783b11fb0 models : fix LFM2 tensors (#17548) b7177 Georgi Gerganov 2025-11-27 16:04:29 +02:00
  • c6bba89ea9 arch : add description about LLM_TENSOR_INFOS gg/arch-add-desc Georgi Gerganov 2025-11-27 16:03:09 +02:00
  • 172208afbf sampling : add comments about backend sampler [no ci] Daniel Bevenius 2025-11-27 14:59:52 +01:00
  • d93ff58322 models : fix LFM2 tensors gg/lfm-fix-tensors Georgi Gerganov 2025-11-27 14:53:24 +02:00
  • 909072abcf cuda : fix UMA detection on discrete GPUs. (#17537) b7176 matt23654 2025-11-27 11:35:35 +00:00
  • cd8370b408 ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494) b7175 Alberto Cabrera Pérez 2025-11-27 11:25:14 +00:00
  • d21a76ac38 devops: Add build-essential to Ubuntu 26.04 image (#17531) Eric Curtin 2025-11-27 10:35:47 +00:00
  • 4fcd87cf7c gguf-py : skip endian-conversion of MXFP4 data (#17523) Aleksei Nikiforov 2025-11-27 11:35:38 +01:00
  • 5ea3be265b cuda : fix top-k compilation when CUB is unavailable Daniel Bevenius 2025-11-27 09:40:13 +01:00
  • 51107a0b63 sampling : fix temperature check to allow zero temperature Daniel Bevenius 2025-11-27 09:18:43 +01:00
  • d9d736102b sampling : use argmax for min-p sampling Daniel Bevenius 2025-11-27 07:38:44 +01:00
  • b78db3bd50 vulkan : move contiguous checks to device_supports_op (#17490) b7172 Acly 2025-11-27 06:54:19 +01:00
  • 142df17c9c vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (#17514) b7171 Jeff Bolz 2025-11-26 23:32:30 -06:00
  • e509411cf1 server: enable jinja by default, update docs (#17524) b7170 Xuan-Son Nguyen 2025-11-27 01:02:50 +01:00
  • 7cba58bbea opencl: add sqr, sqrt, mean and ssm_conv (#17476) b7169 lhez 2025-11-26 13:29:58 -08:00
  • 5449367b21 Fix chunks being too small with small matrix sizes (#17526) b7168 Alberto Cabrera Pérez 2025-11-26 21:14:54 +00:00
  • 1d594c295c clip: (minicpmv) fix resampler kq_scale (#17516) b7167 Han Qingzhe 2025-11-27 04:44:07 +08:00
  • 7c2bfb352e Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-26 17:52:29 +01:00
  • 90a3aff2c2 cuda : fix editorconfig-checker warning Daniel Bevenius 2025-11-26 17:44:04 +01:00
  • eec1e33a9e vulkan: allow graph_optimize for prompt processing workloads (#17475) b7166 Jeff Bolz 2025-11-26 09:46:33 -06:00
  • 879d673759 vulkan: Implement top-k (#17418) b7165 Jeff Bolz 2025-11-26 09:45:43 -06:00
  • 0f7805f32a common : add get_active_samplers function to check enabled samplers Daniel Bevenius 2025-11-26 13:12:36 +01:00
  • 4fea191c66 Use FetchContent over CPM as it's bundled with CMake Oliver Simons 2025-11-26 15:00:24 +01:00
  • 6ab4e50d9c ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448) b7164 xctan 2025-11-26 21:33:05 +08:00
  • 2336cc4784 cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520) b7163 Adrien Gallouët 2025-11-26 14:15:21 +01:00
  • e6923caaec ggml : fix ARM feature verification (#17519) b7162 Adrien Gallouët 2025-11-26 14:14:41 +01:00
  • 3e18dba9fd HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502) b7161 Jiacheng (Jason) Chen 2025-11-26 05:18:48 -05:00
  • b45d504e70 sampling : add min-p backend sampler Daniel Bevenius 2025-11-26 10:50:58 +01:00
  • eeb5605de2 CANN: Add MROPE and IMROPE support (#17401) b7160 hipudding 2025-11-26 16:44:19 +08:00
  • f3a848a3b1 chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513) b7159 o7si 2025-11-26 15:21:06 +08:00
  • b3b03a7baf vulkan: Implement GGML_OP_CUMSUM (#17479) b7158 Jeff Bolz 2025-11-26 00:08:10 -06:00
  • 05429433a1 examples: add model-backend-compare tool to compare intermediate device tensors with CPU reference 0cc4m/model-backend-compare 0cc4m 2025-11-25 18:05:56 +01:00
  • f23b306cc5 CUDA: Add top-k implementation Oliver Simons 2025-11-21 12:01:32 +01:00
  • ec047e12ee Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-25 15:16:44 +01:00
  • 583cb83416 ggml : add ggml_top_k (#17365) b7157 Georgi Gerganov 2025-11-25 15:31:43 +02:00
  • 05872ac885 convert : fix big-endian conversion (#17431) Aleksei Nikiforov 2025-11-25 14:18:16 +01:00
  • 9e5e09d087 sampling : remove backend-dist option (wip) Daniel Bevenius 2025-11-25 13:45:02 +01:00
  • 55ab25caf5 codeowners : remove slaren (#17492) Diego Devesa 2025-11-25 04:00:23 -08:00
  • 064c90d843 CANN: supports out_prod operator for F32 and F16 (#17406) b7154 TianHao324 2025-11-25 17:39:06 +08:00
  • 53dca56d9b Merge remote-tracking branch 'upstream/master' into gpu-sampling Daniel Bevenius 2025-11-25 08:20:50 +01:00
  • 0f17ccdee7 examples : add info about hybrid sampling in batched [no ci] Daniel Bevenius 2025-11-25 08:12:42 +01:00
  • b1846f1c8e webui: add rehype plugin to restore HTML in Markdown table cells (#17477) Pascal 2025-11-25 08:01:02 +01:00
  • d414db02d3 vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455) b7152 Jeff Bolz 2025-11-25 00:11:27 -06:00
  • 2b4c7927ee Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-25 06:10:33 +01:00
  • 877566d512 llama: introduce support for model-embedded sampling parameters (#17120) b7151 Aaron Teo 2025-11-25 09:56:07 +08:00
  • 3d07caa99b vulkan: more FA details in vk_perf_logger (#17443) b7150 Jeff Bolz 2025-11-24 15:25:24 -06:00
  • 134e6940ca llama : skip output reordering for single token batches (#17466) b7149 Daniel Bevenius 2025-11-24 21:06:17 +01:00
  • a02adf4211 sampling : add assertions for contiguous tensors in async copy functions Daniel Bevenius 2025-11-24 21:00:03 +01:00
  • 883a87043a samplers : add missing cont Georgi Gerganov 2025-11-24 21:46:57 +02:00
  • 0543f928a3 HIP: WMMA-MMQ kernels for RDNA 4 (#17156) b7148 Jiacheng (Jason) Chen 2025-11-24 14:00:10 -05:00
  • b26c7069fb common : initialize backend samplers Georgi Gerganov 2025-11-24 20:25:44 +02:00
  • e2d4f0829c llama-cli : fix dangling reference to sampler config Georgi Gerganov 2025-11-24 19:51:32 +02:00
  • d0bea21a3c examples : update batched to use backend sampling Daniel Bevenius 2025-11-24 16:37:22 +01:00
  • b61de2b2df convert : allow quantizing lora again (#17453) Sigbjørn Skjæret 2025-11-24 15:50:55 +01:00
  • 25f33806d3 sampling : add debug log when backend sampler selects token Daniel Bevenius 2025-11-24 15:03:41 +01:00
  • b8372eecd9 server: split server.cpp code into server/common/task/queue (#17362) b7146 Xuan-Son Nguyen 2025-11-24 14:41:53 +01:00
  • 6ab8eacddf examples : add -kvu to batched usage example [no ci] (#17469) Daniel Bevenius 2025-11-24 14:38:45 +01:00
  • 2d50b9d8cb sync : ggml b7144 Georgi Gerganov 2025-11-24 14:28:37 +02:00
  • 697edfeead ggml : remove dirty flag from version string (ggml/1391) Daniel Bevenius 2025-11-24 12:51:50 +01:00
  • 8eb9b4769d sampling : remove redundant checks for stride and size [no ci] Daniel Bevenius 2025-11-24 13:53:29 +01:00
  • 4a90583d7d sampling : cleanup and clarify output_reserve Daniel Bevenius 2025-11-24 13:26:18 +01:00
  • dbb852b549 ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739) b7142 Alberto Cabrera Pérez 2025-11-24 11:08:11 +00:00
  • 5f55c385cb ggml: add RISC-V cpu-feats (#17461) b7141 ixgbe 2025-11-24 19:07:14 +08:00
  • 72f80499ee server : headers cleanup gg/tmp Georgi Gerganov 2025-11-24 10:43:56 +02:00
  • d88ba1813c common : remove build-info.cpp from commit [no ci] Daniel Bevenius 2025-11-24 09:31:14 +01:00
  • 7816f0bb56 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-24 07:44:06 +01:00
  • 50d21aa4a4 tests : cleanup test-backend-sampler.cpp Daniel Bevenius 2025-11-24 07:18:39 +01:00
  • 4902eebe33 models : Added support for RND1 Diffusion Language Model (#17433) b7140 william pan 2025-11-23 22:16:56 -08:00
  • 923ae3c619 hexagon: add support for ROPE_NEOX (#17458) b7139 Max Krasnyansky 2025-11-23 18:55:56 -08:00
  • 01ad35e6d6 CANN: Define cann_graph_update_required before macro (#17434) b7138 Raul Torres 2025-11-24 02:02:52 +00:00
  • fcb013847c ggml-hexagon: Initial Hexagon v68/v69 support (#17394) b7137 M. Mediouni 2025-11-24 01:54:49 +01:00
  • d5bc1ad110 ggml-hexagon: add hex_supported_buffer for better buffer supported check (#17212) b7136 nullname 2025-11-24 06:26:36 +08:00
  • 0c7220db56 webui: minor settings reorganization and add disable autoscroll option (#17452) Pascal 2025-11-23 18:42:00 +01:00
  • 9e273f7aa4 sampling : fix copying both sampled tokens and logits/probs from backend Daniel Bevenius 2025-11-23 13:08:08 +01:00
  • ae23d2d2c1 sampling: clarify candidate ids usage in comments Daniel Bevenius 2025-11-23 11:28:19 +01:00
  • 65500d05ab sampling : add stride variable for clarity Daniel Bevenius 2025-11-23 11:27:54 +01:00
  • 96ac5a2329 cuda : support non-contiguous i32 to i32 copy (#17326) b7134 Sigbjørn Skjæret 2025-11-23 11:13:34 +01:00
  • bc809e9c53 vulkan: Update docker image to Ubuntu 26.04 to enable glslc features (#17439) Eric Curtin 2025-11-23 09:29:36 +00:00
  • 722f9defe9 vulkan: intel mmv fix attempt 0cc4m/vulkan-intel-mmv-fix 0cc4m 2025-11-23 10:13:19 +01:00
  • 54d83bbe85 vulkan: remove a couple unnecessary switches (#17419) b7132 Jeff Bolz 2025-11-22 23:29:40 -06:00
  • 4949ac0f18 ci : switch to BoringSSL on Server workflow (#17441) b7131 Adrien Gallouët 2025-11-22 21:38:19 +01:00
  • 3f3a4fb9c3 Revive MUL_MAT_ID to perf testing (#17397) b7130 Masato Nakasaka 2025-11-22 18:55:43 +09:00
  • 8174e29b0e release: fix linting Aaron Teo 2025-11-22 13:59:24 +08:00
  • 49d4164952 release: fix duplicate libs, store symbolic links Aaron Teo 2025-11-16 19:51:53 +08:00
  • d6abfe8c84 release: add deprecation notice to release.yml fix-release-duplicate-libs-b7083-d6abfe8 Aaron Teo 2025-11-22 11:52:11 +08:00
  • 028f93ef98 HIP: RDNA4 tensor core support for MMF (#17077) b7129 yulo 2025-11-22 07:03:24 +08:00
  • 8e9ddba610 opencl: refine condition for kqv mm (#17392) b7128 lhez 2025-11-21 14:34:48 -08:00
  • 79b8cf2a75 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-21 16:38:32 +01:00
  • 79bfc1c0d3 release: rm gunzip Aaron Teo 2025-11-21 22:49:48 +08:00
  • dec86a0c8c release: add .tar release Aaron Teo 2025-11-21 22:45:07 +08:00
  • 23bc779a6e model : detect GigaChat3-10-A1.8B as deepseek lite (#17420) b7127 ubergarm 2025-11-21 08:51:38 -05:00
  • 9b2439347f common, tools : refactor model loading to support backend samplers Daniel Bevenius 2025-11-21 14:26:52 +01:00
  • 61ffe41dc1 sampling : use pinned memory for backend sampling buffers Daniel Bevenius 2025-11-21 14:02:16 +01:00
  • 28175f857d cmake : add option to build and link BoringSSL (#17205) b7126 Adrien Gallouët 2025-11-21 11:46:45 +01:00
  • 9cc4080441 ci : start using OpenSSL (#17235) Adrien Gallouët 2025-11-21 11:45:00 +01:00
  • f1ffbba68e vulkan: disable async for older Intel devices (#17369) b7124 Jeff Bolz 2025-11-21 02:58:17 -06:00
  • 2370665e56 CANN: Refactor evaluate_and_capture_cann_graph (#17333) b7123 Raul Torres 2025-11-21 08:23:29 +00:00