Commit Graph

  • 0a540f9abd ci : add windows-cuda 13.1 release (#17839) b7313 Sigbjørn Skjæret 2025-12-07 14:02:04 +01:00
  • 22577583a3 common : change --color to accept on/off/auto, default to auto (#17827) b7312 Sigbjørn Skjæret 2025-12-07 03:43:50 +01:00
  • d9e03db1e7 sycl: add missing BF16 conversion support for Intel oneAPI (#17780) b7311 Law Po Ying 2025-12-07 09:18:18 +08:00
  • db97837385 vulkan: perf_logger improvements (#17672) b7310 Jeff Bolz 2025-12-06 11:46:46 -06:00
  • 017761daf5 ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) Vishal Singh 2025-12-06 21:43:33 +05:30
  • 52258181da tests : fix memory leaks Georgi Gerganov 2025-12-06 17:11:15 +02:00
  • fdac9686f7 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-06 16:55:33 +02:00
  • c42712b056 server: support multiple generations from one prompt (OAI "n" option) (#17775) Xuan-Son Nguyen 2025-12-06 15:54:38 +01:00
  • 30742a6ff5 sampling : expand support (wip) Georgi Gerganov 2025-12-05 22:02:48 +02:00
  • 09c7c50e64 ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985) b7307 Phylliida Dev 2025-12-06 06:07:02 -08:00
  • f334b79494 HIP: fix RDNA3 FP16/BF16 matrix multiplication (#17817) b7306 Johannes Gäßler 2025-12-06 13:45:36 +01:00
  • a28e3c7567 webui: Stop generation from chat sidebar (#17806) Aleksander Grygier 2025-12-06 13:29:15 +01:00
  • e31b5c55c3 webui: Fix context available value in Multi-model Router mode (#17804) Aleksander Grygier 2025-12-06 13:23:29 +01:00
  • 21f24f27a9 webui: Per-conversation system message with UI displaying, edition & branching (#17275) Aleksander Grygier 2025-12-06 13:19:05 +01:00
  • 7b43f55753 ggml : improve error handling for search path existence checks (#17653) b7302 Sky 2025-12-06 19:28:16 +08:00
  • 444f00b0ec llama : remove quantization sanity check (#17788) b7301 Daniel Bevenius 2025-12-06 12:26:20 +01:00
  • 2960eb2975 vulkan: Use one row per workgroup for f32 mmv (#17711) b7300 Jeff Bolz 2025-12-06 04:12:26 -06:00
  • dbc15a7967 convert: support Mistral 3 Large MoE (#17730) Xuan-Son Nguyen 2025-12-06 10:49:33 +01:00
  • c6c5e85979 vulkan: support solve_tri with larger N/K values (#17781) b7298 Jeff Bolz 2025-12-06 01:56:45 -06:00
  • 8e5f4987b1 contrib : stale PRs (#17803) Georgi Gerganov 2025-12-06 09:34:18 +02:00
  • 8ce774a102 metal : fix build(#17799) b7296 Georgi Gerganov 2025-12-06 09:33:59 +02:00
  • 67788f6846 vulkan: Replace deprecated VK_EXT_validation_features (#17637) Masato Nakasaka 2025-12-06 14:39:42 +09:00
  • d8c0a7b085 vulkan: Fix mismatch in TOPK_MOE unit test (#17541) Masato Nakasaka 2025-12-06 14:23:30 +09:00
  • 933414c0b6 vulkan: add more num_blocks instantiations in rms_norm (#17701) b7293 Jeff Bolz 2025-12-05 15:08:56 -06:00
  • a0f3897d53 vulkan: fix top_k bug when there are ties in the input (#17659) Jeff Bolz 2025-12-05 15:03:19 -06:00
  • 31436df5ae contrib : stale PRs gg/contrib-stale Georgi Gerganov 2025-12-05 22:49:15 +02:00
  • e15cd06a94 vulkan : support conv-2d with large output size (#17685) Acly 2025-12-05 21:46:39 +01:00
  • fd57b24c0f ggml webgpu: unary op suppport, code refactoring, ops support (#17764) Reese Levine 2025-12-05 12:25:51 -08:00
  • 6ab0d64960 vulkan: enable mmvq for q2_k on NVIDIA (#17675) Jeff Bolz 2025-12-05 14:21:57 -06:00
  • 93bb92664e vulkan: set all memory allocations to high priority (#17624) Jeff Bolz 2025-12-05 14:21:04 -06:00
  • 8160b38a5f rpc : fix alloc size logic (#17116) Georgi Gerganov 2025-12-05 19:39:04 +02:00
  • c41bde6fbd metal : add residency sets keep-alive heartbeat (#17766) Georgi Gerganov 2025-12-05 19:38:54 +02:00
  • e652566139 Readd cub::DeviceScan::InclusiveSum-based CumSum Oliver Simons 2025-12-05 16:15:31 +01:00
  • 7668999518 Merge branch 'master' into gpu-sampling Oliver Simons 2025-12-05 14:41:08 +01:00
  • dd11f6eb7b Add perf-tests for CUMSUM Oliver Simons 2025-12-05 13:54:44 +01:00
  • 6016d0bd41 HIP : fix RDNA4 build (#17792) b7285 Johannes Gäßler 2025-12-05 13:47:52 +01:00
  • cf74b1a8ec sampling : fix candidates logic Georgi Gerganov 2025-12-05 14:21:08 +02:00
  • 1be97831e4 fix: prevent segfault in tokenizer on highly repetitive input (#17786) Pascal 2025-12-05 12:52:23 +01:00
  • a6cfc212ed ci : fix winget workflow (#17790) Adrien Gallouët 2025-12-05 12:44:17 +01:00
  • 3a0d10533a Q4/Q8 Tiled Gemm Optimization. (#16999) shalinib-ibm 2025-12-05 17:11:51 +05:30
  • 6648989673 Add pwilkin to CODEOWNERS for chat files (#17789) Piotr Wilkin (ilintar) 2025-12-05 12:00:57 +01:00
  • e95d0bc8fd CUDA: fix FA VKQ accumulator overflow (#17746) Johannes Gäßler 2025-12-05 09:18:10 +01:00
  • 668ed76574 HIP: enable WMMA-MMQ INT kernels for RDNA 3 (#17576) Jiacheng (Jason) Chen 2025-12-05 03:17:37 -05:00
  • 03d9a77b85 ci : transform release binary root dir in tar to llama-bXXXX (#17773) b7278 Sigbjørn Skjæret 2025-12-05 01:50:19 +01:00
  • 3143a755c8 docs : update ops.md (Metal, BLAS) (#17768) Gabe Goodhart 2025-12-04 16:55:34 -07:00
  • 96fe9badfc Add support for CUMSUM and TRI for CUDA. (#17584) b7276 Piotr Wilkin (ilintar) 2025-12-04 22:19:51 +01:00
  • 7864074fdb sampling : fix outputs and device checks Georgi Gerganov 2025-12-04 19:33:01 +02:00
  • bde188d60f metal: TRI, FILL, EXPM1, SOFTPLUS (#16623) b7275 Gabe Goodhart 2025-12-04 10:12:19 -07:00
  • abc19635a3 cont : keep backend sampling disabled for now Georgi Gerganov 2025-12-04 17:42:09 +02:00
  • 9d0229967a server: strip content-length header on proxy (#17734) b7274 Xuan-Son Nguyen 2025-12-04 16:32:57 +01:00
  • 6958d41366 sampling : check backend support during init Georgi Gerganov 2025-12-04 17:29:08 +02:00
  • c4c10bfb86 server: move msg diffs tracking to HTTP thread (#17740) b7273 Xuan-Son Nguyen 2025-12-04 15:46:08 +01:00
  • 817d743cc1 examples : add missing code block end marker [no ci] (#17756) Daniel Bevenius 2025-12-04 14:17:30 +01:00
  • bd4ef13476 common : skip model validation when --help is requested (#17755) b7271 Daniel Bevenius 2025-12-04 13:36:50 +01:00
  • 1bde70785d sampling : remove redundant calls to ggml_build_forward_expand Georgi Gerganov 2025-12-04 14:25:28 +02:00
  • fce571ee51 sampling : simplify temp sampling Georgi Gerganov 2025-12-04 14:23:02 +02:00
  • 87a2084c45 ggml-cpu : remove asserts always evaluating to false (#17728) b7270 Alberto Cabrera Pérez 2025-12-04 12:16:38 +00:00
  • 3659aa28e9 convert: use existing local chat_template if mistral-format model has one. (#17749) SmartestWashingMachine 2025-12-04 22:12:45 +11:00
  • 2a73f81f8a cmake : simplify build info detection using standard variables (#17423) b7268 Adrien Gallouët 2025-12-04 11:42:13 +01:00
  • 7dba049b07 ci : disable ggml-ci-x64-amd-* (#17753) Sigbjørn Skjæret 2025-12-04 11:25:08 +01:00
  • dad7571ff2 tests : better input range for unary operators gg/tests-better-unary-range Georgi Gerganov 2025-12-04 12:18:24 +02:00
  • 83c1171529 common: use native MultiByteToWideChar (#17738) b7266 Adrien Gallouët 2025-12-04 11:06:49 +01:00
  • ac9e164714 sampling : fix backend temp sampling to use logits masking Daniel Bevenius 2025-12-04 09:39:20 +01:00
  • 0d1324856f metal : use params per pipeline instance (#17739) b7265 Georgi Gerganov 2025-12-04 10:34:11 +02:00
  • a67ef0f47f llama : fix sanity checks during quantization (#17721) b7264 Georgi Gerganov 2025-12-04 10:33:42 +02:00
  • 10bd640aae Revert "sampling : stop short if backend sampler sampled a token" Daniel Bevenius 2025-12-04 08:26:33 +01:00
  • c0b182f4d6 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-04 08:17:50 +01:00
  • 87b2719eca sampling : stop short if backend sampler sampled a token Daniel Bevenius 2025-12-04 08:13:49 +01:00
  • ef75a89fdb build : move _WIN32_WINNT definition to headers (#17736) b7263 Adrien Gallouët 2025-12-04 07:04:02 +01:00
  • d8b5cdc4fe build: enable parallel builds in msbuild using MTT (#17708) b7262 Jeff Bolz 2025-12-03 22:42:29 -06:00
  • dea9ba27cb ggml-cpu: remove duplicate conditional check 'iid' (#17650) b7261 Herman Semenoff 2025-12-04 00:03:19 +03:00
  • c6d1a00aa7 Add a couple of file types to the text section (#17670) Piotr Wilkin (ilintar) 2025-12-03 21:45:06 +01:00
  • 424c579455 convert : support latest mistral-common (fix conversion with --mistral-format) (#17712) SmartestWashingMachine 2025-12-04 07:15:04 +11:00
  • e9f9483464 Use OpenAI-compatible /v1/models endpoint by default (#17689) Aleksander Grygier 2025-12-03 20:49:09 +01:00
  • 41c5e02f42 webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (#17445) Andika Wasisto 2025-12-04 02:45:17 +07:00
  • 2e1c9cd814 CUDA: generalized (mma) FA, add Volta support (#17505) b7256 Johannes Gäßler 2025-12-03 16:57:05 +01:00
  • 190c4838bd chat : reserve memory in compute_diffs and improve naming (#17729) b7255 Georgi Gerganov 2025-12-03 17:22:10 +02:00
  • e7c2cf1356 server: add router multi-model tests (#17704) (#17722) Pascal 2025-12-03 15:10:37 +01:00
  • 1257491047 server : fix bad fmt, size() is a size_type (#17735) b7253 Adrien Gallouët 2025-12-03 14:47:22 +01:00
  • 083e18b11c cmake: explicitly link against crypt32 on non-MSVC Windows builds (#17727) b7252 Adrien Gallouët 2025-12-03 14:47:02 +01:00
  • cce3b2a8ad sampling : minor cleanup Georgi Gerganov 2025-12-03 15:39:44 +02:00
  • 3d94e967a1 metal : fix data race in pipeline library (#17731) b7251 Georgi Gerganov 2025-12-03 14:03:40 +02:00
  • 7feb0a1005 ci : remove the build of openeuler-cann in release (#17724) b7250 jiahao su 2025-12-03 19:24:59 +08:00
  • 0a8026e768 common : introduce composable PEG parser combinators for chat parsing (#17136) Aldehir Rojas 2025-12-03 04:45:32 -06:00
  • 5ceed62421 server: fix duplicate HTTP headers in multiple models mode (#17698) b7248 Pascal 2025-12-03 10:28:43 +01:00
  • 7ca5991d2b ggml webgpu: add support for emscripten builds (#17184) b7247 Reese Levine 2025-12-03 01:25:34 -08:00
  • 01c9e9fd5c llama : fix sanity checks during quantization gg/llama-quant-fix-sanity-checks Georgi Gerganov 2025-12-03 11:10:11 +02:00
  • b3e3060f4e ci : move release details to the top visible by default (#17719) Sigbjørn Skjæret 2025-12-03 09:22:46 +01:00
  • 37adc9c6ba ggml, llama : use defaulted constructors/destructors (#17649) b7245 Herman Semenoff 2025-12-03 09:12:18 +03:00
  • 16cc3c606e build: document how to compile with Vulkan using Debian/Ubuntu packages (#17688) b7244 Marcos Del Sol Vives 2025-12-03 01:25:11 +01:00
  • 13628d8bdb server: add --media-path for local media files (#17697) b7243 Xuan-Son Nguyen 2025-12-02 22:49:20 +01:00
  • a96283adc4 mtmd: fix --no-warmup (#17695) Xuan-Son Nguyen 2025-12-02 22:48:08 +01:00
  • 4eba8d9451 ci : RVV1.0 builds with tests (#16682) Ali Tariq 2025-12-03 01:46:10 +05:00
  • 61bde8e21f vulkan: Reduce temporary memory usage for TOP_K (#17623) b7240 Jeff Bolz 2025-12-02 12:22:04 -06:00
  • e251e5ebbe cmake : add utf8 compilation options for msvc (#17682) b7239 xiaobing318 2025-12-03 01:50:57 +08:00
  • c4357dcc35 Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572) Chad Voegele 2025-12-02 10:33:50 -06:00
  • aad5a6afd7 sampling : implement temp_ext_backend sampling Daniel Bevenius 2025-12-02 17:26:04 +01:00
  • e148380c7c ggml : use svcntb() for SVE vector length detection (#17474) b7237 Adrien Gallouët 2025-12-02 17:21:11 +01:00
  • a2b0fe8d37 CANN: Disable Ger operator of OUT_PROD on 310p device (#17563) b7236 TianHao324 2025-12-02 20:35:23 +08:00
  • 7f3a72a8ed ggml : remove redundant n_copies check when setting input/output (#17612) b7235 Daniel Bevenius 2025-12-02 12:52:45 +01:00