Commit Graph

  • b9a37717b0 codeowners : remove ericcurtin (#17658) Eric Curtin 2025-12-02 11:18:15 +00:00
  • 2595818a68 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-02 12:07:01 +01:00
  • f3a9674ae8 llama : fix signed comparison warning on FreeBSD (#17497) b7233 Adrien Gallouët 2025-12-02 12:05:38 +01:00
  • db8972e251 squash! sampling : fix backend temp sampler for zero temperature Daniel Bevenius 2025-12-02 11:53:29 +01:00
  • 2c453c6c77 convert: add error message for mistral3 quantized weight (#17686) Xuan-Son Nguyen 2025-12-02 11:48:31 +01:00
  • 5d6bd842ea server: remove default "gpt-3.5-turbo" model name (#17668) b7231 Xuan-Son Nguyen 2025-12-02 11:38:57 +01:00
  • 516af33ca6 CUDA: Update CCCL's rc candidate Oliver Simons 2025-12-02 11:23:01 +01:00
  • 244880ae3a CUDA: Use standard-compliant preprocessor for MSVC builds Oliver Simons 2025-12-02 11:22:25 +01:00
  • 559d058dd2 CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt Oliver Simons 2025-12-01 17:54:06 +01:00
  • fd3abe849e server: fixing naming conflict res_error in server-models.cpp (#17679) b7230 senhtry 2025-12-02 18:18:39 +08:00
  • 682e6658bb server: explicitly set exec path when create new instance (#17669) b7229 Xuan-Son Nguyen 2025-12-02 10:25:11 +01:00
  • 4574f2949e ci : skip winget update when not in ggml-org (#17465) Adrien Gallouët 2025-12-02 10:15:01 +01:00
  • ab6726eeff ggml : add fallback definition for HWCAP2_SVE2 (#17683) b7227 Adrien Gallouët 2025-12-02 09:41:26 +01:00
  • 3e9a258c14 Merge remote-tracking branch 'upstream/master' into gpu-sampling Daniel Bevenius 2025-12-02 09:26:04 +01:00
  • cee92af553 Add context info to server error (#17663) Aleksander Grygier 2025-12-02 09:20:57 +01:00
  • 739b597804 sampling : fix backend temp sampler for zero temperature Daniel Bevenius 2025-12-02 09:03:08 +01:00
  • ed32089927 ggml-cuda: reorder only relevant nodes (#17639) b7225 Aman Gupta 2025-12-02 12:36:31 +08:00
  • 7b6d745364 release: fix duplicate libs, store symbolic links (#17299) b7224 Aaron Teo 2025-12-02 11:52:05 +08:00
  • 98bd9ab1e4 enhance argsort for UT (#17573) b7223 Neo Zhang Jianyu 2025-12-02 08:56:46 +08:00
  • 746f9ee889 Override SSM_A op for Qwen3 Next to reduce splits (#17587) b7222 Piotr Wilkin (ilintar) 2025-12-02 00:43:13 +01:00
  • 9810cb8247 ops.md: update vulkan support (#17661) Jeff Bolz 2025-12-01 15:26:21 -06:00
  • ecf74a8417 mtmd: add mtmd_context_params::warmup option (#17652) b7220 Xuan-Son Nguyen 2025-12-01 21:32:25 +01:00
  • 00c361fe53 fix: llama arch implementation (#17665) b7219 Gilad S. 2025-12-01 22:21:13 +02:00
  • ec18edfcba server: introduce API for serving / loading / unloading multiple models (#17470) b7218 Xuan-Son Nguyen 2025-12-01 19:41:04 +01:00
  • 988261b18d examples : remove outdated backend sampling section Daniel Bevenius 2025-12-01 18:20:41 +01:00
  • 88cca45bb8 sampling : fix top_p empty condition Georgi Gerganov 2025-12-01 18:02:34 +02:00
  • 04f2822a86 sampling : do not create empty samplers Georgi Gerganov 2025-12-01 17:52:07 +02:00
  • 4032ce2378 common : simplify sampler chain initialization Georgi Gerganov 2025-12-01 17:10:32 +02:00
  • 217469f07f Make backend's top_p sampler inclusive Oliver Simons 2025-12-01 15:24:32 +01:00
  • ae0bb6a6da Factor out ggml_sort into its own function Oliver Simons 2025-12-01 14:46:47 +01:00
  • 7733409734 common: improve verbosity level definitions (#17630) b7217 Xuan-Son Nguyen 2025-12-01 14:38:13 +01:00
  • 16451d6bc3 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-01 14:47:50 +02:00
  • cd3c118908 model: support Ministral3 (#17644) b7216 Xuan-Son Nguyen 2025-12-01 12:26:52 +01:00
  • 8bee483c97 Fix backend_top_p_sampler Oliver Simons 2025-12-01 12:07:30 +01:00
  • 649495c9d9 metal : add FA head size 48 (#17619) b7215 Georgi Gerganov 2025-12-01 12:49:53 +02:00
  • 90c72a614a ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617) b7214 Georgi Gerganov 2025-12-01 12:49:33 +02:00
  • 6eea666912 llama-graph: avoid expand_forward for fusion (#17633) b7213 Aman Gupta 2025-12-01 17:12:48 +08:00
  • cf0e1475c5 sampling : lower log level for output buffer reallocations [no ci] Daniel Bevenius 2025-12-01 09:13:47 +01:00
  • ff90508d68 contributing: update guidelines for AI-generated code (#17625) b7212 Xuan-Son Nguyen 2025-11-30 22:51:34 +01:00
  • 0a4aeb927d cmake : add option to build and link LibreSSL (#17552) b7211 Adrien Gallouët 2025-11-30 22:14:32 +01:00
  • 2ba719519d model: LFM2-VL fixes (#17577) b7210 Tarek Dakhran 2025-11-30 21:57:31 +01:00
  • 874c877bde revise xsn/update_guidelines_for_ai Xuan Son Nguyen 2025-11-30 17:54:44 +01:00
  • 91faddb9b4 contributing: update guidelines for AI-generated code Xuan Son Nguyen 2025-11-30 17:46:42 +01:00
  • 7f8ef50cce clip: fix nb calculation for qwen3-vl (#17594) b7209 Xuan-Son Nguyen 2025-11-30 15:33:55 +01:00
  • 3c136b21a3 cli: add migration warning (#17620) b7208 Xuan-Son Nguyen 2025-11-30 15:32:43 +01:00
  • beb1f0c503 common : throttle download progress output to reduce IO flush (#17427) b7207 Adrien Gallouët 2025-11-30 13:22:44 +01:00
  • def5404f26 common: add LLAMA_LOG_FILE env var (#17609) b7206 Aaron Teo 2025-11-30 19:12:32 +08:00
  • 80742cbaeb cont : naming Georgi Gerganov 2025-11-30 00:07:13 +02:00
  • fa0465954f ggml: fix: macOS build with -DGGML_BACKEND_DL=ON (#17581) b7205 Gilad S. 2025-11-30 04:00:59 +02:00
  • 5a6241feb0 common: update env var name (#17588) b7204 ddh0 2025-11-29 19:59:25 -06:00
  • c7af376c29 CUDA: add stream-based concurrency (#16991) b7203 Aman Gupta 2025-11-30 08:17:55 +08:00
  • 00425e2ed1 cuda : add error checking for cudaMemcpyAsync in argsort (#17599) b7202 Mahekk Shaikh 2025-11-29 19:16:28 -05:00
  • 385c3da5e6 vulkan : fix FA mask load with bounds check (coopmat2) (#17606) b7201 Acly 2025-11-30 01:03:21 +01:00
  • c187003d81 llama : naming Georgi Gerganov 2025-11-30 00:05:47 +02:00
  • 1760bd69b3 llama : reserve graphs with samplers Georgi Gerganov 2025-11-29 23:57:25 +02:00
  • 467746e3ad Merge branch 'master' into HEAD Georgi Gerganov 2025-11-29 23:17:25 +02:00
  • ff7b0bf632 llama : call backend_init once Georgi Gerganov 2025-11-29 23:09:53 +02:00
  • ab49f094d2 server: move server-context to its own cpp|h (#17595) b7200 Xuan-Son Nguyen 2025-11-29 22:04:44 +01:00
  • d8d98bb4bb Merge branch 'master' into HEAD Georgi Gerganov 2025-11-29 22:38:44 +02:00
  • 9028ebfea8 llama : cleanup + naming Georgi Gerganov 2025-11-29 22:37:07 +02:00
  • 8c32d9d96d server: explicitly set the function name in lambda (#17538) b7199 Haiyue Wang 2025-11-30 01:43:29 +08:00
  • 0874693b44 common : fix json schema with '\' in literals (#17307) b7198 Igor Smirnov 2025-11-29 21:06:32 +05:00
  • 865bcb4abc release: bugfix missing .tar.gz upload fix-release-duplicate-libs-b7137-865bcb4 Aaron Teo 2025-11-29 23:46:34 +08:00
  • bd119c7471 release: undo debug info and attempt release fix-release-duplicate-libs-b7136-bd119c7 Aaron Teo 2025-11-29 23:04:02 +08:00
  • fbc8f49f3c llama : simplify Georgi Gerganov 2025-11-29 15:58:59 +02:00
  • a00ecf21eb release: debug file info Aaron Teo 2025-11-29 22:50:24 +08:00
  • 7d2add51d8 sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566) b7197 Neo Zhang 2025-11-29 20:59:44 +08:00
  • 76f6335fef release: disable release workflow for debug Aaron Teo 2025-11-29 20:59:41 +08:00
  • f698a79c63 ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567) b7196 ixgbe 2025-11-29 20:56:31 +08:00
  • 751a3cf956 release: update release message Aaron Teo 2025-11-29 19:42:19 +08:00
  • f5335532e5 release: switch to .tar.gz Aaron Teo 2025-11-29 19:02:18 +08:00
  • 47a268ea50 Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900) b7195 Ruben Ortlam 2025-11-29 09:37:22 +01:00
  • 59d8d4e963 vulkan: improve topk perf for large k, fix overflow in unit tests (#17582) b7194 Jeff Bolz 2025-11-29 01:39:57 -06:00
  • d82b7a7c1d gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553) b7193 Aleksei Nikiforov 2025-11-28 20:53:01 +01:00
  • 03914c7ef8 common : move all common_chat_parse_* to chat-parser.cpp. (#17481) b7192 DAN™ 2025-11-28 13:29:36 -05:00
  • 3ce7a65c2f server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386) b7191 o7si 2025-11-29 02:14:00 +08:00
  • e072b2052e ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276) b7190 Diego Devesa 2025-11-28 07:33:23 -08:00
  • 2464d1b3fc sampling : simplify Georgi Gerganov 2025-11-28 17:21:12 +02:00
  • 8cac9dee45 sampling : use logits directly for min-p filtering Daniel Bevenius 2025-11-28 16:12:05 +01:00
  • 333da805fe Add initial version for top-p sampling Oliver Simons 2025-11-28 15:08:20 +01:00
  • 117e2079a9 refactor : simplify and improve memory management Georgi Gerganov 2025-11-28 11:47:59 +02:00
  • c6f7a423c8 [MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551) b7189 R0CKSTAR 2025-11-28 21:08:29 +08:00
  • 459b7ae7b9 squash! sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-28 13:46:51 +01:00
  • 2e7ef98f18 ggml-cuda: add stricter checking for fusion (#17568) b7188 Aman Gupta 2025-11-28 20:34:51 +08:00
  • ddf9f94389 server : add Anthropic Messages API support (#17570) b7187 Fredrik Hultin 2025-11-28 12:57:04 +01:00
  • ff55414c42 model : Qwen3 Next (#16095) b7186 Piotr Wilkin (ilintar) 2025-11-28 12:02:56 +01:00
  • 73955f7d2a CUDA: no FP16 arithmetic for vector FA kernel (#17558) b7185 Johannes Gäßler 2025-11-28 10:29:09 +01:00
  • 35cf8887e1 vulkan: Implement GGML_OP_TRI (#17503) b7184 Jeff Bolz 2025-11-28 03:07:29 -06:00
  • 15d2b46b4d rpc : cache and reuse compute graphs (#15405) b7183 Radoslav Gerganov 2025-11-28 10:33:51 +02:00
  • 9ad6522be6 squash! sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-28 08:57:48 +01:00
  • 74be332e24 sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-27 19:39:41 +01:00
  • 6bca76ff5e HIP: enable mul_mat_f for RDNA4 (#17437) b7182 yulo 2025-11-28 15:24:30 +08:00
  • cd0e3a7a3b SOLVE_TRI CUDA kernel for small matrices (#17457) b7181 Piotr Wilkin (ilintar) 2025-11-28 05:15:32 +01:00
  • efaaccdd69 refactor pad_reflect_1d to make the UT case pass (#17204) b7180 Neo Zhang Jianyu 2025-11-28 08:50:56 +08:00
  • f9889cf1c7 Fix top-k comp & behavior for non-CUB path Oliver Simons 2025-11-27 16:40:41 +01:00
  • 4abef75f2c vulkan: Implement SOLVE_TRI (#17486) b7179 Jeff Bolz 2025-11-27 08:48:00 -06:00
  • c386114922 arch : add description about LLM_TENSOR_INFOS (#17550) b7178 Georgi Gerganov 2025-11-27 16:34:13 +02:00
  • e9d070980b sampling : remove backend sampling chain from common_sampler Daniel Bevenius 2025-11-27 15:28:37 +01:00
  • 6783b11fb0 models : fix LFM2 tensors (#17548) b7177 Georgi Gerganov 2025-11-27 16:04:29 +02:00
  • c6bba89ea9 arch : add description about LLM_TENSOR_INFOS gg/arch-add-desc Georgi Gerganov 2025-11-27 16:03:09 +02:00