Commit Graph

  • aa8b62105c Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Gaurav Garg 2026-02-16 15:39:26 +05:30
  • de956a6ca8 cleanup Georgi Gerganov 2026-02-16 12:02:16 +02:00
  • 350e7c1409 datasets : fix aime2025 Georgi Gerganov 2026-02-16 11:55:57 +02:00
  • db10dda1f3 grade : improve regex + logs Georgi Gerganov 2026-02-16 11:51:36 +02:00
  • 52759bf078 grader : update prompt Georgi Gerganov 2026-02-16 11:17:53 +02:00
  • 99e3c3d02c datasets : add aime2025 Georgi Gerganov 2026-02-16 11:07:54 +02:00
  • c6315655b7 cont Georgi Gerganov 2026-02-16 10:56:58 +02:00
  • f762a71d56 grader : improve example answers Georgi Gerganov 2026-02-16 10:51:41 +02:00
  • 73e61d5b75 rename Georgi Gerganov 2026-02-16 10:30:10 +02:00
  • d5dfc33027 graph : fix KQ mask, lora, cvec reuse checks (#19644) b8069 Georgi Gerganov 2026-02-16 09:21:11 +02:00
  • 267ba5a1d9 ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132) b8068 abhijain1204fujitsu 2026-02-16 12:08:43 +05:30
  • cffd268bb3 add gpqa + sampling + docs Georgi Gerganov 2026-02-16 00:52:17 +02:00
  • e8a807519a datasets : add gsm8k Georgi Gerganov 2026-02-15 23:19:46 +02:00
  • ff4affb4c1 sync : ggml b8067 Georgi Gerganov 2026-02-15 22:23:13 +02:00
  • 55d58599c8 ggml : bump version to 0.9.7 (ggml/1425) Georgi Gerganov 2026-02-15 22:21:04 +02:00
  • 1a8c700bfd ggml : bump version to 0.9.6 (ggml/1423) Georgi Gerganov 2026-02-07 09:58:02 +02:00
  • 1db8428f00 remove old files Georgi Gerganov 2026-02-15 22:16:54 +02:00
  • 7751ae2796 docs Georgi Gerganov 2026-02-15 22:15:50 +02:00
  • d2b10302ce improve grader Georgi Gerganov 2026-02-15 21:50:45 +02:00
  • 68dde884d6 minor Georgi Gerganov 2026-02-15 21:21:40 +02:00
  • fd90796da2 eval : support multiple dataset runs Georgi Gerganov 2026-02-02 22:34:25 +02:00
  • 8156d549f6 sim : fix answer matching Georgi Gerganov 2026-02-02 19:45:04 +02:00
  • 9695e6feb4 test : fix path Georgi Gerganov 2026-02-02 19:13:37 +02:00
  • fb1481d60d eval : add prompts Georgi Gerganov 2026-01-31 22:37:57 +02:00
  • 812ae13ec1 eval : print progress Georgi Gerganov 2026-01-31 19:33:37 +02:00
  • e79e8d02d5 examples: add task summary table to llama-eval-new.py Georgi Gerganov 2026-01-31 18:58:27 +02:00
  • a939f4c47e docs: update llama-eval-discussion.md with threading and model parameter updates Georgi Gerganov 2026-01-31 16:58:36 +02:00
  • 62b04cef54 examples: add threading support and model parameter to llama-eval-new.py Georgi Gerganov 2026-01-31 16:56:56 +02:00
  • 37b26cafee docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 16:41:55 +02:00
  • 04f6872116 examples: use cached dataset path in simulator to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:39:51 +02:00
  • c2619c18bf examples: use cached dataset path to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:38:46 +02:00
  • 87f8930968 examples: remove HF_HUB_OFFLINE to allow dataset download Georgi Gerganov 2026-01-31 16:33:45 +02:00
  • 9453f9de12 examples: use HF_HUB_OFFLINE to avoid HF Hub warnings Georgi Gerganov 2026-01-31 16:32:39 +02:00
  • 5a1be6ce37 examples: implement flexible grader system for answer validation Georgi Gerganov 2026-01-31 16:31:46 +02:00
  • a80814e97b docs: remove README.md from llama-eval Georgi Gerganov 2026-01-31 16:17:43 +02:00
  • 5cc2258e82 examples: add simplified llama-eval-new.py for AIME evaluation Georgi Gerganov 2026-01-31 16:17:06 +02:00
  • c87af1d527 docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 15:49:43 +02:00
  • 23d4e21a81 examples: refactor test-simulator.sh for better readability Georgi Gerganov 2026-01-31 15:45:47 +02:00
  • 07d5e1e0ea examples: add llama-server simulator for testing eval scripts Georgi Gerganov 2026-01-31 15:37:31 +02:00
  • 8839037528 add checkpointing gatbontonpc 2026-01-16 17:58:31 -05:00
  • 89cab3dbc5 Add readme gatbontonpc 2026-01-12 13:53:39 -05:00
  • c2d83ca048 multi source llama-eval gatbontonpc 2026-01-12 13:47:43 -05:00
  • c05df17ce3 working llama-eval mc and math suite gatbontonpc 2026-01-10 22:19:08 -08:00
  • 27b93cbd15 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624) b8064 David Friehs 2026-02-15 18:08:42 +01:00
  • 6e67fd2144 docs: update s390x build docs (#19643) Aaron Teo 2026-02-16 00:33:34 +08:00
  • 9e118b97c4 build : remove LLAMA_HTTPLIB option (#19623) b8062 Adrien Gallouët 2026-02-15 15:38:50 +01:00
  • 57088276d4 cmake : check if KleidiAI API has been fetched (#19640) b8061 Daniel Bevenius 2026-02-15 13:59:38 +01:00
  • 341bc7d23c context : fix output reorder with backend sampling (#19638) b8060 Georgi Gerganov 2026-02-15 14:57:40 +02:00
  • 08e6d914b8 ggml : avoid UB in gemm ukernel (#19642) b8059 Georgi Gerganov 2026-02-15 14:56:35 +02:00
  • 184c694f45 ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (#19399) b8058 Aaron Teo 2026-02-15 18:20:35 +08:00
  • 684b36101c ggml-cpu: FA add GEMM microkernel (#19422) b8057 Aman Gupta 2026-02-15 11:09:24 +05:30
  • 3a00c98584 cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (#19581) b8056 SamareshSingh 2026-02-14 23:22:53 -06:00
  • 079feab9e3 convert : ensure all models handle new experts count (#19621) b8055 Sigbjørn Skjæret 2026-02-14 22:22:32 +01:00
  • 01d8eaa28d mtmd : Add Nemotron Nano 12B v2 VL support (#19547) b8054 Anav Prasad 2026-02-14 05:07:00 -08:00
  • 1725e316c1 models : optimize qwen3next graph (#19375) b8053 Georgi Gerganov 2026-02-14 12:57:36 +02:00
  • b7742cf321 ggml : fix GGML_DEBUG with OpenMP (#19599) b8052 Adrien Gallouët 2026-02-14 11:22:57 +01:00
  • badba89320 NetBSD build support (#19589) b8051 iMil 2026-02-14 09:47:01 +01:00
  • baa12f3831 webui: Architecture and UI improvements (#19596) Aleksander Grygier 2026-02-14 09:06:41 +01:00
  • 2d8015e8a4 llama : update LoRA API. + fix excessive graph reserves (#19280) b8049 agent-enemy-2 2026-02-14 03:06:27 -05:00
  • eb145c0753 mmap: Fix Windows handle lifetime (#19598) b8048 George 2026-02-14 10:05:12 +02:00
  • 6e473fb384 metal : fix ACC op (#19427) b8047 Georgi Gerganov 2026-02-14 09:54:03 +02:00
  • c7db95f106 scripts : use official split.py for cpp-httplib (#19588) b8046 Adrien Gallouët 2026-02-14 08:41:16 +01:00
  • 0d00ef65ed convert : store ffn_gate_inp_shexp as F32 (#19606) Sigbjørn Skjæret 2026-02-14 08:17:43 +01:00
  • 91ea5d67f2 build : fix libtool call in build-xcframework.sh (#19605) Adrien Gallouët 2026-02-14 06:48:37 +01:00
  • dbb023336b vulkan: support L2_NORM with contiguous rows (#19604) b8043 Jeff Bolz 2026-02-13 21:42:04 -08:00
  • 53aef25a88 vulkan: support GGML_OP_SET (#19584) b8042 Jeff Bolz 2026-02-13 21:36:38 -08:00
  • 2dec548094 vulkan: Add vendor id for Qualcomm drivers (#19569) b8041 Sophon 2026-02-14 13:29:17 +08:00
  • 0ccbfdef3e hexagon: further optimizations and refactoring for flash attention (#19583) b8040 Max Krasnyansky 2026-02-13 16:27:30 -08:00
  • 94a602db66 github : add missing backends to issue templates (#19603) Mengsheng Wu 2026-02-13 15:56:53 -08:00
  • 05a6f0e894 vulkan: restore -inf check in FA shaders (#19582) b8038 Jeff Bolz 2026-02-13 11:35:29 -08:00
  • fd24533e89 better granularity estimate Johannes Gäßler 2026-02-13 18:20:44 +01:00
  • d8f97b99ed fix compilation Johannes Gäßler 2026-02-13 15:13:40 +01:00
  • b48e80f677 common : update download code (#19573) b8037 Adrien Gallouët 2026-02-13 15:10:46 +01:00
  • 752584d5f5 model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) (#19460) b8036 Xuan-Son Nguyen 2026-02-13 14:56:53 +01:00
  • 98ab6727e4 arbitrary num. of GPUs/tensor split Johannes Gäßler 2026-02-13 11:45:05 +01:00
  • cc2aa81513 Fix wrong memcpy length for block_interleave == 4 (#19575) b8035 Alberto Cabrera Pérez 2026-02-13 12:32:14 +00:00
  • 0e21991472 fix vulkan ggml_acc only works in 3d but not 4d (#19426) b8034 ymcki 2026-02-13 20:31:37 +08:00
  • b2ecc0cdb4 support --verbose-prompt (#19576) b8033 Sigbjørn Skjæret 2026-02-13 12:49:10 +01:00
  • 5065da554e CUDA: loop over ne2*ne3 in case it overflows (#19538) b8032 Aman Gupta 2026-02-13 17:01:40 +05:30
  • 5174d7206f webui: UI and routing fixes (#19586) Aleksander Grygier 2026-02-13 12:31:00 +01:00
  • 9c7d45c0fc fix view_offs scaling Johannes Gäßler 2026-02-13 11:05:57 +01:00
  • 43919b7f4f CUDA: Do not mutate cgraph for fused ADDs (#19566) b8030 Oliver Simons 2026-02-13 10:37:55 +01:00
  • 423cf0b26f docs : fix broken link and typo (#19560) Pavan Shinde 2026-02-13 14:08:09 +05:30
  • 33a56f90a6 model : Kimi Linear fix conv state update (#19531) b8028 ymcki 2026-02-13 16:10:18 +08:00
  • 25224c8021 llama : remove deprecated codecvt (#19565) b8027 Adrien Gallouët 2026-02-13 06:43:53 +01:00
  • 2f5d8f8edc vendor : update BoringSSL to 0.20260211.0 (#19562) b8026 Adrien Gallouët 2026-02-13 06:43:26 +01:00
  • bb96bfd361 memory : fix kv cache size for hybrid models (#19559) b8025 Georgi Gerganov 2026-02-13 07:36:24 +02:00
  • 0644baefde metal : improve concurrency (#19555) b8024 Georgi Gerganov 2026-02-13 07:35:57 +02:00
  • 490eb96b88 metal : support GGML_OP_SET (#19548) b8023 Georgi Gerganov 2026-02-13 07:34:52 +02:00
  • 31e4f189bb support for tensor dims % n_devs != 0 Johannes Gäßler 2026-02-11 23:34:43 +01:00
  • 3bb78133ab hexagon: fix typo in vtcm_needs_release (#19545) b8022 Shupei Fan 2026-02-13 07:07:49 +08:00
  • 79cc0f2daf opencl: add basic support for q4_1 (#19534) b8021 lhez 2026-02-12 14:52:37 -08:00
  • 338085c69e args : add -kvu to llama-parallel (#19577) b8020 Georgi Gerganov 2026-02-12 21:52:41 +02:00
  • 5da56dc1d8 args : add -kvu to llama-parallel pr/19378 Georgi Gerganov 2026-02-12 21:50:01 +02:00
  • f8feadb20f metal : fix build Georgi Gerganov 2026-02-12 21:49:52 +02:00
  • 4c61875bf8 webui: Add switcher to Chat Message UI to show raw LLM output (#19571) Aleksander Grygier 2026-02-12 19:55:51 +01:00
  • 4b385bfcf8 vendor : update cpp-httplib (#19537) b8018 Adrien Gallouët 2026-02-12 16:11:22 +01:00
  • f488429380 llama : update outdated comment in llama.h (#19428) b8017 Christian Schmitz 2026-02-12 15:52:57 +01:00
  • b12a56351d Merge pull request #4 from gaugarg-nv/minor_fixes Johannes Gäßler 2026-02-12 14:19:13 +01:00
  • 9bb9d78368 Apply suggestion from @JohannesGaessler Johannes Gäßler 2026-02-12 14:18:49 +01:00