Commit Graph

  • 7dbb0e998a examples : update args speculative-simple README.md [no ci] (#22938) master Daniel Bevenius 2026-05-11 13:00:57 +02:00
  • dd9280a664 vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (#22589) Jeff Bolz 2026-05-11 05:49:03 -05:00
  • 8cef8201a1 CUDA: directly include cuda/iterator (#22936) Oliver Simons 2026-05-11 12:16:38 +02:00
  • f5636f8fc7 convert : add image break token fallback (#22914) Daniel Bevenius 2026-05-11 12:07:17 +02:00
  • c8f8e2364c cont : simplify gg/spec-mtp-experiments Georgi Gerganov 2026-05-11 09:41:00 +03:00
  • 838374375c vendor : update cpp-httplib to 0.44.0 (#22919) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-11 03:47:13 -03:00
  • 7d442abf5c [SYCL] Add OP im2col_3d (#22903) b9102 Neo Zhang 2026-05-11 13:01:47 +08:00
  • c417ddfc74 fix batch size Aman Gupta 2026-05-11 12:22:37 +08:00
  • a428b010ab spec: support MTP Aman Gupta 2026-05-11 11:18:17 +08:00
  • 389ff61d77 server : print warning when HTTP timeout exceeded (#22907) b9101 Georgi Gerganov 2026-05-10 22:00:18 +03:00
  • f49c636db0 llama-eval : protect dump() with lock for thread safety gg/scripts-eval Georgi Gerganov 2026-05-10 21:52:43 +03:00
  • d5165e8f2e llama-eval : require --grader-model or --model when using --grader-type llm Georgi Gerganov 2026-05-10 21:49:58 +03:00
  • 85c6aa006d llama-server-simulator : fix comment - Dice coefficient, not Levenshtein Georgi Gerganov 2026-05-10 21:49:02 +03:00
  • e5ac6d1da6 llama-eval : track model name in eval state and verify on resume Georgi Gerganov 2026-05-10 21:43:35 +03:00
  • 094554dbcc llama-eval : update README with PR link and quick-start examples Georgi Gerganov 2026-05-10 21:22:48 +03:00
  • f64d56bcd8 llama-server-simulator : replace Flask with stdlib http.server Georgi Gerganov 2026-05-10 20:47:08 +03:00
  • 43f14a0a46 llama-eval : support multiple evaluation endpoints with dynamic task distribution ggerganov 2026-05-10 20:42:14 +03:00
  • 2e97c5f96f backend sampling: support returning post-sampling probs (#22622) b9100 Tim Neumann 2026-05-10 19:12:02 +02:00
  • 5d5d2e15d2 vendor : update cpp-httplib to 0.43.4 (#22888) b9099 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-10 13:46:54 -03:00
  • d26b1ffcc9 llama-eval : rename display, escaped, and count variables to use prefix convention Georgi Gerganov 2026-05-10 19:23:30 +03:00
  • 9f10d8d195 llama-eval : add per-task generation time from server timings Georgi Gerganov 2026-05-10 19:15:34 +03:00
  • 4d5dedc569 llama-eval : add per-task generation speed from server timings Georgi Gerganov 2026-05-10 19:05:20 +03:00
  • 81a65cf035 eval : add Wilson score confidence interval to results Georgi Gerganov 2026-05-10 18:46:36 +03:00
  • 2b2babd124 ggml-virtgpu : include missing mutex header (#22810) Oliver Walsh 2026-05-10 16:32:41 +01:00
  • 7d433f767b eval : unify "judge" terminology to "grader" Georgi Gerganov 2026-05-10 18:23:28 +03:00
  • 633a68d6c2 remove junk Georgi Gerganov 2026-03-29 17:31:04 +03:00
  • e0a2cf48ca track total time Georgi Gerganov 2026-02-23 21:22:02 +02:00
  • bad9565a1e refactor Georgi Gerganov 2026-02-16 23:02:45 +02:00
  • 752b703a5e resoning and error handling Georgi Gerganov 2026-02-16 22:16:15 +02:00
  • fc571f3a1e add tokens Georgi Gerganov 2026-02-16 21:52:54 +02:00
  • 6797d80dff store full response Georgi Gerganov 2026-02-16 21:44:29 +02:00
  • 3649793811 add html Georgi Gerganov 2026-02-16 21:22:06 +02:00
  • 7e8c88c5e0 fix prompts Georgi Gerganov 2026-02-16 21:02:25 +02:00
  • 2e0b6766f3 simplify Georgi Gerganov 2026-02-16 19:47:06 +02:00
  • f95f4dd1ca fix counts Georgi Gerganov 2026-02-16 16:38:31 +02:00
  • 095c8ab655 cleanup Georgi Gerganov 2026-02-16 16:31:14 +02:00
  • d830acacc5 resume eval Georgi Gerganov 2026-02-16 16:21:36 +02:00
  • f35b10f0a9 ignore errors Georgi Gerganov 2026-02-16 15:23:23 +02:00
  • 802d85e26e add AGENTS.md Georgi Gerganov 2026-02-16 13:08:56 +02:00
  • 91bd92c6b6 cleanup Georgi Gerganov 2026-02-16 12:02:16 +02:00
  • f20b5a72cf datasets : fix aime2025 Georgi Gerganov 2026-02-16 11:55:57 +02:00
  • 122dfe3eab grade : improve regex + logs Georgi Gerganov 2026-02-16 11:51:36 +02:00
  • 8b94ab4f4a grader : update prompt Georgi Gerganov 2026-02-16 11:17:53 +02:00
  • f99d77f3bd datasets : add aime2025 Georgi Gerganov 2026-02-16 11:07:54 +02:00
  • 55a7cf4a06 cont Georgi Gerganov 2026-02-16 10:56:58 +02:00
  • 6e7e1a5a63 grader : improve example answers Georgi Gerganov 2026-02-16 10:51:41 +02:00
  • 9f02fa6382 rename Georgi Gerganov 2026-02-16 10:30:10 +02:00
  • e7b8646098 add gpqa + sampling + docs Georgi Gerganov 2026-02-16 00:52:17 +02:00
  • 55ce1b4e2f datasets : add gsm8k Georgi Gerganov 2026-02-15 23:19:46 +02:00
  • abec77e068 remove old files Georgi Gerganov 2026-02-15 22:16:54 +02:00
  • 65e3c5a928 docs Georgi Gerganov 2026-02-15 22:15:50 +02:00
  • 4f176f6a4d improve grader Georgi Gerganov 2026-02-15 21:50:45 +02:00
  • 9578e83ac2 minor Georgi Gerganov 2026-02-15 21:21:40 +02:00
  • 530f38f9c3 eval : support multiple dataset runs Georgi Gerganov 2026-02-02 22:34:25 +02:00
  • cda8cae01a sim : fix answer matching Georgi Gerganov 2026-02-02 19:45:04 +02:00
  • 64720e1e01 test : fix path Georgi Gerganov 2026-02-02 19:13:37 +02:00
  • 1a780f7c44 eval : add prompts Georgi Gerganov 2026-01-31 22:37:57 +02:00
  • 940364e4c9 eval : print progress Georgi Gerganov 2026-01-31 19:33:37 +02:00
  • ee9b715eb6 examples: add task summary table to llama-eval-new.py Georgi Gerganov 2026-01-31 18:58:27 +02:00
  • d639ee52ea docs: update llama-eval-discussion.md with threading and model parameter updates Georgi Gerganov 2026-01-31 16:58:36 +02:00
  • fb40d1a04a examples: add threading support and model parameter to llama-eval-new.py Georgi Gerganov 2026-01-31 16:56:56 +02:00
  • 2fe445cc60 docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 16:41:55 +02:00
  • 3732aea2df examples: use cached dataset path in simulator to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:39:51 +02:00
  • edc766c919 examples: use cached dataset path to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:38:46 +02:00
  • d7d2c22909 examples: remove HF_HUB_OFFLINE to allow dataset download Georgi Gerganov 2026-01-31 16:33:45 +02:00
  • 30ea5124de examples: use HF_HUB_OFFLINE to avoid HF Hub warnings Georgi Gerganov 2026-01-31 16:32:39 +02:00
  • 0ca458d892 examples: implement flexible grader system for answer validation Georgi Gerganov 2026-01-31 16:31:46 +02:00
  • de8eda468b docs: remove README.md from llama-eval Georgi Gerganov 2026-01-31 16:17:43 +02:00
  • a2b96e0444 examples: add simplified llama-eval-new.py for AIME evaluation Georgi Gerganov 2026-01-31 16:17:06 +02:00
  • deed078654 docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 15:49:43 +02:00
  • 05b8425bd6 examples: refactor test-simulator.sh for better readability Georgi Gerganov 2026-01-31 15:45:47 +02:00
  • 58bd57ba99 examples: add llama-server simulator for testing eval scripts Georgi Gerganov 2026-01-31 15:37:31 +02:00
  • 5cbe95b6e5 add checkpointing gatbontonpc 2026-01-16 17:58:31 -05:00
  • c7f3ce25f5 Add readme gatbontonpc 2026-01-12 13:53:39 -05:00
  • 4db4497ca7 multi source llama-eval gatbontonpc 2026-01-12 13:47:43 -05:00
  • db8b09d6e8 working llama-eval mc and math suite gatbontonpc 2026-01-10 22:19:08 -08:00
  • 0b047287fe sync : ggml b9097 Georgi Gerganov 2026-05-10 16:59:29 +03:00
  • efbada936f ggml : bump version to 0.11.1 (ggml/1484) Georgi Gerganov 2026-05-10 16:57:19 +03:00
  • f3c3e0e9a0 internal AllReduce kernel for CUDA provider (#22299) b9095 scutler-nv 2026-05-10 02:05:22 -07:00
  • 5755a100cd model : fix model type check for granite/llama3 and deepseek2/glm4.7 lite (#22870) b9094 Sigbjørn Skjæret 2026-05-10 08:44:29 +02:00
  • f0210cc40d Converge implementation with export-graph-ops cross-profiler Piotr Wilkin 2026-04-07 22:01:00 +02:00
  • 2bbcc61af7 Add missing op parameters to the profiler; add support for test-backend-ops to run performance tests with exactly the tensor shapes from the run Piotr Wilkin 2026-04-03 17:41:57 +02:00
  • ff5d0bbc11 docs, pass copy details Piotr Wilkin 2026-03-29 23:35:38 +02:00
  • 395c43eadf fix mul_mat_id stats, add throughput stat, add envvar trigger, add concurrent mode fix Piotr Wilkin 2026-03-29 22:52:33 +02:00
  • b1252bcd73 fix builds, integrate vulkan profiler, fix copy events, fix export Piotr Wilkin 2026-03-29 16:52:50 +02:00
  • 8291ecd707 Fix more missing backend stuff (and Python errors) Piotr Wilkin 2026-03-29 01:57:02 +01:00
  • 391d2bf23a add second dimension to reported tensors, fix Mac build, add missing initializer to all backends Piotr Wilkin 2026-03-29 01:49:52 +01:00
  • 2e66d2c130 feat: cool profiler thingy Piotr Wilkin 2026-03-29 01:14:09 +01:00
  • 1e5ad35d56 model : add sarvam_moe architecture support (#20275) b9093 Sumit Chatterjee 2026-05-10 00:31:50 +10:00
  • 65d7a8bbf0 devops : updated Nix systems (#22869) Yuannan 2026-05-09 14:15:03 +00:00
  • db8e326913 spec : introduce common_speculative_process() gg/spec-refactor-parallel Georgi Gerganov 2026-05-09 17:12:24 +03:00
  • 0d5dd61d66 spec : reset drafting flag at the end Georgi Gerganov 2026-05-09 17:12:06 +03:00
  • ec8bc44854 cont : minor Georgi Gerganov 2026-05-09 15:28:29 +03:00
  • b3bd3bd4cc cont : clean-up Georgi Gerganov 2026-05-09 14:09:45 +03:00
  • 00d56b11c3 docker : upgraded the default intel compute-runtime version (#22567) Davi Henrique Linhares 2026-05-09 05:22:23 -03:00
  • 5757c4dcb1 cmake : update BoringSSL to 0.20260508.0 (#22839) b9090 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-09 04:26:33 -03:00
  • ce0acf03ea server, spec : clean-up Georgi Gerganov 2026-05-09 10:21:57 +03:00
  • e20b83930c SYCL: reduce allocation overhead during flash attention (#22732) b9089 Alexey Kopytko 2026-05-09 15:30:39 +09:00
  • fd89556567 [SYCL] Add BF16 support to GET_ROWS operation (#21391) b9088 Devedse 2026-05-09 07:50:24 +02:00
  • 60489932ec sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (#22152) b9087 Intel AI Get-to Market Customer Success and Solutions 2026-05-08 22:48:07 -07:00