llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-13 04:24:17 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	3649793811	add html	2026-05-10 18:13:50 +03:00
Georgi Gerganov	7e8c88c5e0	fix prompts	2026-05-10 18:13:49 +03:00
Georgi Gerganov	2e0b6766f3	simplify	2026-05-10 18:13:49 +03:00
Georgi Gerganov	f95f4dd1ca	fix counts	2026-05-10 18:13:49 +03:00
Georgi Gerganov	095c8ab655	cleanup	2026-05-10 18:13:49 +03:00
Georgi Gerganov	d830acacc5	resume eval	2026-05-10 18:13:49 +03:00
Georgi Gerganov	f35b10f0a9	ignore errors	2026-05-10 18:13:49 +03:00
Georgi Gerganov	802d85e26e	add AGENTS.md	2026-05-10 18:13:49 +03:00
Georgi Gerganov	91bd92c6b6	cleanup	2026-05-10 18:13:48 +03:00
Georgi Gerganov	f20b5a72cf	datasets : fix aime2025	2026-05-10 18:13:48 +03:00
Georgi Gerganov	122dfe3eab	grade : improve regex + logs	2026-05-10 18:13:48 +03:00
Georgi Gerganov	8b94ab4f4a	grader : update prompt	2026-05-10 18:13:48 +03:00
Georgi Gerganov	f99d77f3bd	datasets : add aime2025	2026-05-10 18:13:48 +03:00
Georgi Gerganov	55a7cf4a06	cont	2026-05-10 18:13:48 +03:00
Georgi Gerganov	6e7e1a5a63	grader : improve example answers	2026-05-10 18:13:48 +03:00
Georgi Gerganov	9f02fa6382	rename	2026-05-10 18:13:47 +03:00
Georgi Gerganov	e7b8646098	add gpqa + sampling + docs	2026-05-10 18:13:47 +03:00
Georgi Gerganov	55ce1b4e2f	datasets : add gsm8k	2026-05-10 18:13:47 +03:00
Georgi Gerganov	abec77e068	remove old files	2026-05-10 18:13:47 +03:00
Georgi Gerganov	65e3c5a928	docs	2026-05-10 18:13:47 +03:00
Georgi Gerganov	4f176f6a4d	improve grader	2026-05-10 18:13:47 +03:00
Georgi Gerganov	9578e83ac2	minor	2026-05-10 18:13:47 +03:00
Georgi Gerganov	530f38f9c3	eval : support multiple dataset runs	2026-05-10 18:13:46 +03:00
Georgi Gerganov	cda8cae01a	sim : fix answer matching	2026-05-10 18:13:46 +03:00
Georgi Gerganov	64720e1e01	test : fix path	2026-05-10 18:13:46 +03:00
Georgi Gerganov	1a780f7c44	eval : add prompts	2026-05-10 18:13:46 +03:00
Georgi Gerganov	940364e4c9	eval : print progress	2026-05-10 18:13:46 +03:00
Georgi Gerganov	ee9b715eb6	examples: add task summary table to llama-eval-new.py	2026-05-10 18:13:46 +03:00
Georgi Gerganov	d639ee52ea	docs: update llama-eval-discussion.md with threading and model parameter updates - Add threading support implementation details - Document ThreadPoolExecutor usage and thread safety - Add model parameter implementation details - Include testing results for both features	2026-05-10 18:13:46 +03:00
Georgi Gerganov	fb40d1a04a	examples: add threading support and model parameter to llama-eval-new.py - Add ThreadPoolExecutor for parallel request processing controlled by --threads - Add --model argument to specify model name in request data - Refactor process() to use thread-safe _process_single_case() method - Update progress tracking to work with concurrent execution	2026-05-10 18:13:45 +03:00
Georgi Gerganov	2fe445cc60	docs: update llama-eval-discussion.md with session work summary	2026-05-10 18:13:45 +03:00
Georgi Gerganov	3732aea2df	examples: use cached dataset path in simulator to avoid HF Hub requests	2026-05-10 18:13:45 +03:00
Georgi Gerganov	edc766c919	examples: use cached dataset path to avoid HF Hub requests	2026-05-10 18:13:45 +03:00
Georgi Gerganov	d7d2c22909	examples: remove HF_HUB_OFFLINE to allow dataset download	2026-05-10 18:13:45 +03:00
Georgi Gerganov	30ea5124de	examples: use HF_HUB_OFFLINE to avoid HF Hub warnings	2026-05-10 18:13:45 +03:00
Georgi Gerganov	0ca458d892	examples: implement flexible grader system for answer validation - Add Grader class supporting regex and CLI-based grading - Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande - Add CLI grader interface: python script.py --answer <pred> --expected <gold> - Add HF telemetry disable to avoid warnings - Support exact match requirement for regex patterns - Add 30-second timeout for CLI grader - Handle both boxed and plain text formats for AIME answers	2026-05-10 18:13:45 +03:00
Georgi Gerganov	de8eda468b	docs: remove README.md from llama-eval	2026-05-10 18:13:44 +03:00
Georgi Gerganov	a2b96e0444	examples: add simplified llama-eval-new.py for AIME evaluation - Create new simplified evaluation script focused only on AIME - Implement EvalState and Processor dataclasses for structured state management - Add real-time feedback showing correct/incorrect status per case - Abstract grading interface for external grader support - Use structured JSON output for eval state - Apply HuggingFace dataset caching to avoid repeated downloads - Remove Levenshtein matching - eval script only sends requests and validates answers	2026-05-10 18:13:44 +03:00
Georgi Gerganov	deed078654	docs: update llama-eval-discussion.md with session work summary Add summary of llama-server-simulator implementation work including features, testing results, technical decisions, and refactoring.	2026-05-10 18:13:44 +03:00
Georgi Gerganov	05b8425bd6	examples: refactor test-simulator.sh for better readability Extract repeating question string into TEST_QUESTION variable and create make_request() helper function to reduce code duplication. Add proper error handling for error responses.	2026-05-10 18:13:44 +03:00
Georgi Gerganov	58bd57ba99	examples: add llama-server simulator for testing eval scripts Add a standalone Python script that simulates a llama-server HTTP endpoint for testing the eval script. The simulator: - Implements /v1/chat/completions endpoint with OpenAI-compatible format - Loads AIME dataset from HuggingFace with local caching - Uses Levenshtein distance for intelligent question matching - Supports configurable success rate for correct/wrong answer generation - Provides debug logging for troubleshooting Also includes test scripts and documentation for testing and understanding the simulator functionality.	2026-05-10 18:13:44 +03:00
gatbontonpc	5cbe95b6e5	add checkpointing	2026-05-10 18:13:44 +03:00
gatbontonpc	c7f3ce25f5	Add readme	2026-05-10 18:13:44 +03:00
gatbontonpc	4db4497ca7	multi source llama-eval	2026-05-10 18:13:43 +03:00
gatbontonpc	db8b09d6e8	working llama-eval mc and math suite	2026-05-10 18:13:42 +03:00
Neo Zhang	6a2a2513dc	fix script error (#22795sycl : )	2026-05-08 06:54:57 +03:00
Shane Tran Whitmire	cfff1fc300	sycl : fix test script (#22737 ) The error: ./examples/sycl/test.sh: line 122: level_zero:${$GGML_SYCL_DEVICE}: bad substitution was thrown whenever the user used this command: ./examples/sycl/test.sh -mg 0 Fix is to get rid of a dollar sign.	2026-05-07 08:25:57 +03:00
Adrien Gallouët	bf76ac77be	common : only load backends when required (#22290 ) * common : only load backends when required Signed-off-by: Adrien Gallouët <angt@huggingface.co> * llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-05 09:23:50 +02:00
Georgi Gerganov	d6e7b033a4	llama : add option to save memory in device buffers (#22679 ) * llama : add option to save memory in device buffers * tests : extend llama-save-load-state	2026-05-05 06:35:07 +03:00
Shakhnazar Sailaukan	d8794eecd5	examples: refactor diffusion generation (#22590 ) * examples: refactor diffusion generation * renamed enum values	2026-05-04 20:19:30 +08:00

1 2 3 4 5 ...

1677 Commits