llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-11 19:44:06 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	f49c636db0	llama-eval : protect dump() with lock for thread safety Assisted-by: llama.cpp:local pi	2026-05-10 21:52:43 +03:00
Georgi Gerganov	d5165e8f2e	llama-eval : require --grader-model or --model when using --grader-type llm Assisted-by: llama.cpp:local pi	2026-05-10 21:49:58 +03:00
Georgi Gerganov	85c6aa006d	llama-server-simulator : fix comment - Dice coefficient, not Levenshtein Assisted-by: llama.cpp:local pi	2026-05-10 21:49:02 +03:00
Georgi Gerganov	e5ac6d1da6	llama-eval : track model name in eval state and verify on resume - Store model_name in EvalState and JSON output - Display model in HTML summary table - Verify --model matches stored model when resuming Assisted-by: llama.cpp:local pi	2026-05-10 21:43:35 +03:00
Georgi Gerganov	094554dbcc	llama-eval : update README with PR link and quick-start examples Assisted-by: llama.cpp:local pi	2026-05-10 21:22:48 +03:00
Georgi Gerganov	f64d56bcd8	llama-server-simulator : replace Flask with stdlib http.server - Use HTTPServer + BaseHTTPRequestHandler instead of Flask - RequestHandler handles POST /v1/chat/completions - Server runs in daemon thread with clean Ctrl+C shutdown - Remove flask and unused asdict imports Assisted-by: llama.cpp:local pi	2026-05-10 20:47:08 +03:00
ggerganov	43f14a0a46	llama-eval : support multiple evaluation endpoints with dynamic task distribution - Add ServerConfig dataclass (url, threads, name) - Accept comma-separated --server, --threads, --server-name CLI args - Dynamic shared-queue task distribution across servers (fast servers do more work) - One ThreadPoolExecutor per server, workers pull from shared Queue - Track which server processed each task (server_name in results) - Thread-safe EvalState with threading.Lock for concurrent mutations - Server column in HTML report and console output - Backward compatible: single server works as before Assisted-by: llama.cpp:local pi	2026-05-10 20:42:23 +03:00
Georgi Gerganov	d26b1ffcc9	llama-eval : rename display, escaped, and count variables to use prefix convention - _display suffix → display_ prefix (answer, tokens, tps, t_gen) - _escaped suffix → escaped_ prefix (response, prompt, reasoning) - _count suffix → n_ prefix (correct, incorrect, pending) Assisted-by: llama.cpp:local pi	2026-05-10 19:24:29 +03:00
Georgi Gerganov	9f10d8d195	llama-eval : add per-task generation time from server timings Extract predicted_ms from the server timings response and store it as t_gen_ms per task. Display in seconds with one decimal digit in console progress, print_all_tasks, and HTML report. Assisted-by: llama.cpp:local pi	2026-05-10 19:15:34 +03:00
Georgi Gerganov	4d5dedc569	llama-eval : add per-task generation speed from server timings Extract predicted_per_second from the server timings response and store it as tps_gen per task. Display in console progress, print_all_tasks, and HTML report. Assisted-by: llama.cpp:local pi	2026-05-10 19:05:20 +03:00
Georgi Gerganov	81a65cf035	eval : add Wilson score confidence interval to results Compute 95% CI on-the-fly from completed cases. Displayed in terminal output, HTML report, and JSON state.	2026-05-10 18:46:36 +03:00
Georgi Gerganov	7d433f767b	eval : unify "judge" terminology to "grader" Replace all occurrences of "judge" with "grader" for consistency across the codebase (CLI args, Grader class fields, help text). Assisted-by: llama.cpp:local pi	2026-05-10 18:23:28 +03:00
Georgi Gerganov	633a68d6c2	remove junk	2026-05-10 18:13:50 +03:00
Georgi Gerganov	e0a2cf48ca	track total time	2026-05-10 18:13:50 +03:00
Georgi Gerganov	bad9565a1e	refactor	2026-05-10 18:13:50 +03:00
Georgi Gerganov	752b703a5e	resoning and error handling	2026-05-10 18:13:50 +03:00
Georgi Gerganov	fc571f3a1e	add tokens	2026-05-10 18:13:50 +03:00
Georgi Gerganov	6797d80dff	store full response	2026-05-10 18:13:50 +03:00
Georgi Gerganov	3649793811	add html	2026-05-10 18:13:50 +03:00
Georgi Gerganov	7e8c88c5e0	fix prompts	2026-05-10 18:13:49 +03:00
Georgi Gerganov	2e0b6766f3	simplify	2026-05-10 18:13:49 +03:00
Georgi Gerganov	f95f4dd1ca	fix counts	2026-05-10 18:13:49 +03:00
Georgi Gerganov	095c8ab655	cleanup	2026-05-10 18:13:49 +03:00
Georgi Gerganov	d830acacc5	resume eval	2026-05-10 18:13:49 +03:00
Georgi Gerganov	f35b10f0a9	ignore errors	2026-05-10 18:13:49 +03:00
Georgi Gerganov	802d85e26e	add AGENTS.md	2026-05-10 18:13:49 +03:00
Georgi Gerganov	91bd92c6b6	cleanup	2026-05-10 18:13:48 +03:00
Georgi Gerganov	f20b5a72cf	datasets : fix aime2025	2026-05-10 18:13:48 +03:00
Georgi Gerganov	122dfe3eab	grade : improve regex + logs	2026-05-10 18:13:48 +03:00
Georgi Gerganov	8b94ab4f4a	grader : update prompt	2026-05-10 18:13:48 +03:00
Georgi Gerganov	f99d77f3bd	datasets : add aime2025	2026-05-10 18:13:48 +03:00
Georgi Gerganov	55a7cf4a06	cont	2026-05-10 18:13:48 +03:00
Georgi Gerganov	6e7e1a5a63	grader : improve example answers	2026-05-10 18:13:48 +03:00
Georgi Gerganov	9f02fa6382	rename	2026-05-10 18:13:47 +03:00
Georgi Gerganov	e7b8646098	add gpqa + sampling + docs	2026-05-10 18:13:47 +03:00
Georgi Gerganov	55ce1b4e2f	datasets : add gsm8k	2026-05-10 18:13:47 +03:00
Georgi Gerganov	abec77e068	remove old files	2026-05-10 18:13:47 +03:00
Georgi Gerganov	65e3c5a928	docs	2026-05-10 18:13:47 +03:00
Georgi Gerganov	4f176f6a4d	improve grader	2026-05-10 18:13:47 +03:00
Georgi Gerganov	9578e83ac2	minor	2026-05-10 18:13:47 +03:00
Georgi Gerganov	530f38f9c3	eval : support multiple dataset runs	2026-05-10 18:13:46 +03:00
Georgi Gerganov	cda8cae01a	sim : fix answer matching	2026-05-10 18:13:46 +03:00
Georgi Gerganov	64720e1e01	test : fix path	2026-05-10 18:13:46 +03:00
Georgi Gerganov	1a780f7c44	eval : add prompts	2026-05-10 18:13:46 +03:00
Georgi Gerganov	940364e4c9	eval : print progress	2026-05-10 18:13:46 +03:00
Georgi Gerganov	ee9b715eb6	examples: add task summary table to llama-eval-new.py	2026-05-10 18:13:46 +03:00
Georgi Gerganov	d639ee52ea	docs: update llama-eval-discussion.md with threading and model parameter updates - Add threading support implementation details - Document ThreadPoolExecutor usage and thread safety - Add model parameter implementation details - Include testing results for both features	2026-05-10 18:13:46 +03:00
Georgi Gerganov	fb40d1a04a	examples: add threading support and model parameter to llama-eval-new.py - Add ThreadPoolExecutor for parallel request processing controlled by --threads - Add --model argument to specify model name in request data - Refactor process() to use thread-safe _process_single_case() method - Update progress tracking to work with concurrent execution	2026-05-10 18:13:45 +03:00
Georgi Gerganov	2fe445cc60	docs: update llama-eval-discussion.md with session work summary	2026-05-10 18:13:45 +03:00
Georgi Gerganov	3732aea2df	examples: use cached dataset path in simulator to avoid HF Hub requests	2026-05-10 18:13:45 +03:00

1 2

63 Commits