llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-12 03:54:06 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	55ce1b4e2f	datasets : add gsm8k	2026-05-10 18:13:47 +03:00
Georgi Gerganov	abec77e068	remove old files	2026-05-10 18:13:47 +03:00
Georgi Gerganov	65e3c5a928	docs	2026-05-10 18:13:47 +03:00
Georgi Gerganov	4f176f6a4d	improve grader	2026-05-10 18:13:47 +03:00
Georgi Gerganov	9578e83ac2	minor	2026-05-10 18:13:47 +03:00
Georgi Gerganov	530f38f9c3	eval : support multiple dataset runs	2026-05-10 18:13:46 +03:00
Georgi Gerganov	cda8cae01a	sim : fix answer matching	2026-05-10 18:13:46 +03:00
Georgi Gerganov	64720e1e01	test : fix path	2026-05-10 18:13:46 +03:00
Georgi Gerganov	1a780f7c44	eval : add prompts	2026-05-10 18:13:46 +03:00
Georgi Gerganov	940364e4c9	eval : print progress	2026-05-10 18:13:46 +03:00
Georgi Gerganov	ee9b715eb6	examples: add task summary table to llama-eval-new.py	2026-05-10 18:13:46 +03:00
Georgi Gerganov	d639ee52ea	docs: update llama-eval-discussion.md with threading and model parameter updates - Add threading support implementation details - Document ThreadPoolExecutor usage and thread safety - Add model parameter implementation details - Include testing results for both features	2026-05-10 18:13:46 +03:00
Georgi Gerganov	fb40d1a04a	examples: add threading support and model parameter to llama-eval-new.py - Add ThreadPoolExecutor for parallel request processing controlled by --threads - Add --model argument to specify model name in request data - Refactor process() to use thread-safe _process_single_case() method - Update progress tracking to work with concurrent execution	2026-05-10 18:13:45 +03:00
Georgi Gerganov	2fe445cc60	docs: update llama-eval-discussion.md with session work summary	2026-05-10 18:13:45 +03:00
Georgi Gerganov	3732aea2df	examples: use cached dataset path in simulator to avoid HF Hub requests	2026-05-10 18:13:45 +03:00
Georgi Gerganov	edc766c919	examples: use cached dataset path to avoid HF Hub requests	2026-05-10 18:13:45 +03:00
Georgi Gerganov	d7d2c22909	examples: remove HF_HUB_OFFLINE to allow dataset download	2026-05-10 18:13:45 +03:00
Georgi Gerganov	30ea5124de	examples: use HF_HUB_OFFLINE to avoid HF Hub warnings	2026-05-10 18:13:45 +03:00
Georgi Gerganov	0ca458d892	examples: implement flexible grader system for answer validation - Add Grader class supporting regex and CLI-based grading - Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande - Add CLI grader interface: python script.py --answer <pred> --expected <gold> - Add HF telemetry disable to avoid warnings - Support exact match requirement for regex patterns - Add 30-second timeout for CLI grader - Handle both boxed and plain text formats for AIME answers	2026-05-10 18:13:45 +03:00
Georgi Gerganov	de8eda468b	docs: remove README.md from llama-eval	2026-05-10 18:13:44 +03:00
Georgi Gerganov	a2b96e0444	examples: add simplified llama-eval-new.py for AIME evaluation - Create new simplified evaluation script focused only on AIME - Implement EvalState and Processor dataclasses for structured state management - Add real-time feedback showing correct/incorrect status per case - Abstract grading interface for external grader support - Use structured JSON output for eval state - Apply HuggingFace dataset caching to avoid repeated downloads - Remove Levenshtein matching - eval script only sends requests and validates answers	2026-05-10 18:13:44 +03:00
Georgi Gerganov	deed078654	docs: update llama-eval-discussion.md with session work summary Add summary of llama-server-simulator implementation work including features, testing results, technical decisions, and refactoring.	2026-05-10 18:13:44 +03:00
Georgi Gerganov	05b8425bd6	examples: refactor test-simulator.sh for better readability Extract repeating question string into TEST_QUESTION variable and create make_request() helper function to reduce code duplication. Add proper error handling for error responses.	2026-05-10 18:13:44 +03:00
Georgi Gerganov	58bd57ba99	examples: add llama-server simulator for testing eval scripts Add a standalone Python script that simulates a llama-server HTTP endpoint for testing the eval script. The simulator: - Implements /v1/chat/completions endpoint with OpenAI-compatible format - Loads AIME dataset from HuggingFace with local caching - Uses Levenshtein distance for intelligent question matching - Supports configurable success rate for correct/wrong answer generation - Provides debug logging for troubleshooting Also includes test scripts and documentation for testing and understanding the simulator functionality.	2026-05-10 18:13:44 +03:00
gatbontonpc	5cbe95b6e5	add checkpointing	2026-05-10 18:13:44 +03:00
gatbontonpc	c7f3ce25f5	Add readme	2026-05-10 18:13:44 +03:00
gatbontonpc	4db4497ca7	multi source llama-eval	2026-05-10 18:13:43 +03:00
gatbontonpc	db8b09d6e8	working llama-eval mc and math suite	2026-05-10 18:13:42 +03:00
Neo Zhang	6a2a2513dc	fix script error (#22795sycl : )	2026-05-08 06:54:57 +03:00
Shane Tran Whitmire	cfff1fc300	sycl : fix test script (#22737 ) The error: ./examples/sycl/test.sh: line 122: level_zero:${$GGML_SYCL_DEVICE}: bad substitution was thrown whenever the user used this command: ./examples/sycl/test.sh -mg 0 Fix is to get rid of a dollar sign.	2026-05-07 08:25:57 +03:00
Adrien Gallouët	bf76ac77be	common : only load backends when required (#22290 ) * common : only load backends when required Signed-off-by: Adrien Gallouët <angt@huggingface.co> * llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-05 09:23:50 +02:00
Georgi Gerganov	d6e7b033a4	llama : add option to save memory in device buffers (#22679 ) * llama : add option to save memory in device buffers * tests : extend llama-save-load-state	2026-05-05 06:35:07 +03:00
Shakhnazar Sailaukan	d8794eecd5	examples: refactor diffusion generation (#22590 ) * examples: refactor diffusion generation * renamed enum values	2026-05-04 20:19:30 +08:00
Peter Sideris	b42c7fa5b8	spec : fix vocab compat checks in spec example (#22426 ) * port #22358 PR to examples/speculative/speculative.cpp * use vocab_[tgt,dft] instead of ctx_[tgt,dft] when logging on draft model / target model vocabulary mismatch Co-authored-by: Petros Sideris <petros.sideris@nokia.com>	2026-04-30 08:18:25 +03:00
Georgi Gerganov	14e733e36f	spec : refactor params (#22397 ) * spec : refactor params * cont : fix * cont : rename "sparam" to "sampling" * cont : add spec params category * cont : add info about removed arguments * cont : skip param length check for spec params * cont : adapt server tests	2026-04-28 09:07:33 +03:00
Max Krasnyansky	5594d13224	common: fix missing exports in llama-common (#22340 ) * common: refactor common/debug to move abort_on_nan into base_callback_data Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO. It should just be a member of the base_callback_data instead. * cont : cleanup * common : use pimpl in debug.h to reduce header dependencies Move common_debug_cb_user_data's data members (std::regex, std::vector<uint8_t>) into a private impl struct in debug.cpp. This removes the includes of common.h and <regex> from debug.h, reducing transitive dependencies for any translation unit that includes the header. Assisted-by: llama.cpp:local pi --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-27 08:06:39 +03:00
Neo Zhang	eddd7a13a5	[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291 ) * opt arc770 for Q4_0 * add for Q4_0 * update the script * add help script for windows * update guide * fix format issue * convert from dos to unix for format issue * fix missed -sm parameter	2026-04-25 09:20:14 +03:00
Daniel Bevenius	9012c50fc8	model-conversion : fix mmproj output file name [no ci] (#22274 ) * model-conversion : fix mmproj output file name [no ci] This commit updates the convert-model.sh script to properly handle mmproj output files. The motivation for this that currently the same name as the original model is used as the mmproj file, which causes the original model to be overwritten and no mmproj-<model_name>.gguf to be created. * model-conversion : use MODEL_NAME [no ci]	2026-04-23 15:07:38 +02:00
Georgi Gerganov	bcb5eeb645	speculative-simple : add checkpoint support (#22227 ) * speculative-simple : add checkpoint support * cont : fix build	2026-04-22 15:44:45 +03:00
Sigbjørn Skjæret	23b8cc4991	android : libcommon -> libllama-common (#22076 )	2026-04-18 11:19:40 +02:00
Georgi Gerganov	6990e2f1f7	libs : rename libcommon -> libllama-common (#21936 ) * cmake : allow libcommon to be shared * cmake : rename libcommon to libllama-common * cont : set -fPIC for httplib * cont : export all symbols * cont : fix build_info exports * libs : add libllama-common-base * log : add common_log_get_verbosity_thold()	2026-04-17 11:11:46 +03:00
Matt	e39eba26f3	read n_ctx back after making llama_context (#21939 )	2026-04-15 15:24:57 +08:00
Daniel Bevenius	c8ac02fa1b	requirements : update transformers to 5.5.1 (#21617 ) * requirements : update transformers to 5.5.0 This commit updates the transformers dependency to version 5.5.0. The motivation for this is that transformers 5.5.0 includes support for Gemma4 and is required to be able to convert Gemma4 models. This is also causing issues for user of gguf-my-repo. Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/202 * fix huggingface_hub version * set version of transformers to 5.5.0 * convert : add ty ignore directives to convert_hf_to_gguf.py This commit adds `ty: ignore` directives to transformers tokenizers field/methods to avoid type check errors. There might be better ways to handle this and perhaps this can be done in a follow up commit. The motivation for this is that it looks like in transformers 5.5.0 AutoTokenizer.from_pretrained can return generic tokenizer types or None and the type checker now produces an error when the conversion script accesses field like tokenizer.vocab. * convert : add ty ignore to suppress type check errors * convert : remove incorrect type ignores * convert : fix remaining python checks I was running a newer version of ty locally but I've switched to version 0.0.26 which is what CI uses and I was then able to reproduce the errors. Sorry about the noise. * update transformers version to 5.5.1	2026-04-09 12:36:29 +02:00
Daniel Bevenius	87f4744a80	examples : disable cb_eval callback for --save-logits (#21553 ) This commit updates the debug example to not create the base_callback_data. The motivation for this is when using `--save-logits`, which is used by examples/model-conversion scripts, we often don't care about the tensor outputs and they just add noise to the output. This changes is quiet by default we can always remove --save-logits to get the tensor outputs when debugging.	2026-04-08 14:10:33 +02:00
Xuan-Son Nguyen	63f8fe0ef4	model, mtmd: fix gguf conversion for audio/vision mmproj (#21309 ) * fix gguf conversion for audio/vision mmproj * fix test	2026-04-02 17:10:32 +02:00
Adrien Gallouët	41361c8599	common : move up common_init() and fix Windows UTF-8 logs (#21176 ) The build info is now only for debug, so we avoid the duplicate with `--version`. The UTF-8 setup at the beginning is needed to avoid logging garbage on Windows. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-31 12:53:41 +02:00
Sigbjørn Skjæret	e2eb39e81c	ci : bump ty to 0.0.26 (#21156 ) * fix incorrect type ignore comments * bump ty to 0.0.26	2026-03-30 09:29:15 +02:00
Neo Zhang	afe65aa282	[SYCL] Enhance build script to use half cores to build, avoid OS hang (#21093 ) * use half cores to build, avoid OS hang * reduce the output text num to short test time * avoid to return 0	2026-03-29 09:02:45 +08:00
yikechayedan	406f4e3f61	android : fix-pointer-dangling (#20974 )	2026-03-25 11:51:26 +02:00
Sigbjørn Skjæret	29b28a9824	ci : switch from pyright to ty (#20826 ) * type fixes * switch to ty * tweak rules * tweak more rules * more tweaks * final tweak * use common import-not-found rule	2026-03-21 08:54:34 +01:00

1 2 3 4 5 ...

1660 Commits