llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-13 12:34:05 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	c6d70b9bea	add AGENTS.md	2026-02-16 13:13:35 +02:00
Georgi Gerganov	de956a6ca8	cleanup	2026-02-16 12:02:16 +02:00
Georgi Gerganov	350e7c1409	datasets : fix aime2025	2026-02-16 11:55:57 +02:00
Georgi Gerganov	db10dda1f3	grade : improve regex + logs	2026-02-16 11:51:36 +02:00
Georgi Gerganov	52759bf078	grader : update prompt	2026-02-16 11:17:53 +02:00
Georgi Gerganov	99e3c3d02c	datasets : add aime2025	2026-02-16 11:07:54 +02:00
Georgi Gerganov	c6315655b7	cont	2026-02-16 10:56:58 +02:00
Georgi Gerganov	f762a71d56	grader : improve example answers	2026-02-16 10:51:41 +02:00
Georgi Gerganov	73e61d5b75	rename	2026-02-16 10:30:10 +02:00
Georgi Gerganov	cffd268bb3	add gpqa + sampling + docs	2026-02-16 00:52:33 +02:00
Georgi Gerganov	e8a807519a	datasets : add gsm8k	2026-02-15 23:19:46 +02:00
Georgi Gerganov	1db8428f00	remove old files	2026-02-15 22:16:54 +02:00
Georgi Gerganov	7751ae2796	docs	2026-02-15 22:15:50 +02:00
Georgi Gerganov	d2b10302ce	improve grader	2026-02-15 22:12:02 +02:00
Georgi Gerganov	68dde884d6	minor	2026-02-15 21:21:40 +02:00
Georgi Gerganov	fd90796da2	eval : support multiple dataset runs	2026-02-15 21:08:24 +02:00
Georgi Gerganov	8156d549f6	sim : fix answer matching	2026-02-15 21:08:24 +02:00
Georgi Gerganov	9695e6feb4	test : fix path	2026-02-15 21:08:24 +02:00
Georgi Gerganov	fb1481d60d	eval : add prompts	2026-02-15 21:08:24 +02:00
Georgi Gerganov	812ae13ec1	eval : print progress	2026-02-15 21:08:24 +02:00
Georgi Gerganov	e79e8d02d5	examples: add task summary table to llama-eval-new.py	2026-02-15 21:08:23 +02:00
Georgi Gerganov	a939f4c47e	docs: update llama-eval-discussion.md with threading and model parameter updates - Add threading support implementation details - Document ThreadPoolExecutor usage and thread safety - Add model parameter implementation details - Include testing results for both features	2026-02-15 21:08:23 +02:00
Georgi Gerganov	62b04cef54	examples: add threading support and model parameter to llama-eval-new.py - Add ThreadPoolExecutor for parallel request processing controlled by --threads - Add --model argument to specify model name in request data - Refactor process() to use thread-safe _process_single_case() method - Update progress tracking to work with concurrent execution	2026-02-15 21:08:23 +02:00
Georgi Gerganov	37b26cafee	docs: update llama-eval-discussion.md with session work summary	2026-02-15 21:08:23 +02:00
Georgi Gerganov	04f6872116	examples: use cached dataset path in simulator to avoid HF Hub requests	2026-02-15 21:08:23 +02:00
Georgi Gerganov	c2619c18bf	examples: use cached dataset path to avoid HF Hub requests	2026-02-15 21:08:23 +02:00
Georgi Gerganov	87f8930968	examples: remove HF_HUB_OFFLINE to allow dataset download	2026-02-15 21:08:23 +02:00
Georgi Gerganov	9453f9de12	examples: use HF_HUB_OFFLINE to avoid HF Hub warnings	2026-02-15 21:08:23 +02:00
Georgi Gerganov	5a1be6ce37	examples: implement flexible grader system for answer validation - Add Grader class supporting regex and CLI-based grading - Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande - Add CLI grader interface: python script.py --answer <pred> --expected <gold> - Add HF telemetry disable to avoid warnings - Support exact match requirement for regex patterns - Add 30-second timeout for CLI grader - Handle both boxed and plain text formats for AIME answers	2026-02-15 21:08:23 +02:00
Georgi Gerganov	a80814e97b	docs: remove README.md from llama-eval	2026-02-15 21:08:23 +02:00
Georgi Gerganov	5cc2258e82	examples: add simplified llama-eval-new.py for AIME evaluation - Create new simplified evaluation script focused only on AIME - Implement EvalState and Processor dataclasses for structured state management - Add real-time feedback showing correct/incorrect status per case - Abstract grading interface for external grader support - Use structured JSON output for eval state - Apply HuggingFace dataset caching to avoid repeated downloads - Remove Levenshtein matching - eval script only sends requests and validates answers	2026-02-15 21:08:22 +02:00
Georgi Gerganov	c87af1d527	docs: update llama-eval-discussion.md with session work summary Add summary of llama-server-simulator implementation work including features, testing results, technical decisions, and refactoring.	2026-02-15 21:08:22 +02:00
Georgi Gerganov	23d4e21a81	examples: refactor test-simulator.sh for better readability Extract repeating question string into TEST_QUESTION variable and create make_request() helper function to reduce code duplication. Add proper error handling for error responses.	2026-02-15 21:08:22 +02:00
Georgi Gerganov	07d5e1e0ea	examples: add llama-server simulator for testing eval scripts Add a standalone Python script that simulates a llama-server HTTP endpoint for testing the eval script. The simulator: - Implements /v1/chat/completions endpoint with OpenAI-compatible format - Loads AIME dataset from HuggingFace with local caching - Uses Levenshtein distance for intelligent question matching - Supports configurable success rate for correct/wrong answer generation - Provides debug logging for troubleshooting Also includes test scripts and documentation for testing and understanding the simulator functionality.	2026-02-15 21:08:22 +02:00
gatbontonpc	8839037528	add checkpointing	2026-02-15 21:08:22 +02:00
gatbontonpc	89cab3dbc5	Add readme	2026-02-15 21:08:22 +02:00
gatbontonpc	c2d83ca048	multi source llama-eval	2026-02-15 21:08:22 +02:00
gatbontonpc	c05df17ce3	working llama-eval mc and math suite	2026-02-15 21:08:19 +02:00
Daniel Bevenius	6ab881b7c3	model-conversion : add tensor-info.py utility (#18954 ) This commit adds a new python script that can be used to print tensors information from a tensor in a safetensors model. The motivation for this is that during model conversion work it can sometimes be useful to verify the shape of tensors in the original model. While it is possible to print the tensors when loading the model this can be slow when working with larger models. With this script it is possible to quickly query tensor shapes. Example usage: ```console (venv) $ ./scripts/utils/tensor-info.py --help usage: tensor-info.py [-h] [-m MODEL_PATH] [-l] [tensor_name] Print tensor information from a safetensors model positional arguments: tensor_name Name of the tensor to inspect options: -h, --help show this help message and exit -m MODEL_PATH, --model-path MODEL_PATH Path to the model directory (default: MODEL_PATH environment variable) -l, --list List unique tensor patterns in the model (layer numbers replaced with #) ``` Listing tensor names: ```console (venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m -l embed_tokens.weight layers.#.input_layernorm.weight layers.#.mlp.down_proj.weight layers.#.mlp.gate_proj.weight layers.#.mlp.up_proj.weight layers.#.post_attention_layernorm.weight layers.#.post_feedforward_layernorm.weight layers.#.pre_feedforward_layernorm.weight layers.#.self_attn.k_norm.weight layers.#.self_attn.k_proj.weight layers.#.self_attn.o_proj.weight layers.#.self_attn.q_norm.weight layers.#.self_attn.q_proj.weight layers.#.self_attn.v_proj.weight norm.weight ``` Printing a specific tensor's information: ```console (venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m layers.0.input_layernorm.weight Tensor: layers.0.input_layernorm.weight File: model.safetensors Shape: [768] ```	2026-02-04 10:40:53 +01:00
Daniel Bevenius	6156ae5111	model-conversion : add debug option to conversion script (#19265 ) This commit adds a debug option to the model conversion script to enable using the Python debugger (pdb) during model conversion. The motivation for this is that I've found myself adding this a few times now and it would be quicker to have this flag as an option and a makefile target/recipe for it.	2026-02-02 11:29:57 +01:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Neo Zhang	2634ed207a	create test.sh to enhance the parameters for testing, update the guide, rm useless script (#19243 )	2026-02-01 18:24:00 +08:00
Daniele Pinna	1488339138	lookup, lookahead: fix crash when n_ctx not specified (#18729 ) * lookup, lookahead: fix crash when n_ctx not specified Since PR #16653 (Dec 15, 2025), the default n_ctx is 0 to enable automatic GPU memory fitting. This causes llama-lookup and llama-lookahead to crash when run without explicit -c flag: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") Root cause: Both examples use params.n_ctx directly for batch initialization, but params.n_ctx remains 0 even after the context is properly initialized to n_ctx_train internally. Bug history: - Nov 2023: lookahead.cpp created (PR #4207) with params.n_ctx pattern - Dec 2023: lookup.cpp created (PR #4484) with same pattern - Nov 2024: default n_ctx changed to 4096 (PR #10136) - bug dormant - Dec 2025: default n_ctx changed to 0 (PR #16653) - bug activated The bug was dormant for 2+ years because params.n_ctx defaulted to 512, then 4096. PR #16653 changed it to 0 for GPU auto-fitting, triggering the crash. Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching the pattern already used elsewhere in lookup.cpp (line 72) and in speculative.cpp/speculative-simple.cpp. Tested: llama-lookup now works without -c flag (12.5% acceptance on Gemma-3-1B). Note: llama-lookahead has a separate pre-existing issue with sequence initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix. * lookahead: fix n_seq_max and kv_unified configuration Lookahead decoding requires: - W + G + 1 = 31 sequences for parallel Jacobi decoding - Unified KV cache for coupled sequences in batch splitting These requirements were broken after PR #14482 changed validation logic. Consolidates fix from PR #18730 per maintainer request. Commit message drafted with Claude.	2026-01-30 22:10:24 +02:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Daniel Bevenius	a14b960bc7	model-conversion : use BUILD_DIR variable in all scripts (#19015 ) This commit modifies all the utility scripts to use an optional BUILD_DIR variable/argument to specify the build directory. The motivation for this is that Commit `3d55846a5c` ("model-conversion : add BUILD_DIR variable to run-converted-model scripts") introduced this variable to the causal and embeddings scripts, but I missed the scripts in the utils directory.	2026-01-23 09:01:36 +01:00
Daniel Bevenius	3d55846a5c	model-conversion : add BUILD_DIR variable to run-converted-model scripts (#18927 ) This commit adds a BUILD_DIR variable to the scripts used for running converted models. The motivation for this is that currently the `build` directory is hardcoded and it can be useful to specify a different build directory, with builds for different configurations.	2026-01-19 13:12:38 +01:00
Georgi Gerganov	39173bcacb	context : reserve new scheduler when graph topology changes (#18547 ) * context : reserve new scheduler when graph topology changes * cont : fix * cont : fix reserve * cont : reserve only when changes occur + timing * context : add comments * llama : reserve on sampler changes * common : allow null common_sampler * server : task declares needs (embd, logits, sampling) * server : do not init sampler if not needed * llama : fix need_reserve when unsetting a sampler * server : consolidate slot reset/clear logic	2026-01-15 16:39:17 +02:00
Adrien Gallouët	ec997b4f2b	tests : download models only when running ctest (#18843 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-15 09:47:29 +01:00
Piotr Wilkin (ilintar)	d98b548120	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 ) * Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality * Move to common * Remove unneeded header * Unlink from common * chore: update webui build output * Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code. * Revert change to webapp * Post-merge adjust * Apply suggestions from code review Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Apply code review changes * Remove changes to server-context * Remove mtmd.h include * Remove utility functions from header * Apply suggestions from code review Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Rename functions * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-01-14 20:29:35 +01:00
Adrien Gallouët	516a4ca9b5	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00

1 2 3 4 5 ...

1640 Commits