Commit Graph

1649 Commits

Author SHA1 Message Date
Georgi Gerganov
2ffa45edfc add tokens 2026-02-16 21:52:54 +02:00
Georgi Gerganov
9c29be1177 store full response 2026-02-16 21:44:29 +02:00
Georgi Gerganov
013963cfd5 add html 2026-02-16 21:22:06 +02:00
Georgi Gerganov
e2e998a2d6 fix prompts 2026-02-16 21:02:25 +02:00
Georgi Gerganov
6c41664b8b simplify 2026-02-16 19:50:27 +02:00
Georgi Gerganov
7b84af8051 fix counts 2026-02-16 16:38:31 +02:00
Georgi Gerganov
60a501e138 cleanup 2026-02-16 16:31:14 +02:00
Georgi Gerganov
e6e777cfb3 resume eval 2026-02-16 16:21:36 +02:00
Georgi Gerganov
ad3a54eb68 ignore errors 2026-02-16 15:23:23 +02:00
Georgi Gerganov
c6d70b9bea add AGENTS.md 2026-02-16 13:13:35 +02:00
Georgi Gerganov
de956a6ca8 cleanup 2026-02-16 12:02:16 +02:00
Georgi Gerganov
350e7c1409 datasets : fix aime2025 2026-02-16 11:55:57 +02:00
Georgi Gerganov
db10dda1f3 grade : improve regex + logs 2026-02-16 11:51:36 +02:00
Georgi Gerganov
52759bf078 grader : update prompt 2026-02-16 11:17:53 +02:00
Georgi Gerganov
99e3c3d02c datasets : add aime2025 2026-02-16 11:07:54 +02:00
Georgi Gerganov
c6315655b7 cont 2026-02-16 10:56:58 +02:00
Georgi Gerganov
f762a71d56 grader : improve example answers 2026-02-16 10:51:41 +02:00
Georgi Gerganov
73e61d5b75 rename 2026-02-16 10:30:10 +02:00
Georgi Gerganov
cffd268bb3 add gpqa + sampling + docs 2026-02-16 00:52:33 +02:00
Georgi Gerganov
e8a807519a datasets : add gsm8k 2026-02-15 23:19:46 +02:00
Georgi Gerganov
1db8428f00 remove old files 2026-02-15 22:16:54 +02:00
Georgi Gerganov
7751ae2796 docs 2026-02-15 22:15:50 +02:00
Georgi Gerganov
d2b10302ce improve grader 2026-02-15 22:12:02 +02:00
Georgi Gerganov
68dde884d6 minor 2026-02-15 21:21:40 +02:00
Georgi Gerganov
fd90796da2 eval : support multiple dataset runs 2026-02-15 21:08:24 +02:00
Georgi Gerganov
8156d549f6 sim : fix answer matching 2026-02-15 21:08:24 +02:00
Georgi Gerganov
9695e6feb4 test : fix path 2026-02-15 21:08:24 +02:00
Georgi Gerganov
fb1481d60d eval : add prompts 2026-02-15 21:08:24 +02:00
Georgi Gerganov
812ae13ec1 eval : print progress 2026-02-15 21:08:24 +02:00
Georgi Gerganov
e79e8d02d5 examples: add task summary table to llama-eval-new.py 2026-02-15 21:08:23 +02:00
Georgi Gerganov
a939f4c47e docs: update llama-eval-discussion.md with threading and model parameter updates
- Add threading support implementation details
- Document ThreadPoolExecutor usage and thread safety
- Add model parameter implementation details
- Include testing results for both features
2026-02-15 21:08:23 +02:00
Georgi Gerganov
62b04cef54 examples: add threading support and model parameter to llama-eval-new.py
- Add ThreadPoolExecutor for parallel request processing controlled by --threads
- Add --model argument to specify model name in request data
- Refactor process() to use thread-safe _process_single_case() method
- Update progress tracking to work with concurrent execution
2026-02-15 21:08:23 +02:00
Georgi Gerganov
37b26cafee docs: update llama-eval-discussion.md with session work summary 2026-02-15 21:08:23 +02:00
Georgi Gerganov
04f6872116 examples: use cached dataset path in simulator to avoid HF Hub requests 2026-02-15 21:08:23 +02:00
Georgi Gerganov
c2619c18bf examples: use cached dataset path to avoid HF Hub requests 2026-02-15 21:08:23 +02:00
Georgi Gerganov
87f8930968 examples: remove HF_HUB_OFFLINE to allow dataset download 2026-02-15 21:08:23 +02:00
Georgi Gerganov
9453f9de12 examples: use HF_HUB_OFFLINE to avoid HF Hub warnings 2026-02-15 21:08:23 +02:00
Georgi Gerganov
5a1be6ce37 examples: implement flexible grader system for answer validation
- Add Grader class supporting regex and CLI-based grading
- Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande
- Add CLI grader interface: python script.py --answer <pred> --expected <gold>
- Add HF telemetry disable to avoid warnings
- Support exact match requirement for regex patterns
- Add 30-second timeout for CLI grader
- Handle both boxed and plain text formats for AIME answers
2026-02-15 21:08:23 +02:00
Georgi Gerganov
a80814e97b docs: remove README.md from llama-eval 2026-02-15 21:08:23 +02:00
Georgi Gerganov
5cc2258e82 examples: add simplified llama-eval-new.py for AIME evaluation
- Create new simplified evaluation script focused only on AIME
- Implement EvalState and Processor dataclasses for structured state management
- Add real-time feedback showing correct/incorrect status per case
- Abstract grading interface for external grader support
- Use structured JSON output for eval state
- Apply HuggingFace dataset caching to avoid repeated downloads
- Remove Levenshtein matching - eval script only sends requests and validates answers
2026-02-15 21:08:22 +02:00
Georgi Gerganov
c87af1d527 docs: update llama-eval-discussion.md with session work summary
Add summary of llama-server-simulator implementation work including
features, testing results, technical decisions, and refactoring.
2026-02-15 21:08:22 +02:00
Georgi Gerganov
23d4e21a81 examples: refactor test-simulator.sh for better readability
Extract repeating question string into TEST_QUESTION variable and
create make_request() helper function to reduce code duplication.
Add proper error handling for error responses.
2026-02-15 21:08:22 +02:00
Georgi Gerganov
07d5e1e0ea examples: add llama-server simulator for testing eval scripts
Add a standalone Python script that simulates a llama-server HTTP endpoint
for testing the eval script. The simulator:

- Implements /v1/chat/completions endpoint with OpenAI-compatible format
- Loads AIME dataset from HuggingFace with local caching
- Uses Levenshtein distance for intelligent question matching
- Supports configurable success rate for correct/wrong answer generation
- Provides debug logging for troubleshooting

Also includes test scripts and documentation for testing and understanding
the simulator functionality.
2026-02-15 21:08:22 +02:00
gatbontonpc
8839037528 add checkpointing 2026-02-15 21:08:22 +02:00
gatbontonpc
89cab3dbc5 Add readme 2026-02-15 21:08:22 +02:00
gatbontonpc
c2d83ca048 multi source llama-eval 2026-02-15 21:08:22 +02:00
gatbontonpc
c05df17ce3 working llama-eval mc and math suite 2026-02-15 21:08:19 +02:00
Daniel Bevenius
6ab881b7c3 model-conversion : add tensor-info.py utility (#18954)
This commit adds a new python script that can be used to print tensors
information from a tensor in a safetensors model.

The motivation for this is that during model conversion work it can
sometimes be useful to verify the shape of tensors in the original
model. While it is possible to print the tensors when loading the model
this can be slow when working with larger models.
With this script it is possible to quickly query tensor shapes.

Example usage:
```console
(venv) $ ./scripts/utils/tensor-info.py --help
usage: tensor-info.py [-h] [-m MODEL_PATH] [-l] [tensor_name]

Print tensor information from a safetensors model

positional arguments:
  tensor_name           Name of the tensor to inspect

options:
  -h, --help            show this help message and exit
  -m MODEL_PATH, --model-path MODEL_PATH
                        Path to the model directory (default: MODEL_PATH environment variable)
  -l, --list            List unique tensor patterns in the model (layer numbers replaced with #)
```

Listing tensor names:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m -l
embed_tokens.weight
layers.#.input_layernorm.weight
layers.#.mlp.down_proj.weight
layers.#.mlp.gate_proj.weight
layers.#.mlp.up_proj.weight
layers.#.post_attention_layernorm.weight
layers.#.post_feedforward_layernorm.weight
layers.#.pre_feedforward_layernorm.weight
layers.#.self_attn.k_norm.weight
layers.#.self_attn.k_proj.weight
layers.#.self_attn.o_proj.weight
layers.#.self_attn.q_norm.weight
layers.#.self_attn.q_proj.weight
layers.#.self_attn.v_proj.weight
norm.weight
```

Printing a specific tensor's information:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m layers.0.input_layernorm.weight
Tensor: layers.0.input_layernorm.weight
File:   model.safetensors
Shape:  [768]
```
2026-02-04 10:40:53 +01:00
Daniel Bevenius
6156ae5111 model-conversion : add debug option to conversion script (#19265)
This commit adds a debug option to the model conversion script to enable
using the Python debugger (pdb) during model conversion.

The motivation for this is that I've found myself adding this a few
times now and it would be quicker to have this flag as an option and a
makefile target/recipe for it.
2026-02-02 11:29:57 +01:00
Christian Kastner
7a4ca3cbd9 docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00