mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-13 20:44:09 +00:00
Add a standalone Python script that simulates a llama-server HTTP endpoint for testing the eval script. The simulator: - Implements /v1/chat/completions endpoint with OpenAI-compatible format - Loads AIME dataset from HuggingFace with local caching - Uses Levenshtein distance for intelligent question matching - Supports configurable success rate for correct/wrong answer generation - Provides debug logging for troubleshooting Also includes test scripts and documentation for testing and understanding the simulator functionality.
llama.cpp/example/llama-eval
llama-eval.py is a single-script evaluation runner that sends prompt/response pairs to any OpenAI-compatible HTTP server (the default llama-server).
./llama-server -m model.gguf --port 8033
python examples/llama-eval/llama-eval.py --path_server http://localhost:8033 --n_prompts 100 --prompt_source arc
The supported tasks are:
- GSM8K — grade-school math
- AIME — competition math (integer answers)
- MMLU — multi-domain multiple choice
- HellaSwag — commonsense reasoning multiple choice
- ARC — grade-school science multiple choice
- WinoGrande — commonsense coreference multiple choice