mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-10 19:14:07 +00:00
650 B
650 B
llama-eval
Simple evaluation tool for llama.cpp with support for multiple datasets.
For a full description, usage examples, and sample results, see:
Quick start
# Single server
python3 llama-eval.py \
--server http://localhost:8033 \
--model my-model \
--dataset gsm8k --n_cases 100 \
--grader-type regex --threads 32
# Multiple servers (comma-separated URLs and thread counts)
python3 llama-eval.py \
--server http://gpu1:8033,http://gpu2:8033 \
--server-name gpu1,gpu2 \
--threads 16,16 \
--dataset aime2025 --n_cases 240 \
--grader-type regex