mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-12 03:54:06 +00:00

Files

Georgi Gerganov 81a65cf035 eval : add Wilson score confidence interval to results

Compute 95% CI on-the-fly from completed cases. Displayed in
terminal output, HTML report, and JSON state.

2026-05-10 18:46:36 +03:00

llama-eval.py

2026-05-10 18:46:36 +03:00

llama-server-simulator.py

sim : fix answer matching

2026-05-10 18:13:46 +03:00

README.md

remove junk

2026-05-10 18:13:50 +03:00

test-simulator.sh

test : fix path

2026-05-10 18:13:46 +03:00

llama-eval

Simple evaluation tool for llama.cpp with support for multiple datasets.

TODO: add usage