Files
llama.cpp/examples/llama-eval/test-grader.py
Georgi Gerganov 5a1be6ce37 examples: implement flexible grader system for answer validation
- Add Grader class supporting regex and CLI-based grading
- Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande
- Add CLI grader interface: python script.py --answer <pred> --expected <gold>
- Add HF telemetry disable to avoid warnings
- Support exact match requirement for regex patterns
- Add 30-second timeout for CLI grader
- Handle both boxed and plain text formats for AIME answers
2026-02-15 21:08:23 +02:00

27 lines
623 B
Python
Executable File

#!/usr/bin/env python3
import sys
import argparse
def main():
parser = argparse.ArgumentParser(description="Test grader script")
parser.add_argument("--answer", type=str, required=True, help="Predicted answer")
parser.add_argument("--expected", type=str, required=True, help="Expected answer")
args = parser.parse_args()
pred = args.answer.strip()
gold = args.expected.strip()
print(f"Gold: {gold}")
print(f"Pred: {pred}")
if pred == gold:
print("Correct!")
sys.exit(0)
else:
print("Incorrect")
sys.exit(1)
if __name__ == "__main__":
main()