AI benchmark Machine Learning
noun phrase
Definition: A standardized test, dataset, or evaluation framework used to measure and compare the performance of AI systems on defined tasks under specified conditions. Recent work on benchmark quality also emphasizes that a good AI benchmark should have a clear purpose, scope, interpretability, and usable evaluation procedures [Reuel et al. 2024[.
Example in context: “Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems.” [Lazar, Nelson 2025]
Synonyms: benchmark; AI evaluation benchmark
Related terms: leaderboard; evaluation metric; benchmark dataset; model evaluation