Benchmark
Technique
Definition
A standardized test or dataset used to evaluate and compare the performance of different AI models, algorithms, or systems. Benchmarks like MMLU, HumanEval, and ARC enable objective measurement of LLM capabilities in reasoning, code, and knowledge.
In French
Benchmark — Test standardisé permettant d’évaluer et comparer les performances de différents modèles ou systèmes IA. Les benchmarks comme MMLU, HumanEval ou ARC permettent de mesurer objectivement les capacités des LLM en raisonnement, code et connaissances.
Related terms
Reinforcement Learning
Deep Learning
Attention Mechanism
Classification
Clustering
Data Augmentation
🛠️ Related tools
Explore the full glossary
Discover all artificial intelligence terms in our glossary.