Skip to main content
⚙️

Benchmark

Technique

⚙️ Technique 🌐 Benchmark

Definition

A standardized test or dataset used to evaluate and compare the performance of different AI models, algorithms, or systems. Benchmarks like MMLU, HumanEval, and ARC enable objective measurement of LLM capabilities in reasoning, code, and knowledge.

In French

Benchmark — Test standardisé permettant d’évaluer et comparer les performances de différents modèles ou systèmes IA. Les benchmarks comme MMLU, HumanEval ou ARC permettent de mesurer objectivement les capacités des LLM en raisonnement, code et connaissances.

Related terms

⚙️
Reinforcement Learning
⚙️
Deep Learning
⚙️
Attention Mechanism
⚙️
Classification
⚙️
Clustering
⚙️
Data Augmentation

🛠️ Related tools

Explore the full glossary

Discover all artificial intelligence terms in our glossary.