⚙️

Benchmark

Technique

⚙️ Technique 🌐 Benchmark

Définition

Test standardisé permettant d’évaluer et comparer les performances de différents modèles ou systèmes IA. Les benchmarks comme MMLU, HumanEval ou ARC permettent de mesurer objectivement les capacités des LLM en raisonnement, code et connaissances.

En anglais

Benchmark — A standardized test or dataset used to evaluate and compare the performance of different AI models, algorithms, or systems. Benchmarks like MMLU, HumanEval, and ARC enable objective measurement of LLM capabilities in reasoning, code, and knowledge.