Winogrande — LLM evaluation Dataset

44k commonsense reasoning problems inspired by Winograd Schema challenges. Tests whether models can resolve pronoun ambiguity using real-world knowledge. Requires understanding of causality and physical common sense that pure language statistics can't capture. Included in the Open LLM Leaderboard.

Dataset Details

Providerallenai
Categoryevaluation
Size44k Problems
LicenseApache 2.0
Downloads3M
TagsBenchmark, Commonsense-Reasoning, Winograd, NLU
from datasets import load_dataset
ds = load_dataset("allenai/winogrande")

← All Datasets | Fine-Tuning Guide