Anthropic HH-RLHF — LLM preference Dataset

170k human preference comparisons between two AI assistant responses, with one labeled 'chosen' (helpful, harmless) and one 'rejected'. The dataset that defined RLHF alignment for chat assistants. Used to train early Claude models and became the standard reference for alignment research and DPO fine-tuning.

Dataset Details

ProviderAnthropic
Categorypreference
Size170k Pairs
LicenseMIT
Downloads2.8M
TagsRLHF, Alignment, Safety, Helpfulness, Human-Feedback
from datasets import load_dataset
ds = load_dataset("Anthropic/hh-rlhf")

← All Datasets | Fine-Tuning Guide