DPO Mix 7K — LLM preference Dataset

A compact, curated DPO (Direct Preference Optimization) dataset of 7k chosen/rejected pairs, carefully selected from multiple high-quality sources. Ideal for running DPO experiments locally — small enough to fine-tune in under an hour on consumer GPUs while producing meaningfully aligned models.

Dataset Details

Providerargilla
Categorypreference
Size7k Pairs
LicenseApache 2.0
Downloads220k
TagsDPO, Alignment, Curated, Community, Quick-Training
from datasets import load_dataset
ds = load_dataset("argilla/dpo-mix-7k")

← All Datasets | Fine-Tuning Guide