Stanford Alpaca — LLM instruction Dataset

The foundational self-instruct dataset that launched a thousand fine-tunes. 52k instruction-following examples generated by GPT-3 (text-davinci-003) using the Self-Instruct technique. Used to train the original Alpaca 7B and inspired nearly every instruction dataset that followed.

Dataset Details

Providertatsu-lab
Categoryinstruction
Size52k Rows
LicenseCC-BY-NC 4.0
Downloads4.5M
TagsSelf-Instruct, GPT-3, Foundational, General
from datasets import load_dataset
ds = load_dataset("tatsu-lab/alpaca")

← All Datasets | Fine-Tuning Guide