ShareGPT 52K — LLM instruction Dataset

Real multi-turn conversations scraped from ShareGPT.com — a site where users shared their best ChatGPT chats. Contains 52k real human-AI conversation trees. The dataset that made Vicuna and OpenChat possible by teaching models natural multi-turn dialogue.

Dataset Details

ProviderRyokoAI
Categoryinstruction
Size52k Conversations
LicenseCC-BY-NC 4.0
Downloads1.1M
TagsMulti-Turn, Real-World, ChatGPT, Conversational
from datasets import load_dataset
ds = load_dataset("RyokoAI/sharegpt")

← All Datasets | Fine-Tuning Guide