CodeFeedback-Filtered-Instruction — LLM code Dataset

74k coding instruction pairs across Python, Java, JavaScript, and C++ with a focus on interactive code generation and debugging. Combines data from Magicoder, ShareGPT-Python, and original generation to create a diverse multi-language coding chat dataset. Used to train OpenCodeInterpreter.

Dataset Details

Providerm-a-p
Categorycode
Size74k Rows
LicenseApache 2.0
Downloads180k
TagsMulti-Language, Python, Java, Code-Chat
from datasets import load_dataset
ds = load_dataset("m-a-p/codefeedback")

← All Datasets | Fine-Tuning Guide