Question 1

Can I use CodeFeedback-Filtered-Instruction commercially?

Accepted Answer

Yes — CodeFeedback-Filtered-Instruction is released under Apache 2.0, a permissive license that allows commercial use, including training models you ship in a product. Check the dataset card for attribution requirements before release.

Question 2

How much data does CodeFeedback-Filtered-Instruction contain, and do I need all of it?

Accepted Answer

CodeFeedback-Filtered-Instruction contains 74k Rows. You rarely need all of it: for style and format fine-tuning, a few hundred to a few thousand examples are enough — load a slice (e.g. split="train[:1000]") and scale up only if quality plateaus.

Question 3

What is CodeFeedback-Filtered-Instruction best used for?

Accepted Answer

Training coding models that respond to execution feedback. It belongs to the Code section of our dataset hub, where you'll find alternatives and complementary sets.

Provider	m-a-p
Category	Code
Size	74k Rows
License	Apache 2.0
Downloads	180k
Tags	Multi-Language, Python, Java, Code-Chat

7B QLoRA	~6GB VRAM
13B QLoRA	~10GB VRAM

CodeFeedback-Filtered-Instruction — LLM Code Dataset

Dataset Details

Fine-tune with this dataset

Related datasets

Frequently asked questions