Instruction FollowingSFT, Distillation, Synthetic DataCommercial OK
OpenCodeReasoning
by nvidia
4.1Kdownloads
533likes
100K<n<1MDescription
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Data Overview
OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming
questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT).
Technical Report - Discover the methodology and technical details behind OpenCodeReasoning.
Github Repo - Access the complete pipeline used to… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenCodeReasoning.
What can I do with this?
Tags
task_categories:text-generationlicense:cc-by-4.0size_categories:100K<n<1Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2504.01943region:ussynthetic