Instruction FollowingSFT, Distillation, Synthetic DataCommercial OK

OpenCodeReasoning

by nvidia

Silver52
4.1Kdownloads
533likes
100K<n<1M

Description

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Data Overview OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT). Technical Report - Discover the methodology and technical details behind OpenCodeReasoning. Github Repo - Access the complete pipeline used to… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenCodeReasoning.

What can I do with this?

Tags

task_categories:text-generationlicense:cc-by-4.0size_categories:100K<n<1Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2504.01943region:ussynthetic