CodeSynthetic DataCommercial OK
OpenThoughts-114k
by open-thoughts
169.4Kdownloads
819likes
Description
[!NOTE]
We have released a paper for OpenThoughts! See our paper here.
Open-Thoughts-114k
Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
Inspect the content with rich formatting with Curator Viewer.
Available Subsets
default subset containing ready-to-train data used to finetune the OpenThinker-7B and OpenThinker-32B models:
ds = load_dataset("open-thoughts/OpenThoughts-114k", split="train")… See the full description on the dataset page: https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k.
What can I do with this?
Tags
license:apache-2.0size_categories:100K<n<1Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantarxiv:2506.04178region:uscuratorsynthetic