Math & ReasoningCommercial OK
dolma3_mix-6T-1025-7B
by allenai
536.4Kdownloads
37likes
Description
⚠️ WARNING: This dataset is intended ONLY for reproducing Olmo 3 7B ⚠️
For all other training use cases, including training from scratch, please utilize our primary dolma 3 data mix: https://huggingface.co/datasets/allenai/dolma3_mix-6T.
Note: Some olmOCR science PDFs in the current dataset have been redacted following the training of Olmo 3 7B. These texts are indicated with [REMOVED] in the text field. This will affect reproducibility of Olmo 3 7B.
For this reason, please use our… See the full description on the dataset page: https://huggingface.co/datasets/allenai/dolma3_mix-6T-1025-7B.
What can I do with this?
Tags
task_categories:text-generationlanguage:enlicense:odc-byarxiv:2512.13961region:us