Math & ReasoningCommercial OK

dolma3_mix-6T-1025-7B

by allenai

Silver57
536.4Kdownloads
37likes

Description

⚠️ WARNING: This dataset is intended ONLY for reproducing Olmo 3 7B ⚠️ For all other training use cases, including training from scratch, please utilize our primary dolma 3 data mix: https://huggingface.co/datasets/allenai/dolma3_mix-6T. Note: Some olmOCR science PDFs in the current dataset have been redacted following the training of Olmo 3 7B. These texts are indicated with [REMOVED] in the text field. This will affect reproducibility of Olmo 3 7B. For this reason, please use our… See the full description on the dataset page: https://huggingface.co/datasets/allenai/dolma3_mix-6T-1025-7B.

What can I do with this?

Tags

task_categories:text-generationlanguage:enlicense:odc-byarxiv:2512.13961region:us