Text Generation & ChatPretrainingCommercial OK

Dolma

by allenai

Silver53
2.9Kdownloads
1.0Klikes
n>1T

Description

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

What can I do with this?

Tags

task_categories:text-generationlanguage:enlicense:odc-bysize_categories:n>1Tarxiv:2402.00159arxiv:2301.13688region:uslanguage-modelingcasual-lmllm