Text Generation & ChatPretrainingCommercial OK
Dolma
by allenai
2.9Kdownloads
1.0Klikes
n>1TDescription
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
What can I do with this?
Tags
task_categories:text-generationlanguage:enlicense:odc-bysize_categories:n>1Tarxiv:2402.00159arxiv:2301.13688region:uslanguage-modelingcasual-lmllm