Text Generation & ChatNon-Commercial
the Pile
by EleutherAI
1.8Kdownloads
492likes
100B<n<1TDescription
The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality
datasets combined together.
What can I do with this?
Tags
task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:foundmultilinguality:monolingualsource_datasets:originallanguage:enlicense:othersize_categories:100B<n<1Tarxiv:2201.07311arxiv:2101.00027region:us