Text Generation & ChatNon-Commercial

the Pile

by EleutherAI

Silver50
1.8Kdownloads
492likes
100B<n<1T

Description

The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.

What can I do with this?

Tags

task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:foundmultilinguality:monolingualsource_datasets:originallanguage:enlicense:othersize_categories:100B<n<1Tarxiv:2201.07311arxiv:2101.00027region:us