Benchmarks & EvaluationPPOCommercial OK
OpenWebText
by Skylion007
80.8Kdownloads
501likes
1M<n<10MDescription
Dataset Card for "openwebtext"
Dataset Summary
An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2.
This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
plain_text
Size of downloaded dataset files: 13.51 GB
Size of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.
What can I do with this?
Tags
task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:foundmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc0-1.0size_categories:1M<n<10Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us