Benchmarks & EvaluationPPOCommercial OK

OpenWebText

by Skylion007

Silver59
80.8Kdownloads
501likes
1M<n<10M

Description

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances plain_text Size of downloaded dataset files: 13.51 GB Size of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.

What can I do with this?

Tags

task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:foundmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc0-1.0size_categories:1M<n<10Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us