Text Generation & ChatHuman AnnotatedCopyleft
Wikipedia
by legacy-datasets
75.9Kdownloads
613likes
n<1KDescription
Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).
What can I do with this?
Tags
task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:crowdsourcedmultilinguality:multilingualsource_datasets:originallanguage:aalanguage:ablanguage:acelanguage:aflanguage:aklanguage:alslanguage:amlanguage:anlanguage:anglanguage:arlanguage:arclanguage:arz