Text Generation & ChatCopyleft
wikipedia
by wikimedia
95.3Kdownloads
1.2Klikes
n<1KDescription
Dataset Card for Wikimedia Wikipedia
Dataset Summary
Wikipedia dataset containing cleaned articles of all languages.
The dataset is built from the Wikipedia dumps (https://dumps.wikimedia.org/)
with one subset per language, each containing a single train split.
Each example contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).
All language subsets have already been processed for recent dump, and you… See the full description on the dataset page: https://huggingface.co/datasets/wikimedia/wikipedia.
What can I do with this?
Tags
task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelinglanguage:ablanguage:acelanguage:adylanguage:aflanguage:altlanguage:amlanguage:amilanguage:anlanguage:anglanguage:anplanguage:arlanguage:arclanguage:arylanguage:arzlanguage:aslanguage:ast