Text Generation & ChatHuman AnnotatedCopyleft

Wikipedia

by legacy-datasets

Silver60

75.9Kdownloads

613likes

n<1K

Description

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

What can I do with this?

Tags

task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:crowdsourcedmultilinguality:multilingualsource_datasets:originallanguage:aalanguage:ablanguage:acelanguage:aflanguage:aklanguage:alslanguage:amlanguage:anlanguage:anglanguage:arlanguage:arclanguage:arz

Details

Tasks: text-generationfill-mask
Languages: aaabaceafakalsamanangararcarzasastatjavayazazbbabarbclbebgbhbibjnbmbnbobpybrbsbugbxrcacbkcdocecebchchochrchyckbcocrcrhcscsbcucvcydadedindiqdsbdtydvdzeeelemleneoeseteuextfafffifjfofrfrpfrrfurfygagaggangdglglkgngomgorgotgugvhahakhawhehihifhohrhsbhthuhyiaidieigiiikiloinhioisitiujajamjbojvkakaakabkbdkbpkgkikjkkklkmknkokoikrckskshkukvkwkylaladlblbelezlfnlglilijlmolnlolrcltltglvlzhmaimdfmgmhmhrmiminmkmlmnmrmrjmsmtmusmwlmymyvmznnanahnannapndsnenewngnlnnnonovnrfnsonvnyocoloomorospapagpampappcdpdcpflpipihplpmspnbpntpsptqurmrmyrnrorurueruprwsasahsatscscnscosdsesgsgsshsiskslsmsnsosqsrsrnssststqsusvswszltatcytdttetgthtitktltntotpitrtstttumtwtytyvudmugukuruzvevecvepvivlsvovrowawarwowuuxalxhxmfyiyoyuezazeazhzu
License: cc-by-sa-3.0
HuggingFace ID: legacy-datasets/wikipedia