Math & ReasoningCommercial OK

HPLT2.0_cleaned

by HPLT

Silver50
28.4Kdownloads
38likes
n>1T

Description

NB: HPLT2.0 is now superseded by a newer release: HPLT3.0 We recommed switching to v3.0, unless you have a compelling reason to stay on 2.0. This is a large-scale collection of web-crawled documents in 191 world languages, produced by the HPLT project. The source of the data is mostly Internet Archive with some additions from Common Crawl. For a detailed description of the dataset, please refer to our website and our pre-print. The Cleaned variant of HPLT Datasets v2.0 This is the… See the full description on the dataset page: https://huggingface.co/datasets/HPLT/HPLT2.0_cleaned.

What can I do with this?

Tags

task_categories:fill-masktask_categories:text-generationtask_ids:language-modelingmultilinguality:multilinguallanguage:acelanguage:aflanguage:alslanguage:amlanguage:arlanguage:aslanguage:astlanguage:awalanguage:ayrlanguage:azblanguage:azjlanguage:balanguage:bmlanguage:banlanguage:belanguage:bem