Text Generation & ChatCommercial OK

FineTranslations-Edu

by HuggingFaceFW

Bronze42
1.5Kdownloads
26likes
n>1T

Description

💬 FineTranslations The world's knowledge in 1+1T tokens of parallel text NOTE: this is the Edu version of the dataset, containing only the top 10% scoring data based on an educational classifier applied to the English translations. It has no splits. For the base dataset, see here. What is it? This dataset contains over 1 trillion tokens of parallel text in English and 500+ languages. It was obtained by translating data from 🥂 FineWeb2 into English using… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finetranslations-edu.

What can I do with this?

Tags

task_categories:text-generationtask_categories:translationlanguage:abklanguage:abqlanguage:abslanguage:acmlanguage:adhlanguage:adilanguage:adylanguage:aeblanguage:afrlanguage:agxlanguage:aiilanguage:aimlanguage:ainlanguage:ajzlanguage:akblanguage:alnlanguage:alslanguage:alt