Text Generation & ChatCommercial OK
FineTranslations-Edu
by HuggingFaceFW
1.5Kdownloads
26likes
n>1TDescription
💬 FineTranslations
The world's knowledge in 1+1T tokens of parallel text
NOTE: this is the Edu version of the dataset, containing only the top 10% scoring data based on an educational classifier applied to the English translations. It has no splits. For the base dataset, see here.
What is it?
This dataset contains over 1 trillion tokens of parallel text in English and 500+ languages. It was obtained by translating data from 🥂 FineWeb2 into English using… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finetranslations-edu.
What can I do with this?
Tags
task_categories:text-generationtask_categories:translationlanguage:abklanguage:abqlanguage:abslanguage:acmlanguage:adhlanguage:adilanguage:adylanguage:aeblanguage:afrlanguage:agxlanguage:aiilanguage:aimlanguage:ainlanguage:ajzlanguage:akblanguage:alnlanguage:alslanguage:alt