Translation & MultilingualCommercial OK

fineweb-edu-translated

by Helsinki-NLP

Silver50
300.6Kdownloads
4likes

Description

Helsinki-NLP/fineweb-edu-translated fineweb-edu-tanslated is a collection of automatically translated documents from fineweb-edu. Translations are based on OPUS-MT and HPLT-MT models. The data covers 36,704,000 documents with over 28 billion space-searated tokens of English data translated into 36 languages. The total data set is incudes of over 960 billion tokens and the translated documents are aligned across all languages. More information about how the data has been produced can… See the full description on the dataset page: https://huggingface.co/datasets/Helsinki-NLP/fineweb-edu-translated.

What can I do with this?

Tags

task_categories:translationtask_categories:text-generationlanguage:boslanguage:bullanguage:catlanguage:ceslanguage:danlanguage:deulanguage:elllanguage:englanguage:estlanguage:euslanguage:finlanguage:fralanguage:glelanguage:glglanguage:hrvlanguage:hunlanguage:isllanguage:ita