Vision-LanguagePretrainingCommercial OK

MINT-1T

by mlfoundations

Silver57
174.5Kdownloads
94likes
100B<n<1T

Description

πŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens πŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. πŸƒ MINT-1T is designed to facilitate research in multimodal pretraining. πŸƒ MINT-1T is created by a team from the University of Washington in… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML.

What can I do with this?

Tags

task_categories:image-to-texttask_categories:text-generationlanguage:enlicense:cc-by-4.0size_categories:100M<n<1Bformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2406.11271region:usmultimodal