Vision-LanguagePretrainingNon-Commercial

cc12m-wds

by pixparse

Silver50
30.7Kdownloads
38likes
10M<n<100M

Description

Dataset Card for Conceptual Captions 12M (CC12M) Dataset Summary Conceptual 12M (CC12M) is a dataset with 12 million image-text pairs specifically meant to be used for visionand-language pre-training. Its data collection pipeline is a relaxed version of the one used in Conceptual Captions 3M (CC3M). Usage This instance of Conceptual Captions is in webdataset .tar format. It can be used with webdataset library or upcoming releases of Hugging Face datasets.… See the full description on the dataset page: https://huggingface.co/datasets/pixparse/cc12m-wds.

What can I do with this?

Tags

task_categories:image-to-textlicense:othersize_categories:10M<n<100Mformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantarxiv:2102.08981region:us