Vision-LanguagePretrainingNon-Commercial
cc12m-wds
by pixparse
30.7Kdownloads
38likes
10M<n<100MDescription
Dataset Card for Conceptual Captions 12M (CC12M)
Dataset Summary
Conceptual 12M (CC12M) is a dataset with 12 million image-text pairs specifically meant to be used for visionand-language pre-training.
Its data collection pipeline is a relaxed version of the one used in Conceptual Captions 3M (CC3M).
Usage
This instance of Conceptual Captions is in webdataset .tar format. It can be used with webdataset library or upcoming releases of Hugging Face datasets.… See the full description on the dataset page: https://huggingface.co/datasets/pixparse/cc12m-wds.
What can I do with this?
Tags
task_categories:image-to-textlicense:othersize_categories:10M<n<100Mformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantarxiv:2102.08981region:us