CodePretraining, Synthetic DataOther

DataCompDR-1B

by apple

Silver50
39.1Kdownloads
30likes
1B<n<10B

Description

Dataset Card for DataCompDR-1B This dataset contains synthetic captions, embeddings, and metadata for DataCompDR-1B. The metadata has been generated using pretrained image-text models on DataComp-1B. For details on how to use the metadata, please visit our github repository. Dataset Details Dataset Description DataCompDR is an image-text dataset and an enhancement to the DataComp dataset. We reinforce the DataComp dataset using our multi-modal dataset… See the full description on the dataset page: https://huggingface.co/datasets/apple/DataCompDR-1B.

What can I do with this?

Tags

task_categories:text-to-imagetask_categories:image-to-textlanguage:enlicense:apple-amlrsize_categories:1B<n<10Bformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantarxiv:2311.17049region:us