CodePretraining, Synthetic DataOther
DataCompDR-1B
by apple
39.1Kdownloads
30likes
1B<n<10BDescription
Dataset Card for DataCompDR-1B
This dataset contains synthetic captions, embeddings, and metadata for DataCompDR-1B.
The metadata has been generated using pretrained image-text models on DataComp-1B.
For details on how to use the metadata, please visit our github repository.
Dataset Details
Dataset Description
DataCompDR is an image-text dataset and an enhancement to the DataComp dataset.
We reinforce the DataComp dataset using our multi-modal dataset… See the full description on the dataset page: https://huggingface.co/datasets/apple/DataCompDR-1B.
What can I do with this?
Tags
task_categories:text-to-imagetask_categories:image-to-textlanguage:enlicense:apple-amlrsize_categories:1B<n<10Bformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantarxiv:2311.17049region:us