Description
CoSyn-400k
CoSyn-400k is a collection of synthetic question-answer pairs about very diverse range of computer-generated images.
The data was created by using the Claude large language model to generate code that can be executed to render an image,
and using GPT-4o mini to generate Q/A pairs based on the code (without using the rendered image).
The code used to generate this data is open source.
Synthetic pointing data is available in a seperate repo.
Quick links:
📃 CoSyn… See the full description on the dataset page: https://huggingface.co/datasets/allenai/CoSyn-400K.
What can I do with this?
Tags
task_categories:visual-question-answeringlicense:odc-bysize_categories:100K<n<1Mformat:parquetmodality:imagemodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2502.14846arxiv:2409.17146region:us