Description
๐ฎ The WHOLE FLAN Collection! ๐ฎ
Overview
This repository includes the full dataset from the FLAN Collection, totalling ~300GB as parquets.
Generated using the official seqio templating from the Google FLAN Collection GitHub repo.
The data is subject to all the same licensing of the component datasets.
To keep up with our continued work on OpenOrca and other exciting research, find our Discord here:
https://AlignmentLab.ai
Motivation
This work was done as part ofโฆ See the full description on the dataset page: https://huggingface.co/datasets/Open-Orca/FLAN.
What can I do with this?
Tags
language:enlicense:cc-by-4.0size_categories:100M<n<1Bformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2301.13688arxiv:2109.01652arxiv:2110.08207arxiv:2204.07705region:us