Math & ReasoningSynthetic DataCommercial OK
dolphin
by QuixiAI
1.4Kdownloads
431likes
Description
Dolphin 🐬
https://erichartford.com/dolphin
Dataset details
This dataset is an attempt to replicate the results of Microsoft's Orca
Our dataset consists of:
~1 million of FLANv2 augmented with GPT-4 completions (flan1m-alpaca-uncensored.jsonl)
~3.5 million of FLANv2 augmented with GPT-3.5 completions (flan5m-alpaca-uncensored.jsonl)
We followed the submix and system prompt distribution outlined in the Orca paper. With a few exceptions. We included all 75k of CoT in the FLAN-1m… See the full description on the dataset page: https://huggingface.co/datasets/QuixiAI/dolphin.
What can I do with this?
Tags
task_categories:text-generationlanguage:enlicense:apache-2.0size_categories:1M<n<10Mformat:jsonmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:us