Math & ReasoningSynthetic DataCommercial OK

dolphin

by QuixiAI

Bronze49
1.4Kdownloads
431likes

Description

Dolphin 🐬 https://erichartford.com/dolphin Dataset details This dataset is an attempt to replicate the results of Microsoft's Orca Our dataset consists of: ~1 million of FLANv2 augmented with GPT-4 completions (flan1m-alpaca-uncensored.jsonl) ~3.5 million of FLANv2 augmented with GPT-3.5 completions (flan5m-alpaca-uncensored.jsonl) We followed the submix and system prompt distribution outlined in the Orca paper. With a few exceptions. We included all 75k of CoT in the FLAN-1m… See the full description on the dataset page: https://huggingface.co/datasets/QuixiAI/dolphin.

What can I do with this?

Tags

task_categories:text-generationlanguage:enlicense:apache-2.0size_categories:1M<n<10Mformat:jsonmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:us