Math & ReasoningSynthetic DataCommercial OK

dolphin

by QuixiAI

Bronze49

1.4Kdownloads

431likes

Description

Dolphin 🐬 https://erichartford.com/dolphin Dataset details This dataset is an attempt to replicate the results of Microsoft's Orca Our dataset consists of: ~1 million of FLANv2 augmented with GPT-4 completions (flan1m-alpaca-uncensored.jsonl) ~3.5 million of FLANv2 augmented with GPT-3.5 completions (flan5m-alpaca-uncensored.jsonl) We followed the submix and system prompt distribution outlined in the Orca paper. With a few exceptions. We included all 75k of CoT in the FLAN-1m… See the full description on the dataset page: https://huggingface.co/datasets/QuixiAI/dolphin.

dolphin

Description

What can I do with this?

Tags