Instruction FollowingSynthetic DataCommercial OK

UltraChat 200k

by HuggingFaceH4

Silver58
42.4Kdownloads
678likes
100K<n<1M

Description

Dataset Card for UltraChat 200k Dataset Description This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic: Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5% of the data… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k.

What can I do with this?

Tags

task_categories:text-generationlanguage:enlicense:mitsize_categories:100K<n<1Mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2305.14233region:us