Preference & Alignment (DPO/RLHF)Preference Learning, Reward Modeling, Evol-InstructCommercial OK
UltraFeedback
by openbmb
4.1Kdownloads
410likes
100K<n<1MDescription
Introduction
GitHub Repo
UltraRM-13b
UltraCM-13b
UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples.
To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.
What can I do with this?
Tags
task_categories:text-generationlanguage:enlicense:mitsize_categories:10K<n<100Kformat:jsonmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantlibrary:polarsarxiv:2310.01377region:us