Instruction FollowingPPO, Reward ModelingCommercial OK

HelpSteer2

by nvidia

Silver51
3.5Kdownloads
442likes
10K<n<100K

Description

HelpSteer2: Open-source dataset for training top-performing reward models HelpSteer2 is an open-source Helpfulness Dataset (CC-BY-4.0) that supports aligning models to become more helpful, factually correct and coherent, while being adjustable in terms of the complexity and verbosity of its responses. This dataset has been created in partnership with Scale AI. When used to tune a Llama 3.1 70B Instruct Model, we achieve 94.1% on RewardBench, which makes it the best Reward Model as… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/HelpSteer2.

What can I do with this?

Tags

language:enlicense:cc-by-4.0size_categories:10K<n<100Kformat:jsonmodality:tabularmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsarxiv:2410.01257arxiv:2406.08673region:ushuman-feedback