Instruction FollowingSFT, Synthetic Data, Self-InstructNon-Commercial

Alpaca

by tatsu-lab

Silver61
87.3Kdownloads
934likes

Description

Dataset Card for Alpaca Dataset Summary Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications: The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

What can I do with this?

Tags

task_categories:text-generationlanguage:enlicense:cc-by-nc-4.0size_categories:10K<n<100Kformat:parquetmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantregion:usinstruction-finetuning