Instruction FollowingSFT, Synthetic Data, Self-InstructNon-Commercial
Alpaca
by tatsu-lab
87.3Kdownloads
934likes
Description
Dataset Card for Alpaca
Dataset Summary
Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.
The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:
The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.
What can I do with this?
Tags
task_categories:text-generationlanguage:enlicense:cc-by-nc-4.0size_categories:10K<n<100Kformat:parquetmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantregion:usinstruction-finetuning