Text Generation & ChatUnknown

tiny-shakespeare

by Trelis

Bronze42
5.1Kdownloads
11likes
n<1K

Description

Data source Downloaded via Andrej Karpathy's nanogpt repo from this link Data Format The entire dataset is split into train (90%) and test (10%). All rows are at most 1024 tokens, using the Llama 2 tokenizer. All rows are split cleanly so that sentences are whole and unbroken.

What can I do with this?

Tags

task_categories:text-generationlanguage:ensize_categories:n<1Kformat:csvmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:usfine-tuningshakespeare