Text Generation & ChatUnknown
tiny-shakespeare
by Trelis
5.1Kdownloads
11likes
n<1KDescription
Data source
Downloaded via Andrej Karpathy's nanogpt repo from this link
Data Format
The entire dataset is split into train (90%) and test (10%).
All rows are at most 1024 tokens, using the Llama 2 tokenizer.
All rows are split cleanly so that sentences are whole and unbroken.
What can I do with this?
Tags
task_categories:text-generationlanguage:ensize_categories:n<1Kformat:csvmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:usfine-tuningshakespeare