CodeHuman AnnotatedNon-Commercial
github-code
by codeparrot
25.8Kdownloads
353likes
unknownDescription
The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.
What can I do with this?
Tags
task_categories:text-generationtask_ids:language-modelinglanguage_creators:crowdsourcedlanguage_creators:expert-generatedmultilinguality:multilinguallanguage:codelicense:otherregion:us