CodeHuman AnnotatedNon-Commercial

github-code

by codeparrot

Silver55
25.8Kdownloads
353likes
unknown

Description

The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

What can I do with this?

Tags

task_categories:text-generationtask_ids:language-modelinglanguage_creators:crowdsourcedlanguage_creators:expert-generatedmultilinguality:multilinguallanguage:codelicense:otherregion:us