CodeCommercial OK

github-code-clean

by codeparrot

Silver53
23.7Kdownloads
136likes

Description

The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.

What can I do with this?

Tags

license:apache-2.0size_categories:10M<n<100Mmodality:textlibrary:datasetslibrary:mlcroissantregion:us