Text Generation & ChatCommercial OK

Youtube Commons Corpus

by PleIAs

Silver50
3.0Kdownloads
378likes

Description

📺 YouTube-Commons 📺 YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license. Content The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.

What can I do with this?

Tags

task_categories:text-generationlanguage:enlanguage:frlanguage:eslanguage:ptlanguage:delanguage:rulicense:cc-by-4.0region:usconversational