Text Generation & ChatCommercial OK
Youtube Commons Corpus
by PleIAs
3.0Kdownloads
378likes
Description
📺 YouTube-Commons 📺
YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.
Content
The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels).
In total, this represents nearly 45 billion words (44,811,518,375).
All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
What can I do with this?
Tags
task_categories:text-generationlanguage:enlanguage:frlanguage:eslanguage:ptlanguage:delanguage:rulicense:cc-by-4.0region:usconversational