Speech & AudioPPO, Human AnnotatedOther

People's Speech

by MLCommons

Silver55
23.8Kdownloads
263likes
1T<n

Description

Dataset Card for People's Speech Dataset Summary The People's Speech Dataset is among the world's largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0. It includes 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially is available with a permissive license. Supported Tasks… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/peoples_speech.

What can I do with this?

Tags

task_categories:automatic-speech-recognitionannotations_creators:crowdsourcedannotations_creators:machine-generatedlanguage_creators:crowdsourcedlanguage_creators:machine-generatedmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc-by-2.0license:cc-by-2.5license:cc-by-3.0license:cc-by-4.0license:cc-by-sa-3.0license:cc-by-sa-4.0size_categories:1M<n<10Mformat:parquetmodality:audiomodality:textlibrary:datasetslibrary:dask