Benchmarks & EvaluationPPO, Human AnnotatedCommercial OK

LibriSpeech

by openslr

Silver57
76.5Kdownloads
220likes
100K<n<1M

Description

Dataset Card for librispeech_asr Dataset Summary LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Supported Tasks and Leaderboards automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.

What can I do with this?

Tags

task_categories:automatic-speech-recognitiontask_categories:audio-classificationtask_ids:speaker-identificationannotations_creators:expert-generatedlanguage_creators:crowdsourcedlanguage_creators:expert-generatedmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc-by-4.0size_categories:100K<n<1Mformat:parquetmodality:audiomodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us