Benchmarks & EvaluationPPO, Human AnnotatedCommercial OK
LibriSpeech
by openslr
76.5Kdownloads
220likes
100K<n<1MDescription
Dataset Card for librispeech_asr
Dataset Summary
LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
Supported Tasks and Leaderboards
automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.
What can I do with this?
Tags
task_categories:automatic-speech-recognitiontask_categories:audio-classificationtask_ids:speaker-identificationannotations_creators:expert-generatedlanguage_creators:crowdsourcedlanguage_creators:expert-generatedmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc-by-4.0size_categories:100K<n<1Mformat:parquetmodality:audiomodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us