Benchmarks & EvaluationNon-Commercial
Turkmen Speech Dataset
by rozumov
110.9Kdownloads
0likes
100K<n<1MDescription
Turkmen Speech Dataset (ASR)
This dataset contains 251 hours of Turkmen speech audio with transcriptions, intended for training and evaluating Automatic Speech Recognition (ASR) models.
It is one of the largest publicly available Turkmen speech datasets.
Dataset Overview
Property
Value
Total clips
119,847
Total duration
251.86 hours
Sampling rate
16,000 Hz
Language
Turkmen (tk)
Split
train
Each item includes:
audio: waveform + sampling rate
text:… See the full description on the dataset page: https://huggingface.co/datasets/rozumov/TurkmenSpeech.
What can I do with this?
Tags
task_categories:automatic-speech-recognitiontask_categories:text-to-speechlanguage:tklicense:cc-by-nc-4.0size_categories:100K<n<1Mregion:us