Role-Play & CharactersUnknown
fc-amf-ocr
by lightonai
12.1Kdownloads
23likes
1M<n<10MDescription
Dataset Card for Finance Commons AMF OCR dataset (FC-AMF-OCR)
Dataset Summary
The FC-AMF-OCR dataset is a comprehensive document collection derived from the AMF-PDF dataset, which is part of the Finance Commons collection. This extensive dataset comprises 9.3 million images, each processed through Optical Character Recognition (OCR) using the docTR library. While native text annotations are available in the AMF-Text dataset, these annotations suffer from imperfections and… See the full description on the dataset page: https://huggingface.co/datasets/lightonai/fc-amf-ocr.
What can I do with this?
Tags
task_categories:image-to-textlanguage:enlanguage:frsize_categories:10K<n<100Kformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantregion:us