Science & ResearchCommercial OK
NewsWire
by dell-research-harvard
5.5Kdownloads
90likes
1M<n<10MDescription
Dataset Card for NewsWire
Dataset Summary
NewsWire contains 2.7 million unique public domain U.S. news wire articles, written between 1878 and 1977. Locations in these articles are georeferenced, topics are tagged using customized neural topic classification, named entities are recognized, and individuals are disambiguated to Wikipedia using a novel entity disambiguation model.
Languages
English (en)
Dataset Structure
Each year in the dataset is… See the full description on the dataset page: https://huggingface.co/datasets/dell-research-harvard/newswire.
What can I do with this?
Tags
task_categories:text-classificationtask_categories:text-generationtask_categories:text-retrievaltask_categories:summarizationtask_categories:question-answeringlanguage:enlicense:cc-by-4.0size_categories:1M<n<10Mformat:jsonmodality:tabularmodality:textlibrary:datasetslibrary:dasklibrary:mlcroissantarxiv:2406.09490doi:10.57967/hf/2423region:ussocial scienceeconomicsnews