Classification & SentimentCommercial OK
FineFineWeb
by m-a-p
2.6Mdownloads
115likes
n>1TDescription
FineFineWeb: A Comprehensive Study on Fine-Grained Domain Web Corpus
arXiv: Coming Soon
Project Page: Coming Soon
Blog: Coming Soon
Data Statistics
Domain (#tokens/#samples)
Iteration 1 Tokens
Iteration 2 Tokens
Iteration 3 Tokens
Total Tokens
Iteration 1 Count
Iteration 2 Count
Iteration 3 Count
Total Count
aerospace
5.77B
261.63M
309.33M
6.34B
9100000
688505
611034
10399539
agronomy
13.08B
947.41M
229.04M
14.26B
15752828
2711790
649404
19114022
artistic… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/FineFineWeb.
What can I do with this?
Tags
task_categories:text-classificationtask_categories:text-generationlanguage:enlicense:apache-2.0size_categories:1B<n<10Bmodality:tabularmodality:textregion:us