UncategorizedCommercial OK
Parameter Golf FineWeb Export
by willdepueoai
40.7Kdownloads
6likes
Description
Parameter Golf FineWeb Export
This repository hosts tokenizer-matched export artifacts derived from HuggingFaceFW/fineweb, specifically a 30B subset pulled from the 100B FineWeb cut used for parameter-golf experiments.
The repository contains:
pretokenized training and validation shards under datasets/datasets/
tokenizer artifacts under datasets/tokenizers/
the export manifest at datasets/manifest.json
selected-document metadata at datasets/docs_selected.jsonl
License… See the full description on the dataset page: https://huggingface.co/datasets/willdepueoai/parameter-golf.
What can I do with this?
Tags
source_datasets:HuggingFaceFW/fineweblicense:odc-byregion:us