UncategorizedCommercial OK

Parameter Golf FineWeb Export

by willdepueoai

Bronze46
40.7Kdownloads
6likes

Description

Parameter Golf FineWeb Export This repository hosts tokenizer-matched export artifacts derived from HuggingFaceFW/fineweb, specifically a 30B subset pulled from the 100B FineWeb cut used for parameter-golf experiments. The repository contains: pretokenized training and validation shards under datasets/datasets/ tokenizer artifacts under datasets/tokenizers/ the export manifest at datasets/manifest.json selected-document metadata at datasets/docs_selected.jsonl License… See the full description on the dataset page: https://huggingface.co/datasets/willdepueoai/parameter-golf.

What can I do with this?

Tags

source_datasets:HuggingFaceFW/fineweblicense:odc-byregion:us