Live catalog — syncing from HuggingFace

Find the perfect fine-tuned model for your project

The curated directory of fine-tuned AI models and training datasets. No ML expertise required — browse by use case, modality, or training method.

3,452

Fine-tuned models

10,000

Training datasets

100

Uncensored models

47

Base model families

Browse by category

Top models

Fine-Tuned Models

View all

Top datasets

Training Datasets

View all
Math & Reasoning

Grade School Math 8K

openai

Silver67

Dataset Card for GSM8K Dataset Summary GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning. These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

PPO, Human Annotated1K<n<10K
761.9K1.2Ken
Commercial OK
Uncategorized

PhysicalAI-Autonomous-Vehicles

nvidia

Silver67

PHYSICAL AI AUTONOMOUS VEHICLES The PhysicalAI-Autonomous-Vehicles dataset provides one of the largest, geographically diverse collections of multi-sensor data empowering AV researchers to build the next generation of Physical AI based end-to-end driving systems. This dataset is ready for commercial/non-commercial AV use per the license agreement. Data Collection Method Automatic/Sensor Labeling Method Automatic/Sensor This dataset has a total of 1700 hours of driving… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles.

993.0K805
Non-Commercial
Text Generation & Chat

FineWeb

HuggingFaceFW

Silver66

🍷 FineWeb 15 trillion tokens of the finest data the 🌐 web has to offer What is it? The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.

n>1T
206.0K2.7Ken
Attrib. Required
Text Generation & Chat

WikiText

Salesforce

Silver66

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/wikitext.

Human Annotated1M<n<10M
1.1M653en
Copyleft
Code

prompts.chat

fka

Silver65

a.k.a. Awesome ChatGPT Prompts This is a Dataset Repository mirror of prompts.chat — a social platform for AI prompts. 📢 Notice This Hugging Face dataset is a mirror. For the latest prompts, features, and community contributions, please visit: 🌐 Website: prompts.chat 📦 GitHub: github.com/f/awesome-chatgpt-prompts About prompts.chat is an open-source platform where users can share, discover, and collect AI prompts from the community. The project can be… See the full description on the dataset page: https://huggingface.co/datasets/fka/prompts.chat.

Synthetic Data100K<n<1M
32.0K9.6K
Commercial OK
Video

Xperience-10M

ropedia-ai

Silver64

⚠️ Important: If you have already submitted an access request but have not completed the required DocuSign agreement, your request will remain pending. Please complete signing and we will grant access once verified. Interactive Intelligence from Human Xperience Xperience-10M Dataset Summary Xperience-10M is a large-scale egocentric multimodal dataset of human experience for embodied AI, robotics, world models, and spatial… See the full description on the dataset page: https://huggingface.co/datasets/ropedia-ai/xperience-10m.

1M<n<10M
2.2M154en
Non-Commercial

What is fine-tuning?

Fine-tuning takes a pre-trained AI model and trains it further on specialized data — teaching a generalist to become an expert in your field. Instead of training from scratch (millions of dollars), you start from an existing model and adapt it with your own dataset.

This catalog helps you find both fine-tuned models others have created, and the training datasets you need for your own fine-tuning projects. We filter out pure quantizations and format conversions — only genuine fine-tunes that involved real training.

Built on generosity

Every model and dataset in this catalog exists because someone chose to share their work with the world. Behind each entry is real human expertise — researchers, engineers, and hobbyists who invested their knowledge, their time, and often significant compute resources to create something valuable, then gave it away freely.

Fine-tuning a model can take days of GPU time. Curating a training dataset can take months of careful annotation. These contributions represent a quiet, extraordinary act of generosity — people sharing the fruits of their labor so that others can build on them, learn from them, and push the boundaries of what's possible.

To every open-source contributor in this catalog: thank you.