Benchmarks & EvaluationUnknown

General AI Assistants Benchmark

by gaia-benchmark

Silver58
33.8Kdownloads
632likes

Description

GAIA dataset GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format. Data and leaderboard GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

What can I do with this?

Tags

language:ensize_categories:n<1Kformat:parquetmodality:audiomodality:documentmodality:imagemodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantarxiv:2311.12983region:us