Benchmarks & EvaluationUnknown
General AI Assistants Benchmark
by gaia-benchmark
33.8Kdownloads
632likes
Description
GAIA dataset
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc).
We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.
Data and leaderboard
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.
What can I do with this?
Tags
language:ensize_categories:n<1Kformat:parquetmodality:audiomodality:documentmodality:imagemodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantarxiv:2311.12983region:us