CodePretraining, Synthetic DataUnknown
LongBench
by zai-org
91.8Kdownloads
175likes
1K<n<10KDescription
LongBench is a comprehensive benchmark for multilingual and multi-task purposes, with the goal to fully measure and evaluate the ability of pre-trained language models to understand long text. This dataset consists of twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, few-shot learning, synthetic tasks, and code completion.
What can I do with this?
Tags
task_categories:question-answeringtask_categories:text-generationtask_categories:summarizationtask_categories:text-classificationlanguage:enlanguage:zhsize_categories:1K<n<10Karxiv:2308.14508arxiv:2108.00573arxiv:1712.07040arxiv:2105.03011arxiv:2104.02112arxiv:2104.05938arxiv:2305.05280arxiv:2303.09752arxiv:1910.10683arxiv:2306.14893arxiv:2306.03091region:usLong Context