Description
Dataset Summary
SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks.
Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf
See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os
Dataset Structure
We follow SWE-Bench Verified (https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified) in terms of dataset structure, with several… See the full description on the dataset page: https://huggingface.co/datasets/ScaleAI/SWE-bench_Pro.
What can I do with this?
Tags
benchmark:officialbenchmark:eval-yamlsize_categories:n<1Kformat:parquetmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantregion:us