BigCodeBench
🧠A benchmark for evaluating the code generation capabilities of large language models, featuring 1,140 software-engineering-oriented programming tasks with two modes (Complete and Instruct) to test models on complex instructions and diverse function call scenarios.
PythonPyTorch大语言模型