Skip to content

OpenCompass v0.1.2

Compare
Choose a tag to compare
@gaotongxiao gaotongxiao released this 11 Aug 10:45
· 589 commits to main since this release
4fc1701

This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.

🆕Highlights

🏆 Leaderboard: The evaluation results of Qwen-7B, XVERSE-13B, LLaMA-2, and GPT-4 has been posted to our leaderboard. Now it's also possible to conduct model comparison online. We hope this feature offers deeper insights!

📊 Datasets: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. (#101, #192) Add datasets related to safety to collections. [#185]

🎭New modality: Support for MMBench is introduced, and the evaluation of multi-modal models is on the way! (#56 ,#161) Besides, Intern language model is introduced. (#51)

⚙️Enhancement: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [#121] [#128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [#148] [#187]

📝 Documentation: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.

🛠️ Bug Fixes: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [#122] [#137]

🎉 New Contributors

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

@go-with-me000 (First Contribution)
@anakin-skywalker-Joseph (First Contribution)
@zhouzaida (First Contribution)
@dependabot (First Contribution)

Changelog

Full Changelog: 0.1.1...0.1.2