Skip to content

Releases: open-compass/opencompass

0.3.3

30 Sep 08:58
22a4e76
Compare
Choose a tag to compare

🌟 OpenCompass v0.3.3 Release Log
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.3!

🚀 New Features

  • 🔧 Added support for the SciCode summarizer configuration.
  • 🛠 Introduced support for internal Followbench.
  • 🔧 Updated models and configurations for MathBench & WikiBench under FullBench.
  • 🛠 Enhanced support for OpenAI O1 models and Qwen2.5 Instruct.
  • 🔧 Included a postprocess function for custom models.
  • 🛠 Added InternTrain feature for broader model training scenarios.

📖 Documentation

  • 📚 Updated the README with the latest information on how to use OpenCompass effectively.

🐛 Bug Fixes

  • 🔧 Fixed issues with the link-check workflow and wildbench.
  • 🛠 Resolved errors in partitioning and corrected typos throughout the codebase.
  • 🔧 Addressed compatibility issues with lmdeploy interface type changes.
  • 🛠 Fixed the followbench dataset configuration and token settings.

⚙ Enhancements and Refactors

  • 🛠 Enhanced support for verbose output in OpenAI API interactions.
  • 🔧 Updated maximum output length configurations for multiple models.
  • 🛠 Improved handling of the "begin section" in meta_template for better parsing.
  • 🔧 Added a common summarizer for qabench and expanded test coverage for various models.

🎉 Welcome New Contributors
👋 We'd like to extend a warm welcome to our new contributors who have made their first contributions to OpenCompass:

Thank you to all our contributors for making this release possible!

Full Changelog: 0.3.2.post1...0.3.3

0.3.2.post1

06 Sep 10:48
b5f8afb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.2...0.3.2.post1

0.3.2

06 Sep 08:21
ff18545
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.2!

🚀 New Features

  • 🛠 Added extra_body support for OpenAISDK and introduced proxy URL support when connecting to OpenAI's API.
  • 🗂 Included auto-download functionality for Mmlu-pro, Needlebench, Longbench and other datasets.
  • 🤝 Integrated support for the Rendu API.
  • 🧪 Added a model postprocess function.

📖 Documentation

  • 📜 Updated the README file for better clarity and guidance.

🐛 Bug Fixes

  • 🛠 Fixed CLI evaluation for multiple models.
  • 🛠 Updated requirements to resolve dependency issues.
  • 🛠 Corrected configurations for the Llama model series.
  • 🛠 Addressed bad cases and added environment information to improve testing.

⚙ Enhancements and Refactors

  • 🛠 Made OPENAI_API_BASE compatible with OpenAI's default environment settings.
  • 🛠 Optimized SciCode for improved performance.
  • 🛠 Added an api_key attribute to TurboMindAPIModel.
  • 🛠 Implemented fixes and improvements to the CI test environment, including baselines for vllm.

🎉 Welcome New Contributors

  • 👋 @cpa2001 contributed with the addition of icl_sliding_k_retriever.py and updates to __init__.py.
  • 👋 @gyin94 made the OPENAI_API_BASE compatible with OpenAI's default environment.
  • 👋 @chengyingshe added an attribute api_key into TurboMindAPIModel.
  • 👋 @yanzeyu supported the integration of Rendu API.

Full Changelog: 0.3.1...0.3.2

OpenCompass v0.3.1

23 Aug 03:00
5485207
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.1!


🌟 Highlights

  • 🚀 Support pip installation, update Readme and evaluation demo
  • 🐛 Fixed various dataset loading issues.
  • ⚙️ Enhanced auto-download features for datasets.

🚀 New Features

  • 🆕 Introduced support for Ruler datasets.
  • 🆕 Enhanced model compatibility.
  • 🆕 Improved dataset handling, support auto-download for various datasets

📖 Documentation

  • 📚 Updated README to reflect the latest changes.
  • 📚 Improved documentation for dataset loading procedures.

🐛 Bug Fixes

  • 🐞 Resolved modelscope dataset load issues.
  • 🐞 Corrected evaluation scores for the Lawbench dataset.
  • 🐞 Fixed dataset bugs for CommonsenseQA and Longbench.

⚙ Enhancements and Refactors

  • 🔧 Retained first and last halves of prompts to avoid max_seq_len issues.
  • 🔧 Updated Compassbench to v1.3.
  • 🔧 Switched to Python runner for single GPU operations.

🎉 Welcome New Contributors

  • 🙌 @Yunnglin for fixing modelscope dataset load problem.
  • 🙌 @changyeyu for addressing max_seq_len issues with prompt handling.
  • 🙌 @seetimee for updates to openai_api.py.
  • 🙌 @HariSeldon0 for adding the scicode dataset.

What's Changed

Full Changelog: 0.3.0...0.3.1


Thank you for your continued support and contributions to OpenCompass!

OpenCompass v0.3.0

06 Aug 17:34
264fd23
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.0! This release brings a variety of new features, enhancements, and bug fixes to improve your experience.

🌟 Highlights

  1. Support for OpenAI ChatCompletion
  2. Updated Model Support List
  3. Support Dataset Automatic Download
  4. Support pip install opencompass

🚀 New Features

  1. Support for CompassBench Checklist Evaluation
  2. Adding support for Doubao API
  3. Support for ModelScope Datasets

📖 Documentation

  1. Update NeedleBench Docs
  2. Update Documentation

🐛 Bug Fixes

  1. Fix Typing and Typo
  2. Fix Lint Issues
  3. Fix Summary Error in subjective.py

⚙ Enhancements and Refactors

  1. Upgrade Default Math pred_postprocessor
  2. Fix Path and Folder Updates
  3. Update Get Data Path for LCBench and HumanEval

🔗 Full Change Logs

🎉 Welcome New Contributors

Full Changelog: 0.2.6...0.3.0

OpenCompass v0.2.6

05 Jul 16:36
a62c613
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.6!

🌟 Highlights

  • No noteworthy highlights.

🚀 New Features

  1. #1215 #1224 #1266 Add Datasets MT-Bench-101, Fofo, wildbench
  2. #1286 Add Models InternLM2.5-7B

📖 Documentation

  1. #1252 Add doc for accelerator function
  2. #1263 Update quick start guide

🐛 Bug Fixes

  1. #1221 Resolve release version installation and import issues
  2. #1228 Fix pip version issues
  3. #1282 Update MathBench summarizer & fix cot setting

⚙ Enhancements and Refactors

  1. #1284 Reorganize subjective eval

🎉 Welcome New Contributors

🔗 Full Change Logs

Full Changelog: 0.2.5...0.2.6

OpenCompass v0.2.5

29 May 16:35
a77b8a5
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!

🌟 Highlights

  • Simplify the huggingface / vllm / lmdeploy model wrapper. meta_template is no longer needed to be hand-crafted in model configs
  • Introduce evaluation results README in ~20 dataset config folders.

🚀 New Features

  1. #1065 Add LLaMA-3 Series Configs
  2. #1048 Add TheoremQA with 5-shot
  3. #1094 Support Math evaluation via judgemodel
  4. #1080 Add gpqa prompt from simple_evals, openai
  5. #1074 Add mmlu prompt from simple_evals, openai
  6. #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs

📖 Documentation

  1. #1053 Update readme
  2. #1102 Update NeedleInAHaystack Docs
  3. #1110 Update README.md
  4. #1205 Remove --no-batch-padding and Use --hf-num-gpus

🐛 Bug Fixes

  1. #1036 Update setup.py install_requires
  2. #1051 Fixed the issue caused
  3. #1043 fix multiround
  4. #1070 Fix sequential runner
  5. #1079 Fix Llama-3 meta template

⚙ Enhancements and Refactors

  1. #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
  2. #1104 fix prompt template
  3. #1109 Update performance of common benchmarks

🎉 Welcome New Contributors

🔗 Full Change Logs

Read more

OpenCompass v0.2.5.rc1

23 Apr 09:21
81d0e4d
Compare
Choose a tag to compare
Pre-release
[Feature] Add lmdeploy tis python backend model (#1014)

* add lmdeploy tis python backend model

* fix pr check

* update

OpenCompass v0.2.4

09 Apr 10:06
b39f501
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!

🌟 Highlights

  • Enhanced support for multiple datasets including QuALITY, APPS and TACO.
  • Introducing multi-model judging for subjective test.
  • Bug fixes and improvements in configurations and documentation.

🚀 New Features

🌐 General

  1. Feat #963 - Support for APPS dataset.
  2. Feature #976 - Add the implementation of QuALITY datasets.
  3. Feature #984 - Add support for setting prediction paths.
  4. Feature #1006 - Support alpacaeval_v2.
  5. Feature #1016 - Add multi-model judge.
  6. Feature #1019 - Add ATC Choice Version.

📖 Documentation

  1. Updates docs #1015 - General documentation updates and improvements.

🐛 Bug Fixes

  1. Fix #964 - Fix the config's name of deepseek-coder.
  2. Fix #890 - Update links and link checkers.
  3. Fix #977 - Fix a bug in internlm2 series configs.
  4. Fix #975 - Fix documentation issues.
  5. Fix #992 - Fix running issues in turbomind_tis.
  6. Fix #994 - Change status to list in base.py.
  7. Fix #995, Fix #1020 - Quick fixes and refactors for configs.

⚙ Enhancements and Refactors

  1. Modify requirements/runtime.txt #983 - Update numpy version requirement.
  2. Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
  3. Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.

🎉 Welcome New Contributors

🔗 Full Change Logs

[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in #964
[Fix] Update links and link checkers by @Leymore in #890
[Feat] support apps by @Connor-Shen in #963
fix doc problem by @seanzhang-zhichen in #975
[Fix] fix a bug in internlm2 series configs by @jingmingzhuo in #977
[Feature] Add the implement of QuALITY datasets by @jingmingzhuo in #976
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in #983
[Feature] add support for set prediction path by @bittersweet1999 in #984
[Feat] Support TACO by @Connor-Shen in #966
[Feature] update apps by @Connor-Shen in #985
[Fix] update apps/taco by @Connor-Shen in #988
[Feature] add one script for subjective by @bittersweet1999 in #993
Fix running issues in turbomind_tis by @ispobock in #992
[Fix] base.py change status into list by @Chaseldot in #994
[Fix] quick fix for configs by @bittersweet1999 in #995
[Feature] update needlebench and configs by @DseidLi in #986
[Feature] support alpacaeval_v2 by @bittersweet1999 in #1006
updates docs by @Y0oMu in #1015
[Feature] Add multi-model judge and fix some problems by @bittersweet1999 in #1016
[Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in #1020
[Feature] Add ATC Choice Version by @DseidLi in #1019
[Fix] Simplify needlebench summarizer by @DseidLi in #1024

For a detailed overview of all changes, check out our Full Changelog.

OpenCompass v0.2.4.rc1

25 Mar 10:15
0a6a03f
Compare
Choose a tag to compare
Pre-release

Provide with more parsed datasets:

OpenCompassData-complete-20240325.zip

Important updates compared to previous version are as follow:

Subjective: Add MTBench
LongText: Support Needle-In-Haystack Test Dataset
Code: Update generation version of CIBench