News

[2023.09.08] We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our homepage for more details.
[2023.09.06] Baichuan2 team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
[2023.09.02] We have supported the evaluation of Qwen-VL in OpenCompass.
[2023.08.25] TigerBot team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
[2023.08.21] Lagent has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
[2023.08.18] We have supported evaluation for multi-modality learning, include MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with OpenCompass !
[2023.08.18] Dataset card is now online. Welcome new evaluation benchmark OpenCompass !
[2023.08.11] Model comparison is now online. We hope this feature offers deeper insights!
[2023.08.11] We have supported LEval.
[2023.08.10] OpenCompass is compatible with LMDeploy. Now you can follow this instruction to evaluate the accelerated models provide by the Turbomind.
[2023.08.10] We have supported Qwen-7B and XVERSE-13B ! Go to our leaderboard for more results! More models are welcome to join OpenCompass.
[2023.08.09] Several new datasets(CMMLU, TydiQA, SQuAD2.0, DROP) are updated on our leaderboard! More datasets are welcomed to join OpenCompass.
[2023.08.07] We have added a script for users to evaluate the inference results of MMBench-dev.
[2023.08.05] We have supported GPT-4! Go to our leaderboard for more results! More models are welcome to join OpenCompass.
[2023.07.27] We have supported CMMLU! More datasets are welcome to join OpenCompass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

news.md

news.md

News

Files

news.md

Latest commit

History

news.md

File metadata and controls

News