Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/4k66-ev35

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

CMMLU: Measuring massive multitask language understanding in Chinese

keywords:

llm envaluation

chinese llm

benchmark

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance is becoming more important and more challenging. This paper aims to address this issue for Mandarin Chinese in the form of CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural sciences, social sciences, engineering, and the humanities. We conduct a thorough evaluation of more than 20 contemporary multilingual and Chinese LLMs, assessing their performance across different subjects and settings. The results reveal that most existing LLMs struggle to achieve an accuracy of even 60%, which is the pass mark for Chinese exams. This highlights that there is substantial room for improvement in the capabilities of LLMs. Additionally, we conduct extensive experiments to identify factors impacting the models' performance and propose directions for enhancing LLMs. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models for Chinese.

Downloads

Transcript English (automatic)

Next from ACL 2024

Exploring Chain-of-Thought for Multi-modal Metaphor Detection
poster

Exploring Chain-of-Thought for Multi-modal Metaphor Detection

ACL 2024

+1
Yanzhi Xu and 3 other authors

13 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved