A Comprehensive and Explainable Approach to Evaluating LLMs’ Defense Capabilities

Organizers

Mingming Sun , Yaqing Wang

Speaker

Yue Feng

Time

Thursday, November 28, 2024 2:00 PM - 4:00 PM

Venue

A3-1-301

Online

Zoom 230 432 7880 (BIMSA)

Abstract

Given the importance of large language models (LLMs) safety, evaluating their defense capabilities against jailbreak attacks has become a key area of focus. However, current evaluation methods often fail to generalize to complex scenarios and lack transparency, leading to incomplete and inaccurate assessments. To address these limitations, we introduce JAILJUDGE, a comprehensive and explainable benchmark designed to assess LLMs’ defense capabilities. JAILJUDGE covers a wide array of risk scenarios, including synthetic, adversarial, in-the-wild, and multilingual prompts. It also offers detailed explanations to ensure transparent and reliable evaluations.

Speaker Intro

Yue Feng is an assistant professor at the University of Birmingham. She got her Ph.D. from University College London. Her research interests lie in natural language processing and information retrieval. She has published more than 30 papers in top conferences (e.g., ACL, SIGIR, EMNLP, WSDM, etc). She also won the Amazon Alexa Prize TaskBot Challenge and was awarded the Baidu Outstanding Research Intern Star.