BIMSA >
Advances in Artificial Intelligence
A Comprehensive and Explainable Approach to Evaluating LLMs’ Defense Capabilities
A Comprehensive and Explainable Approach to Evaluating LLMs’ Defense Capabilities
Organizers
Speaker
Yue Feng
Time
Thursday, November 28, 2024 2:00 PM - 4:00 PM
Venue
A3-1-301
Online
Zoom 230 432 7880
(BIMSA)
Abstract
Given the importance of large language models (LLMs) safety, evaluating their defense capabilities against jailbreak attacks has become a key area of focus. However, current evaluation methods often fail to generalize to complex scenarios and lack transparency, leading to incomplete and inaccurate assessments. To address these limitations, we introduce JAILJUDGE, a comprehensive and explainable benchmark designed to assess LLMs’ defense capabilities. JAILJUDGE covers a wide array of risk scenarios, including synthetic, adversarial, in-the-wild, and multilingual prompts. It also offers detailed explanations to ensure transparent and reliable evaluations.
Speaker Intro
Yue Feng is an assistant professor at the University of Birmingham. She got her Ph.D. from University College London. Her research interests lie in natural language processing and information retrieval. She has published more than 30 papers in top conferences (e.g., ACL, SIGIR, EMNLP, WSDM, etc). She also won the Amazon Alexa Prize TaskBot Challenge and was awarded the Baidu Outstanding Research Intern Star.