Few-Shot Learning in AI for Science
演讲者
时间
2025年03月21日 15:00 至 16:30
地点
A3-2a-302
线上
Zoom 637 734 0280
(BIMSA)
摘要
In the current field of AI-assisted scientific research (AI for Science), particularly in drug discovery and biomedicine, we often face the challenge of scarce labeled data. Few-shot learning has become a key technology to address this challenge, as it can effectively leverage limited data for learning and prediction. In this report, I will introduce a series of machine learning algorithms developed specifically to improve data efficiency and prediction accuracy in AI for Science under data scarcity. I will discuss the application of few-shot learning techniques in molecular property prediction, reviewing existing technologies and presenting our proposed Property-Aware Relationship Network (PAR) (NeurIPS 2021, TPAMI 2024) and parameter-efficient Graph Neural Network Adapter (PACIA) (IJCAI 2024). PAR optimizes the relationship representations between molecules by introducing a property-aware molecular encoder and a dependency-query-based relational graph learning module, thereby improving prediction accuracy for various chemical properties. Meanwhile, PACIA enhances few-shot molecular property prediction performance by generating a small number of adaptive parameters to modulate the information propagation process in graph neural networks. In addition, I will introduce the KnowDDI technique (Communications Medicine 2024), which enhances drug representations by leveraging large biomedical knowledge graphs and explains predicted drug-drug interactions (DDIs) by learning knowledge subgraphs of drug pairs, effectively addressing the issue of scarce known data. KnowDDI not only improves prediction performance but also enhances the interpretability of the model, making the prediction process more transparent and trustworthy. Finally, I will share the vision of applying few-shot learning techniques in broader scientific research.
演讲者介绍
王雅晴现为北京雁栖湖应用数学研究院副研究员,2019年于香港科技大学计算机科学及工程学系取得博士学位,师从倪明选教授和郭天佑教授,研究方向为机器学习。2019至2024年,她在百度研究院担任资深研究员,专注于标注样本稀缺的冷启动推荐、检索意图识别、大模型和智能体(Agent)优化以及AI4Science等领域的研发工作。王博士的研究方向涵盖机器学习、人工智能与数据科学。她的研究以“简约性原则”为指导,旨在以高效、低成本的方式发现具有解释力的科学机制,用以应对现实世界中的复杂问题。当前,她主要关注小样本学习、元学习与上下文学习等高效学习范式,大语言模型与智能体的建模方法,人工智能在科学与数学领域的交叉应用(AI + X),以及面向冷启动问题的推荐系统与用户建模技术。王博士已在国际顶级会议与期刊如NeurIPS, ICML, ICLR, KDD, TheWebConf, SIGIR, EMNLP, TPAMI, JMLR, 以及TIP上发表了30多篇论文,被引用4700余次。作为项目骨干,她承担了科技部科技创新2030下一代人工智能重大项目和国家自然科学基金面上项目。她长期担任IJCAI和AAAI的高级程序委员,并为ICML、NeurIPS、ICLR、TPAMI等顶级会议与期刊审稿。王博士于2024年入选全球前2%顶尖科学家榜单,并于2025年入选北京市科技新星计划。