BIMSA

Workshop on Topological AI and Applications

Artificial intelligence (AI) has fundamentally changed the landscape of science, engineering, and technology in the past decade and holds great promise for data-driven discovery. However, AI-based discovery encounters challenges arising from the intricate complexity, high dimensionality, nonlinearity, and multiscale nature of many datasets, particularly those from biological science. Rooted in algebraic, differential, and geometric topologies, topological data analysis (TDA) addresses these challenges in a unique manner that cannot be achieved by any other mathematical, physical, or statistical means. Topological AI is a new paradigm in data science and the frontier in rational learning. It has found successful applications in various fields such as science, engineering, medicine, defense, and industry. This workshop will bring together researchers from mathematics, computer science, and various applied sciences to exchange ideas and foster collaborations.
Organizing Institution: Beijing Key Laboratory of Topological Statistics and Applications for Complex Systems, BIMSA

组织者

雷逢春 , Xinqi Gong , 魏国卫 , 吴杰 , 邬荣领 , 李京艳 , 刘冉 , 田芳

演讲者

毕婉莹 ( Hebei Normal University & BIMSA , 北京雁栖湖应用数学研究院 )

董昂 ( 北京雁栖湖应用数学研究院 )

段海豹 ( 中国科学院数学与系统科学研究院 , 北京雁栖湖应用数学研究院-中国科学院大学 )

耿伟华 ( Southern Methodist University )

Xinqi Gong ( Renming U )

胡国庆 ( 北京雁栖湖应用数学研究院 )

康家熠 ( 北京雁栖湖应用数学研究院 )

雷逢春 ( 北京雁栖湖应用数学研究院 )

刘冉 ( 北京雁栖湖应用数学研究院-北京航空航天大学 )

宋汝志 ( Dalian University of Technology )

王雅晴 ( 北京雁栖湖应用数学研究院 )

王炳胥 ( 北京大学深圳研究生院新材料学院 )

魏国卫 ( 密西根州立大学 )

日期

2025年07月03日至 04日

位置

Weekday	Time	Venue	Online	ID	Password
周四,周五	09:00 - 18:00	A6-101	ZOOM 07	559 700 6085	BIMSA

日程安排

时间\日期	07-03 周四	07-04 周五
09:00-09:40		段海豹
09:40-10:20		Xinqi Gong
10:00-10:40	魏国卫
10:30-10:55		董昂
10:50-11:30	胡国庆
10:55-11:20		毕婉莹
11:20-11:45		宋汝志
11:30-11:55	刘冉
14:00-14:40	王雅晴
14:40-15:05	王炳胥
15:15-15:55	耿伟华
15:55-16:20	康家熠
16:30-17:10	雷逢春

*本页面所有时间均为北京时间（GMT+8）。

议程

2025-07-03

10:00-10:40 魏国卫

Mathematical AI and Application

Artificial intelligence (AI) has fundamentally changed the landscape of science, engineering, and technology in the past decade. However, AI-based discovery encounters data challenges arising from intricate complexity, high dimensionality, nonlinearity, and multiscale nature. We tackle these challenges with mathematical AI paradigms, including topological deep learning (TDL), a new frontier in rational learning introduced by us in 2017. We devised algebraic topology, differential geometry, geometric topology, and commutative algebra to significantly improve AI's ability to tackle data challenges. Using our mathematical AI approaches, my team has been the top winner in D3R Grand Challenges, a worldwide annual competition series in computer-aided drug design and discovery for years. By further integrating mathematical AI with millions of genomes isolated from patients, we discovered the mechanisms of SARS-CoV-2 evolution and accurately predicted emerging dominant SARS-CoV-2 variants months in advance.

10:50-11:30 胡国庆

Exploring potential transcription factors and their regulatory relationships based on asymmetric covariance natural vector encoding method and machine learning algorithms

Transcription factors (TFs) are pivotal in regulating cellular functions and responses to external stimuli by modulating gene transcription, either activating or repressing it. Despite advances that integrate experimental and computational methods, current TF prediction techniques still face significant challenges in deciphering the regulatory relationships between TFs and target genes, particularly in determining whether these relationships promote or inhibit gene expression. To overcome these obstacles, we develop ACNVE-K, an innovative framework integrating k-mer decomposition with asymmetric covariance natural vector encoding to transform amino acid sequences into multidimensional feature vectors. Leveraging XGBoost, Gradient Boosting, and RandomForest algorithms, we establish five predictive models for TF identification, target gene prediction, and directional regulation classification (activation/inhibition). Systematic evaluation demonstrates XGBoost's superior performance across human and mouse genomes, achieving enhanced accuracy with latest genome annotations. The 5-mer configuration optimally captures sequence features while maintaining computational efficiency. Furthermore, we implement Graph Attention Networks (GAT) to reconstruct TF-target regulatory networks, enabling discovery of novel interactions through topological analysis. Overall, this study presents powerful tools for TF prediction, directional transcriptional regulation analysis, and network-level exploration, thereby advancing the fields of precision medicine and functional genomics.

11:30-11:55 刘冉

IntComplex for high-order interactions

Graphs are essential tools for modeling pairwise interactions in fields such as biology, material science, and social networks. However, they fall short in representing many-body interactions involving more than two entities. Simplicial complexes and hypergraphs have been developed to address this need but still face challenges, particularly in capturing transitions between different orders of interactions. In this talk, I will present IntComplex, a new framework designed to model these high-order interactions. By incorporating homology theory, IntComplex offers a quantitative topological representation of these interactions. Additionally, I will demonstrate how persistent homology, applied through a filtration process, ensures a stable and robust analysis of the interactions. This framework introduces a new approach to understanding the topological properties of high-order interactions, with applications in complex network analysis.

14:00-14:40 王雅晴

Few-Shot Learning on Graphs

In many real-world scenarios, especially in AI-assisted scientific research such as drug discovery and biomedicine, data is naturally represented in graph structures—molecules as graphs, biological networks, and interaction graphs. However, a major challenge remains: labeled data is often extremely limited. Few-shot learning on graphs has emerged as a promising approach to address this challenge by enabling models to generalize from only a few annotated graph instances. In this talk, I will introduce a series of graph-based machine learning algorithms we have developed to enhance data efficiency and predictive performance under few-shot settings. Specifically, I will discuss our Property-Aware Relationship Network (PAR) (NeurIPS 2021, TPAMI 2024), which improves molecular property prediction by learning property-conditioned relational graphs through a dependency-query mechanism. I will also present our Parameter-Efficient Adapter for GNNs (PACIA) (IJCAI 2024), which dynamically generates a small number of task-specific parameters to modulate information propagation in graph neural networks, achieving superior few-shot generalization with minimal overhead. These methods demonstrate how few-shot learning tailored to graph-structured data can unlock new potential in scientific AI applications. I will conclude by discussing broader opportunities and future directions on developing more advanced methods, incorporating topology and machine learning on graphs.

14:40-15:05 王炳胥

Analyzing Material Structures Based on Topological Data Analysis with Artificial Intelligence

Topological features of material structures provide a novel perspective for analyzing material functions and properties, enabling more effective structural characterization. At the same time, with the rapid advancements of topological data analysis as an emerging method for structural data analysis and its ability to significantly simplify data complexity while preserving key information, the scope of topological studies on material structures has been greatly expanded. This presentation introduces the application of topology-based approaches for property prediction and stability analysis of small molecular clusters, energy prediction and structural design of surface-phase crystals, and property prediction and structural screening of bulk-phase crystals. These studies are all based on feature extraction through topological data analysis combined with artificial intelligence-based prediction methods, providing a new research paradigm for computational materials science.

15:15-15:55 耿伟华

A DNN Biophysics Model with Topological and Electrostatic Features

In this project, we provide a deep-learning neural network (DNN) based biophysics model to predict protein properties. The model uses multi-scale and uniform topological and electrostatic features generated from protein structural information and force field. The topological features are generated using the element specified persistent homology (ESPH) with a selection of heavy atoms or carbon atoms which contribute significantly in protein property. The electrostatic features are fast computed using a Cartesian treecode, modulated with treecode parameters for accuracy-cost trade- off. The machine learning simulation on two data sets with 4000+ and 17000+ protein structures shows the efficiency and fidelity of these features in representing the protein structure and force field for the predication of their biophysical properties such as electrostatic Coulombic energies and solvation energies. The feature generation algorithms have the potential as general tools in assisting machine learning based biophysical properties and function prediction for the broad biomolecules whose structures are obtained from both theoretical computing and experiments.

15:55-16:20 康家熠

Data Fusion over Sensor Networks: A Topological and Distributed Kalman Filters approach

This report summarizes existing research on how network topology in wireless sensor networks (WSNs) influences data fusion and distributed Kalman filtering. It examines how various configurations—ranging from single-sensor setups to multi-sensor systems and dynamic network topologies—affect state estimation. Key findings from the reviewed studies include the analysis of data fusion under network constraints, the development of distributed Kalman filters (DKFs) to enable collaborative sensor processing, the design of Kalman consensus filters to ensure consistent state estimates across nodes, the management of state consensus in time-varying topologies, and the evaluation of performance in dynamic communication scenarios. These studies provide a robust framework for reliable state estimation in WSNs, highlighting the impact of changing network structures on the stability and effectiveness of distributed filtering.

16:30-17:10 雷逢春

Introduction To Knot Data Analysis

Knot theory is an area of topology, it studies how pairwise disjoint circles are knotted in 3-dimensional space. Knot theory has been applied in broad fields, including molecular biology and chemistry. In recent years, topological data analysis has emerged as a powerful algebraic topology approach in data science, but knot theory has been less involved in due to the lack of localization and quantization. We address these challenges by introducing a multiscale knot theory paradigm that extends its scope from qualitative to quantitative analysis, providing an effective data analysis tool. It has been validated in quantitative protein flexibility analysis, drug toxicity evaluation, and many others. In the talk, after reviewing some fundamental facts in knot theory, I will explain the basic idea of knot data analysis. This is a joint work with Li Shen, Hongsong Feng, Ruzhi Song, Fengling Li, Jie Wu and Guo-Wei Wei.

2025-07-04

09:00-09:40 段海豹

Make Schubert calculus calculable

Hilbert 15th problem called for a rigorous foundation of Schubert enumerative calculus, in which a long standing and challenging part is Schubert's problem of characteristics. In the course of securing the foundation of algebraic geometry, Van der Waerden and Andre Weil attributed the problem to the determination of the intersection theory of flag manifolds. This talk surveys the background, content, and resolution of the problem of characteristics. Our main results are a unified formula for the characteristics, and a system description for the intersection rings of flag manifolds. We illustrate the effectiveness of the formula and the algorithm via explicit examples.

09:40-10:20 Xinqi Gong

超大多体蛋白复合物预测的拓扑问题

在后AlphaFold时代，单体蛋白质结构预测已实现突破，但多体蛋白质复合物的高精度预测仍面临巨大挑战。我的课题组近年来聚焦多体复合物的拓扑优化与结构预测，提出了一系列创新方法：（1）拓扑约束与几何建模（2）深度学习驱动的界面预测（3）对称性与动态组装优化。这里我将汇报多体蛋白质复合物结构预测的主要拓扑难题，这些难题的解决在抗体设计、药物靶点识别及合成生物学领域具有重要应用价值。

10:30-10:55 董昂

Complex Network Reconstruction and its Application

In this talk, I will introduce a framework for reconstructing complex networks using ordinary differential equation (ODE)-based methods, with a particular focus on gene regulatory networks (GRNs). I will highlight how our approach captures the dynamic and nonlinear nature of biological regulation and demonstrate its flexibility across diverse real-world applications. These include modeling gene interactions in complex diseases, uncovering microbial community dynamics, and some of our latest work. This integrative perspective offers new insights into the structure and function of biological systems and beyond, with promising implications for both fundamental research and translational applications.

10:55-11:20 毕婉莹

Topological magnitude for protein flexibility analysis

Protein flexibility is critical for structural adaptability and biological function, influencing enzymatic catalysis, molecular recognition, and signal transduction. Magnitude, an invariant from category theory and algebraic topology, was initially used to measure the size of finite metric spaces and has since been extended to graphs, hypergraphs, and material science. In this study, we introduce a novel method for predicting protein flexibility based on the magnitude of finite metric spaces. We model protein structures as finite metric spaces with biologically relevant distance functions and use magnitude to capture their geometric and topological properties, improving the prediction of B-factors, which represent atomic fluctuation. Experimental results on the Superset dataset show that our method achieves a Pearson correlation coefficient (PCC) of 0.725, outperforming GNM, pfFRI, ASPH, opFRI, and EH. Compared to GNM, our approach improves PCC by 28.32%, demonstrating its performance advantage. Additionally, by combining magnitude with global and local structural features, we validate the method’s robustness in blind tests, proving its effectiveness in capturing protein flexibility and offering a novel approach to protein dynamics analysis.

11:20-11:45 宋汝志

Multi-scale Jones Polynomial and Persistent Jones Polynomial for Knot Data Analysis

Many structures in science, engineering, and art can be viewed as curves in 3-space.The entanglement of these curves plays a crucial role in determining the functionality and physical properties of materials. Many concepts in knot theory provide theoretical tools to explore the complexity and entanglement of curves in 3-space. However, classical knot theory focuses on global topological properties and lacks the consideration of local structural information, which is critical in practical applications. In this work, two localized models based on the Jones polynomial were proposed, namely, the multi-scale Jones polynomial and the persistent Jones polynomial. The stability of these models, especially the insensitivity of the multi-scale and persistent Jones polynomial models to small perturbations in curve collections, was analyzed, thus ensuring their robustness for real-world applications.

Contact

Prof. Jingyan Li, Tel: 18301163837, Email: jingyanli@bimsa.cn