BIMSA

The three speakers develop principled statistical or deep learning methods to uncover latent, multilevel structures in high-dimensional genomic/omics data.
Dr. Fu introduces a deep graph attention autoencoder that detects hierarchical communities within gene regulatory networks, revealing organization across scales.
Dr. Wen presents a general probabilistic framework that takes any initial clustering result and systematically explores nested, multiresolution cluster structures, reconciling inconsistencies and recovering interpretable patterns in genetic and spatial transcriptomics data.
Dr. Hu addresses a complementary question: once groups or features are identified, how do we robustly test which ones act as mediators linking exposures to outcomes? Her symmetric mediation statistics provide powerful FDR controlled inference for high-dimensional omics mediators.
Together, the three talks move from detecting hierarchical network communities, to unifying and reconciling multiscale clusters, to pinpointing causal mediators among high-dimensional molecular features — a logical progression that highlights how modern statistical learning can extract richer, more reliable biological insights from complex data.

傅秋燕 ( Wayne State University School of Medicine )

胡懿娟 ( Peking University )

温晓泉 ( University of Michigan )

邬荣领 ( 北京雁栖湖应用数学研究院 , 清华丘成桐数学科学中心 )

2026年07月03日至 03日

Weekday	Time	Venue	Online	ID	Password
周五	15:00 - 17:00	A3-4-301	ZOOM 08	787 662 9899	BIMSA

时间\日期	07-03 周五
15:00-15:00	邬荣领
15:00-15:40	傅秋燕
15:40-16:20	温晓泉
16:20-17:00	胡懿娟

*本页面所有时间均为北京时间（GMT+8）。

2026-07-03

15:00-15:40 傅秋燕

Deep learning-based hierarchical community detection for high-dimensional gene regulatory networks

Reconstructing genome-wide gene regulatory networks (GRNs) from genomic data is challenging due to high dimensionality and complexity. We propose a hierarchical model with three layers: individual genes at the bottom, gene communities in the middle, and communities of communities at the top, revealing patterns at different scales. We developed DeepHCD, a deep learning algorithm using a graph attention autoencoder to learn low-dimensional embeddings and infer community structures top-down. DeepHCD minimizes a multitask loss function encompassing graph reconstruction, attribute reconstruction, clustering, and modularity, requiring only rough upper bounds for community numbers at each level. Simulations across diverse network types demonstrate DeepHCD's superior performance in detecting middle-layer communities using homogeneity and completeness metrics. Applied to single-cell regulon activity data (243 regulons, 30,000+ cells), DeepHCD outperforms existing methods, producing clearer community structures with the highest intra-group correlations.

15:00-15:00 邬荣领

Open Remark

15:40-16:20 温晓泉

Probabilistic multiresolution clustering

Cluster analysis is a widely used unsupervised learning technique in genomics, with applications ranging from inferring genetic population structure to identifying spatial domains in spatial transcriptomics (ST) data. However, existing clustering methods often yield inconsistent results and typically focus on identifying a single optimal partition, overlooking the intrinsic relationships among the inferred clusters. In this work, we introduce a computational framework for systematically exploring multiresolution clustering structures in scientific data, starting from an initial configuration generated by \textit{\textbf{any}} existing clustering algorithm. The proposed framework provides a unified and principled approach for uncovering complex nested latent structures and reconciling discrepancies among clustering results. Through simulations and applications to large-scale, high-dimensional genetic and spatial transcriptomics data, we demonstrate the framework's ability to recover interpretable clustering patterns and reveal biologically meaningful multiresolution structures.

16:20-17:00 胡懿娟

SMS: Symmetric Mediation Statistics for Powerful High-Dimensional Mediation Analysis

Mediation analysis of high-dimensional features, particularly molecular-level omics features, provides important opportunities to uncover biological mechanisms underlying human health and disease. However, two central statistical challenges remain: testing the composite null hypothesis and maintaining power when the exposure--mediator and mediator--outcome associations differ substantially in statistical significance. Existing methods typically rely on accurate estimation of the proportions of the three null types or on the maximum of the two association p-values, and may not always control the FDR well and may have limited power under imbalanced significance.We propose SMS, a new statistical framework based on symmetric mediation statistics. By exploiting symmetry, SMS calibrates the rejection threshold for FDR control under the composite null as a whole. It also allows flexible combinations of the two p-values corresponding to the E--M and M--O associations, including the maximum, and then enables an omnibus test. Moreover, it permits direct use of effect size estimates, bypassing the need to compute p-values. SMS maintained accurate FDR control across a wide range of simulation scenarios while achieving a substantial power gain, approximately 20%, over existing methods including HDMT, DACT, and DEI-B. Applications to a metabolomics dataset and a DNA methylation dataset further corroborated these findings. Notably, SMS discovered five plausible mediators in the metabolomics dataset that were missed by all existing methods considered.