Dr. Fu introduces a deep graphattention autoencoder that detects hierarchical communities within gene regulatory networks, revealing organization across scales.
Dr. Wen presents a general probabilistic framework that takes any initial clustering result and systematically explores nested, multiresolution cluster structures, reconciling inconsistencies and recovering interpretable patterns in genetic and spatial transcriptomics data.
Dr. Hu addresses a complementary question: once groups or features are identified, how do we robustly test which ones act as mediators linking exposures to outcomes? Her symmetric mediation statistics provide powerful FDRcontrolled inference for highdimensional omics mediators.
Together, the three talks move from detecting hierarchical network communities, to unifying and reconciling multiscale clusters, to pinpointing causal mediators among highdimensional molecular features — a logical progression that highlights how modern statistical learning can extract richer, more reliable biological insights from complex data.
| Weekday | Time | Venue | Online | ID | Password |
|---|---|---|---|---|---|
| 周五 | 15:00 - 17:00 | A3-4-301 | ZOOM 08 | 787 662 9899 | BIMSA |
| 时间\日期 | 07-03 周五 |
|---|---|
| 15:00-15:00 | 邬荣领 |
| 15:00-15:40 | 傅秋燕 |
| 15:40-16:20 | 温晓泉 |
| 16:20-17:00 | 胡懿娟 |
*本页面所有时间均为北京时间(GMT+8)。
15:00-15:40 傅秋燕
Deep learning-based hierarchical community detection for high-dimensional gene regulatory networks
Reconstructing genome-wide gene regulatory networks (GRNs) from genomic data is challenging due to high dimensionality and complexity. We propose a hierarchical model with three layers: individual genes at the bottom, gene communities in the middle, and communities of communities at the top, revealing patterns at different scales. We developed DeepHCD, a deep learning algorithm using a graph attention autoencoder to learn low-dimensional embeddings and infer community structures top-down. DeepHCD minimizes a multitask loss function encompassing graph reconstruction, attribute reconstruction, clustering, and modularity, requiring only rough upper bounds for community numbers at each level. Simulations across diverse network types demonstrate DeepHCD's superior performance in detecting middle-layer communities using homogeneity and completeness metrics. Applied to single-cell regulon activity data (243 regulons, 30,000+ cells), DeepHCD outperforms existing methods, producing clearer community structures with the highest intra-group correlations.
15:00-15:00 邬荣领
Open Remark
15:40-16:20 温晓泉
Probabilistic multiresolution clustering
Cluster analysis is a widely used unsupervised learning technique in genomics, with applications ranging from inferring genetic population structure to identifying spatial domains in spatial transcriptomics (ST) data. However, existing clustering methods often yield inconsistent results and typically focus on identifying a single optimal partition, overlooking the intrinsic relationships among the inferred clusters. In this work, we introduce a computational framework for systematically exploring multiresolution clustering structures in scientific data, starting from an initial configuration generated by \textit{\textbf{any}} existing clustering algorithm. The proposed framework provides a unified and principled approach for uncovering complex nested latent structures and reconciling discrepancies among clustering results. Through simulations and applications to large-scale, high-dimensional genetic and spatial transcriptomics data, we demonstrate the framework's ability to recover interpretable clustering patterns and reveal biologically meaningful multiresolution structures.
16:20-17:00 胡懿娟
SMS: Symmetric Mediation Statistics for Powerful High-Dimensional Mediation Analysis
Mediation analysis of high-dimensional features, particularly molecular-level omics features, provides important opportunities to uncover biological mechanisms underlying human health and disease. However, two central statistical challenges remain: testing the composite null hypothesis and maintaining power when the exposure--mediator and mediator--outcome associations differ substantially in statistical significance. Existing methods typically rely on accurate estimation of the proportions of the three null types or on the maximum of the two association p-values, and may not always control the FDR well and may have limited power under imbalanced significance.We propose SMS, a new statistical framework based on symmetric mediation statistics. By exploiting symmetry, SMS calibrates the rejection threshold for FDR control under the composite null as a whole. It also allows flexible combinations of the two p-values corresponding to the E--M and M--O associations, including the maximum, and then enables an omnibus test. Moreover, it permits direct use of effect size estimates, bypassing the need to compute p-values. SMS maintained accurate FDR control across a wide range of simulation scenarios while achieving a substantial power gain, approximately 20%, over existing methods including HDMT, DACT, and DEI-B. Applications to a metabolomics dataset and a DNA methylation dataset further corroborated these findings. Notably, SMS discovered five plausible mediators in the metabolomics dataset that were missed by all existing methods considered.