-

Cross-Barcodes and their applications in data analysis

组织者

马修·伯菲特 , 李京艳 , 吴杰 , 杨南君 , 周嘉伟

演讲者

Serguei Barannikov

时间

2025年01月09日 13:30 至 14:30

地点

A3-2a-302

线上

Zoom 482 240 1589 (BIMSA)

摘要

In this talk, we describe a topological framework for comparing data representations, called R-Cross-Barcodes [1], and discuss its use in data analysis. R-Cross-Barcodes are a tool that measures multi-scale discrepancies in the topological structures of two point clouds with a one-to-one correspondence between points. The R-Cross-Barcodes track the discrepancies of topological features taking into account their localization, allowing comparison of data embeddings even when they lie in distinct ambient spaces. Based on R-Cross-Barcode, we define the Representation Topology Divergence (RTD), a scalar quantifying the topological differences in two data representations. We review the construction and principal properties of R-Cross-Barcodes, including the exact sequence:

\[
\cdots \;\to\;
H_i\bigl(\mathrm{VR}_\alpha(\mathcal{G}^w)\bigr)
\;\to\;
H_i\bigl(\mathrm{VR}_\alpha(\mathcal{G}^{\min(w,\tilde{w}}))\bigr)
\;\to\;
H_i\bigl(\mathrm{VR}_\alpha(\hat{\mathcal{G}}^{w,\tilde{w}})\bigr)
\;\to\;
H_{i-1}\bigl(\mathrm{VR}_\alpha(\mathcal{G}^w)\bigr)
\;\to\;
\cdots
\]

relating the R-Cross-Barcodes to localization-aware discrepancies in standard Bar-codes features.

We then incorporate RTD as a loss in deep auto-encoders to obtain topology-preserving data embeddings. Minimizing RTD aligns topological features between the original dataset and the embedding. We explain the stability, the continuity and the differentiability properties of the RTD loss. Our experiments with neural network representation analysis demonstrate that R-Cross-Barcodes and RTD capture and preserve topological structure, providing a robust method for analyzing, comparing and optimizing complex data representations

演讲者介绍

Prof. Serguei Barannikov earned his Ph.D. from UC Berkeley and has made contributions to algebraic topology, algebraic geometry, mathematical physics, and machine learning. His work, prior to his Ph.D., introduced canonical forms of filtered complexes, now known as persistence barcodes, which have become fundamental in topological data analysis. More recently, he has applied topological methods to machine learning, particularly in the study of large language models, with results published in leading ML conferences such as NeurIPS, ICML, and ICLR, effectively bridging pure mathematics and advanced AI research.