Asymmetric Natural Vector Method for Predicting Ambiguous Non-standard Base Codes

Organizer

Stephen S-T. Yau

Speaker

Guoqing Hu

Time

Monday, October 21, 2024 9:00 PM - 9:30 PM

Venue

Online

Abstract

In this report, we introduce a novel approach based on the Asymmetric Natural Vector (ANV) method to address the problem of ambiguity in DNA sequences. We propose using ANV to predict the bases represented by non-standard codes in DNA sequences. Our approach involves developing a deep learning framework to establish a correspondence between DNA sequences (in FASTA format) and natural vectors, which encode relevant sequence properties. By training on a large dataset, we learn the distribution of these ambiguous base codes within the datasetThis method allows us to accurately predict masked or ambiguous bases in genomic fragments. It is particularly applicable to datasets, such as the COVlD-19 genome data, which contain numerous non-standard base codes like R, Y, S, W, K, M, B, D, H, and V. By employing our algorithm, we can effectively estimate the corresponding standard bases and assign confidence scores to each prediction, aiding in the resolution of sequencing uncertainties

Speaker Intro

Researcher at Beijing Institute of Mathematical Sciences and Applications
Xi'an Jiaotong University, Bachelor's and Master's in Computational Mathematics.
UIC University, PhD in Computer Science, Research Direction: Nonlinear Filter Control, Supervisor: Stephen S.-T Yau 丘成栋
After graduation, he mainly worked in the field of wireless communications in the United States. He worked as a senior engineer in Lucent, Alcatel-Lucent, Nokia and other companies.
Joined Beijing Institute of Mathematical Sciences and Applications (BIMSA) in Jan. of 2024, currently engaged in research on neural networks, artificial intelligence, big data, machine learning and biomathematics.