BIMSA >
Seminar on Bioinformatics
An Alignment-Free Method for Detection of Missing Regions for Phylogenetic Analysis
An Alignment-Free Method for Detection of Missing Regions for Phylogenetic Analysis
Organizer
Speaker
Mengcen Guan
Time
Sunday, April 23, 2023 10:30 AM - 11:00 AM
Venue
Online
Abstract
Phylogenetic tree estimation from genomic sequences is a fundamental problem in computational biology. Pairwise or multiple sequence alignment is often a prerequisite for phylogenetic analysis using conventional approaches. However, sequence alignment has difficulties related to scalability and accuracy in the case of long sequences such as whole genomes, low sequence identity, and in the presence of genomic rearrangements. To address these issues, alignment free approaches have been proposed. While these methods have demonstrated promising results, many of these lead to errors when regions are missing from the sequences of one or more species that are trivially detected in alignment based methods. Here, we present an alignment free method for detecting missing regions in sequences of species for which phylogeny is to be estimated. It is based on counts of k-mers, and can be used to filter out k-mers belonging to regions in one species that are missing in one or more of the other species. We perform experiments with real and simulated datasets containing missing regions and find that it can successfully detect a large fraction of such k-mers, and can lead to improvements in the estimated phylogenies. Our method can be used in k-mer based alignment free phylogeny estimation methods to filter out k-mers corresponding to missing regions.