Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms, as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes, including finding the tree that provides the best fit to the data, sampling from the space of likely trees, and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability.
Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, where the tree is broken down into sub-trees and the likelihood of each sub-tree is calculated in parallel. We present an overview of phylogenetic trees and how Felsenstein’s algorithm is used to calculate the likelihood of these trees. Then we analyse the two proposed parallel likelihood calculation methods, which aim at drastically increasing the size of trees that can be use in phylogenetic analysis.
Peter Hayward is a MSc Computer Science student at the Bioinformatics Research Group, Stellenbosch University.