Widely-used models of amino acid sequence evolution, such as the WAG and JTT models, are based on empirically-derived rate matrices that capture the exchangeability and frequencies of amino acids observed in large databases of alignments. For phylogenetic estimation, these are typically combined with some kind of parametric model for differeing rates amongst sites. However, these models ignore important effects in the evolution of real protein sequences such as changes in the rate of evolution at a position over the tree, changes in the amino acid frequencies in different lineages and site-specific amino acid exchangeabilities and frequencies that depend on the 3D structural and functional context of the site in the protein. Here I provide an overview of several models we have developed to capture these biologically important effects. We have addressed the problem of rate changes over lineages by implementing a general covarion model for amino acid sequences in the maximum likelihood framework. We have approached the problem of site-specific substitution processes in two ways. First we have implemented a “mixture of frequencies” model that captures the 3 dominant amino acid frequency patterns in a set of 21 taxon-rich alignments. Second, we have implemented an independence energy model that uses statistical energy potentials from the 3D structure of the protein of interest to construct site-specific substitution matrices. Likelihood ratio tests indicate that all of three of these models have significantly better fit to real protein sequence data than the standard empirical amino acid substitution models. Furthermore, some of these more biologically realistic substitution models avoid phylogenetic estimation biases that can arise if these important effects are ignored.
Related Links
* http://www.rogerlab.biochem.dal.ca - Andrew Roger Homepage
* http://www.mscs.dal.ca/Faculty/susko.html - Edward Susko Homepage
view more