Detecting the Signatures of Adaptive Evolution in Protein‐Coding Genes

Joseph P. Bielawski1

1 Department of Biology, Department of Mathematics & Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 19.1
DOI:  10.1002/0471142727.mb1901s101
Online Posting Date:  January, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The field of molecular evolution, which includes genome evolution, is devoted to finding variation within and between groups of organisms and explaining the processes responsible for generating this variation. Many DNA changes are believed to have little to no functional effect, and a neutral process will best explain their evolution. Thus, a central task is to discover which changes had positive fitness consequences and were subject to Darwinian natural selection during the course of evolution. Due the size and complexity of modern molecular datasets, the field has come to rely extensively on statistical modeling techniques to meet this analytical challenge. For DNA sequences that encode proteins, one of the most powerful approaches is to employ a statistical model of codon evolution. This unit provides a general introduction to the practice of modeling codon evolution using the statistical framework of maximum likelihood. Four real‐data analysis activities are used to illustrate the principles of parameter estimation, robustness, hypothesis testing, and site classification. Each activity includes an explicit analytical protocol based on programs provided by the Phylogenetic Analysis by Maximum Likelihood (PAML) package. Curr. Protoc. Mol. Biol. 101:19.1.1–19.1.21. © 2013 by John Wiley & Sons, Inc.

Keywords: molecular evolution; protein evolution; selection pressure; codon models; maximum likelihood

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Codon Modeling Activities Using the CODEMI Program
  • Concluding Remarks
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Anisimova, M. and Kosiol, C. 2009. Investigating protein‐coding sequence evolution with probabilistic codon substitution models. Mol. Biol. Evol. 26:255‐271.
   Anisimova, M. and Liberles, D. 2012. Detecting and understanding natural selection. In Codon Evolution: Mechanisms and Models (G. Cannarozzi and A. Schneider, eds.) Oxford University Press, New York.
   Anisimova, M., Bielawski, J.P., and Yang, Z. 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18:1585‐1592.
   Anisimova, M., Bielawski, J.P., and Yang, Z. 2002. Accuracy and power of Bayesian prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950‐958.
   Aris‐Brosou, S. and Bielawski, J.P. 2006. Large‐scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation. Gene 378:58‐64.
   Bao, L., Gu, H., Dunn, K.A. and Bielawski, J.P. 2007. Methods for selecting fixed‐effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol. Biol. 7:S5.
   Bao, L., Gu, H., Dunn, K.A., and Bielawski, J.P. 2008. Likelihood Based Clustering (LiBaC) for Codon Models, a method for grouping sites according to similarities in the underlying process of evolution. Mol. Biol. Evol. 25:1995‐2007.
   Bielawski, J.P. and Yang, Z. 2001. The role of selection in the evolution of the DAZ gene family. Mol. Biol. Evol. 18:523‐529.
   Bielawski, J.P. and Yang, Z. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J. Mol. Evol. 59:121‐132.
   Bielawski, J.P. and Yang, Z. 2005. Maximum likelihood methods for detecting adaptive protein evolution. In Statistical Methods in Molecular Evolution (R. Nielsen, ed.) pp. 103‐124. Springer‐Verlag, New York.
   Bielawski, J.P., Dunn, K.A., Sabehi, G., and Béjà, O. 2004. Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment. Proc. Natl. Acad. Sci. U.S.A. 101:14824‐14829.
   DeLong, E.F. and Béjà, O. 2010. The light‐driven proton pump proteorhodopsin enhances bacterial survival during tough times. PLoS Biol. 8:e100359.
   Dutheil, J.Y., Galtier, N., Romiguier, J., Douzery, E.J., Ranwez, V., and Boussau, B. 2012. Efficient selection of branch‐specific models of sequence evolution. Mol. Biol. Evol. 29:1861‐1874.
   Field, S. F., Bulina, M.Y., Kelmanson, I.V., Bielawski, J.P., and Matz, M.V. 2006. Adaptive evolution of multicolored fluorescent proteins in reef‐building corals. J. Mol. Evol. 62:332‐339.
   Goldman, N. and Yang, Z. 1994. A codon based model of nucleotide substitution for protein‐coding DNA sequences. Mol. Biol. Evol. 11:725‐736.
   Guindon, S., Rodrigo, A.G., Dyer, K.A., and Huelsenbeck, J.P. 2004. Modeling the site‐specific variation of selection patterns along lineages. Proc. Natl. Acad. Sci. U.S.A. 101:12957‐12962.
   Jiggins, F.M., Hurst, G.D.D., and Yang, Z. 2002. Host‐symbiont conflicts: Positive selection on the outer membrane protein of parasite but not mutualistic Rickettsiaceae. Mol. Biol. Evol. 19:1341‐1349.
   Kelley, J.L. and Swanson, W.J. 2008. Dietary change and adaptive evolution of enamelin in humans and among primates. Genetics 178:1595‐1603.
   Kosakovsky Pond, S.L. and Frost, S.D. 2005a. A genetic algorithm approach to detecting lineage‐specific variation in selection pressure. Mol. Biol. Evol. 22:478‐485.
   Kosakovsky Pond, S.L. and Frost, S.D. 2005b. Not so different after all: A comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22:1208‐1222.
   Kosakovsky Pond, S.L. and Muse, S.V. 2005. HyPhy: Hypothesis testing using phylogenies. In Statistical Methods in Molecular Evolution (R. Nielsen, ed.) pp. 125‐181. Springer‐Verlag, New York.
   Kosakovsky Pond, S.L., Murrell, B., Fourment, M., Frost, S.D., Delport, W., and Scheffler, K. 2011. A random effects branch‐site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28:3033‐3043.
   Muse, S.V. and Gaut, B.S. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with applications to the chloroplast genome. Mol. Biol. Evol. 11:715‐725.
   Pawitan, Y. 2001. In all likelihood: Statistical modeling and inference using likelihood. Clarendon Press, Oxford.
   Rodrigue, N., Lartillot, N., and Philippe, H. 2008. Bayesian comparisons of codon substitution models. Genetics 180:1579‐1591.
   Stamatakis, A. 2006. RAxML‐VI‐HPC: Maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688‐2690.
   Swofford, D.L. 2003. PAUP*. Phylogenetic analysis using parsimony (* and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
   Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568‐573.
   Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford.
   Yang, Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586‐1591.
   Yang, Z. and Bielawski, J.P. 2000. Statistical methods for detecting molecular adaptation. TREE 15:496‐503.
   Yang, Z. and dos Reis, M. 2011. Statistical properties of the branch‐site test of positive selection. Mol. Biol. Evol. 28:1217‐1228.
   Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32‐43.
   Yang, Z. and Nielsen, R. 2002. Codon‐substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908‐917.
   Yang, Z. and Swanson, W.J. 2002. Codon‐substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49‐57.
   Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A.M.K. 2000. Codon‐substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431‐449.
   Yap, V.B., Lindsay, H., Easteal, S., and Huttley, G. 2010. Estimates of the effect of natural selection on protein‐coding content. Mol. Biol. Evol. 27:726‐734.
   Zhang, J., Nielsen, R., and Yang, Z. 2005. Evaluation of an improved branch‐site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22:2472‐2479.
   Zwickl, D.J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.
PDF or HTML at Wiley Online Library

Supplementary Materials

Codon usage bias in amy2 gene of D. melanogaster and D. pseudoobscura: Supplementary_FigureS1.pptx

The supporting files for Activities 1 through 4 for codon modeling activities using the CODEML program: